chemometric tools for enhanced performance in liquid chromatography …160634/fulltext01.pdf ·...

Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 607

_____________________________ _____________________________

Chemometric Tools for Enhanced Performance in Liquid

Chromatography-Mass Spectrometry

BY

DAN BYLUND

ACTA UNIVERSITATIS UPSALIENSISUPPSALA 2001

Dissertation for the Degree of Doctor of Philosophy in Analytical Chemistry presentedat Uppsala University in 2001

Abstract

Bylund, Dan, 2001. Chemometric Tools for Enhanced Performance in Liquid Chromatography– Mass Spectrometry. Acta Univ. Ups., Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 607. 47 pp. Uppsala. ISBN 91-554-4946-8.

Liquid chromatography–mass spectrometry (LC-MS) has become an important analytical on-line technique, capable of producing large amounts of data with high selectivity and sensitivity.Optimal use of the sophisticated instrumentation can be attained if the analytical chemists areguided to perform the proper experiments and to extract the useful information from theacquired data. In this thesis, strategies and methods concerning these two issues are presented.

LC-MS method development will benefit from fundamental understanding of the processesinvolved. An experimental procedure was designed to determine the coefficients in a model forthe electrospray process. By relating these coefficients to the experimental conditions, theinfluence on signal level and sensitivity for presence of matrix compounds was studied.

For the optimization of LC-MS methods, strategies based on empirical modelling wereworked out. Comparisons were made between artificial neural network (ANN) modelling andlinear modelling tools, and a genetic algorithm was implemented to explore the ANN models.

Visual interpretation and multivariate analysis of LC-MS data is hampered by backgroundsignals and noise, and a digital filter for background suppression and signal-to-noiseimprovement was developed. It is also important to indicate the presence of overlapping peaks,and a strategy for the assessment of peak purity was therefore worked out. These methods andseveral established methods were implemented in an add-on program (LC-MS Toolbox 1.0) forinformation extraction of LC-MS data.

Ultimately, the data produced with LC-MS can be separated into the mass spectra, theelution profiles and the concentrations of the analytes, e.g. with PARAFAC modelling. Thetrilinear data structure assumed may, however, be distorted by variations in the LC conditionscausing retention time shifts. An improved algorithm for time warping that can compensate forsome of these deviations was worked out, and its performance as a pre-processing tool forPARAFAC was examined.

Dan Bylund, Institute of Chemistry, Department of Analytical Chemistry, Uppsala University,Box 531, SE-751 21 Uppsala, Sweden

Dan Bylund 2001

ISSN 1104-232XISBN 91-554-4946-8

Printed in Sweden by Lindbergs Grafiska HB, Uppsala 2001

List of papers

The following papers are discussed in this thesis. They are referred to in the text by theirRoman numerals I-VI.

I. A method for determination of ion distribution within electrosprayeddropletsP.J.R Sjöberg, C.F. Bökman, D. Bylund, K.E. MarkidesAnal. Chem., 73(1), 23-28 (2001)

II. Optimisation of chromatographic separations by use of a chromatographicresponse function, empirical modelling and multivariate analysisD. Bylund, A. Bergens, S.P. JacobssonChromatographia, 44(1/2), 74-80 (1997)

III. Optimization strategy for liquid chromatography-electrospray ionizationmass spectrometry methodsM. Moberg, D. Bylund, R. Danielsson, K.E. MarkidesAnalyst, 125(11), 1970-1976 (2000)

IV. Matched filtering with background suppression for improved base peakchromatograms and mass spectra in LC-MSD. Bylund, R. Danielsson, K.E. MarkidesAnal. Chim. Acta, submitted for publication

V. Peak purity assessment in liquid chromatography/mass spectrometryD. Bylund, R. Danielsson, K.E. MarkidesJ. Chromatogr. A, accepted for publication

VI. Chromatographic alignment by warping and dynamic programming as apre-processing tool for PARAFAC modelling of LC-MS dataD. Bylund, R. Danielsson, G. Malmquist, K.E. MarkidesIn manuscript

Reprints were made with kind permission from the publishers.

The following paper has been omitted since its content is not related to the scope of thisthesis.

Chemometric techniques applied to soft x-ray emission spectra of aliphaticmoleculesK. Gunnelin, M.Wirde, D. Bylund, G. BrayPaper I, Doctoral Thesis, K. Gunnelin, Uppsala University, 1999

Table of contents

1. Introduction..................................................................................................12. Liquid chromatography/mass spectrometry............................................. 2

ESI theory.......................................................................................................33. Chemometrics.............................................................................................. 6

Principal component analysis....................................................................... 6Multivariate calibration.................................................................................7Parallel factor analysis..................................................................................8Artificial neural networks.............................................................................. 9Genetic algorithms.........................................................................................11

4. Optimization.................................................................................................12LC optimization..............................................................................................12LC-MS optimization.......................................................................................14

5. Exploration of LC-MS data........................................................................ 18Mathematical description of LC-MS data......................................................18LC-MS Toolbox 1.0........................................................................................18

6. LC/MS for quantification and classification............................................. 28Quantitative analysis..................................................................................... 28PARAFAC modelling and the alignment problem......................................... 29

7. Concluding remarks and future aspects....................................................358. Acknowledgements...................................................................................... 369. References.....................................................................................................37

Abbreviations

ALS Alternating least squaresAMDIS Automated mass spectral deconvolution and identification systemANN Artificial neural networksAPCI Atmospheric pressure chemical ionisationAPI Atmospheric pressure ionisationBPC Base peak chromatogramCMCP Comparative mass chromatogram plotCODA Component detection algorithmCOW Correlation optimized warpingCRF Chromatographic response functionESI Electrospray ionisationFSMW-EFA Fixed size moving window - evolving factor analysisGA Genetic algorithmsGC Gas chromatographyGRAM Generalised rank annihilation methodGSD Gaussian second derivativeICP Inductively coupled plasmaLC Liquid chromatographyLIC Length ion chromatogramm/z Mass-to-charge ratioMCR Multivariate curve resolutionMLR Multiple linear regressionMS Mass spectrometryNIR Near infraredOBS Orthogonal background subtractionOSCAR Optimization by stepwise constraining of alternating regressionPARAFAC Parallel factor analysisPCA Principal component analysisPCR Principal component regressionPLS Partial least squares regressionS/N Signal-to-noise ratioSIM Single ion monitoringSPC Sequential paired covarianceTIC Total ion chromatogramUV UltravioletWMSM Windowed mass selection methodXIC Extracted ion chromatogram

Conventions

Scalars are represented by italic lower case characters, aVectors are represented by bold lower case characters, aMatrices are represented by bold upper case characters, AThree-way data arrays are represented by underlined, bold upper case characters, AThe superscript T indicates the transpose, AT

Running indices are indicated by italic lower case characters, ai

Dimensions are given by italic upper case characters, A (I×J)

1

1. Introduction

A general trend in analytical chemistry is to produce more and more data per sample.This is due to increasing analytical demands for higher specificity and sensitivity, andhas been facilitated by developments in instrumentation and computer systems, makinglarge amounts of data possible to produce and store with good economy. In order tomake efficient use of the sophisticated analysis systems present, there is a need formethods that can help the analytical chemists to perform good experiments and toextract the relevant information from the acquired data. Such methods are collectedunder the discipline of chemometrics.

One of the recent important instrumental developments, giving high selectivity andsensitivity for complex samples, is the hyphenation of liquid chromatography with massspectrometry (LC-MS). The use of LC-MS in analytical chemistry has shown anexponential increase ever since the introduction of commercial instruments withinterfaces based on atmospheric pressure ionisation in the 1980s. LC-MS has nowbecome an important tool in, e.g., pharmaceutical drug development, proteincharacterisation (proteomics) and environmental control.

This thesis deals with how to increase the flow of chemical information from analyticalmethods based on LC-MS. Such an improved performance can be achieved by methoddevelopment, as discussed in Papers I-III, and by data processing, as discussed inPapers IV-VI.

Analytical method development can be separated into optimization and validation, ofwhich the first issue is discussed in this thesis. For LC-MS methods, several processesare involved and a good starting point for the optimization would be chosen fromfundamental understanding of these processes (Paper I). The complexity of the systemalso means that further optimization preferably should be guided by the use of empiricalmodels (Papers II and III).

The data produced with an LC-MS system are of second order; the data points representintensity as a function of both retention time and m/z ratio. Full use and interpretabilityof this information can be attained by the application of chemometrics. Ultimately, thedata can be separated into the mass spectra, the elution profiles and the concentrationsof the solutes in the sample. This separation can be performed with multiway modelling(Paper VI) and is facilitated if non-ideal properties of the data are first removed. Suchdata pre-processing may involve background subtraction (Paper IV) and retention timealignment (Paper VI). The models might be less accurate in the presence of co-elutingcompounds; hence assessment of the peak purity is also of interest (Paper V).

2

2. Liquid chromatography-mass spectrometry

For decades, the liquid chromatograph has been a working horse in the separation oforganic compounds. At the same time the mass spectrometer has been an important andsensitive tool for structure elucidation. By hyphenating the two techniques, a verypowerful instrumental set-up is achieved. However, interfacing the two techniques isnot straightforward since the solutes leaving an LC column are dissolved in mobilephase at atmospheric pressure, while the MS is constructed to detect gas phase ions invacuum.

One of the first attempts for on-line LC-MS experiments were reported by Tal'roze etal., using a capillary inlet interface [1]. Several other types of interfaces have beensuggested, including, e.g., thermospray [2] and fast atom bombardment [3]. Thebreakthrough for LC-MS was, however, the development of two techniques foratmospheric pressure ionisation (API): the electrospray ionisation (ESI) [4] and theatmospheric pressure chemical ionisation (APCI) [5].

ESI and APCI are both 'soft' ionisation techniques generating mainly protonated ordeprotonated ions. The two techniques can be considered as complementary since ESIinvolves solution chemistry and APCI gas phase chemistry, although it can be hard toselect a preferred method for a given compound. Generally APCI is preferred for lesspolar compounds with moderate acid-base properties in solution, while ESI should bethe method of choice for more polar compounds. With ESI it is also possible to obtainmultiply charged ions for large molecules [6], e.g. proteins and carbohydrates. Therebythe detection of high molecular weight compounds is facilitated for instruments withlimited m/z range.

Except for the ion source, the main parts of a mass spectrometer are the mass analyser,where the ions are separated, and the detector counting the ions. Today the mostcommon mass analysers for LC-MS are those used in quadrupole, time of flight (TOF)and ion trap instruments.

The high gas load generated by the LC effluent makes the hyphenation with MSdifficult. It is also hard for the MS system to handle the non-volatile buffers often usedin LC. Even though the LC-MS manufacturers are struggling to construct interfaces thatcan handle both the flow rates used in conventional LC and the presence of phosphate inthe effluent, the best performance with ESI is still obtained by the use of volatilebuffers, e.g. ammonium acetate, and flow rates below what is normally used inconventional LC. Hence LC-MS has been one of the driving forces for the developmentof miniaturised LC systems [7]. In addition to the MS compatibility, µ-LC has theadvantages compared to conventional LC in higher efficiency at a lower cost and lowsample volume and mobile phase consumption. With µ-LC it is also easier to use thetemperature for selectivity control [8,9].

3

Throughout this work, quadrupole instruments with pneumatically assisted ESIinterfaces [10] have been used. These instruments have been hyphenated to both µ- andconventional LC systems. In the latter case the flow has been split before the ESIinterface.

ESI theory (Paper I)

Electrospray is the dispersion of a liquid into charged droplets caused by the force of anapplied electrostatic field. In the late 1960s Dole et al. [11] were the first to reportstudies of this mechanism as a sample introduction technique, capable of producing gasphase ions at atmospheric pressure. Iribarne and Thomsson reported further studies inthe 1970s [12,13] and in the 1980s several research groups considered electrospray as asource for gas phase ions detected with MS [14-16].

In ESI [17], an electrostatic field is generated between the spray emitter and the MSentrance by applying a potential difference between the electrodes. The field induces acharge separation in the liquid, and thereby formation of an aerosol with a net chargeflow proportional to the current driven by the system. A fraction, f, of this net chargemay leave the droplets as gas phase ions. The exact mechanism [18] for this step is notknown, but the predominant theories are those of Dole et al. [11] and Iribarne andThomson [12,13]. The gas phase ions are then sampled, transmitted, and detected by theMS system with a certain probability, P.

Due to charge repulsion, the main part of the net charge will be situated at the surface ofthe droplets. The surface activity of the solute is thus an important parameter as pointedout in the model by Tang and Kebarle [19]. Enke refined this model by defining apartition coefficient (K) between the charged surface and the neutral interior of thedroplet [20]. For a system with analyte (A) and electrolyte (E) present, there will be acompetition for the surface sites described by

[ ] [ ][ ] [ ] is

is

E

A

XAE

XEAKK

−++

−++

= (2.1)

where X- is the counter ion and the subscripts s and i stands for surface and interior,respectively. The response (R) for the analyte can then be described as a function of itssurface concentration

[ ] sA APfR += (2.2)

where [A+]s is some fraction of the net charge concentration [Q]. By measuring thecurrent I and the liquid flow rate L, [Q] can be determined from

4

[ ] FLIQ = (2.3)

where F is the Farraday constant. By incorporating mass balance equations and the factthat [Q]= [A+]s+[E+]s for the two-component system described above, Enke [20] arrivedat the following equation

[ ] [ ] [ ] [ ] 0112 =+

++

−−

− ++

E

AAE

E

AA

E

As

E

As K

KQCC

KK

CKK

QAKK

A (2.4)

where CA and CE are the total concentrations of analyte and electrolyte, respectively. Bysolving this quadratic equation an expression for [A+]s is obtained that can be fitted toexperimental values of RA vs. CA and from which the value of KA/KE can be determined(Fig. 1).

Fig. 1. Surface concentration of the tetrepentylammonium (TPA) ion vs. the total concentration ofTPA in methanol - water (50/50) with 0.10 mM tetramethylammonium bromide as electrolyte:(!) experimental data, (o) data points fitted using Eq. 2.4.

In Paper I, an alternative method for determination of KA/KE was presented. Thismethod is based on the assumption that the instrumental response factor Pf for theelectrolyte can be determined for a one-component system with only electrolyte present,and then utilised as a constant for a two-component system with both electrolyte andanalyte present. Under this assumption only two experiments are required to determine[A+]s and [E+]s. By incorporating mass balance, a value for KA/KE can then be directlycalculated from Eq. 2.1 according to

1.0E-08

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-08 1.0E-06 1.0E-04 1.0E-02

CA / M

[A+ ]

/ M

s

5

[ ] [ ] [ ]

[ ] [ ] [ ]

+−

−

−

=

00

00

00

00

E

EA

E

E

E

EE

E

E

E

A

RR

QQCRR

Q

RR

QCRR

QQ

KK

(2.5)

where the superscript zero indicates the results obtained for the experiment with noanalyte present. Thus the only things that has to be measured in the two experiments arethe current and the response for the electrolyte.

This experimental procedure was in Paper I applied for a series of permanently chargedquaternary ammonium compounds. It was found that the partition factor KA/KE and theinstrumental response factor Pf optimized at different amount of organic modifier in themobile phase. This behaviour has several consequences regarding method development.One is that when optimizing for maximum response, there will exist an interactionbetween the amount of organic modifier and the buffer content (the electrolyte).Another important consequence is that operation at optimal signal conditions might notbe optimal when matrix effects are considered.

In a recent report by Zhou and Cook [21], Enkes’ partition coefficient was expressed asa function of surface activity, electrophoretic mobility and ion-pairing properties. Thismodel can be used to further explain some of the results in Paper I. Since theelectrophoretic mobility is inversely proportional to the viscosity, this factor will have alower impact on KA/KE for high viscosity mobile phase compositions. Hence highKA/KE values are expected for such mobile phases, provided that the electrolyte have ahigher mobility. This is fully in agreement with the results obtained for the quaternaryammonium compounds.

6

3. Chemometrics

Svante Wold gave the following definition of chemometrics [22] in 1994: “How do weget chemically relevant information out of measured chemical data, how do werepresent and display this information, and how do we get such information into data?”In other words, chemometrics comprises mathematical and statistical methods thatguide the flow of chemical information. Hence chemometrics is a natural extension ofanalytical chemistry (Fig. 2).

Fig. 2. The route of analytical chemistry in which chemometrics can play many roles. It canhelp the analytical chemist to select the analytical method, comprising all the steps fromsampling to data acquisition, to optimize and validate this method, and finally to interpretthe data and translate it into an answer that can be understood by those who ordered theanalysis.

The concept of a model is central in chemometrics. Depending on the assumptionsmade, models can be classified from soft to hard, with the hardest being those describedby physical laws. The complexity of the model determines how the experiments shall bedesigned and which tools that can be applied to extract the information from theacquired data.

The thinking in terms of models has also permeated this work and numerouschemometric methods have been used. The theory behind those methods will be brieflydescribed below. Further reading can be found in the references given to each techniqueor in general textbooks; see for example Handbook of Chemometrics and Qualimetrics[23,24].

Principal component analysis

Principal component analysis (PCA) [25] is a standard tool to compress and visualiselarge amounts of data. In chemistry, it has been used in a wide range of applicationsincluding, e.g., classification [26], experimental design [27] and multivariate qualitycontrol [28]. In this work PCA have been used for peak purity analysis [Paper V].

Analyticalrequest

Analyticalanswer

Analyticalmethod

CHEMOMETRICS

7

With PCA, a data matrix X (M×N) is decomposed into a bilinear model according to

TTPX = (3.1)

where T (M×Fmax) is the orthogonal score matrix and P (N×Fmax) is the orthonormalloading matrix. The matrices are arranged so that the columns of T have descendingvariance, i.e. so that the first principal component describes most of the variance in Xetc. The value of Fmax (the number of components) equals the rank of X. Due to noisethis value often exceeds the number of components F that is necessary to reconstruct therelevant information in X. It is then possible to divide the principal components intoprimary and secondary components, where the latter contain only noise. Hence themodel (Eq. 3.1) can be truncated into

ETPX += T (3.2)

where E (M×N) constitutes the residuals. It is often of interest to find the optimalnumber of components of this model (the pseudo-rank F), and several test procedureshave been suggested for this purpose [29,30]. The resulting model means a projection ofX on so-called latent variables. The co-ordinates along these variables (components)form the score matrix T.

By plotting the scores of one component vs. another, a score plot is obtained which canbe used to find relations between the objects, e.g. for classification or outlier detection.Similarly, loading plots can be used to relate the variables and visualise theircontribution to the model.

Multivariate calibration

Methods for multivariate calibration [31] are commonly applied in modern analyticalchemistry, e.g. in NIR spectroscopy [32] and ICP-MS [33]. In this work, multivariatecalibration methods have been used for empirical modelling of chromatographicbehaviour [Papers II,III] and of the signal-to-noise ratio (S/N) obtained for LC-MSexperiments [Paper III].

By multivariate calibration a dependent variable y is related to a multivariate space ofindependent variables X according to

eXby += (3.3)

where e constitutes the residuals. The regression coefficient vector b is determined from

yXb += (3.4)

8

where X+ is a generalised inverse of X. With multiple linear regression (MLR), X+ isgiven by

( ) T1T XXXX −+ = (3.5)

This inverse will exist only if X has full column rank and hence the number of variablesmust not exceed the number of samples. As a consequence variable selection [34] maybe necessary. Problems in terms of unstable results will occur if X is close to singular,e.g. due to co-linearity. An alternative is then to perform PCA on X followed byregression of y on T, a method referred to as principle component regression (PCR). InPCR the generalised inverse is given by

( ) T1TFFFF TTTPX −+ = (3.6)

An optimal value for F can be found by cross- or test set validation [35]. It is notcertain, however, that the principle components describing much variance in X are theones most predictive for y. An alternative is to apply partial least squares regression(PLS) where the decomposition of X is guided to maximise the covariance with y ratherthan to describe as much variance in X as possible. In PLS the generalised inverse isgiven by

( ) ( ) T1T1TFFFFFF TTTWPWX −−+ = (3.7)

where W is the loading weights matrix.

Parallel factor analysis

Parallel factor analysis (PARAFAC) [36] can be regarded as an extension of PCA intohigher order data, i.e. data that is dependent on more than two controlled factors. Inchemistry, PARAFAC has been used for, e.g., studies of kinetic systems [37] and inqualitative and quantitative analysis with UV spectrophotometry [38]. In this workPARAFAC has been used to model LC-MS data for peptide standard mixtures [PaperVI].

For a three-way data array X (I×J×K), the PARAFAC model is

ijk

F

fkfjfifijk ecbax +=∑

=1

(3.8)

where F is the number of factors, eijk is an element of the residual array E (I×J×K) andaif, bjf and ckf are elements of the loading matrices A (I×F), B (J×F) and C (K×F)respectively.

9

Compared to the bilinear case modelled by PCA, there are a number of importantdifferences. The first is that unlike PCA, the optimal PARAFAC solution will usuallynot be obtained if the factors are calculated sequentially. The consequence is that thenumber of factors should be set in advance and the entire model fitted simultaneously.Another important difference is that the rotational freedom present for bilinear data doesnot exist for higher order linearity. The PARAFAC solution is unique, and the contentsof A, B and C can be directly interpreted as physical-chemical properties of X.

After setting the value of F, the PARAFAC model is fitted by alternating least squares(ALS) [39,40]. It is possible to add constraints to this iterative algorithm, e.g. non-negativity or unimodality [40]. The unconstrained case will always give the best fit for agiven number of factors. However, by the use of constraints it is possible to incorporatepre-knowledge of the data, e.g. that a chromatographic elution profile has a singlemaximum and that a mass spectrum does not contain negative values. Thereby theresulting model can be easier to interpret, especially if X is not truly trilinear or containshigh levels of noise.

Artificial neural networks

Algorithms with the purpose to mimic the properties of the human brain are classifiedas artificial neural networks (ANNs). Different types of ANNs have been used for awide variety of tasks, including, e.g., speech recognition and space travelling. Withinanalytical chemistry, ANNs have mainly been used for non-linear calibration [41] andclassification problems [42]. In most of these works, multi-layer feed-forward neuralnetworks [43] have been the method of choice. In this work ANNs have been used forempirical modelling of a chromatographic response function [Paper II] and of the S/Nfor LC-MS experiments [Paper III].

A multi-layer feed-forward neural network consists of a number of operating units,referred to as nodes or neurones. The nodes are arranged in layers with one input layer,one output layer, and at least one hidden layer in between. The number of nodes in theinput and output layers equals the number of independent and dependent variables,respectively, while the number of hidden nodes must be optimized for the problem athand. Each node in a layer is connected to all of the nodes in the next layer, and eachconnection is assigned a weight (w). The first operation in a node is to sum theincoming signals according to

∑=i

jiij wnet ο (3.9)

The output (oj) from the node j is then given by a function of the sum (netj). Anydifferentiable function can be used, but frequently a sigmoidal function is applied

10

jnetj −+=

e11ο (3.10)

Once the numbers of hidden nodes and layers have been chosen, calibrating the networkis a matter of optimizing the weights. This is often accomplished iteratively by thebackpropagation learning rule. After initialising the network with random weights,calibration data is presented to the network and the error (ε) between the network output(οk) and the experimental data (y) is calculated.

ky οε −= (3.11)

The weights are then adjusted according to

ijjiw οηδ=∆ (3.12)

where η is the learning rate and δ is proportional to the error and the derivative of thetransfer function of the node. For a node in the output layer δ is given by

( )εδ kk netf ′= (3.13)

while δ for a hidden node, where the desired value is unknown, is given by

( )∑′=k

kjkjj wnetf δδ (3.14)

The learning rate is an important parameter. A low setting for η, i.e. a small step size inthe weight optimization, give accurate results at the cost of slow convergence. There isalso a higher risk for getting stuck at local optima. On the other hand, a high setting maylead to unstable or even oscillating results. In order to combine the properties of lowand high learning rate, η can be made adaptive as a function of the learning progress.Another common customisation to prevent oscillations is to invoke a momentum term αin the weight adjustment according to

( ) ( )nwnw jiijji ∆+=+∆ αοηδ1 (3.15)

where n and n+1 indicates the previous and the current iteration, respectively. A morerobust backpropagation algorithm was proposed Walczak [44]. Other weightoptimisation procedures are also available, all with different merits and drawbacks [45].

11

Genetic algorithms

Genetic algorithms (GAs) [46] are suited to solve combinatorial optimization problems.In analytical chemistry, most applications with GAs have concerned variable selection[47,48], but it has also been used for other problems, e.g. the resolution of overlappingchromatographic peaks [49]. In this work, a GA was used to explore ANN models inorder to find optimal parameter settings for LC-MS experiments [Paper III].

The main principle of a GA is the natural selection according to Darwin’s theory ofsurvival of the fittest [50], and several parallels can be drawn with the evolving forcesof nature. A standard GA is initialised by random selection of a start population ofindividuals, each representing a candidate solution to the optimization problem. Theindividuals are ranked according to their fitness, i.e. to their success in solving theproblem as measured by an objective function. The individuals are then allowed torecombine, i.e. exchange variable setting properties, with probabilities related to theirfitness. The offspring forms the new generation, which is evaluated with the objectivefunction and so on.

A common customisation of the flow chart described above is the introduction of amutation step, i.e. low probability random changes in variable settings. This ensures amore complete search of the candidate solution space, especially for small populationsfor which random effects may easily lead to irrecoverable loss of some variable settingsdue to what in nature is known as the bottleneck effect.

The individuals of a GA are often represented by bitstrings (chromosomes). Thebitstrings (sequences of ones and zeros) are divided into sections (genes), eachrepresenting a variable setting. For such GAs, a mutation is simply the exchangebetween zero and one at a given position. Recombination is often accomplished by socalled 1X recombination as shown in Fig. 3.

Fig. 3. 1X recombination of two individuals from a GA population. The position for thecleavage is randomly chosen and the offspring is formed by recombining the pieces.

1 0 0 1 0 1 1 1 0 1 0 1

0 1 0 0 1 1 0 0 0 1 1 0

1 0 0 1 0 1 1 1 0 1 1 0

0 1 0 0 1 1 0 0 0 1 0 1

+ +

12

4. Optimization

When setting up an analytical method it is often of interest to find the combination ofparameter settings that gives the best results, i.e. to optimize the conditions within theexperimental domain. When only few parameters are considered, this domain can bestudied by varying one parameter at the time. However, as the number of parametersincreases this procedure soon becomes impractical, especially if the effects of theparameters are not independent of each other. It is then preferable to use a moreefficient optimization strategy.

Multi-parameter optimization strategies [51] can be divided into response surfacemethodology and sequential methods. In sequential optimization the experiments areperformed one by one at parameter settings given by a search routine, usually a gradientsearch method like the simplex [52]. Thereby the results of the previous experimentswill guide the experimental conditions towards an optimum. A problem is that the entireexperimental domain will usually not be covered, and there is a risk of ending up atlocal optimum rather than the global optimum. To avoid this, it is often recommendedto repeat the optimization from a different start position and see if the same optimum isfound.

In response surface optimization, a number of experiments are performed according toan experimental design. The results are then fitted to a model describing the relationshipbetween the response and the parameters investigated. The best conditions are thenselected from this model. The choice of experimental design determines how complexthis model can be made. In the early stage of method development, a screening designwith few experiments per parameter can be used to give first estimates for the maineffects. The important parameters can then be modelled by the use of a response surfacedesign, e.g. a central composite design [53], in order to account for interactions andnon-linearities.

LC optimization (Paper II)

The aim of analytical LC method development is to obtain sufficient resolution for thesolutes within a reasonable analysis time. The chromatographic resolution (Rs) can beexpressed as a function of efficiency (as measured by the plate number N), selectivity(α) and retention (k) according to

+

+−=

111

2 kkNRs α

α (4.1)

In LC, the most common way to improve the resolution is to increase the selectivity bychanging the mobile phase. General guidelines for the selection of the mobile phase canbe found in the literature [54] and is often offered by LC manufacturers. There are also

13

some hard models that can be utilised, e.g. the linear relationship between the logarithmof the retention factor and the fraction of organic modifier. Such knowledge can providea good starting point and even be sufficient for less demanding separations.

For more complex separation problems, many parameters, with possible interactions,must be considered. Then optimization with chemometric methods can offer an efficientpathway. In order to define the goal of the optimization it is necessary to describe thechromatogram by a single measure. This is often accomplished with a chromatographicresponse function (CRF), and some of the most commonly used CRFs are given inreference [55].

In Paper II, a new CRF was developed, describing the chromatogram as a product of aquality function (Q=f(Rs)) and a time function (T=f(k)) according to

)ba/(b)ba/(a TQCRF ++= (4.2)

By changing the weights (a and b) it is possible to alter the relative importance ofresolution and analysis time.

In response surface optimization, the CRF can be modelled either as a direct function ofthe LC parameters considered or indirectly from retention factor models [56] and theuse of Eq. 4.1 assuming N to be constant. In Paper II, the stationary phase was a proteinfor which the properties could be suspected to be dependent on the operating conditions.Hence the CRF was modelled directly as a function of the investigated parameters(temperature, pH, buffer concentration and amount of organic modifier).

The idea behind the optimization strategy applied in Paper II was to combine responsesurface methodology and sequential optimization and make use of all the acquired data.After a few initial experiments, a reduced factorial design with a centre point wasplanned and the experiments were performed accordingly. A PLS model of the results,including all the quadratic and interaction parameters, was then used to select theconditions for the next experiment. This experiment was performed and the result usedto update the PLS model etc. It can be argued that this strategy may lead to localoptima. In order to prevent this, experiments can be performed at regular intervals underthe least explored conditions for the experimental domain, as suggested by Djordjevic etal. [57].

For the large experimental domain investigated in Paper II it could be suspected that aquadratic model for the CRF (Eq. 4.2) was insufficient. Hence a comparison was madebetween MLR, PLS and ANN modelling of the results. In this study, an ANN modelwith eight hidden nodes in a single hidden layer outperformed the other methods inprediction ability (Table 1). It was also found that none of the models could be used forextrapolation.

14

Table 1. The root mean squared errors of prediction obtained for validationof three models of CRF=f(T,pH,%MeOH,[buffer])

MLR PLS ANNTest set 0.241 0.263 0.091Validation inside model 0.412 0.359 0.145Validation outside model 0.692 0.386 0.459

Low prediction ability, however, is not necessarily equivalent with bad optimizationperformance. In optimization the important subject is to find the directions where to goin the experimental domain. In that respect PLS was successful for the chiral separationof oxybutynin presented in Paper II.

LC-MS optimization (Paper III)

Compared to LC-UV, much more parameters must be considered when setting up anLC-MS method (Fig. 4). Some of these parameters, particularly the mobile phasecomposition, will affect both LC and MS performance (optimization problems of thiskind have been described by Lundstedt [58]). Unfortunately, there is also limitedamount of guidance in the literature how different parameters affect the ESI process[59], even though several research groups are currently working within this field [60-62].

Fig. 4. Parameters considered in Paper III for the optimization of LC-MS methods withelectrospray ionisation. (Org. mod.=volume fraction of organic modifier; Buffer=bufferconcentration; Flow=liquid flow rate; NEB=nebuliser gas flow; ISV=ionspray voltage;d=distance; CUR=curtain gas flow; OR=orifice voltage; RING=ring voltage)

In most published works, API interfaces have been studied and optimized with a one-parameter-at-a-time approach [63,64]. Among the exceptions are a combined responsesurface and simplex optimization reported by Mazsaroff et al. [65] and a response

Org. mod.BufferFlow

NEB

ISVCUR

ORRING

d

15

surface approach reported by Garcia et al. [66]. Generally the LC performance has notbeen considered in these studies.

In Paper III, a stepwise strategy for LC-MS method development was presented. Thefirst two steps of this strategy involve an LC study and a screening of the parametersaffecting the ion source. In the screening, direct infusion or flow injection of the pureanalytes dissolved in mobile phase is performed according to a screening design. Fromthe obtained S/N ratios, the parameters with high influence on the signal level and/orstability can be identified.

The LC study is performed in order to find limits for the mobile phase composition.How narrow these limits shall be set depends on how much weight the LC performanceshall be given. In many cases it is sufficient to obtain k>1 for all analytes of interest, justto ensure a separation from polar matrix compounds eluted with the front. The LC studycan then be performed to obtain a simple model of the retention behaviour. In somecases the LC study must be more thorough. An example is the analysis of enantiomers,for which the MS offers no selectivity under normal operating conditions. The LCperformance can also be important for quantitative analysis with ESI, where co-elutionmay affect the accuracy due to suppression effects [67,68].

The information gathered from the screening and the LC study is then used to determinethe setting for some parameters and the limits for the parameters remaining to optimize.The final optimization step is performed with infusion or flow injection experimentsaccording to a response surface design. From the experiments a model is obtained thatrelates the S/N ratio to the studied parameters. The choice of design, and thereby alsothe complexity of the model, is determined from the number of parameters and to whichcategory these parameters belong. Some parameters, e.g. the mobile phase composition,can be considered as slow to adjust, while others, e.g. the orifice voltage, can beconsidered as fast. With many parameters left to optimize, it may be necessary to useseparate designs for these categories in order to save experimental time. The 'fastdesign' is then performed for each experiment in the 'slow design'. The drawback of thisnested approach is that instrumental drift may influence the models to a higher degreesince the experimental sequence is not completely randomised.

The results from the final experiments can be fitted to an appropriate model with MLRor PLS. With many parameters and possible non-linearities and interactions present itcan be hard to define a suitable model. An alternative is to apply an ANN model, whereno assumptions of the relationship between the parameters and the response have to bemade. The drawback is that it can, for the same reason, be hard to locate the optimumfor such a model. Here GAs offers an effective way to explore the model. Compared toother search strategies, like the simplex algorithm, GAs has the advantage of a lowerrisk for arriving at local optimum. Such optima can be suspected to be extensivelypresent for complex ANN models.

16

An example of the results in Paper III is the optimization of the LC-ESI-MS analysis ofthree estrogens. The parameters considered were the amounts of buffer and organicmodifier, the flow rates of effluent, nebulising gas and curtain gas, the ionspray voltage,the distance between the spray emitter and the curtain plate, and the voltage settings forthe orifice and the ring electrode (Fig. 4). The results of the screening design for estriol(Fig. 5) indicated that the mobile phase composition, the nebulizer gas and curtain gasflows and the orifice voltage were the most important factors. From the LC study,where the mobile phase composition was varied according to a central composite designand the obtained retention factors for the three analytes fitted to quadratic models, it wasfound that within the experimental domain there was no problem to achieve sufficientresolution. Further it was found that the retention was mainly determined by the amountof organic modifier.

Fig. 5. The influnce from nine parameters (cf. Fig. 4) on the S/N for ESI-MS analysis ofestriol according to screening experiments. The 5% significance limits are indicated.

From the results of these studies it was determined that further studies should involvethe significant parameters according to the screening but also the ring voltage since aninteraction with the orifice voltage was suspected. The amount of organic modifier wasstudied at high levels since the LC study had shown that this would ensure a reasonableanalysis time. A nested design was applied with the four 'slow' parameters following athree-level fractional factorial design and the two 'fast' parameters a central compositedesign. The S/N values were modelled with an ANN with five nodes in a single hiddenlayer. The optimal conditions were then located with a GA and used to obtain thechromatogram shown in Fig. 6.

ORG BUF FLOW POS ISV NEB CUR OR RING -3

-2

-1

0

1

2

3

4

.C

oeffi

cien

t

17

Fig. 6. XIC (m/z 289) for the optimized LC-MS analysis of estriol (1), 4-hydroxyestradiol(2) and 2-hydroxyestradiol (3).

Paper III also included a comparison between ANN and MLR modelling of the finalexperiments. The prediction errors for the test sets were found to be lowest for ANNwhen some parameters had been studied at more than three levels (Table 2). Anexplanation for this could be that such designs might allow for more complex modelsthan the quadratic model used for MLR.

Table 2. Root mean squared errors of prediction for validation of MLR andANN models of S/N=f(ESI parameters) for three compounds.

MLR ANNEstriol (5 levels) 10.5 7.9Ibuprofen (5 levels) 14.1 7.7Morphine (3 levels) 13.1 16.2

0 5 10 15 200

1

2

3

4

t / min

I∗10

-6 /

cps

1

2

3

18

5. Exploration of LC-MS data

Mathematical description of LC-MS data

The data generated with LC-MS can be organised as an I×J matrix D with I massspectra and J ion chromatograms. Contributors to D are analyte signals (A), backgroundsignals (B), and noise (N) according to

NBAD ++= (5.1)

Under ideal conditions, A can be considered as bilinear combinations of concentrationprofiles (c) and mass spectra (s) of M analytes:

T

1

T CSscA ==∑=

M

mmm (5.2)

Deviations from this ideal behaviour can have different causes, e.g. non-linear detectorresponse and multimer formation [69,Paper V].

The relevant information to extract is present in S (qualitative analysis) and C(quantitative analysis). This extraction can be accomplished by different methods, ofwhich some, e.g. OSCAR [39,70], self-modelling curve resolution [71] and AMDIS[72,73], can be automated for large-scale applications. The performance of thesemethods can be improved by applying data pre-processing methods in order to minimisethe influence of B and N.

LC-MS Toolbox 1.0 (Papers IV and V)

A considerable part of this work has been the development of a chemometrics toolboxfor information extraction from LC-MS data. The toolbox runs under MATLAB v.4.2c(The MathWorks Inc., Natick, MA), but is currently being upgraded for MATLABv.5.3. Included are methods to visualise LC-MS data, methods for background andnoise reduction, methods for peak detection, and methods for peak purity analysis.When selecting among existing methods, much weight has been given to speed and easeof use. The methods have also been selected to be general, allowing for analysis of so-called black systems [74] where no a priori knowledge of the solutes is available.Methods of interest that have not (yet) been implemented, comprise, e.g., Fourierfiltering [75-77], wavelets [77,78], two-dimensional filtering [79] and several methodsfor peak purity analysis [80].

The main purpose of the toolbox is to aid visual interpretation of LC-MS data, but it canalso be useful in data pre-processing for further mathematical analysis. Here the

19

contents and performance of the toolbox will be exemplified for an LC-MS analysis of arosemary extract [81]. A more complete guide is in preparation [82].

When the program is started, a main menu is shown and the total ion chromatogram(TIC: the sum of each spectrum versus time) of the current data file is plotted (Fig. 7).

Fig. 7. The initial screen view of LC-MS Toolbox 1.0 exemplified with an LC-MS analysisof a rosemary extract.

The View menu contains all standard methods to visualise LC-MS data, i.e. the TIC, thebase peak chromatogram (BPC: the maximum signal for each spectrum versus time),extracted ion chromatograms (XICs: the signal for specified m/z ratios versus time),contour plotting, and mass spectra. An additional option is the “length ionchromatogram” (LIC: the length of each spectrum versus time) [83]. The LIC can becomplementary to the TIC and BPC in peak detection for some backgroundcharacteristics.

The Background menu contains a method for background subtraction based onorthogonalisation (OBS: 'Orthogonal Background Subtraction') [83]. When activated,the user is prompted to select a region representing mainly background. PCA is thenperformed on the selected data (Db):

20

EPTD += Tbbb (5.3)

If Db is properly chosen, the spectral composition of the background is gathered in oneor a few loading vectors Pb. The entire data set can then be orthogonalised against Pb toform the background subtracted data set Ds according to

Tbbs PDPDD −= (5.4)

This procedure handles quantitative variations of the background under the assumptionthat the qualitative content is relatively constant. Each spectrum in Ds will thenapproximate the corresponding net analyte signal (NAS) vector [84,85], i.e. the part ofthe spectrum that is orthogonal to the background. In practice, the mass spectra of thesolutes will not be completely orthogonal to the background and hence the resultsshould only be considered as semi-quantitative.

The Noise menu contains several time domain filters for smoothing of the data in thechromatographic direction. Well-known and established methods in this category arethe simple moving average [75] and the Savitzky-Golay filter [86]. A robust alternativeis the median filter [75], which can be used for spike removal. The influence fromspikes can also be reduced by the use of a filter that generates the geometric mean of thedata points within a time window of size 2m+1. The idea is that a sequence of adjacentdata points with non-zero signals is needed to give a non-zero response in accordancewith

( ) ( ))m/(

i

titytz121 +

∆+= ∏ ; (i=0,±1,…,±m) (5.5)

This procedure is related to the SPC algorithm [87] and the WMSM algorithm [88].Compared to SPC it has the advantage in representing the signals in the same scale asthe raw data, and with peak heights that are less dependent on the background level.

An optimal S/N improvement for a chromatographic peak with superimposed whitenoise is achieved by cross correlating the acquired signal with a proper noise freerepresentation of the peak, a procedure referred to as matched filtering [89,90]. Thematched filter implemented in the toolbox adopts a Gaussian peak model

)/x(A)x(f22 2e

2σ

πσ−= (5.6)

where A is the area and x is the position relative to the retention time tr. In order tomaintain a matched filter for the entire run, the model peak width (σ) is changed with tr

according to

21

rbta +=σ (5.7)

The constants a and b are specific for the chromatographic system in use [91,92] andcan be set by fitting Eq. 5.7 to a number of peaks in the chromatogram. Since the signaly(t) is sampled with regular intervals ∆t, the cross correlation is implemented as a digitalfilter according to

∑ ∆+∆= )ti(k)tti(y)t(z ; (i=0,±1,…,±m) (5.8)

The value of m, controlling the size of the time window, is for practical reasons andcomputational speed set to 4σ rounded to the next integer (this window coversessentially 100% of the peak). The filter coefficients k are given by the model peak (Eq.5.6) and can be scaled arbitrarily without affecting the S/N of the filtered signal z(t). Bysetting A equal to √2, the peak heights will be maintained and the filtering effect ismanifest as noise reduction.

The optimal gain in S/N ratio is achieved at the cost of chromatographic resolution,since the cross correlation operation for a matched filter will cause a peak broadeningby a factor √2. Another disadvantage with matched filtering is that white noise willadjust to the same frequency spectrum as the Gaussian peak, thus being harder todistinguish from real chromatographic peaks [Paper IV]. In Fig. 8 the matched filteringeffects on S/N and peak width is exemplified for the peak at 35 min (cf. Fig. 7).

Fig. 8. XICs (m/z 347) for the peak at 35 min before (a) and after (b) matched filtering.

The GSD menu supports the combination of matched filtering and two-folddifferentiation ('Gaussian Second Derivative') presented in Paper IV. By differentiatinga signal twice, linear background is eliminated and the peaks will appear as sharpened.These advantages have been utilised quite extensively in GC-MS [93,94]. The drawbackis that the noise is amplified. Hence it is natural to combine the differentiation with asmoothing filter.

30 32 34 36 38 40

0

2

4

6

8

t / min

I∗10

-5 /

cps

10a

30 32 34 36 38 40

0

2

4

6

8

10

t / min

b

22

The two operations filtering and differentiation can be performed in a single step, asshown already in 1964 by Savitzky and Golay [86]. In GSD, the matched filterdescribed above is combined with a two-fold differentiation according to

∑ ∆+∆=′′ )ti(k)tti(y)t(z ; (i=0,±1,…,±m) (5.9)

where the filter coefficients is given by the second derivative of the Gaussian modelpeak (Eq. 5.6 and 5.7)

)/x()x(C)x(f)x(k22 222 e σσ −−=′′−= (5.10)

The value of C can be set arbitrarily without affecting the S/N of z″. By normalising tounit square sum

∑ =∆ 12 )ti(k (5.11)

the noise level is unaffected and the filter effect is manifest as peak amplification. Thedegree of peak amplification is proportional to the square root of σ as shown in PaperIV.

It is especially favourable to combine the data produced by GSD with the BPCrepresentation. Since the background is suppressed, the BPC baseline will be loweredaccordingly. The baseline will also become more stable when defined by essentially alldetection channels rather than from the intensity of a few channels representing ionswith high background. Together with the peak amplification, this gives an improvedpeak detection capability of the BPC. This is illustrated in Fig. 9, which shows theBPCs in the retention interval 15-35 min (cf. Fig. 7), before and after filtering withGSD.

Fig. 9. BPCs for the LC-MS analysis of a rosemary extract before (a) and after (b)treatment with GSD.

15 20 25 30 350

2

4

6

t / min15 20 25 30 35

0

2

4

6

t / min

I∗10

-6 /

cps

a b

23

The GSD filtering has several positive effects also in the spectral domain. The massspectra obtained at peak maxima will show virtually zero intensity for background ions,while the ions corresponding to the analytes are amplified. This is illustrated in Fig. 10,which shows how GSD affects the mass spectrum of the peak at 23.5 min (cf. Fig. 7).

Fig. 10. The mass spectra of the peak at 23.5 min before (a) and after (b) GSD filtering.

Due to the peak sharpening with GSD, the mass spectrum obtained at the peakmaximum will be less influenced by nearly co-eluting compounds with Rs<0.35. On theother hand close-eluting compounds with higher Rs values may give considerablenegative contributions. In such a case, the presence of an impurity is indicated bynegative values for masses that are unique for the impurity.

The Purity menu supports the procedures included in the strategy for peak purityassessment presented in Paper V. The strategy involves detection of possible impurepeaks with BPC [95] and fixed size moving window evolving factor analysis (FSMW-EFA) [96,97], and further analysis of these peaks with local PCA modelling andcomparative mass chromatogram plots (CMCPs).

With BPC, impurities are detected by a change of the most dominant mass within apeak, as indicated by a plot colour change. Hence, to be detected the impurity must havea dominant mass that is different from the base peak mass of the main compound, andalso give higher intensity for some position within the chromatographic peak. In PaperV it was shown that the success of impurity profiling with BPC is highly dependent onthe chromatographic resolution. The strength of the method is that, apart from the basicrequirement, it is insensitive for spectral similarity between the main compound and theimpurity.

With FSMW-EFA the bilinear structure of the data (Eq. 5.2) is utilised. The principle isthat for each point in time, a sub-matrix is formed by the spectra recorded in thesurrounding time window of size 2m+1. The matrix is modelled with PCA (Eq. 3.2) and

100 200 300 400 500

0

2

4

6

8

10

m / z

I∗10

-5 /

cps

a

100 200 300 400 500

0

2

4

6

8

10

m / z

b

24

the eigenvalues λ are calculated from the score matrix (λλλλ=diag(TTT)). Finally, thesingular values (θ, the positive square root of λ) or the logarithm of the eigenvalues areplotted versus time. Thereby a rank map is obtained, where impurities are detected aspeaks in the second and higher order eigenvalue trace. In Paper V, the factorsinfluencing the detectability for impurities with FSMW-EFA were systematicallyinvestigated. A relationship S/N(θ2)=c∗ f(Rs,%I,σnoise) was found, where %I is theamount of impurity and c is a factor reflecting the degree of spectral similarity betweenthe main compound and the impurity. High degree of spectral similarity gives lowvalues for c, and hence a low detectability. An example of the results for S/N(θ2)obtained by varying Rs, %I and σnoise is given in Fig. 11.

Fig. 11. The detectability for impurities with FSMW-EFA, measured with the S/N for θ2, asa function of chromatographic resolution (Rs), amount of impurity (%I) and noise (σnoise).

When the complementary information from BPC and FSMW-EFA has been analysed,there is a need for further analysis of suspected impure peaks. With a CMCP the XICsfor two ions are plotted versus each other. A retention time difference will then show upas a loop pattern and peak shape deviations as a bow shape, while perfect agreementresults in a straight line with a slope determined by the peak height ratio [Paper V]. Aproblem is that it can be hard to determine which ions to study, especially if theimpurity is present at low level. PCA modelling of the peak cluster can support theselection, since ions related to the impurity and the main compound will appear atdifferent angles from origo in the loading plot. In Fig. 12 the peak purity assessmentstrategy is exemplified for the peak at 17.5 min. (cf. Fig. 7).

19.9 37.2

44.6 72.5

6.8 14.6

17.1 32.3

Rs

%I

σnoise

25

Fig. 12. Peak purity profiling of an impure LC-MS peak: (a) The FSMW-EFA plot wherethe presence of impurity is suspected from the peak for the second singular value; (b) TheBPC where the presence of impurity is suspected from the shift of base peak mass withinthe peak; (c) The loading plot of a local PCA model revealing two main directions, oneheaded by m/z 331 and one headed by m/z 299; (d) The CMCP for m/z 299 vs. m/z 331showing the typical loop pattern for a retention time difference.

The Peaks menu contains two methods that can improve the detectability of realchromatographic peaks. The first method is the CODA algorithm presented by Windiget al. [98]. With CODA the single channel chromatograms with the highest S/N arefound and used to give a reconstructed TIC, where those with low S/N are zeroed.Thereby the overall S/N is improved. A problem with this method is to find the dividingline between high and low S/N. The other method in the menu shows a normalised plotof the first eigenvalue trace obtained with FSMW-EFA (cf. the Purity menu). The ideais that real peaks will show a more bilinear data structure than the background andnoise, and hence be better described by the first principal component. In Fig. 13, theperformance is exemplified for the retention interval between 20 and 30 min (cf. Fig. 7).

17 18 190

12

t / min

θ∗10

-6a

17 18 190

5

t / min

I∗10

-6 /

cps

b

-0.8 -0.6 -0.4 -0.2 0

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

PC1 (51%)

PC2

(48%

)

c m/z 331

m/z 299

m/z 285

m/z 332

m/z 353 m/z 348

m/z 300

0 4.50

4.5

I(m/z 331)∗ 10-6 / cps

I(m/z

299

)∗10

-6 /

cps d

26

Fig. 13. Peak detection in an LC-MS analysis of rosemary: (a) The TIC for the retentiontime interval 20-30 min; (b) The corresponding first singular value trace.

Combined use of the methods in the toolbox might result in further data enhancement.In Paper V, it was shown that the presence of background impairs the peak purityassessment tools, and hence that pre-processing with OBS could be advantageous. InPaper IV, it was claimed that the apparent resolution enhancement achieved by GSDmight aid the BPC as a peak purity analysis method. This claim is supported by theBPCs for a peak cluster obtained in an LC-MS analysis of estrogens [Paper III] shownin Fig. 14.

Fig. 14. BPCs for a peak cluster, with changes in the base peak mass indicated, before (a)and after (b) filtering with GSD. The filtering improves the S/N and sharpens the peaks.Both these effects improve the detectability for the minor compound.

20 22 24 26 28 30

7

8

9

10

t / min

I∗10

-6 /

cps

a

20 22 24 26 28 30

70

80

90

100

t / min

Nor

mal

ised

θ1

b

t / min14 14.5 15 15.5 16

0

1

2

3

4

5

I∗10

-6 /

cps

14 14.5 15 15.5 160

2

4

6

8

t / min

a b

27

Combination of OBS and GSD can also be beneficial. This is exemplified in Fig. 15,which shows the BPCs obtained for the LC-MS analysis of 2 pmol of the anabolicsteroid nandrolone (a five times stronger solution was used in Paper IV) before and aftertreatment with OBS and GSD, alone or in combination. A possible explanation for theeffectiveness of this combination is that fluctuations of the background, which may beenhanced by the GSD filtering, are reduced by the use of OBS.

Fig. 15. BPCs for the LC-MS analysis of a nandrolone standard solution at a concentrationclose to the detection limit: (a) no peak is visible for the raw data; (b) after OBS treatment,the LC front at 1.6 min is visualised; (c) after GSD treatment, the LC front and thenandrolone peak at 4.6 min is visualised, but also a number of non-significant peaks; (d)after combined treatment with OBS and GSD, the LC front and the nandrolone peak areclearly registered.

1 2 3 4 50

1

2

t / min

I∗10

-5 /

cps

a

1 2 3 4 50

1

2

3

t / min

I∗10

-4 /

cps

b

1 2 3 4 50

1

2

3

t / min

I∗10

-4 /

cps

c

1 2 3 4 50

1

2

3

t / min

I∗10

-4 /

cps

d

28

6. LC-MS for quantification and classification

Quantitative analysis

Quantitative LC-MS and LC-MS/MS has mainly been performed with quadrupoleinstruments operated in the single ion monitoring (SIM) mode and the multiple reactionmonitoring mode, respectively, utilising the selectivity and sensitivity of the MS. Theanalytical peaks are then integrated, and the data are fitted to linear models orpolynomials in order to increase the dynamic range. Apparent drawbacks with thisprocedure are that only one or a few known analytes can be determined for each run andthat the second order advantage [99], i.e. the possibility of accurate analysis in thepresence of unknown interferents, is sacrificed. Except for arguments like convenienceand data storage problems there is, however, at least one good reason for using the SIMmode, and that is the improved counting statistics and thereby the higher S/N comparedto the scan mode [100]. In Fig. 16 this is exemplified with an LC-MS analysis of theanabolic steroid nandrolone.

Fig. 16. LC-MS analysis of 2 pmol nandrolone in: (a) SIM mode with dwell time 200 ms;(b) Scan mode with dwell time 1 ms. From counting statistics, the S/N is expected to beapproximately 14 times higher in the SIM trace.

The precision obtained in quantitative analysis with LC-MS has been considered as lowcompared to LC-UV [101]. A possible reason for this is the additional variance from theinterface, where small variations in the chemical conditions can influence the sensitivityfor the studied ion.

An alternative method, where some interference can be accounted for, is to record entiremass spectra, integrate over the peak duration and apply a multivariate regression modellike PLS. For co-eluting compounds in GC-MS, Demir and Brereton have shown thatthis procedure have some benefits compared to univariate regression [102]. Similarstudies on LC-MS data performed at our laboratory points in the same direction [103].

0 1 2 3 4 5 60

5

10

15a

t / min

I∗10

-3 /

cps

0 1 2 3 4 5 60

5

10

15

t / min

b

I∗10

-3 /

cps

29

With second order data it is possible to apply multiway methods, e.g. GRAM [104,105],n-PLS [106,107] or PARAFAC [36,108], for quantification (Table 3). A comparativestudy in this area was recently reported by Gardner et al. [109]. A problem with suchmethods is that irreproducible chromatographic conditions, as discussed below, canintroduce extra factors and hence models that can be hard to interpret and moreinfluenced by noise.

Table 3. Calibration properties with different data structures.

Zero ordera First ordera Second ordera

Data structure Scalar Vector (I×1) Matrix (I×J)Calibration Univariate Multivariate MultiwayRank for K samplesb 1 ≤min(I,K) unknownc

Interferents possibleto detect?

No Yes Yes

Interferents affectsaccuracy?

Yes Yes No

a Instrumentation classified according to Sánchez and Kowalski [99]b Determines how many analytes that can be quantifiedc For a 2x2x2 array ten Berge et al. [110] have shown that the rank is 3

A general problem for complex samples is that the presence of unknown interferentsmight affect the sensitivity for the analyte. Such interference is not covered by thesecond order advantage and it might be necessary to use internal standards [111] orapply the standard addition method [112].

PARAFAC modelling and the alignment problem (Paper VI)

The data obtained from several LC-MS runs can be organised as a three-way array D(I×J×K) with I mass spectra, J mass chromatograms and K objects. Each object kcontributes with an (I×J) matrix according to

( ) NScPD += Tdiag kk (6.1)

where P (I×M) contains the elution profiles for M solutes, ck (M×1) the concentrationsof the solutes forming an (M×M) diagonal matrix and S (J×M) the corresponding massspectra. When S and P are in common for all objects, as indicated in Eq. 6.1, D is trulytrilinear and suitable for PARAFAC modelling (cf. Eq. 3.8).

Deviations from the trilinear data structure can have several causes including, e.g.,variations in charge state distribution [113] or non-linear detector response. The single

30

largest problem, however, is variations in the elution profiles due to variations in thechromatographic conditions. An alternative is to apply multiway methods that canaccount for such variations, e.g. PARAFAC2 [114,115] or MCR-ALS [116]. Anotherapproach is to pre-process the data in order to minimise the influence ofchromatographic profile variations. Several such methods for chromatographicalignment of first order data have been reported [117-120]. There are also somemethods for second order data published, where the spectral information is used toguide the alignment procedure [121,122].

One algorithm for chromatographic alignment is the correlation optimized warping(COW) developed by Nielsen et al. [123,124]. In Paper VI the COW algorithm wasmodified for use with LC-MS data. The main principle of COW is to find correspondingtime positions (nodes) in a target chromatogram (T) and the chromatogram (P) subjectto alignment. The number of data points between two nodes is then made equal for Pand T by linear time warping. For a segment length (the two nodes included) of mT forT and mP for P, the warping is performed by determining mT positions pj, in P accordingto

( ) sj xjmm

p +−−−

= 111

T

P ; j=1,...,mT (6.2)

where xs is the position for the first node in the P segment. The values for the warped P-segment are then calculated by linear interpolation between the two points in P adjacentto each pj.

The optimal node sequence, as measured with a benefit function, is found by calculatingall the candidate solutions. This is done in a structured way by so-called dynamicprogramming. The size of the candidate solution space is given by the number of nodesand the value of the slack parameter t. The latter sets the limit for the amount of timewarping, i.e. how much the time axis is allowed to stretch or contract.

The modified COW algorithm presented in Paper VI is different from the originalmainly in three respects: (i) for which chromatogram the node positions are pre-set, (ii)the shape of the candidate solution space and (iii) the choice of benefit function. In themodified COW algorithm, the node positions are pre-set for T instead of P. Since thisselection is common for all P, they can be selected carefully in order to focus on keyfeatures in the data. The modified shape of the candidate solution space is shown in Fig.17. In the original COW algorithm, the extreme points of P are set to correspond to theextreme points of T. Thus the flexibility is largest in the middle of the chromatogram. Inreality the retention time deviations can be suspected to increase with tr and in themodified algorithm no positions are fixed, resulting in the largest time shift allowance atthe end of the chromatogram.

31

Fig. 17. The shapes of the candidate solution spaces for the original (a) and the modified (b)COW algorithms.

The benefit function (the function that measures the similarity between segments of Tand warped P) used in the original COW algorithm is the ordinary correlationcoefficient, while the modified algorithm, as default, uses a modified covariancemeasure. In Paper VI the properties of different benefit functions were studied.Comparisons were made between correlation and covariance and between total meancentering of the data and separate centering of each channel. It was found that for LC-MS data channel centering outperformed total mean centering in several respects, themost significant being a higher sensitivity for peak position deviations. An explanationfor this is that the analyte signals in LC-MS are present only for a minor set of thedetection channels. The other channels represent mostly background and noise and theircontributions to the overall covariance are minimised when each channel is centred tozero mean. The result of the comparison between covariance and correlation was that noclear guidelines for selection could be given for a general case.

In Paper VI, the modified COW algorithm was used as a pre-processing tool forPARAFAC modelling of the LC-MS data obtained for five runs (#a-#e) of a peptidestandard. Fig. 18 shows the BPCs for the five objects before and after alignment. Exceptfor the retention time variation in the raw data, the most apparent difference between theobjects was the almost total absence of the third peak in run #e. The question waswhether a PARAFAC model could reflect this sample-related variation in the presenceof the variations caused by the chromatographic conditions. Therefore a comparisonwas made between PARAFAC modelling of the raw data array and the aligned dataarray.

Seven factors were used in these models since the data were known to comprise sevenpeptides. In order to achieve interpretable results, non-negativity constraints were set onall modes when calculating the models. For the raw data this PARAFAC modelexplained only 59.5% of the total variance. The first mode loadings, representing themass spectra, showed high correlation between the factors, while the third modeloadings, representing the concentrations, included a lot of zeros. Problems were alsofound in the second mode loadings, representing the elution profiles, were some of the

N + 1

11 LP + 1

Nod

e #

a

1 LP + 1

b

Scan # Scan #

32

factors contained more than one peak. Altogether this was due to the same peptide beingmodelled at different peak positions. In summary, the varying chromatographicconditions made the seven factors insufficient to describe this data set.

Fig. 18. BPCs of five LC-MS runs of a peptide standard mixture before and after alignmentwith the modified COW algorithm.

For the aligned data, the seven factor PARAFAC model explained 97% of the totalvariance. The first mode loadings (Fig. 19) could be related to the mass spectra of theseven peptides, and the second mode loadings (Fig. 20) to the corresponding elutionprofiles. The separation of the co-eluting compounds (peak 4 and 5 in Fig. 19) is anexample of the second order advantage. The third mode loadings are related to theobjects. When PCA was performed for this matrix, two components were sufficient toexplain 99% of the variance. In the score plot (Fig. 21), run #e was separated from theother objects in the PC2 direction. This is in line with the direct observation made onthe BPCs. For non-normalised data, PC1 can be suspected to describe mainly variationsin the total amount of the compounds, while PC2 describes variations in the distributionbetween the compounds. The PARAFAC model could then be backtracked to discoverthat run #e differed from the other objects mainly by the low level for the third peak.

0 50 100 150 200 250 300Scan #

a

b

c

d

e

0 50 100 150 200 250 300Scan #

a

b

c

d

e

33

Fig. 19. The first mode loadings of a PARAFAC model representing the mass spectra ofbradykinin (1), luteinizing releasing hormone (2), substance P (3), oxytocin (4), metioninenkephalin (5), bombesin (6) and leucin enkephalin (7).

Fig. 20. The second mode loadings of a PARAFAC model representing the elution profilesof seven peptides (cf. Fig. 19).

500 700 900 1100

m / z

1st m

ode

load

ings

7

1

2

3

4

5

6

t

2nd

mod

e lo

adin

gs

1 2 3 4

5 6 7

34

Fig. 21. The score plot obtained by PCA modelling of the third mode loadings of aPARAFAC model.

0.9 1 1.1 1.2 1.3 1.4-0.2

-0.1

0

0.1

0.2

0.3

0.4

PC1 (97%)

PC2

(2%

)

e

d c

a

b

35

7. Concluding remarks and future aspects

LC-MS is a powerful instrumental set-up, capable of producing large amounts of highlyselective information. Optimal use of the sophisticated instrumentation and the acquireddata can be attained if LC and MS experts, chemometricians and software developerswork together side by side. My hope is that this thesis can contribute to facilitate suchcollaboration.

The analytical chemistry need in life science, combinatorial chemistry and processanalysis is continuously changing towards handling more complex samples atincreasingly higher speed. In addition, demands to monitor this data with high timeresolution is put forward in many areas. These needs have inspired instrumental andtechnical developments, e.g. coupled separation systems and hybrid MS instruments,using chip technology and on-line mass detection.

The trends in chemometrics [125] have always followed the developments in analyticalsystems and the computational power available. As the amount and dimensionality ofthe data increases, the role of chemometrics is not only to serve the analytical chemistsby guiding the flow of chemical information, but also a requirement for elucidation ofanalytical results. Multivariate patterns rather than the absolute amounts of specificanalytes will be increasingly important in many areas, and further developments of thechemometrics toolbox presented in this thesis should be directed accordingly.

Multiway analysis is by some considered as the new wave in chemometrics, and it is ofno doubt that second and higher order data will be even more common in the future.The success of multiway analysis is probably depending on algorithmic developmentsto account for non-ideal properties of the data. It is also of importance that the statisticsof the methods can be elucidated and understood.

36

8. Acknowledgements

During the last six years I have had the pleasure to be involved in numerous ofstimulating projects and to meet several interesting and skilful persons, each givingmore or less influence on this thesis. Especially I wish to thank:

! My supervisor Rolf Danielsson for interesting discussions concerning all the fieldsof chemometrics, for co-authoring and for critical reading of most parts of this work.

! My head supervisor, Professor Karin Markides, for taking me on as a Ph. D. student,and for all of the encouragement during these years.

! Sven Jacobsson at AstraZeneca and Arne Bergens at Pharmacia&Upjohn for co-authoring and for convincing me to start up this work.

! Gunnar Malmquist at Amersham Pharmacia Biotech for co-authoring and forfruitful discussions on chemometrics.

Thanks to all of my colleagues, past and present, at the analytical chemistry department.In addition to those already mentioned, special thanks is given to:

! Fredrik Bökman for nice company, fruitful collaboration, interesting discussions(not necessarily in the field of LC-MS!) and for critical reading of this thesis.

! Per Sjöberg for fruitful collaboration, interesting discussions concerning theelectrospray process and for critical reading of this thesis.

! My Moberg for fruitful collaboration and interesting discussions on chemometrics.! Jenny Samskog for nice company and collaboration.! Pernilla Koivisto for the preparation of several of the LC columns used (I only made

one to glow...!) and for critical reading of this thesis.! Barbo Nelson, Yngve Öst and Lasse Svensson for secretarial and technical support

whenever I have asked for it.

I would also like to thank all of the students that I have had the opportunity to teach inchemometrics during these years. It has always been an inspiring challenge!

There is actually a world outside the walls of Kemikum too. People in this world havealso contributed to the progress of this work and I especially wish to thank:

! Ivar Holmquist and his family for nice company at and beside the bridge table.! The team members of Disraeli and Jaggernaut.! My parents, Lennart and Mai, my brother Mats and my parents-in-law, Eva and

Roland, for long distance support and encouragement.! And finally, my two lovely daughters, Lisa and Sara (“Nu när pappa slutar jobba

kan vi väl få en lillebror?!”), and my dear wife Helena for love and patience. Whatwould I be without you?

37

9. References

1. V.L. Tal'roze, V.E. Skurat, G.V. Karpov, Russ. J. Phys. Chem., 43, 241 (1969)2. C.R. Blakley, M.L. Vestal, Anal. Chem., 55, 750 (1983)3. R.M. Caprioli, T. Fan, J.S. Cottrell, Anal. Chem., 58, 2949 (1986)4. C.M. Whitehouse, R.N. Dreyer, M. Yamashita, J.B. Fenn, Anal. Chem., 57, 675

(1985)5. E.C. Horning, D.I. Carroll, I. Dzidic, K.D. Haegele, M.G. Horning, R.N.

Stillwell, J. Chromatog., 99, 13 (1974)6. S.F. Wong, C.K. Meng, J.B. Fenn, J. Phys. Chem., 92, 546 (1988)7. J.P.C. Vissers, H.A. Claessens, C.A. Cramers, J. Chromatog. A, 779, 1 (1997)8. F. Houdiere, P.W.J. Fowler, N.M. Djordjevic, Anal. Chem., 69, 2589 (1997)9. N.M. Djordjevic, P.W.J. Fowler, F. Houdiere, J. Microcol. Sep., 11, 403 (1999)10. A.P. Bruins, T.R. Covey, J.D. Henion, Anal. Chem., 59, 2642 (1987)11. M. Dole, L.L. Mack, R.L. Hines, R.C. Mobley, L.D. Ferguson, M.B. Alice,

J. Chem. Phys., 49, 2240 (1968)12. J.V. Iribarne, B.A. Thomson, J. Chem. Phys., 64, 2287 (1976)13. B.A. Thomson, J.V. Iribarne, J. Chem. Phys., 71, 4451 (1979)14. J. Gienic, L.L. Mack, K. Nakamae, C. Gupta, V. Kumar, M. Dole,

Biomed. Mass Spectrom., 11, 259 (1984)15. M. Yamashita, J.B. Fenn, J. Phys. Chem., 88, 4451 (1984)16. M.L. Alexandrov, L.N. Gall', M.V. Krasnov, V.I. Nikolaev, V.A. Pavelenko,

V.A. Shkurov, Dokl. Akad. Nauk SSSR, 277, 379 (1984)17. Electrospray Ionisation Mass Spectrometry: Fundamentals, Instrumentation and

Applications, R.B. Cole (Ed.), John Wiley and Sons Inc., New York, 199718. P. Kebarle, M. Peschke, Anal. Chim Acta, 406, 11 (2000)19. L. Tang, P. Kebarle, Anal. Chem., 65, 3654 (1993)20. C.G. Enke, Anal. Chem., 69, 4885 (1997)21. S. Zhou, K.D. Cook, J. Am. Soc. Mass Spectrom., 12, 206 (2001)22. http://www.emsl.pnl.gov:2080/docs/incinc/chem_phd/SWdoc.html,

16 January 200123. D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.J. Lewi,

J. Smeyers-Verberke, Handbook of Chemometrics and Qualimetrics: Part A,Elsevier, Amsterdam, 1997

24. B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. De Jong, P.J. Lewi, J. Smeyers-Verberke, Handbook of Chemometrics and Qualimetrics: Part B,Elsevier, Amsterdam, 1998

25. S. Wold, K. Esbensen, P. Geladi, Chemom. Intell. Lab. Syst., 2, 37 (1987)26. B. Tan, J.K. Hardy, R.E. Snavely, Anal. Chim. Acta, 422, 37 (2000)27. L. Eriksson, E. Johansson, Chemom. Intell. Lab. Syst., 34, 1 (1996)28. C. Wikström, C. Albano, L. Eriksson, H. Fridén, E. Johansson, Å. Nordahl,

S. Rännar, M. Sandberg, N. Kettaneh-Wold, S. Wold, Chemom. Intell. Lab.Syst., 42, 221 (1998)

38

29. R.C. Henry, E. Sug Park, C.H. Spiegelman, Chemom. Intell. Lab. Syst., 48, 91 (1999)

30. K. Faber, B.R. Kowalski, Anal. Chim. Acta, 337, 57 (1997)31. R.G. Brereton, Analyst, 125, 2125 (2000)32. F.J. Rambla, S. Garrigues, M. de la Guardia, Anal. Chim. Acta, 344, 41 (1997)33. M. Rupprecht, T. Probst, Anal. Chim. Acta, 358, 205 (1998)34. A.D. Walmsley, Anal. Chim. Acta, 354, 225 (1997)35. H.A. Martens, P. Dardenne, Chemom. Intell. Lab. Syst., 44, 99 (1998)36. R. Bro, Chemom. Intell. Lab. Syst., 38, 149 (1997)37. Y. Tan, J.-H. Jiang, H.-L. Wu, H. Cui, R.-Q. Yu, Anal. Chim. Acta,

412, 195 (2000)38. M.M. Sena, J.C.B. Fernandes, L. Rover Jr., R.J. Poppi, L.T. Kubota,

Anal. Chim. Acta, 409, 159 (2000)39. E.J. Karjalainen, U.P. Karjalainen, Anal. Chim. Acta, 250, 169 (1991)40. A. de Juan, Y. Vander Heyden, R. Tauler, D.L. Massart, Anal. Chim. Acta,

346, 307 (1997)41. H. Lidén, T. Bachinger, L. Gorton, C.-F. Mandenius, Analyst, 125, 1123 (2000)42. J.R. Long, H.T. Mayfield, M.V. Henley, P.R. Kromann, Anal. Chem.,

63, 1256 (1991)43. J.R.M. Smits, W.J. Melssen, L.M.C. Buydens, G. Kateman, Chemom. Intell.

Lab. Syst., 22, 165 (1994)44. B. Walczak, Anal. Chim. Acta, 322, 21 (1996)45. Y.L. Loukas, J. Chromatog. A, 904, 119 (2000)46. C.B. Lucasius, G. Kateman, Chemom. Intell. Lab. Syst., 19, 1 (1993)47. B.M. Smith, P.J. Gemperline, Anal. Chim. Acta, 423, 167 (2000)48. M.G.B. Drew, J.A. Lumley, N.R. Price, Quant. Struct.-Act. Relat.,

18, 573 (1999)49. X. Shao, Z. Chen, X. Lin, Chemom. Intell. Lab. Syst., 50, 91 (2000)50. C. Darwin, On the origin of the species by means of natural selection,

John Murray, London, 185951. T. Lundstedt, E. Seifert, L. Abramo, B. Thelin, Å. Nyström, J. Pettersen,

R. Bergman, Chemom. Intell. Lab. Syst., 42, 3 (1998)52. P. Hedlund, A. Gustavsson, Anal. Chim. Acta, 371, 9 (1998)53. A.G. González, D. González-Arjona, Anal. Chim. Acta, 298, 65 (1994)54. L.R. Snyder, J. Chromatog. B, 689, 105 (1997)55. A.M. Siouffi, R. Phan-Tan-Lu, J. Chromatog. A, 892, 75 (2000)56. Y.L. Xie, J.J. Baeza-Baeza, J.R. Torres-Lapasió, M.C. García Alvarez-Coque,

G. Ramis-Ramos, Chromatographia, 41, 435 (1995)57. N.M. Djordjevic, F. Erni, B. Schreiber, E.P. Lankmayr, W. Wegscheider,

L. Jaufmann, J. Chromatog., 550, 27 (1991)58. T. Lundstedt, B. Thelin, Chemom. Intell. Lab. Syst., 29, 255 (1995)59. P.J.R. Sjöberg, C.F. Bökman, D. Bylund, K.E. Markides, J. Am. Soc. Mass

Spectrom., submitted60. ESI special issue, J. Mass Spectrom.,35, 761 (2000)

39

61. ESI special issue, Anal. Chim. Acta, 406, 1 (2000)62. ESI special issue, J. Am. Soc. Mass Spectrom., 11, 931 (2000)63. S. Zhou, M. Hamburger, J. Chromatog. A, 755, 189 (1996)64. M. Jemal, Z. Ouyang, D.S. Teitz, Rapid Commun. Mass Spectrom.,

12, 429 (1998)65. I. Mazsaroff, W. Yu, B.D. Kelley, J.E. Vath, Anal. Chem., 69, 2157 (1997)66. D.M. Garcia, S.K. Huang, W.F. Stansbury, J. Am. Soc. Mass Spectrom.,

7, 59 (1996)67. R. King, R. Bonfiglio, C. Fernandez-Metzler, C. Miller-Stein, T. Olah, J. Am.

Soc. Mass Spectrom., 11, 942 (2000)68. B.K. Choi, D.M. Hercules, A.I. Gusev, J. Chromatog. A, 907, 337 (2001)69. M. Stefansson, P.J.R. Sjöberg, K.E. Markides, Anal. Chem., 68, 1792 (1996)70. E.J. Karjalainen, U.P. Karjalainen, Data analysis for hyphenated techniques,

Elsevier, Amsterdam, 199671. J.S. Salau, M. Honing, R. Tauler, D. Barceló, J. Chromatog. A, 795, 3 (1998)72. S.E. Stein, J. Am. Soc. Mass Spectrom., 10, 770 (1999)73. J.M. Halket, A. Przyborowska, S.E. Stein, W.G. Mallard, S. Down,

R.A. Chalmers, Rapid Commun. Mass Spectrom., 13, 279 (1999)74. Y.Z. Liang, O.M. Kvalheim, R. Manne, Chemom. Intell. Lab. Syst.,

18, 235 (1993)75. A. Felinger, Data analysis and signal processing in chromatography,

Elsevier, Amsterdam, 199876. A. Felinger, T.L. Pap, J. Inczédy, Anal. Chim. Acta, 248, 441 (1991)77. V.J. Barclay, R.F. Bonner, I.P. Hamilton, Anal. Chem., 69, 78 (1997)78. X. Shao, W. Cai, Z. Pan, Chemom. Intell. Lab. Syst., 45, 249 (1999)79. P. Nikitas, A. Pappa-Louisi, Anal. Chim. Acta, 415, 117 (2000)80. F.Cuesta Sánchez, B. van den Bogaert, S.C. Rutan, D.L. Massart, Chemom.

Intell. Lab. Syst., 34, 139 (1996)81. T. Staaf, C.F. Bökman, D. Bylund, unpublished results82. D. Bylund, LC-MS Toolbox 1.0 Users guide, in preparation83. D. Bylund, R. Danielsson, G. Malmquist, K.E. Markides, Poster presented at the

14th International Mass Spectrometry Conference, Tampere, 199784. A. Lorber, Anal. Chem., 58, 1167 (1986)85. N.M. Faber, Anal. Chem., 70, 5108 (1998)86. A. Savitzky, M.J.E. Golay, Anal. Chem., 36, 1627 (1964)87. D.C. Muddiman, B.M. Huang, G.A. Andersson, A. Rockwood, S.A. Hofstadler,

M.S. Weir-Lipton, A. Proctor, Q. Wu, R.D. Smith, J. Chromatog. A, 771, 1(1997)

88. C.M. Fleming, B.R. Kowalski, A. Apffel, W.S. Hancook, J. Chromatog. A, 849, 71 (1999)

89. B. van den Bogaert, H.F.M. Boelens, H.C. Smit, Anal. Chim. Acta, 274, 71 (1993)

90. Z. Zhang, J.S. McElvain, Anal. Chem., 71, 39 (1999)91. Y. Shen, M.L. Lee, Anal. Chem., 70, 3853 (1998)

40

92. J.L. Excoffier, G. Guiochon, Chromatographia, 15, 543 (1982)93. A. Gosh, R.J. Anderegg, Anal. Chem., 61, 2118 (1989)94. W. G. Pool, J. W. de Leeuw, B. van de Graaf, J. Mass Spectrom., 31, 509 (1996)95. MultiView User's manual, PE Sciex, 199696. H.R. Keller, D.L. Massart, Anal. Chim. Acta, 246, 379 (1991)97. F.Cuesta Sánchez, J. Toft, O.M. Kvalheim, D.L. Massart, Anal. Chim. Acta,

314, 131 (1995)98. W. Windig, J.M. Phalp, A.W. Payne, Anal. Chem. 68, 3602 (1996)99. E. Sánchez, B.R. Kowalski, J. Chemom., 2, 247 (1990)100. F. Laborda, J. Medrano, J. R. Castillo, Anal. Chim. Acta, 407, 301 (2000)101. Y. Chen, M. Kele, A.A. Tuinman, G. Guiochon, J. Chromatog. A,

873, 163 (2000)102. C. Demir, R.G. Brereton, Analyst, 122, 631 (1997)103. D. Bylund, R. Danielsson, K.E. Markides, Poster presented at the 6th

Scandinavian Symposium on Chemometrics, Porsgrunn, 1999104. E. Sanchez, B.R. Kowalski, Anal. Chem., 58, 496 (1986)105. C.A. Bruckner, B.J. Prazen, R.E. Synovec, Anal. Chem., 70, 2796 (1998)106. R. Bro, J. Chemom., 10, 47 (1996)107. K.D. Zissis, R.G. Brereton, S. Dunkerley, R.E.A. Escott, Anal. Chim. Acta,

384, 71 (1999)108. J.L. Beltrán, J. Guiteras, R. Ferrrer, J. Chromatog. A, 802, 263 (1998)109. W.P. Gardner, R.E. Shaffer, J.E. Girard, J.H. Callahan, Anal. Chem.,

73, 596 (2001)110. J.M.F. ten Berge, H.A.L. Kiers, J. de Leeuw, Psychometrica, 53, 579 (1988)111. B.K. Choi, A.I. Gusev, D.M. Hercules, Anal. Chem., 71, 4107 (1999)112. J. Saurina, R. Tauler, Analyst, 125, 2038 (2000)113. A.T. Iavarone, J.C. Jurchen, E.R. Williams, J. Am. Soc. Mass Spectrom.,

11, 976 (2000)114. R. Bro, C.A. Andersson, H.A.L. Kiers, J. Chemom., 13, 295 (1999)115. H.A.L. Kiers, J.M.F. ten Berge, R. Bro, J. Chemom., 13, 275 (1999)116. A. de Juan, S.C. Rutan, R. Tauler, D.L. Massart, Chemom. Intell. Lab. Syst.,

40, 19 (1998)117. E. Reiner, L.E. Abbey, T.F. Moran, P. Papamichalis, R.W. Schafer, Biomed.

Mass Spectrom., 6, 491 (1979)118. R. Andersson, M.D. Hämäläinen, Chemom. Intell. Lab. Syst., 22, 49 (1994)119. T.L. Cecil, S.C. Rutan, Anal. Chem., 62, 1998 (1990)120. G. Malmquist, R. Danielsson, J. Chromatog. A, 687, 71 (1994)121. B.J. Prazen, R.E. Synovec, B.R. Kowalski, Anal. Chem., 70, 218 (1998)122. B. Grung, O.M. Kvalheim, Anal. Chim. Acta, 304, 57 (1995)123. N.-P.V. Nielsen, J.M. Carstensen, J. Smedsgaard, J. Chromatog. A,

805, 17 (1998)124. N.-P.V. Nielsen, J. Smedsgaard, J.C. Frisvad, Anal. Chem., 71, 727 (1999)125. S. Wold, M. Sjöström, Chemom. Intell. Lab. Syst., 44, 3 (1998)

chemometric tools for enhanced performance in liquid chromatography …160634/fulltext01.pdf ·...

Documents