processing and analysis of nmr data - diva...

95
Processing and analysis of NMR data Impurity determination and metabolic profiling Jenny Forshed A Doctoral Thesis Department of Analytical Chemistry Stockholm University 2005

Upload: others

Post on 04-Dec-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Processing and analysis of NMR dataImpurity determination and metabolic profiling

Jenny Forshed

ADoctoral Thesis

Department of Analytical ChemistryStockholm University

2005

Page 2: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Processing and analysis of NMR dataImpurity determination and metabolic profiling

Doctoral Thesis, 2005

Jenny [email protected]

Department of Analytical ChemistryStockholm UniversitySE-106 91 Stockholm

Academic dissertation for the Degree of Doctor of Philosophy in Analytical Chemistry at Stockholm University to be publicly defended on Thursday 1 December 2005 at 13:00 in Magnélisalen, Kemiska övningslaboratoriet,

Svante Arrhenius v. 10-12.

© Jenny Forshed 2005, pp 1-95. ISBN: 91-7155-139-5

Cover by Mina, Folke and Jenny ForshedAkademitryck AB, Edsbruk, Sweden

Page 3: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

till Mina, Folke och Östen

Page 4: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling
Page 5: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

| 5

Abstract

This thesis describes the use of nuclear magnetic resonance (NMR) spectroscopy as an analytical tool. The theory of NMR spectroscopy in general and quantitative NMR spectrometry (qNMR) in particular is described and the instrumental properties and parameter setups for qNMR measurements are discussed. Examples of qNMR are presented by impurity determination of pharmaceutical compounds and analysis of urine samples from rats fed with either water or a drug (metabolic profiling). The instrumental parameter setup of qNMR and traditional data pre-treatments are examined. Spectral smoothing by convolution with a triangular function, which is an unusual application in this context, was shown to be successful regarding the sensitivity and robustness of the method in paper II. In addition, papers III and IV comprise the field of peak alignment, especially designed for 1H-NMR spectra of urine samples. This is an important preprocessing tool when multivariate analysis is to be applied. A novel peak alignment method was developed and compared to the traditional bucketing approach and a conceptually different alignment method.

Univariate, multivariate, linear and nonlinear data analyses were applied to qNMR data. In papers I–II, calibration models were created to examine the potential of qNMR for these applications. The data analysis in papers III–VI was mainly explorative. The potential of data fusion and data correlation was examined in order to increase the possibilities of analysing the highly complex samples from metabolic profiling (papers V–VI). Data from LC/MS analysis of the same samples were used with the 1H-NMR data in different ways. Correlation analyses between the 1H-NMR data and the drug metabolites identified from the LC/MS data were also performed. In this process, data fusion proved to be a valuable tool.

Page 6: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

6 | Processing and analysis of NMR data

Page 7: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

| 7

Sammanfattning

Denna avhandling beskriver hur NMR (nuclear magnetic resonance) spektroskopi kan användas för kvantifiering i analytisk kemi. Teorin för NMR-spektroskopi i allmänhet och kvantitativ NMR (qNMR) i synnerhet är beskriven och instrumentella förutsättningar och parameterinställningar för qNMR diskuteras. Föroreningsanalys av två olika läkemedelskomponenter och analys av urin från råttor som matats med vatten eller läkemedel (metaboliska profiler) ges som exempel på qNMR. I de två första artiklarna utvärderades instrumentella parameterinställningar för qNMR och betydelsen av traditionell databehandling. Utjämning av spektra (”spectral smoothing”) med hjälp av faltning med en triangulär funktion visade sig förbättra både känslighet och robusthet hos qNMR-metoden i artikel II, dock syns få sådana applikationer i litteraturen. Artiklarna III–IV omfattar topplinjering, att anpassa analytiska signaler utefter en rät linje, anpassat för 1H-NMR data från urinprov. Detta är en viktig förprocessning av data vid multivariat analys. En ny metod för topplinjering utvecklades och jämfördes med den traditionella (integrering av förutbestämda, lika stora segment) och en konceptuellt skild topplinjeringsmetod.

Univariat, multivariat, linjär och olinjär data-analys har tillämpats på qNMR data. För att utreda potentialen av qNMR i artiklarna I–II gjordes kalibrerings-modeller. Data-analysen i artiklarna III–VI var i första hand explorativ. För att öka möjligheterna att analysera de mycket komplexa proverna från metaboliska profiler prövades olika metoder för datafusion och datakorrelation (artiklarna V–VI). Data från LC/MS- och 1H-NMR-analys av samma prover analyserades sammanslagna på olika sätt. Vidare utfördes korrelationsanalys mellan 1H-NMR-data och LC/MS-variabler identifierade som läkemedelsmetaboliter. I denna process visade sig datafusion vara ett värdefullt verktyg.

Page 8: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

8 | Processing and analysis of NMR data

Page 9: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

| 9

Preface

The work on this thesis has been carried out in the borderland between the fields of nuclear magnetic resonance (NMR) spectroscopy – mostly from a structure elucidation perspective, analytical chemistry – aiming for low detection limits and high accuracy and precision, and chemometrics – including the large field of multivariate data analysis. These three fields of science do not often come together in the same department, and even less often in the same person. This PhD thesis, however, aims to combine these fields of knowledge in order to create opportunities for novel research in the field of quantitative NMR spectrometry (qNMR).

As I come from a background in analytical chemistry and chemometrics, the work of this thesis started with the basic theories and practice of NMR spectrometry, which was then a new technique to me. The theory and background sections in this thesis were written more as a handbook for myself at that time. The thesis also contains a summary of the six papers I have completed during my PhD studies at the Department of Analytical Chemistry at Stockholm University. The laboratory work was done at AstraZeneca in Södertälje, which provided all the laboratory materials, chemicals and people with expertise in NMR spectroscopy.

Page 10: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

10 | Processing and analysis of NMR data

Page 11: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

| 11

Contents

List of papers _____________________________________13Abbreviations and notations _________________________151 Introduction ____________________________________192 Nuclear magnetic resonance theory ________________21

Spin ___________________________________________21Applying a magnetic field ___________________________22The laboratory frame ______________________________24The rotating frame of reference ______________________25NMR data acquisition _____________________________30NMR data processing _____________________________33

3 NMR for quantitative analysis _____________________37Background _____________________________________37qNMR methods __________________________________39The sensitivity of NMR spectrometry _________________39Accuracy and precision of qNMR ____________________43Discussion ______________________________________44

4 qNMR applications _____________________________45Impurity determination ____________________________45Metabolic analyses ________________________________50

5 Chemometrics __________________________________53Principal component analysis ________________________53Calibration ______________________________________55Data preprocessing _______________________________59Validation ______________________________________60

6 Peak alignment _________________________________63Profile alignment or peak alignment ___________________65Target for alignment ______________________________67Optimisation ____________________________________67NMR data compression ____________________________70

Page 12: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

12 | Processing and analysis of NMR data

Discussion ______________________________________707 1H-NMR + LC/MS ______________________________71

Data fusion _____________________________________72Data correlation __________________________________75Discussion ______________________________________77

8 Conclusions ____________________________________799 Future perspectives ______________________________81Acknowledgements ________________________________83References ________________________________________85

Page 13: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

List of papers | 13

List of papers

I. NMR and Bayesian regularized neural network regression for impurity determination of 4-aminophenol

Forshed, J., Andersson, F.O. and Jacobsson, S.P., Journal of Pharmaceutical and Biomedical Analysis 29 (2002)

495-505.

The author is responsible for the experimental work and for writing the paper.

II. Quantification of aldehyde impurities in poloxamer by ¹H-NMR spectrometry

Forshed, J., Erlandsson, B. and Jacobsson, S.P. Analytica Chimica Acta, 552 (2005) 160-165. The author is responsible for the experimental work, the calculations and the writing of

the paper.

III. Peak alignment of NMR signals by means of a genetic algorithm Forshed, J., Schuppe-Koistinen, I. and Jacobsson, S.P. Analytica Chimica Acta, 487 (2003) 189-199. The author is responsible for the experimental work, developing the algorithm and

writing the paper.

IV. A comparison of methods for alignment of NMR peaks in the context of cluster analysis

Forshed, J., Torgrip, R.J.O., Åberg, K.M., Karlberg, B., Lindberg, J. and Jacobsson, S.P.

Journal of Pharmaceutical and Biomedical Analysis 38 (2005) 824-832.

The author is responsible for the comparison idea, developing the algorithm for the segmentwise method and writing the main part of the paper.

Page 14: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

14 | Processing and analysis of NMR data

V. Evaluation of different techniques for fusion of LC/MS and 1H-NMR data

Forshed, J., Idborg, H. and Jacobsson, S.P. Chemometrics and Intelligent Laboratory Systems (2005)

submitted. The author is, together with H. Idborg, responsible for the ideas, implementing the code,

and writing the paper.

VI. Enhanced multivariate analysis by correlation scaling and fusion of LC/MS and 1H-NMR data.

Forshed, J., Stolt, R., Idborg, H. and Jacobsson, S.P. Chemometrics and Intelligent Laboratory Systems (2005)

submitted. The author is responsible for the major ideas (together with H. Idborg), some of the

computations and writing the paper.

Page 15: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Abbreviations and notations | 15

Abbreviations and notations

Abbreviations1H-NMR proton-NMR ADC analogue-to-digital converterCC correlation coefficientCOW correlation optimised warpingDFT discrete Fourier transformDSP digital signal processingDTW dynamic time warpingFID free induction decay = S(t)FT Fourier transformFW fuzzy warpingGA genetic algorithmGC gas chromatographyIR infrared spectroscopyLC liquid chromatographyLOD limit of detectionLS least-squares regressionLW local warpingMLR multiple linear regressionMS mass spectrometryNMR nuclear magnetic resonance NN neural networkN-PLS N-way PLSOPA outer product analysisPAGA peak alignment by means of a genetic algorithmPARAFAC parallel factor analysisPARS peak alignment by reduced set mappingPC principal componentPCA principal component analysis

Page 16: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

16 | Processing and analysis of NMR data

PCR principal component regressionPLF partial linear fitPLS partial least squares or projection to latent structuresPLS-DA PLS-discriminant analysisppm parts per million (the units of the horizontal scale in an NMR spectrum)PTW parametric time warpingPWA piecewise alignmentqNMR quantitative NMRRF radio frequencyRMS root-mean-squareRMSE root-mean-square errorRMSEP root-mean-square error of predictionS/N signal-to-noise ratioTMS tetramethyl silaneTMSP (sodium 3-(trimethylsilyl)propionate-2,2,3,3-d4)

Notations

Generally the algebraic notation is used as follows: small letters represent scalars, small bold letters represent vectors, and bold capitals represent data matrices.A(ω) absorption spectraat acquisition timeb regression vectorb receiver bandwidthB0 external magnetic fieldB1 RF pulseD(ω) dispersion spectraE residual matrixE energyf filling factorF(ω) the spectrum as a function of frequencyFsampling sampling frequencyFsignal the highest frequency signalh Planck’s constant (= 6.62618 E-34 Js)I spin quantum numberI(ω) imaginary part of spectrum k the Bolzmann constant (= 1.380 E-23 J/K)m magnetic quantum states

Page 17: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Abbreviations and notations | 17

M net magnetisationN number of detectable nuclei (magnetically equivalent)n the noise figure of the amplifiernt the number of transients (scans) p angular momentP loading matrixQ the quality factor of the RF coilR(ω) real part of spectrumS spectrum to be alignedS(t) the NMR signal as a function of time (= FID)rd relaxation delayrt the pulse repetition timeT sample temperatureT scores matrixT target spectrumT1 spin-lattice relaxation time constantT2 spin-spin relaxation time constantVs volume of the samplew weightsβ pulse angleμ magnetic momentν0 the Larmor frequency [Hz]φ phase errorω0 the Larmor frequency [rad/s]γ the magnetogyric ratioτ pulse width [μs]λ wavelength

Page 18: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

18 | Processing and analysis of NMR data

Page 19: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Introduction | 19

1 Introduction

Nuclear magnetic resonance (NMR) spectroscopy is a powerful technique for determination of molecular structures as well as chemical and physical properties. It allows the detection of the most common atoms in organic compounds, i.e. carbon and hydrogen. Experience of quantitative determinations by NMR spectrometry (qNMR) is still, however, fairly limited.

One limiting factor of NMR spectroscopy as a quantitative tool is that it is considered to be a relatively insensitive technique. However, recent developments of the technique have led to fairly robust instrumentation and low limits of detection in the picomole range have been reported.1 Consequently, NMR spectroscopy has attracted a growing interest and has become a viable analytical technique.

The scope of this thesis

The aim of this thesis has been to address the possibilities and limitations in qNMR and to make a contribution to this field of research. While the requirements for an accurate and precise analysis with qNMR has been investigated in the literature,2-9 there is a large demand for further investigation.

In analytical chemistry reference is frequently made to what is known as the analytical chain:

Sampling → sample preparation → instrumental analysis → data analysis.

This work includes all the steps in this chain, with the main emphasis on the last one. Linear and nonlinear, univariate and multivariate calibration methods have been employed, as well as principal component cluster analysis. The substantial need for methods to correct peak shifts, especially when analysing biological samples, were realised and explored. In response to this need, such methods

Page 20: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

20 | Processing and analysis of NMR data

were developed. Furthermore, techniques for data fusion and data correlation of 1H-NMR and LC/MS data from metabolic samples have been investigated and evaluated.

This thesis will cover some basic theory of NMR spectroscopy, to an extent comprehensive enough to enable one to understand the significance of qNMR and be able to use it. This will be followed by the special case of analysing low-level components by qNMR in general. 1H-NMR spectrometry for impurity determination (papers I–II) and urine analysis (papers III–VI) will be described in more detail.

The final chapters will include the chemometric methods that have been used in the papers. Separate chapters have also been devoted to peak alignment, which has been dealt with in papers III and IV, and data fusion, covered in papers V and VI.

My experiences, results and conclusions from the work in the papers will be addressed in the chapters when considered relevant.

Page 21: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 21

2 Nuclear magnetic resonance theory

This chapter includes the NMR theory needed for understanding quantitative measurements by one-dimensional NMR spectrometry. Some quantum mechanics and classical mechanics will be addressed to facilitate the understanding of the theories. More extensive descriptions of the NMR theory can be found in the references.4,9-17

History

The theoretical bases of NMR spectroscopy were proposed by W. Pauli as early as 1924. However, it was not until 1946 that Bloch and Purcell independently showed that nuclei absorb electromagnetic radiation in a strong magnetic field.18,19 They shared the Nobel prize in Physics for this work in 1952. In 1953 the first high-resolution NMR spectrometer was presented (a continuous wave instrument), although it was not until about 1970 that the first Fourier transform (FT) NMR instrument was available on the market. At least eight Nobel prizes in physics and two in chemistry have been awarded to scientists for their work in the field of magnetic resonance.1,8,17

SpinAtomic nuclei have a spin quantum number, I. If I differs from zero, the nucleus possess a magnetic moment (μ) that may interact with an external magnetic field:

, (1)

where γ is the magnetogyric† ratio and p is the angular momentum. γ will differ between nuclei and may be positive or negative. The possible values of p are quantised and depend on I. The interrelation between the nuclear spin and μ leads to 2I+1 observable magnetic quantum states (m).

†Also denoted as gyromagnetic

Page 22: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

22 | Processing and analysis of NMR data

Applying a magnetic fieldThe fact that nuclei are magnetic allows us to study them by means of NMR spectroscopy. A nucleus with I ≠ 0 will interact with a magnetic field and can absorb or emit electromagnetic radiation at certain resonance conditions. Some of the most important nuclei in organic chemistry and biochemistry (1H, 13C, 19F and 31P) have I = 1/2 and therefore two allowed magnetic quantum states: m = +1/2 and –1/2. The 1H nuclei will be the model in the following theoretical descriptions.

In the absence of an external magnetic field, the nuclear spins are disordered. Applying a magnetic field (B0) will make the 1H nuclei orient with a small access in the lower energy state (m = +1/2) according to the Bolzmann distribution:

, (2)

where k is the Bolzmann constant, T is the absolute temperature and N represents the population at the different stages. The excess of nuclei in the lower energy state is small (64 nuclei per million (for 1H) in a 400 MHz instrument). The difference in energy (ΔE) is proportional to the strength of the magnetic field according to:

, (3)

where h is Planck’s constant and B0 the external/applied magnetic field. This creates the basic prerequisites for NMR spectrometry.

Energy transitions

The transition between lower and upper energy levels in NMR spectroscopy is analogous to absorption in other spectroscopic methods, although wavelengths and frequencies differ (Figure 1). However, the discrete energy levels between which the energy transitions take place are created artificially in NMR spectroscopy, by placing the nuclei in a magnetic field.

Page 23: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 23

ŵcm

MHz

X-ray

VIS γ-rays

IR UVMWNMR

10-1410-1010-610-2102106

Frequency (Hz)

1022

Wavelength (m)

radio

101810141010106102

m

----

Figure 1: The electromagnetic spectrum.

Transitions between energy states are brought about by the absorption or emission of electromagnetic radiation of a frequency ν0 with the corresponding energy:

(4)

From equations (3) and (4) it follows that the frequency of radiation required for a transition is:

(5)

Applied energy with the frequency ν0 will induce both upward and downward spin transitions of nuclei. As long as there are a greater number of nuclei in the lower energy state, the energy absorption will exceed the emission and a signal can be observed. However, continuously applied energy of the “right” frequency will cause the spin populations to become equal and no signal will be observed, due to competing relaxation processes. This phenomenon is called saturation and may be utilised when a specific signal (for example, water) needs to be suppressed,20 e.g. when urine samples are to be analysed.

Page 24: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

24 | Processing and analysis of NMR data

† As when a toy gyroscope starts to precess/wobble because of the gravity force.

The laboratory frameTo illustrate the measurement of energy transitions, a classical description of the behaviour of a charged particle in a magnetic field is illuminative. A coordinate system with B0 along the z axis, and the x and y axes describing the horizontal plane, is introduced to facilitate the following theories. This is called the laboratory frame of reference.

Larmor frequency

When a magnetic field (B0) is applied to a nucleus with a magnetic moment (I ≠ 0), it starts to precess about the z axis because of the gyroscopic effect,† Figure 2.

Figure 2: A precessing nucleus. B0 is the applied magnetic field and ω0 is the angular frequency of motion.

The angular frequency of the motion (ω0) is called the Larmor frequency and is given by:

. (6)

ω0 [rad/s] is equivalent to ν0 [Hz] (equation (5)). The frequency, ν0, for the 1H nucleus is actually what is referred to when the strength of an NMR magnet is denoted in MHz.

Net magnetisation

The ensemble of spins in a sample generates an average magnetisation or net magnetisation (M0) in the direction of the z axis upon interaction with a magnetic field (B0), Figure 3. The individual spins may point in any direction in the xy plane, so their component in the xy plane evens out.

Page 25: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 25

Figure 3: The net magnetisation, M0.

M0 is the sum of all the spins and is directly proportional to the population difference between the low and high energy levels, N+1/2

– N-1/2. Although the magnetic moment (μ) in each nucleus is quantised, M0 is continuous since it reflects the whole population of nuclei.

The rotating frame of referenceTo further facilitate the illustration of energy absorption and to describe the single pulse experiment, we change the angle of view. When studying the nuclei in the laboratory coordinate system, the laboratory frame of reference, each nucleus is precessing at the Larmor frequency (ν0 or ω0) as described above. By subtracting the laboratory frame we can define a new rotating frame of reference in which the precession around the z axis is cancelled out. Moving our view into this rotating coordinate system makes it easier to illustrate the observed energy transitions. This is further described below in “Analogue filtering”.

RF pulse

When a radio frequency (RF) pulse (B1) at the Larmor frequency is applied perpendicular to B0 we achieve a resonant condition and the system absorbs energy (∆E). The populations of N+1/2

and N-1/2 are changed according to the Bolzmann equation (2) and M0 changes direction. Application of the RF pulse is equivalent to application of a magnetic field along the x or y axis. This rotates M0 away from the z axis into the xy plane to give Mxy , Figure 4.

Page 26: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

26 | Processing and analysis of NMR data

Figure 4: The direction of the net magnetisation (M0) is changed with the angle β when an RF pulse (B1) is applied to give Mxy.

The changed angle (β) of M0 depends on the field strength of the RF pulse (B1) and the duration τ (typically μs) according to:

(7)

Relaxation

After applying the RF pulse, the system will relax back and Bolzmann equilibrium (2) will be re-established. The relaxation can be described by two different processes, the disappearance of Mxy and the appearance of Mz until equilibrium is reached and Mxy = 0 and Mz = M0, as illustrated in Figure 5.

Figure 5: The directions of T1 and T2 are shown in a schematic representation of the relaxation of Mxy back to M0.

Page 27: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 27

The process restoring Mz is called the spin-lattice (or longitudinal) relaxation and is defined by the relaxation time constant T1. The energy of the excited state dissipates into molecular vibrations, rotations etc, until M0 has returned to its original value on the z axis. The “lattice” denotes the surroundings of the molecule, hence T1 will depend on, for example, the viscosity of the sample (low mobility will give a high T1).13

The loss of Mxy is due to intrinsic molecular properties and to instrumental imperfections. When excluding the contributions from instrumental imperfections, we may define the spin-spin (or transverse) relaxation time constant T2, which includes interactions between neighbouring nuclei. This results in loss of xy magnetisation until no preferred orientation in the transverse plane remains. When we include the contribution of instrumental imperfections to the loss of xy magnetisation, we define T2*. T2* determines the spectral resolution (peak width) and depends on, for example, inhomogeneities in the magnetic field.

T2* is always less than T2 and T2 is always less than or equal to T1. For many liquids, T1 and T2 are about the same and in the order of seconds.

In contrast to other spectroscopic methods, it is not possible to accurately measure the absolute values of transition frequencies. Instead, the difference between two resonance frequencies is measured. Each nucleus, rotating with its own resonance frequency ωi, differs from the Larmor frequency ω0 by a few parts per million (ppm) in a span of about 10 ppm for the 1H-nuclei. This relaxing magnetisation induces a measurable current in the xy plane, the free induction decay (FID), Figure 6.

Page 28: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

28 | Processing and analysis of NMR data

Figure 6: Schemes of the relaxation of Mxy back to equilibrium (M0) after an applied RF pulse. The drawing also depicts the RF and detection coils. The relaxation of My with the time constant T2

shows the free induction decay (FID).

The FID is composed of exponentially decaying signals, expressed in simple form by

, 0<t<at, (8)

where ωi represents the angular frequency of the ith of K magnetically different nuclei in the same sample. φ is the phase error, which will be corrected for later, t is the time and at is the acquisition time.

Page 29: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 29

Quadrature detection

To distinguish between vectors rotating faster or slower than the carrier frequency, the FID is sampled in two directions perpendicular to each other (Figure 6). This is called quadrature detection14 and results in two signals with phases shifted 90o. These two signals are subsequently treated as one real R(t) and one imaginary part I(t):

(9)

. (10)

One signal of the FID may be illustrated as a combination of equations (9) and (10):

. (11)

Chemical shifts

Resonance frequencies from the same isotopes will differ depending on different molecular (magnetic) surroundings, which give the chemical shifts. Therefore, nuclei with different surroundings result in peaks at different frequencies in an NMR spectrum. In 1H-NMR spectroscopy TMS (tetramethyl silane) or TMSP (sodium 3- (trimethylsilyl)propionate-2,2,3,3-d4) are usually set as the reference value of 0 ppm (Figure 7). This relative scale implies that each peak will appear at the same chemical shift, regardless of the strength of the magnetic field.

Coupling constants

The energy levels of a nucleus may also be affected by the spin state of nuclei nearby, in which case the nuclei are said to couple to each other. This will give split peaks and the magnitude of this separation depends on the strength of the coupling interaction, the coupling constant (Figure 7). These coupling patterns are crucial for the determination of chemical structures but will not be described further here.

Page 30: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

30 | Processing and analysis of NMR data

Figure 7: A sketch of a 1H-NMR spectrum of ethane (CH3-CH3) showing the chemical shift scale (δ) and the coupling constant (J).

TMS assigns the reference value δ=0.

NMR data acquisition

Locking and shimming

The frequency of a resonance in the sample (usually deuterium in the solvent) is compared to a fixed frequency derived from the master clock used to generate the spectrometer frequency. Any deviation due, for example, to small changes in the magnetic field is converted into a correction to B0. This is called field frequency locking.4,21

The lack of homogeneity in the magnetic field is compensated for by spinning the sample and/or by shimming. Shimming is the automatic or manual adjustment of certain shim coils or magnetic gradients until the instrumental contributions to the observed line width are minimised.4,21

Analogue filtering

We are interested in spectral frequencies (ωi) relative to some reference frequency (ω0), as stated above. The NMR signal from the laboratory frame of reference is demodulated by subtracting the reference frequency (ω0) by a bandpass filter, leaving a signal which then is sampled and digitised. This is comparable with how FM radio stations broadcast in the MHz range. The audio signal desired is superimposed on a high-frequency carrier signal, and this carrier frequency is tuned into the radio receiver and subsequently subtracted to leave a signal consisting of frequencies audible to the human ear. ω0 is adjusted by tuning (to find the right frequency) and matching (impedance).21

Page 31: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 31

Analogue to digital conversion

The FID collected by the detection coils in the xy plane (Figure 6) is converted to digital signals. An analogue-to-digital converter (ADC) is characterised by its sampling rate and dynamic range. The Nyquist sampling theorem22 says that the rate of sampling (Fsampling) must be at least twice the frequency of the highest frequency signal (Fsignal) that is to be detected, i.e. at least two points per cycle of the highest frequency signal according to

Fsampling ≥ 2 Fsignal. (12)

The dynamic range of the ADC gives the limit of the smallest signals that are measurable in the presence of a strong signal and may be a source of error/noise when there are large concentration differences.1,22 This is the case, for example, in impurity determinations and biological samples containing a substantial amount of water. A typical dynamic range is 16 bits, which corresponds to an intensity resolution of 1/216. It is important to set the receiver gain of the instrument so that the dynamic range is fully utilised, but to avoid overflow, which results in a truncated FID. Theoretically, the height of the smallest detectable peak is then 1/216-1 = 30 µmol per mole of the main peak. This is also shown to be the practical limit in paper I, where the influence of the dynamic range is further discussed.

Digital filtering

Digital signal processing (DSP) or digital filtering represents processing for extending or enhancing the spectrometer capabilities, particularly when analysing small signals in the presence of large ones. The general steps involved are oversampling, filtering and downsampling.23

Oversampling is a technique borrowed from audio technology and has proved to be useful in NMR technology. The oversampling rate describes how many times (q) greater the actual sampling rate is compared to the Nyquist sampling rate (equation (12)). Theoretically, an oversampling factor of q gives a gain in the effective dynamic range of the ADC by log2(q) bit. For instance, with an oversampling of 8, a gain of 3 bits in ADC resolution is obtained.22

Page 32: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

32 | Processing and analysis of NMR data

After the oversampled data is acquired, a digital filter is applied to the FID to reduce noise or extraneous signals outside the final spectral width desired. This consists of signals with frequencies higher than the Nyquist frequency, which will be “undersampled” and consequently appear at the wrong frequency in the spectrum.

Downsampling of the filtered data is then done to reduce the data size, which is generally reset to the values of the selected spectral width and the Nyquist sampling frequency.

Signal averaging

In the pulsed NMR technique, the responses from multiple experimental repetitions may be co-added, a technique known as signal averaging. Like infrared (IR) spectroscopy, NMR spectroscopy fits the definition of a “detector noise limited” measurement, which means that the process of signal averaging results in an increase of the signal-to-noise ratio by the square root of the number of repetitions.24

, (13)

where S/N is the signal-to-noise ratio and nt is the number of transients (scans).

Important parameters to be set for a repeated pulse experiment are the acquisition time (at) during which the FID is collected and the relaxation delay (rd), see Figure 8. The choice of at and rd is influenced by the relaxation processes T2* and T1.

pp

at at rdrdFigure 8: An example of two repeated RF pulses (p) from the transmitter (upper) with acquisition time (at), during which the FID is collected, and relaxation delay (rd) of the reciever (lower).

Page 33: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 33

NMR data processing

Zero filling

The spectral resolution could be enhanced by zero filling (zero padding) of the FID.9,11,24,25 The information content in the FID is not increased but the distribution between the real and imaginary parts of the spectrum is changed. A rule of thumb is to zero fill by at least double the number of data points to regain the spectral resolution after Fourier transformation, when the imaginary part of the spectrum is eliminated.

Apodisation

More information from the spectrum may be obtained if the FID is multiplied by a suitable apodisation function before Fourier transformation.26,27 The Lorenzian line shapes in spectra may be changed by multiplying the exponentially decaying FID by a function. This will affect the spectral resolution as well as S/N.

A negative exponential function will emphasise the early part of the FID with the strongest signal and suppress the late part, which consists mostly of noise. Such a function would increase S/N but also give line broadening since it would simulate faster relaxation (T2*). However, in general, broader peaks will be covered with more data points which help to obtain more precise integral values. e(-bt) is a so-called matched filter (compare with equation (8)) where b is called the line-broadening factor. The matched filter is evaluated in papers I–II.

Besides the negative exponential function, an abundance of various functions have been used together with the FID. Examples are gaussian, lorenzian-to-gaussian transformation, sine bell and trapezoidal function.12,27 No single apodisation is optimal for all signals in spectra and different functions will enhance different phenomena, such as S/N or resolution.

Convolution

Furthermore, convolution or spectral smoothing could be performed on the spectrum in various ways. To give one example, a one-dimensional convolution with a triangular function26 means

Page 34: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

34 | Processing and analysis of NMR data

that the spectra are filtered by a triangular window. The convolution theorem says that convolving two vectors is equal to multiplying their Fourier transforms.14 Thus, the convolution with a triangular function is the same as multiplying the FID by the Fourier transform of a triangular function (an exponentially decaying cosine curve), which also enhances the very first part of the FID. This has been tried and evaluated in paper II.

Fourier transformation

One aim of NMR data processing is to calculate the frequencies (ωi) from equation (8) and obtain a spectrum. The most usual method for converting the NMR signal from the time domain into a frequency domain is Fourier transformation (FT).24,26

Fourier-transforming the signal S(t) from equation (11) gives a function of frequency, F(ω), also consisting of one real and one imaginary part:

(14)

Since we have discrete values from the ADC, a discrete FT (DFT) must be applied. DFT requires linear sampling, data sampling at regular time intervals until the signal is close to zero, and truncation of the FID will lead to artefacts.

Hodgkinson and Hore28 have pointed out possible ways in which nonlinear sampling of data can be more efficient. A frequency spectrum could then be obtained by alternative data processing procedures such as least squares methods (LS),29 linear prediction (LP),30 the Bayesian method31,32 or the maximum entropy method (MEM).33 These alternative methods are recommended for quantitative problems in the literature,5,10 although only a few have been employed.34

Page 35: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Nuclear magnetic resonance theory | 35

Phase correction

The FID S(t ) as well as the Fourier-transformed spectrum F(ω) has been described as a signal consisting of one real part R and one imaginary part I, equations (11) and (14). Due to phase shifts, R(ω) and I(ω) consist not just of the absorption spectra A(ω) and the dispersion spectra D(ω) respectively, but a linear combination of the two according to:

(15)

where φ represents the phase error. A linear combination of R(ω) and I(ω) will give the pure absorption and dispersion spectra, according to

(16)

This is obtained by phase correction, which is most often performed manually. Since the phase correction is frequency dependent, φ is defined as:

, (17)

where the constant phase error, φ0, arises from the inability of the spectrometer to detect the signals from the two detector coils in the exact x and y directions. Frequency-dependent linear phase errors, φ1, arise because the detectors cannot detect the transverse magnetisation immediately after the RF pulse due to the risk of pulse breakthrough and the RF pulse being slightly off the Larmor frequency of every nucleus.4

Page 36: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

36 | Processing and analysis of NMR data

Page 37: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

NMR for quantitative analysis | 37

3 NMR for quantitative analysis

Peak area determination is very common in routine proton NMR spectrometry. Accurately calculating integrals to determine the relative number of protons contributing to particular peaks is straightforward. For this purpose, an integral accuracy of 10–15% is satisfying and easily achieved. When NMR spectrometry is applied as a quantitative analytical tool, an accuracy of 1–2% or better is desirable. In this thesis NMR spectrometry for quantitative applications is denoted qNMR;35 however, this is not a generally accepted convention. This chapter will review the possibilities of, discuss the requirements for and hopefully give a deeper understanding of the applicability of qNMR.

BackgroundEarly research in the qNMR field was done, among others, by Muller and Goldenson,36 Hollis,37 and Jungnickel and Forbes38 and by 1976 qNMR in pharmaceutical research had been reviewed.39 Several texts have discussed this subject in the last two decades1,5,8-10,25,34,35,40,41 and the notation QNMR was introduced in 1998.7

The use of NMR spectrometry as a quantitative technique has been rather limited despite its many advantages, which include the following: Since it is able to detect small differences in a molecular structure, it is a very selective technique. It is also directly applicable to most samples, little or no sample preparation being required. For instance, samples with a complex background matrix such as urine,42 plasma,42 petrol,43,44 oil,45 beer46 or egg yolk extracts47 can be analysed virtually directly. It is also a non-destructive tool of analysis. One special advantage of NMR spectrometry is its property of being a primary method of measurement, a method with the highest metrological qualities whose operations can be completely described

Page 38: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

38 | Processing and analysis of NMR data

and understood. Thus the results can be accepted without reference to an independently determined standard of the same material but can be determined directly from the physical context. In theory, the peak intensity of each NMR signal exactly reflects the molar ratio of the nuclei present (i.e. the molar response is 1 for all nuclei of the same isotope), making the technique conceptually and technically simple for quantification.Because organic compounds typically generate several peaks, the complexity of the spectra, especially for 1H-NMR spectrometry, may cause problems as well as bringing an advantage. A discouraging fact in qNMR analysis may be the large amount of parameters that have to be set. The most important ones are accounted for in this chapter. Compared to other analytical techniques, the major disadvantage of qNMR is its high limit of detection.

qNMR vs other analytical techniquesDespite its limitations, qNMR may rival chromatography in sensitivity, speed precision and accuracy, while also avoiding the need for a reference standard of each analyte. Examples where qNMR is comparable and even more accurate and precise than the standard liquid chromatography (LC) method are easily found in the literature.7,40,48-51, papers I–II Other advantages of qNMR over LC include time for development and validation, robustness and analysis time (i.e. cost),48 together with the possibility of getting a good overall picture of all types of organic compounds in the sample. Compared to mass spectrometry (MS), where individual events could be counted, NMR spectrometry has S/N ratios that are lower by many orders of magnitude. However, it does have some advantages over MS: normally, no separation (chromatographic or other) is required, no expensive, authentic reference samples are necessary and, in many cases, it is quicker and easier to perform.48

There are no general rules for predicting whether analytical problems can be solved by NMR spectrometry. Pauli et al.40 demonstrated the strong demand for non-chromatographic alternatives in the quality assessment of reference compounds. The area of impurity determination (papers I–II) is suitable for qNMR since the technique is very selective. Favourable application areas of qNMR are also where it can replace complicated and time-consuming sample preparation and analysing methods, one example being found in paper II.

Page 39: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

NMR for quantitative analysis | 39

qNMR methods Two principal methods may be used for qNMR: the absolute method with reference to an internal standard compound (a primary method) and the calibration method with reference to a calibration model.

(i) The absolute method

The intensity of a signal (A) is proportional to the total number of nuclei generating it (N). Comparing two signals, where one works as an internal standard (is), gives the relationship:

, (18)

which makes it easy to determine the number of nuclei generating the other signal (s) without any calibration. However, in order for equation (18) to be valid, the nuclei s and is have to be relaxed to the same extent or, easier to fulfil, fully relaxed.

(ii) The calibration method

NMR spectrometry can also be used without utilising the advantage of the equal molar response for all nuclei. However, this requires a calibration based on the internal standard technique. The equal molar response does not then have to be valid, and incomplete relaxations or peak interferences may be included in the model. The focus of the analysis will then be to optimise the S/N ratio for the particular quantification task. Calibration qNMR is used (more or less) in all the papers in this thesis, although it has been presented in the literature43,44,47,52-55 to a lesser extent than the absolute method.

The sensitivity of NMR spectrometryThe low sensitivity of NMR spectrometry26 is due particularly to the very small differences in energy that are measured. Low-frequency energies are also more difficult to detect than high-frequency ones. Moreover, only the small differences in population between the energy states (equation (2)) are observed in the experiment.

Page 40: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

40 | Processing and analysis of NMR data

However, since the nuclei have a relatively long lifetime in the excited state (compared to other spectroscopic techniques), the resonance is well defined, resulting in narrow peaks, according to Heisenberg’s uncertainty principle.†,13

New technical developments in the NMR instrumental field will provide greater sensitivity in the analyses. Imminent developments include higher magnetic field strengths. Also NMR probes with cooled transmitter/receiver coils and preamplifiers will increase the sensitivity due to reduction of electronic thermal noise. Probes cooled by cryogenic liquids such as liquid helium are called cryoprobes and will increase the S/N ratio approximately fourfold or more.56 References1,57 are also made to the possibilities of using microcoil technology in order to reach a limit of detection (LOD) down to picomole (10-12) level.

Signal-to-noise ratio

The achievable signal-to-noise ratio (S/N) of an NMR single pulse experiment is a function of a number of parameters. These are summarised in equation (19)58,59 and in Table 1, where the time parameters are also included.

(19)

Instrumental parameters determined by the available equipment are the filling factor ( f ), the magnetic field strength (B0), receiver bandwidth (b), the quality factor of the RF coil (Q) and the noise figure of the amplifier (n). The filling factor is a measure of the fraction of the coil volume occupied by the sample and could be increased by fixing the RF coil directly onto the sample cell. This construction, however, makes it impossible to spin the sample to get improved magnetic field homogeneity. On the other hand, the magnetic field homogeneity in such a probe is superior to those with an unfixed coil. According to equation (19) an increase in magnetic field strength will increase S/N by a power of 3/2, i.e. changing a magnet of 500 MHz into 600 MHz would give an increase in S/N of 30%. Instrumental factors affecting sensitivity are further discussed by Hoult.60

†To measure an energy difference accurately, one needs a long time. If the system relaxes rapidly, the time available is small and hence ΔE (=hν) is poorly defined and the observed NMR signals are wide.

Page 41: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

NMR for quantitative analysis | 41

Table 1: A summary and categorisation of the factors influencing NMR spectrometry S/N.16,59

Categorisation Parameter Possibility of S/N enhancement

f filling factor RF coil fixed onto the sample cell

b receiverbandwidth decrease

Qthe quality factor of the RF coil

increase

nthe noise figure of the amplifier

decreased by cooled probe

Probedesign

Vsvolume of the sample a large coil volume

Nnumber of detectablenuclei

increased sample concentration

Adjustablelaboratoryparameters T temperature decrease Instrumentparameter BB0

magneticfield strength increaseAffect the

Bolzmanndistribution � gyromagnetic

ratioa nucleus with a high �, e.g. 1H or 19FIntrinsic

nuclearproperties I spin quantum

numbera nucleus with a high I

T1

spin-latticerelaxationtime

decrease by e.g. temperature or relaxation reagents

T2*

experimental spin-spinrelaxationtime

increase by shimming �narrower peaks

Timeparameters

nt number of scans increase

An adjustable instrumental parameter is the sample temperature (T). Lowering the temperature will give a larger polarisation (larger M0) according to the Bolzmann equation (2), giving increased sensitivity. However, lowering the temperature may reduce T2* and lead to a loss of S/N due to larger line widths.

The volume of the sample (Vs) and the number of detectable nuclei (N) are factors that may be limited by the available amount of sample and solubility, but should be as high as possible. Too high a concentration, however, may risk incomplete dissolution or cause

Page 42: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

42 | Processing and analysis of NMR data

ADC overflow, which results in signal truncation or downscaling and hence a decrease in the ADC dynamic range. A sample with a very high concentration can also give line broadening due to intermolecular interactions. The number of nuclei detected also depends on the number of structurally equivalent nuclei that give rise to one signal from the studied molecule.

The gyromagnetic ratio (γ) and the spin quantum number (I ) are given parameters for the chosen nucleus. If possible, the nuclei of protons are the ones to measure since 1H is a frequently occurring isotope (99.99% natural abundance) and has a high gyromagnetic ratio (26.75 compared to 6.73 for 13C). All in all, 1H-NMR is some 5,700 times more sensitive than 13C NMR spectrometry.9

Furthermore, the splitting of the signal, depending on the structural and chemical surroundings of the proton, affects S/N. To give an example, a singlet peak has a much higher S/N than a doublet peak, although they come from the same number of nuclei and thus have the same total area. If possible, this should be considered when choosing one peak for quantification.

As discussed earlier, various time parameters will also affect the S/N ratio in the resulting spectra: the number of transients (pulse experiments) in signal averaging (equation (13)), acquisition time (determined by T2*) and pulse repetition time (determined by T1). These are also included in Table 1.

For the best S/N ratio, there is an optimum combination of the tip angle (β), the T1 relaxation time and the time between pulses, given by Ernst’s equation:61

(20)

where rt is the pulse repetition time. This equation is valid for one nucleus with the spin-lattice relaxation T1 and does not give the conditions for the best accuracy of the peak integral that are required for the absolute method (i).

The simplest, and most frequently mentioned, way to ensure that the spin system is in equilibrium between pulses is to wait for 5 times the longest T1 after a 90° pulse before repulsing. This corresponds to a relaxation of 99.3% and will give a maximum 0.7% error in the integral accuracy. According to Trafficante,62 a pulse angle of

Page 43: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

NMR for quantitative analysis | 43

83° and a pulse repetition period of 4.5 times the longest T1 give the optimum recovery after a pulse regarding accurate integrations. The settings of these parameters for quantification are discussed in several references.4,5,9,61,63,64

Accuracy and precision of qNMRThe signal peaks in an NMR spectrum are generally characterised by four attributes: chemical shift, multiplicity, line width and relative intensity. A well-resolved, narrow, single peak with a high S/N ratio has the best chance of giving an accurate and precise quantification. Several studies have discussed the uncertainties and errors in qNMR,2-9 and accuracy and precision better than 1% have been reported.2,3,7,25 The main contributions to the experimental error have been localised in these studies to the sample preparation and the analyst.2,7

Accuracy

The systematic errors presented will mainly affect the accuracy of the absolute method (i) of qNMR, where no calibration is performed.

If the aquisition time is shorter than the time it takes for all nuclei to relax completely, the molar response will differ from unity. To avoid systematic errors in the peak areas, the relaxation delay must also be long enough to allow complete relaxation in z-led (T1) for all nuclei. Also vertical truncation of the FID, due to ADC overflow, will introduce distortions of spectra.

Resonances (ωi) with different distances from the excitation pulse (B1) may be differently excited and detected due to off resonance effects and non-uniform responses to the RF pulse. If the bandpass filter width prior to the ADC is too narrow, the lines at the ends of the spectrum may be reduced in intensity. In this respect it is beneficial to use peaks close to each other for quantification.

The integration range for Lorenzian NMR peaks is limited, especially in a crowded spectrum. An integration error smaller than 1% requires integration limits of ±24 times the peak width.25 However, if the integration limits are consistent between peaks, errors or less than 0.5% are easily achieved.3,7,40

Page 44: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

44 | Processing and analysis of NMR data

Precision

Random errors will affect both the precision of the absolute (i) and the calibration method (ii) for qNMR.

Point-to-point noise due to spectral resolution is a random error source. If the spectral resolution exceeds 0.4 of the peak width, the maximum error of the integral will be 0.1%.25 The spectral resolution may be enhanced by zero filling. Furthermore, the intensity resolution will be limited when a small peak is detected in the presence of a strong peak, depending on the dynamic range of the ADC, which will give random errors.

Automatically phased peaks are rarely acceptable with available methods, so manual phasing with a vertical expansion of the peaks is still the recommended method for the best results.35 Since perfect phasing of NMR spectra then relies on a subjective judgment of the goodness of the shape of signals, this will be a source of error. Baseline and phase anomalies are thoroughly discussed in the literature.5,40

DiscussionVery few articles describe the calibration method (ii) in the context of qNMR. By using this method, the parameter determination ensuring complete relaxation is avoided. This is a great advantage over the absolute method. The drawbacks are that the calibration samples have to be prepared and that all the sample components are required as standards. The advantage of the absolute method (i) is that it is very straightforward and that no calibration is needed.

Page 45: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

qNMR applications | 45

4 qNMR applications

This chapter will describe qNMR applications for impurity determination and metabolic profiling. The sample matrices, NMR instrumental parameters and applied data processing from the papers will be briefly illustrated.

Impurity determinationThe presence of unwanted chemicals, e.g. residuals from synthesis and storage, even in small amounts may affect the efficiency and safety of pharmaceutical products. This is why the different pharmacopoeias prescribe limits for the allowable levels of impurities present in the active pharmaceutical ingredients and formulations. Impurity profiling – the identification and quantification of impurities – is of great importance in the pharmaceutical industry.65

qNMR is a very useful technique for impurity determinations due to the absolute response factor, selectivity and the reproducibility of chemical shifts. Impurities with similar chemical structures may readily be both identified and quantified by NMR spectrometry, making it a very cost-effective alternative compared to other analytical methods that require the acquisition of separate standards of the target analyte and possible impurities.

The use of 1H-NMR spectrometry for impurity (or low-level) determination has been reported in several papers1,7,45,48-52,55,66-72 and in papers I–II in this thesis. Organic components have been reliably quantified below the 0.1% level7,70 and some papers have reported LODs lower than 100 µg/g.49,52,67,70,papers I–II The main aim in papers I–II was to investigate the potential of the qNMR technique for impurity determination in these specific cases. There were no limits of materials, i.e. no problem of making calibration samples, and no attempt was made to perform an absolute determination.

Page 46: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

46 | Processing and analysis of NMR data

Paracetamol samples (paper I)

Paracetamol (N-(4-hydroxyphenyl)-acetamide) is a drug having an analgesic, antipyretic and antiphlogistic action that is widely employed in therapeutics. The main impurity in paracetamol preparations is 4-aminophenol, an intermediate in the synthesis of paracetamol (Figure 9), which may also be formed during storage. The 4-aminophenol content in paracetamol is limited to 50 µg/g by the European, United States, British and German pharmacopoeias.73

II III

II III

I

I

Figure 9: Structures of paracetamol (left) and 4-aminophenol (right). The Roman numbers is referred in the legend of Figure 10.

In paper I, the potential of determining 4-aminophenol in paracetamol by qNMR was explored. Mixtures with different concentrations of the impurity were dissolved in DMSO-d6. The pulse angle and repetition time chosen were 83° and 4.5*T1 respectively, the optimal settings for best peak integral accuracy according to Trafficante.62 To expand the vertical dynamic range, the oversampling factor was set to 20. The study is fully described in paper I and part of a spectrum is shown in Figure 10.

Page 47: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

qNMR applications | 47

6.8 6.6 6.4δ(1Η)/ppm

p+4ap

4ap

Figure 10: A part of an NMR spectrum of paracetamol holding 1 mg/g of the impurity 4-aminophenol. The peak marked p+4ap comprises a 13C-satellite signal (0.55% of the main peak intensity) from two aromatic protons in paracetamol (marked I in Figure 9) and the signal from the corresponding signals in 4-aminophenol

(II). The peak marked 4ap arises from 4-aminophenol (III).

Poloxamer samples (paper II)

Poloxamer materials are synthetic copolymers of ethylene oxide and propylene oxide (Figure 11) and were first synthesised in 1954 by Lundsted and Ile.74 Poloxamers form a thermoreversible gel, a low-viscous liquid at low temperatures and a gel at body temperature, which has been used for pharmaceutical formulations.75

Acetaldehyde and propionaldehyde (Figure 11) are well-known impurities in poloxamers.76 They are formed from the monomers ethylene oxide and propylene oxide, which are the building blocks of the poloxamers. The maximum allowed amounts of acetaldehyde and propionaldehyde in the poloxamer are 80 and 100 µg/g, respectively, for some medical applications.

Page 48: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

48 | Processing and analysis of NMR data

Figure 11: Structures of poloxamer (upper), acetaldehyde (left) and propionaldehyde (right). a and b hold different numbers for

different types of poloxamers.

The LC method for the determination of acetaldehyde and propionaldehyde in poloxamer in use today involves a very time-consuming work-up procedure. By means of qNMR the poloxamer samples can be analysed only by dissolving the samples in acidic water. Due to the volatile aldehydes and because the poloxamer forms a gel at temperatures above room temperature, the poloxamer samples were analysed at 275 K. The viscosity of the samples was also the reason for running the experiments in a non-spinning mode. The oversampling factor was set to 16 to expand the vertical dynamic range (chapter 2, “Digital filtering”). The remainder of the parameter setup varied according to Table 2. An NMR spectrum is shown in Figure 12.

Table 2: NMR instrumental parameter setups that were tried for determination of acetaldehyde and propionaldehyde in poloxamer. Shimming Gradient Gradient Manual Gradient Pulse angle 27o 27o 78o 78o

Recycle time (s) 6 (3.3�T1aa) 4 (2.1�T1aa) 9 (5�T1aa) 14 (7.8�T1aa)Acquisition time (s) 6 4 4 4Number of scans 64 256 / 64 64 64Tot. exp. time (min) 6.4 17 / 4.3 9.6 14.9T1aa: Spin-lattice relaxation time constant for acetaldehyde (1.8 s)

Page 49: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

qNMR applications | 49

0.81 1.21.41.61.82 2.2δ(1H)/ppm

poloxamer

aa

pa

Figure 12: A spectral segment including signals from the CH3 groups from the substances poloxamer at 1.2 ppm, acetaldehyde (aa) at 2.22 ppm (176 μg/g poloxamer) and propionaldehyde (pa)

at 0.89 ppm (175 μg/g poloxamer).

Zero filling and line broadening (papers I–II)

Zero filling and line broadening were performed to a varied extent previous to a Fourier transform in both paper I (Table 3) and paper II. The evaluation showed no improvement in the quantitative results due to zero filling, although line broadening improved the quantifications to some extent in paper II.

Table 3: Root-mean-square error of prediction (RMSEP) (equation (31)) of different test sets from NN calibration models including partly different data sets (A-D), and with different data preprocessing (see paper I for details).

RMSEP (�g 4ap/g pa) Zero

filling†Wavelet

compressionLine

broadening A B C D Mean

no no no 35 24 15 27 25yes yes no 21 59 32 38 37yes yes yes 62 59 30 46 49

† to double the number of data points 4ap: 4-aminophenol pa: paracetamol

Page 50: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

50 | Processing and analysis of NMR data

Spectral smoothing (paper II)

A one-dimensional convolution of spectra with a triangular function (of height 1 and width 41) was performed in paper II. The evaluation of this coarse smoothing of spectra improved both repeatability, reproducibility and the LOD.

Metabolic analysesNMR spectrometry is very attractive for quantitative and qualitative determinations of small molecules in biological fluids since it requires no or very little sample preparation. Drugs, toxic substances and metabolites may be analysed without prior assumptions, e.g. to measure the biological response to some xenobiotics. Already in the early eighties NMR spectrometry was used for the analysis of serum, plasma and urine.77-79 This field of research in combination with high resolution 1H-NMR was reviewed by Nicholson and Wilson in 1989.42

Although the terminology in this area of research is still evolving, metabolic analyses may be divided into four general areas.80-83

• Metabolite target analysis: The quantification of specific metabolites.

• Metabolite/metabolic profiling: Quantitative or qualitative determination of a group of related compounds or of specific metabolic pathways.

• Metabolomics/metabonomics: Comprehensive analysis of all metabolites at a specified time and given environmental conditions.

• Metabolic fingerprinting: Sample classification by global analysis of all present components.

In drug research, the term metabolic profiling is often used to describe the metabolic fate of an administrated drug.83 Metabolic profiling makes it possible to generate information about in vivo multiorgan functions. This is an effective preclinical screening method. This type of analysis may also provide one key to “systems biology”: the combination of genomic, proteomic, and metabolomic data.80,81,84

Page 51: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

qNMR applications | 51

By means of urine sampling, information can be semi-continuously collected from the same animal without euthanasia. The 1H-NMR analysis of urine samples takes a few minutes and the sample preparation includes only buffering and centrifugation. Urine sample analysis is expected to reflect a large number of biochemical processes in the body, making it a potential diagnostic tool.81,84

The improved capabilities of the NMR instruments due to the development of technology makes it possible to detect thousands of signals arising from hundreds of endogenous molecules in spectra from 1H-NMR analysis of biological fluids such as urine or plasma. Although these signals often overlap, this is a useful tool for metabolic fingerprinting. When analysing these complex samples, the assignment of all peaks is seldom accomplished. However, even unidentified signals can be used as fingerprints. Techniques for multivariate data analysis and pattern recognition then help to give qualitative and quantitative interpretation out of complex spectra.46,47,54-56,72,80,82,84-97

A lot of biological variations will affect the quality of urine sample 1H-NMR spectra. For example, the ionic strength varies considerably in urine samples and may affect tuning and matching (chapter 2, “NMR data acquisition”).81 Water, which is present in a very high concentration,42 has to be reduced by solvent suppression during the NMR experiment. The concentration of the urine must also be compensated for in the analyses.

Rat urine samples (papers III–VI)

The rat urine samples studied in papers III–VI were received from an animal study at AstraZeneca. The study included two groups of six rats each: one control group orally dosed with water and one group dosed with citalopram (positive control for phospholipidosis). The rats were dosed once a day for 14 consecutive days and urine samples were collected on days -5 (pre-dose, two occasions), 1, 3, 7, 10 and 14. The study is fully described in paper III.

The samples were buffered and centrifuged prior to 1H-NMR analysis on a Bruker DRX600 instrument working at 600.13 MHz. Each sample was automatically shimmed by gradient shimming and suppression of the water signal was achieved using presaturation during relaxation delay and mixing time with a shaped pulse for

Page 52: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

52 | Processing and analysis of NMR data

selective saturation. The acquisition time was 4.68 s and the total pulse recycle time was 7.71 s, 64 scans per sample using a spectral width of 7002.8 Hz. These parameters were chosen on the basis of previous work analysing urine samples with NMR spectrometry.92,95 The total NMR acquisition time for one sample was 8.6 min. An example of an NMR spectrum is shown in Figure 13.

0246810

δ(1H)/ppm

Figure 13: 1H-NMR spectrum of urine from a rat dosed with citalopram for 7 days.

Phase correction and peak alignment (papers I–VI)

Spectra were manually phased and referred to the TMS peak on the NMR instrument and further processed in MATLAB.98 In paper I, the spectral areas of interest (6.4-6.55 ppm) were cut out, automatically phased and corrected laterally by means of a genetic algorithm until the peaks were aligned. In papers II–VI each spectrum was manually phased (again) in MATLAB and peak-aligned segmentwise as described in papers III and IV. The subject of peak alignment is thoroughly discussed in chapter 6.

Page 53: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Chemometrics | 53

5 Chemometrics

Data from a chemical analysis consist of information and noise. To get high accuracy and precision for a chemical analysis, we need knowledge about both of these components. The properties of the instrumental noise of NMR spectrometry are discussed in chapter 3 above. The noise also has chemical and/or biological contributions.

The desired and relevant information is often just one part of the total information. When it comes to data analysis, this is what we want to extract. This can be performed by chemometrics, which comprises the application of multivariate statistics, mathematics and computational methods and also includes how to represent and display this information. A philosophical article about the subject has been written by Svante Wold.99 More theoretical references to the subject of chemometrics can be found in books.100-102

Principal component analysisPrincipal component analysis (PCA)103,104 is a well-established bilinear projection method for interpreting multivariate data. Pearson105 is considered to be the first person to have described PCA as we know it today, although Cauchy derived it as early as 1829.100 PCA denotes the extraction of eigenvectors† from the variance-covariance (var-cov) matrix (21) to obtain a number of uncorrelated (orthogonal) new variables called latent variables or principal components (PCs).

(21)

for variable i, j =1... p in the data matrix X of the dimensions n × p.

†From the German eigen meaning “inherent, characteristic”

Page 54: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

54 | Processing and analysis of NMR data

PCA gives a new representation of data in a new, rotated coordinate system. Every PC is optimised in regard to its ability to explain the direction of maximum variance in the original data and each PC explains variation uncorrelated with the other PCs. The data are approximated by the model:

X = TP’ + E,

where X is the original n × p set of data, typically n samples and p variables. P’ is an m × p matrix of variable coefficients (loadings) that give the relative contributions of every variable in calculating the object scores, T, which is an n × m matrix, of the coordinate values on the PCs. E is an n × p matrix containing the residuals not explained by the model using m PCs. The score value representation (scores plot) will show a clustering of similar samples, as sketched in Figure 14. The loadings represent the variable direction, describing the influence of the variables on the scores, and are a valuable interpretation tool.

v1

v2

v3

Figure 14: A geometrical interpretation of a PCA model. The original data matrix X consist of p variables and eight samples. The first three variables (v) are represented in the coordinate system to the left, where the first two principal component vectors, PC1 and

PC2, are also shown.

PCA as a projection technique in combination with NMR spectra is frequently employed in systems biology and in the pharmaceutical industry. Examples are found in papers III–VI and several references.82,86,90-97,106

Page 55: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Chemometrics | 55

Parallel factor analysis

Parallel factor analysis (PARAFAC) is a multiway decomposition method, a generalisation of PCA to higher order arrays.107,108 X is decomposed in the loading matrices A, B and C, illustrated by:

Xnpq = AnmBpmCqm + Enpq

where n, p and q represent the dimensions of the data matrices, m denotes the number of PARAFAC components and E is the residual matrix in this three dimensional example.

PARAFAC was applied to three- and four-dimensional data in paper VI and has the advantage of being able to retain the time-related structure of this data. Compared to bilinear models with unfolded data, multiway models are generally more robust and easier to interpret, provided that all dimensions hold information.107 Theoretically, the PARAFAC model has a unique solution compared to bilinear methods, which solutions suffer from rotational freedom. However, scaling and centring are not as straightforward in the multiway case.

Calibration Since qNMR has the property of being a primary method of measurement, no calibration is needed as long as we have only random noise. However, if we are uncertain about the randomness in the noise or we do not wait for full relaxation, the equal molar ratio between peaks will not be valid and a calibration is required (as discussed above in chapter 3).

The simplest form of calibration is to measure one sample for which we have all the information, typically concentration. From this sample we get a relationship (b) between the signal (x) and the information (y), from which an unknown sample (yi) could be predicted when we have new measured data (xi):

yi = b ⋅ xi (22)

This is suitable when we have samples to determine in a known, rather limited, concentration range.

Page 56: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

56 | Processing and analysis of NMR data

To cover a larger concentration interval, several samples (x) are required in a calibration model:

y = b ⋅ x + e (23)

e will include both measurement and model error, and b is estimated by least-squares regression (24) and is called the regression vector.

, (24)

where x = (x1, x2,... , xi,... , xn)’ and y = (y1, y2,... , yi,... , yn)’.

qNMR calibration modelling including area determinations, i.e. univariate calibration, is exemplified in paper II and references.43,44,53

NMR peak area determinations are easily performed with consistency if the peaks are singlets and have no other peaks in their immediate surroundings. However, if the peaks suffer from interferences or if singlets have to be compared with multiplets, integration is a more difficult task. A very complex spectrum, such as the NMR spectra of biofluids, may also lead to difficulties in sorting out the important signals. To exploit the full information content of NMR data, including hundreds of overlapping signals, multivariate data analysis methods are preferred.

The use of methods for multivariate calibration methods have increased with the desire to solve increasingly difficult problems and complex data. Naturally this goes hand-in-hand with the increased computer performance. A simple method for multivariate regression is multiple linear regression (MLR), with the estimate of b, according to:

(25)

where X = (x1, x2,..., xi,... , xn), in accordance with equation (24) above.

MLR may also be performed on PCA scores (T) on y, which is called principal component regression (PCR). b is then estimated as:

(26)

The advantage of PCR over MLR is that it involves no limit in the number of variables, since in MLR the number of variables must be less than the number of samples.

Page 57: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Chemometrics | 57

Partial least squares

When the maximum variance in X does not correlate with the maximum variance in y, PCR will give an unnecessarily complex and unstable model for calibration. In such a case, it may be more appropriate to use partial least squares or projection to latent structures (PLS). The objective function of PLS is to find maximum covariance between y and X, in contrast to PCR, where the maximum variance in X is found.

PLS was first invented by Herman Wold in the 1960s109,110 and was later modified and extended by Svante Wold et al. in 1983,111 when it also was named partial least squares. This method for regression has been established by its success in a wide variety of problems.112-114 The details of PLS regression have been thoroughly described in the literature, and only a short summary follows here.

A vector of weights, w, describing the maximum covariance between X and y is defined according to:

= X’y/|X’y| (27)

Scores values t are obtained when X is projected onto w

t = Xw, (28)

and the regression coefficient vector is then determined by

(29)

After estimation of the x-loadings vector

p = X’t/(t’t), (30)

the tp’ matrix is subtracted from X and the bt matrix is subtracted from y, and the procedure is repeated from (27) to extract additional PLS components.

PLS applied to NMR data

PLS has previously been applied to NMR data46,47,54,55,85-87 and in the course of the work on all the papers in this thesis, both for creating calibration models and for explorative analysis. In paper I, PLS did not perform as well as a calibration model as the applied neural network, and was therefore not described. In paper II, the univariate analysis based on integration of peaks performed as well

Page 58: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

58 | Processing and analysis of NMR data

as the PLS models. In order to chose as simple a model as possible, the PLS models were rejected. In paper III, PLS was rejected in favour of the unsupervised PCA model to avoid the risk of guiding the latent variables in the direction of a false y variable. Day of sampling was tried as y, although the kinetics of the drug-induced metabolic changes were unknown. Papers IV–VI include PLS, both with y as a variable of time and a measured y variable (LC/MS), and as a discriminant analysis (DA) method.112 There is, however, a risk, especially with the PLS-DA models, in such analyses. The PLS model may appear to perform really well if only one or a few, e.g. misaligned peaks or noise, discriminate between the samples, though the actual model performance is poor. This risk must be minimised by studying loadings and by a proper validation.

N-PLS was developed by Bro 1996115 as an extension of PLS to higher order data. It was used in paper VI, making it possible to utilise the structure of original data and avoid unfolding.

Nonlinear regression

Linear multivariate regression methods like PCR and PLS may to some extent explain nonlinearities if a sufficient number of latent variables are selected or the x and/or y block are transformed or expanded. However, a better alternative may be to use a nonlinear model. There are examples of nonlinear PLS models where the inner relation between x and y scores is defined as nonlinear (a quadratic polynomial).116 A more common way of modelling nonlinear data is probably neural networks (NN), which was employed in paper I, where a nonlinear relationship was observed. The first application of NN to NMR data was demonstrated in 1989 by Thomsen and Meyer,117 followed by, among others, Amendolia et al.118 A short description will be given here, and further reading about NN can be found in the literature.119

A multilayer feed-forward NN is composed of a highly interconnected mesh of nonlinear and linear computing elements. The fundamental processing element of an NN is the neuron. Neurons are connected by links, and there is a coefficient (or a weight) associated with each link. The neurons receive external inputs (first layer) or inputs from other neurons and perform a weighted sum of these inputs. The neurons process the resulting signal with a transfer function and

Page 59: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Chemometrics | 59

then produce an output passed on as an input to other neurons or as an output from the entire model. This is illustrated in Figure 15.

Figure 15: A sketch of a neural network with three input data, two neurons in the hidden layer and one output data. Each line represents a weighting and each circle is a neuron defining a

transfer function.

A process known as training or learning establishes the values of the weights. Output vectors ( ) are calculated from a set of input vectors (x) for which y is known a priori. The weights are modified based on the difference y- . As this process is repeated, the weights gradually converge to values that transform each input pattern to an output pattern close to its target (y). Any type of variation in the inputs may be accounted for by including them in the training process.

Data preprocessingA common denominator of all of the multivariate methods is the necessity of proper data preprocessing prior to data analysis. Several approaches are available that deal with undesired spectral artefacts, e.g. multiplicative scatter correction (MSC)120 for baseline correction, orthogonal scatter correction (OSC)86,121 for removal of unwanted variances and corrections for relative intensity variations such as mean-centring (xij-xjmean, for variable j and sample i), unit variance scaling (xij/xistd, where xistd is the standard deviation of xi), and normalisation (which could be performed in various ways). Mean-centring is performed prior to multivariate analysis to centre the new data representation on the origin of coordinates since we are generally not interested in the mean level of the variables, but the variation around the mean. Unit variance scaling is generally not suitable for spectral data since noise is scaled to be as important as peaks, which may be risky, especially when PLS is applied due to the risk of overfitting.

Page 60: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

60 | Processing and analysis of NMR data

Normalisation is actually worth a chapter of its own, but will only briefly be mentioned here. It must be performed with care since normalising to the wrong standard could introduce erroneous trends into data. In papers III–VI, spectra are normalised to equal area in order to deal with concentration differences in the urine samples. Also, normalisation with reference to one peak as an internal standard (e.g. the TMSP55 or creatinine peak) appears in the literature.

Furthermore, peak alignment is an important preprocessing tool in multivariate analysis of NMR data. It is employed in all the papers in this thesis and the subject is discussed in detail in chapter 6.

ValidationOne may make an explorative model to search for trends or groupings in data and use no validation at all, as is done in papers III, IV and VI in this thesis. This is, however, a somewhat “quick and dirty” approach. A proper validation of a model (explorative or for prediction) is preferred. The purpose of the validation is to derive estimates of the accuracy of the validity or predictions from the model.

An appropriate method of validation is, prior to any data analysis, to choose a (representative of the population) test set to be put aside. The remaining samples, the training set, are then used to determine the model parameters, e.g. the number of components in the model, variable selection, number of learning cycles in a NN training, as well as preprocessing and scaling of data. The test set is then predicted or projected onto the optimised model to estimate the generalisation error. This procedure was used in papers I, II and V.

Model selection

The optimisation of a model could be guided by cross validation.122 The data set (here the training set) is divided into a number of subsets. Each subset is predicted onto a model built from the other subsets. When all the subsets have been predicted, the generalisation error of the model can be estimated. The subsets must be picked with care, duplicates must be in the same subset, and the subset must not represent all samples from one single corner of the sample domain. An example of this type of optimisation is shown in paper V.

Page 61: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Chemometrics | 61

If the number of subsets equals the number of samples, this is called leave-one-out cross validation. This could give a good estimate of the error of prediction if the samples are too few to comprise a test set and the samples are well spread in the domain, as in paper II. In paper I, an “internal test set”, here called a validation set, is used for the optimisation of the neural net parameters. Additional validation methods for optimisation can be found in the literature.102

Objective function

We may have different performance objectives for the model depending on the situation. Most often, the error of prediction is to be minimised. Root-mean-square error of prediction (RMSEP), equation (31), is a suitable and common measure (papers I–II).

, (31)

where is the predicted concentration, y is the measured concentration and J represents the number of predictions.

Another measure of quality may be a measure of class separation.123 In papers IV–VI we have used a measure developed by M. Åberg,124 illustrated in Figure 16. In the scores vectors space, each class was approximated by a bivariate normal probability distribution. The boundary between a pair of classes was defined as the hypersurface on which the probability densities of the two classes were equal. The measure of class separation was calculated as the minimum Mahalanobis distance125 between the class boundary and the class centres.

Figure 16: A geometrical interpretation of the measure of class separation where x and o represent two classes, the line between them represents equal probability density and the arrow shows the

measure of class separation.

Page 62: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

62 | Processing and analysis of NMR data

Page 63: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Peak alignment | 63

6 Peak alignment

NMR peak area and position give detailed chemical and quantitative information. Peak shifts and differences in peak shapes may, however, also arise from variations in experimental conditions such as temperature variations and inhomogeneities in the applied magnetic field or differences in the background matrix of the sample.81,87 The pH of the sample is a major source of variation in peak positions,126 and even if the samples are buffered, small pH differences will be detected.42,92,95,127 In biofluid samples the background matrix also reflects an individual variation depending on the metabolism.82,128 These factors are, however, most often not what we want to measure.

From an ideal multivariate analysis point of view, any NMR peak originating from identical nuclei should have an identical position and shape between samples. If this is not the case, a multivariate model will be unnecessarily complex and more difficult to interpret. Consequently, the usefulness of the results achieved will decrease. Examples of problems and artefacts originating from misaligned NMR signals have previously been discussed in papers III–IV and in references.87,89,126,129,130

Peak alignment in NMR spectrometry

To overcome this NMR peak shift problem in combination with pattern recognition techniques, different approaches have previously been suggested. Cloarec et al. 88 have recently presented the use of orthogonal projection to latent structures (O-PLS)131 discriminant analysis to handle peak shift problems. The predominant method in the literature is, however, integration of predetermined spectral segments, i.e. bucketing, typically 0.04 ppm wide.86,91,92,95,96 The bucketing approach deals with the peak shift problem but ruins

Page 64: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

64 | Processing and analysis of NMR data

the spectral resolution of the acquired data, typically the spectral resolution decreasing from 6,500 data points/ppm to 28 data points/ppm. This makes multivariate detection of small variances (peaks) impossible if they are confounded in the same bucket as larger variances. Hence, the interpretation of a bucketed data model will not reveal the specific peaks responsible for the clustering, as is shown, for example, in paper III and by Cloarec et al.88

Also, more refined alignment methods (than bucketing) for NMR data have been reported. One example is partial linear fit (PLF), a method outlined by Vogels et al. 1996.47 PLF automatically picks out segments in an NMR spectrum of size d and shifts them s points left and right. Every possible and relevant combination of d and s is tried until the sum of squared differences between the spectrum and the target for alignment is minimised. Another approach, outlined by Brown and Stoyanova 1996,129 performs automatic removal of both frequency and phase shifts in NMR spectra by using PCA to determine the misalignment in a single peak appearing across a series of spectra. This method has been extended and applied to in vivo NMR spectra,132 and has been further developed and applied to high-resolution spectral data.130

As a contribution to this field of research, a method of peak alignment by means of a genetic algorithm (PAGA) was developed in Paper III. It is an automated segment picking method including both linear interpolation and shifting of segments, optimised by a genetic algorithm with the correlation coefficient as objective function. This algorithm was speeded up by Lee and Woodruff 133 by replacing the genetic algorithm by a beam search algorithm, and was further analysed and compared to peak alignment by reduced set mapping (PARS) in paper IV.

PARS was developed by Torgrip et al.134 The method relies on peak detection and converts spectra into sparse vectors consisting of zeros and peak intensities corresponding to peak maxima. This peak alignment is performed by a fast tree-search algorithm with early pruning of non-optimal alignment solutions.

Page 65: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Peak alignment | 65

Peak alignment for other analytical techniques

There are also several examples of alignment of first-order data, which have not yet been applied to NMR data. Some of them will be mentioned here.

Dynamic time warping (DTW) is an application of dynamic programming which has been used for aligning profiles from industrial batch processes of polymerisation135 and GC/FT-IR/MS chromatograms,136 although it has its origin in speech recognition.137 The signals are warped nonlinearly, point by point, to fit a reference profile and the cumulative distance (e.g. Euclidean, equation (35)) between points is optimised. Additional methods developed from DTW have been presented for chromatograms: one peak matching algorithm relying on peak detection and local warping (LW)138 and one parametric time warping method (PTW).139

Piecewise linear correlation optimised warping (COW)140 has been applied to GC and LC data.141,142 This algorithm selects predetermined spectral segments of equal length and by linear interpolation stretches and shrinks them to fit a reference spectrum. The correlation coefficient is optimised. A method for peak alignment very similar to COW, but with sideways shifting instead of interpolation, is piecewise alignment (PWA).123 Another segmentwise peak alignment approach based on genetic algorithm optimisation has been applied to LC chromatograms on peptide maps.143

Walczak and Wu recently presented a method for fuzzy warping (FW) of chromatograms relying on only the main peaks.144 They also compared the performance and processing time for several different methods of peak alignment referred to above: FW, COW, PAGA, PTW and DTW. PAGA (from paper III) performs really well in comparison to the others, although it is relatively time-consuming in some cases.

Profile alignment or peak alignmentSome of the methods mentioned rely on peak detection in some way (PARS, FW and LW) or include just one peak at a time. When a peak list is created, no noise is present and unit variance scaling may successfully be applied as in paper IV, for instance. This and the data reduction are some of the advantages of peak detection. The disadvantage is the uncertainty in defining the peaks.

Page 66: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

66 | Processing and analysis of NMR data

The data set from the urine sample 1H-NMR analysis studied in this thesis has a lot of line-width variations due to, for example, shimming problems. An example is shown in Figure 17, which illustrates the difficulty of peak detection relying on a peak shape definition in these spectra. In order to avoid detection of peaks, methods relying on alignment of profiles are preferred, e.g. DTW, PTW, COW, PLF, PWA and PAGA.

3.063.083.13.123.143.163.183.23.22

3.043.063.083.13.123.143.163.183.2

δ(1H)/ppm

Figure 17: Example of peak shifts and peak shape distortions due to varying background matrix and shimming between 1H-NMR spectra from rat urine samples. Black lines represents samples from control rats and grey lines represent samples from dosed rats (days 7-14). The upper figure shows unaligned spectra and the

lower figure shows spectra aligned by PAGA.

Page 67: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Peak alignment | 67

Target for alignmentIn first-order real data (one vector per sample), there are no data to guide or validate a peak alignment as in second-order data (one matrix per sample). Hence, a target must be chosen – it could be one spectrum, although various alternatives are possible.

When the same target is used for all the spectra in a data set, differences between groups of spectra will easily be detected in the same score plot from a multivariate analysis, assuming that peaks appearing from the same substance appear at the same position in all spectra. An unknown sample would be aligned to the previously chosen target and projected into the multivariate model space (loadings) for visual interpretation. This is the situation in papers III–IV and references,47,123,134,135,138-140,144,145 where one target spectrum is chosen and assumed to reflect all peaks of interest. Also a combination of targets may well be employed, such as the sum of two spectra (paper II) or a mean spectrum.138 For the PARS method,134 there is also a possibility of using a recursively updated target, where new (unaligned) peaks appearing in aligned spectra are added to the target (paper IV), or a dendogram alignment scheme where a root target is created.146

When interpreting data with defined class associations, this information could be used. All the spectra in one class may be aligned to one chosen target in its own class.143 The interpretation with all classes in one multivariate model will not be valid in this case. There are also examples where the whole data matrix (all samples in the study) is part of the alignment and no special target is chosen.129,130,132

OptimisationAn optimisation problem is a computational problem in which the objective is to find the best of all possible solutions, or more formally, find a solution in the feasible region which has the minimum or maximum value of the objective function. There are several solution methods to optimisation problems. I will bring up and compare some used in the methods for alignment mentioned above.

Page 68: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

68 | Processing and analysis of NMR data

Dynamic programming is an algorithmic technique which decomposes a multivariate optimisation by evaluating all possible solutions. This is a “safe” method but could be very computer-intense if the variables are numerous, as when comparing two NMR spectra with 65,000 variables each. Dynamic programming is utilised in COW, DTW or LW, for example.

Genetic algorithms (GAs), used for PAGA (paper III), search the solution space of a function by simulated evolution.147 A GA usually starts with a random population of candidate solutions. Each solution is subjected to a genetic search, which encompasses assignment to a fitness value according to an objective function, selection of strings for further replication, and recombination by crossover and/or mutation. This genetic search runs until a predefined optimisation criterion is met.

Beam search,148 which was used for alignment by Lee and Woodruff,133 is a heuristic search algorithm that includes an optimisation of best-first search, where a number of nearly optimal alternatives (“the beam”) are examined. Initially, a set of likely solutions is created on the edge of a predetermined search radius and subsequently evaluated. The algorithm then assigns one or more new candidate solutions by selecting the k best steps from the current trial solution(s) where k is the parametric input beam width. For each loop of the algorithm, the radius is reduced until a stop criterion is met. In paper IV and the paper by Lee and Woodruff,133 peak alignments by a genetic algorithm and by beam search are compared. According to these studies, they perform equally well, but beam search is about ten times faster.

In optimisation by tree-search algorithms, nodes (e.g. possible peak matches) are defined from the data structure. The nodes are then searched level by level until the globally optimal solution is found (breadth-first search as in PARS).

Page 69: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Peak alignment | 69

Objective function

Several objective functions are mentioned together with peak alignment techniques. In paper I, the fitness criterion was the sum of the absolute differences between the actual spectrum (S ) and a target spectrum (T ):

(32)

where si and ti represents each data point i in spectrum and target respectively. The correlation coefficient (CC ) was used as a measure for the goodness of alignment in, for example, COW, PWA, LW and PAGA (paper III). It is a numeric measure of the strength of linear relationship between variables and it is calculated by dividing the covariance (cov) by the product of their standard deviations (σ):

. (33)

The standard deviation (σ) is estimated by the square root of the variance, the root mean square (RMS) of the deviations from the average:

,

(34)

where J is the number of observations. In COW, PWA and LW, a Wallis filter140 was also used to minimise the effect of varying peak heights. This was not used, however, in paper III since we wanted the larger peaks to guide the alignment.

The root-mean-square error (RMSE) used as objective function in PLF and PTW, for example, is actually a measure of the mean Euclidean distances between profiles. The Euclidean distance is defined as:

(35)

for J variables.

Page 70: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

70 | Processing and analysis of NMR data

NMR data compression Generally, no data compression of a 1H-NMR spectrum is required to perform the data analysis. Since today’s powerful computers can handle (quite) large data volumes, the whole data matrix should not be a problem to cope with. However, special cases when, for example, neural networks are utilised for calibration, the data need to be reduced due to heavy computations, as shown in paper I. This reduction can be performed by more sophisticated methods than bucketing (mentioned above). Examples in the already referred literature are PARS134 where a sparse matrix replace the spectra by peak identification, or wavelet reduction and variable selection (paper I). In paper IV, data compressions by bucketing to different extents after peak alignment are compared. The conclusion was that no difference in class separation from a PLS-DA model is shown when a reduction from about 65,000 to 8,000 data points of spectra is performed.

DiscussionThe optimum choice of method for peak alignment will differ between applications. However, most of the methods mentioned above could be applied to any data produced by an analytical chemist. On the other hand, it is not straightforward to put the optimum parameters into the algorithms. Either the alignment methods rely on peak detection, which implies some definition of a peak and may cause problems, e.g. if the peak shapes differ between samples, or the alignment requires some segment picking, in which optimum parameters will also differ between applications. One conclusion may be that the peak picking approach is favourable in samples with well-defined peaks, appearing similar between samples, and a profile alignment including segment picking is preferred when peaks or profiles are not that well defined. Point-by-point alignments such as DTW are, of course, also possible. However, for NMR data, consisting of around 60K data points, the available methods are very computer-intense.

Another parameter to choose in alignment tasks is the objective function, which should be chosen to fit the samples, either the largest intensities being allowed to guide the alignment (CC (paper III) or RMSE47,139,145) or each peak being equally evaluated (Wallis filter140).

Page 71: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

1H-NMR + LC/MS | 71

7 1H-NMR + LC/MS

As described in chapter 4, 1H-NMR spectrometry is a well-established technique for generating metabolic profiles. Recently, liquid chromatography/mass spectrometry (LC/MS) has also been used for the same purpose.149,150 Both techniques have their advantages in this field of research. For instance, 1H-NMR spectrometry requires minor sample pretreatment, while LC/MS in general possesses a higher sensitivity. 1H-NMR spectrometry may detect compounds that are not retained on the LC column or not ionisable in the mass spectrometer. LC/MS may detect compounds that are present in a concentration below the LOD for 1H-NMR spectrometry.

The rat urine samples described in chapter 4 have been analysed both by 1H-NMR (paper III) and by LC/MS.149,151 When comparing the data from the two techniques, it is most likely that they generate complementary information. A few previous metabolic studies have analysed both LC/MS and NMR spectrometry for analysing the same samples to verify and complement results of the other technique.93,94,97 The data sets have then been studied separately. In papers V and VI, we wanted to analyse the two data sets within the same chemometric model. This was carried out by data fusion and data correlation.

Original data

These data analyses started with the data schematically represented in Figure 18. The LC/MS data consisted of a 95 variable peak list for each sample, pretreated by curve resolution, peak alignment, log10 transformation and mean centring. The 1H-NMR data were pretreated by peak alignment according to PAGA (but with the beam search optimisation algorithm), data reduction by segmentwise integration, normalisation to equal area and mean centring, resulting

Page 72: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

72 | Processing and analysis of NMR data

in 8,192 variables for the data fusion and 2,048 variables for the data correlation. The data pretreatments are further described in papers V and VI.

Figure 18: Original data.

Data fusionIn the field of chemometrics the number of possible operations and combinations of method applications to data are numerous. In paper V, the aim was to examine different possibilities of performing data fusion of the measured LC/MS and 1H-NMR data. Data fusion is defined as a method that combines information from different sources to produce a single model or decision.152 There are a few examples of data fusion in the literature153,154 showing that this is not straightforward. Our hypothesis for this work was that the classification of the samples would be improved by using data fusion, making the diagnosis (i.e. pattern recognition) of the studied phenomena more reliable.

Concatenation, full hierarchical modelling, batch modelling and outer product analysis were the techniques examined for the fusion of LC/MS and 1H-NMR data in papers V and VI. Various scalings of the variables depending on the standard deviation and/or the mean of each variable and two types of block scaling were tried, and the possibilities of reaching an enhanced classification were explored. Of course, various types of other data fusion methods, scalings and multivariate analyses are possible; however, the optimum data treatment must be tried out for each new case.

Page 73: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

1H-NMR + LC/MS | 73

Data concatenation

Data concatenation (Figure 19) is denoted as data-level (low-level) fusion according to definitions in the literature.152,155 It is straightforward, but most certainly requires appropriate block scaling156-158 to prevent one block of data from being numerically dominant. In paper V, a number of different scalings of the concatenated data were evaluated prior to mean centring and multivariate analysis (MVA) – PCA or PLS.

Figure 19: Data concatenation.

Hierarchical modelling

Hierarchical modelling157,159 was also tried in paper V. It is a way of modelling data at two levels: one lower level and one higher level. The lower-level modelling is performed on separate blocks of the data set (LC/MS and 1H-NMR in this case), while higher-level modelling is performed on the concatenated and scaled scores from the lower-level, Figure 20. Different combinations of PCA and PLS were tried. This type of data fusion has previously been termed decision-level (high-level) fusion in the literature.152,155

Figure 20: Hierarchical modelling.

Page 74: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

74 | Processing and analysis of NMR data

Batch modelling

Batch modelling85,160 is a special case of hierarchical modelling in which every batch (rat in paper V), is modelled with respect to one variable (time in paper V) in the lower-level modelling. Prior to the higher-level modelling, the scores from each rat and each technique at the lower level are concatenated and scaled (Figure 21). Different combinations of PCA and PLS were tried.

Figure 21: Batch modelling.

Outer product analysis

Outer product analysis (OPA) is another possible method of fusing two different data sets in order to determine the relations that exist between the two types of signals.161 The outer product matrix for two vectors is defined as all possible product combinations of the vector variables (Figure 22). In paper VI, this data fusion was performed samplewise, resulting in one matrix per sample.

Figure 22: Outer product analysis.

Page 75: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

1H-NMR + LC/MS | 75

Data correlation

Correlation scaling

Paper VI shows the potential of using prior knowledge, in this case identified metabolites, in the data analysis. 1H-NMR variables correlated with drug metabolites variables in LC/MS151 were identified by correlation scaling and PARAFAC analysis. The correlation scaling was performed by multiplying the NMR variables by the correlation coefficient (33), scaled to the range [0 1], between the time trajectory of each 1H-NMR variable and the LC/MS variables from the identified drug metabolites. These identified 1H-NMR signals (most likely arising from drug metabolites) were then eliminated from 1H-NMR data. The drug metabolite variables in LC/MS were also eliminated, and the two data sets were fused by OPA and again analysed by PARAFAC. The data fusion step was required to achieve satisfactory class separation. The PARAFAC analysis gave rise, of course, to new 1H-NMR variables, separating the classes of dosed and control rats. The kinetics of those variables differed from the kinetics of the drug metabolites and they were thus identified as potential endogenous markers for phospholipidosis. Some of the peaks could be identified from the literature (see paper VI). An example is shown in Figure 23.

Page 76: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

76 | Processing and analysis of NMR data

7.80 7.73 7.66 7.59 7.52 7.44-5

1 3

7 10

14

δ(1H)/ppm

day of dosage

Figure 23: A time profile of a 1H-NMR spectral segment. The peak at 7.75 ppm was selected in the correlation scaling analysis. It grows with time in the same way as a drug metabolite. The peaks at 6.64 and 7.56 ppm show a decrease (minimum at day 3) and then increase with time, these peaks being selected in the PARAFAC analysis after the elimination of data correlating with identified

drug metabolites and being identified as hippurate.

Page 77: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

1H-NMR + LC/MS | 77

PLS

PLS was also used to find correlating variables in paper VI. There the 1H-NMR data were chosen as X and selected LC/MS variables identified as drug metabolites were chosen as y. The loadings from PLS (unfolded data) and N-PLS showed the variables separating the sample classes and correlating with the chosen LC/MS variables. These results agreed with the correlation scaling and PARAFAC analysis. However, problems occurred in the next step (with eliminated metabolite variables), when the separation of 1H-NMR data was not satisfactory, and no data fusion was possible since the LC/MS data were used as y data.

DiscussionIn paper V, data fusion by full hierarchical modelling and concatenation of data generally improved the classification, especially for the unsupervised models (PCA). This indicates that we may have some complementary data between the two data sets. The batch modelling, however, was not very successful in this study.

In paper VI, too, the separation between dosed and control classes was improved by data fusion, although this time OPA was utilised for the purpose. The large-scale use of data fusion in general was discovered in this work when some dominating peaks had been removed and the data fusion was actually needed to discriminate between the different groups of samples.

It is not possible to choose the best data fusion method on the basis of these studies. The four methods presented will be successful to different extents in different cases. The scaling of data is of great importance and should be optimised by a proper validation, as exemplified in paper V.

The data correlation methodology in Paper VI shows a lot of potential in the interpretation of metabolic data. More variables could be selected in both steps of the analysis and hence give more signals for interpretation. Moreover, further analysis of the LC/MS data still remains to be done.

Page 78: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

78 | Processing and analysis of NMR data

Page 79: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Conclusions | 79

8 Conclusions

Determination of low-level components by qNMR in organic and biological samples is very easy. The sample is simply dissolved in a deuterated solvent, perhaps buffered to reduce pH variations, put in an NMR sample tube and placed in the magnet. A lot of standard pulse programs have already been prepared, ready for use, in the modern NMR instruments.

However, to get good spectral resolution and quantitative information out of an NMR spectrum, the instrumental parameters have to be carefully chosen. Knowledge of the influence of the sample matrix, relaxation processes and the NMR instrumentation will be of great importance. The optimal parameter settings must be selected for each new type of sample. The work of this thesis includes only the initial step to enable qualified guesses to be made for the basic starting parameters in a qNMR experiment.

The evaluation of qNMR data has been the subject uppermost in my thoughts during this work. Classical data pretreatments (zero filling and line broadening) were tried, together with more unusual and novel ones. The conclusions from these studies were that the classical methods (with typical setups) had very little influence on the quantitative results (papers I–II), while the non-standard methods (convolution by a triangular function and peak alignment) significantly improved the evaluation of qNMR data (papers II–IV).

Extensions in the field of chemometrics were explored by fusion of data from the various analytical techniques 1H-NMR and LC/MS (papers V–VI). Data fusion was shown to be successful when more detailed studies in the interpretation of 1H-NMR data were performed by means of correlation with LC/MS data.

Page 80: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

80 | Processing and analysis of NMR data

Page 81: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Future perspectives | 81

9 Future perspectives

Of course, several additional parameter setups, data pretreatments, and calibration models are possible in the field of qNMR. Experience from the work carried out on this thesis provides me with even more motivation to tackle new qNMR problems and try new methods. If I get the chance to proceed with this work, my immediate efforts would include further work on the profile/peak alignment problem and nonlinear modelling, which may handle peak shifts, incomplete nuclear relaxations and nonlinear relations between samples. Moreover, the project of data fusion has just been initiated and a lot of scaling and methods remain to be investigated.

In order to extract the maximum information out of a set of samples, the noise must be minimised and the resolution maximised. As the time devoted to experiments has taken up a very small part of the total time in the research projects presented, an increase of more transients in favour of S/N would be beneficial. More and shorter pulses would probably have given a lower LOD in papers I–II. For the metabonomic samples in papers III–VI, an improved spectral resolution, in particular, would have been desirable. This will require improved instrumental facilities such as the sample probe and improved shimming. This would facilitate and improve the final interpretation.

Metabonomic sample collection (which is really outside my subject) may also be an area for improvement. More frequently collected samples would, for example, give a more reliable estimate of the kinetics of a metabolite or biomarker (paper VI).

In the future, the metabolic profiling analysis will perhaps have a more important role to play in toxicology, enabling classification of the type of toxicity. This will probably require more extensive data

Page 82: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

82 | Processing and analysis of NMR data

analysis than the most straightforward types of analysis. In papers III–VI, the relevant goal is to find a biomarker time pattern for a specific type of toxicity, namely phospolipidosis. This task (and many others) still remains to be solved and presents a challenge. More than one analysis technique will most likely be needed to give a clear and significant diagnosis or classification. This will require appropriate methods of data fusion. Furthermore, metabolomics, including metabolic profiling, will most likely be of greater significance in the area of systems biology in the future, requiring data fusion techniques.

Page 83: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

Acknowledgements | 83

Acknowledgements

Jag vill hjärtligt tacka alla Ni som direkt eller indirekt hjälpt och stöttat mig under min tid som doktorand på Institutionen för Analytisk kemi (IAK). Det har varit roligt att gå till jobbet varje dag.

Sven, tack för din oändliga entusiasm när det gäller forskningen, det smittar av sig. Få har sån koll på vad som är rätt och hett, det har varit ett privilegium att få ha dig som handledare.

Bosse, tack för att du visat både vetenskapligt och personligt engagemang, trots att du inte har varit min huvudhandledare.

Tack, alla ni doktorander före mig, kursassistenter och inspiratörer: Åsa, Gunnar, Jonas B, Kakan, Ludde, Magnus A, Bettan, Anders Ch, Joanna, Petr, David och Eva.

Och TACK alla ni samtida doktorander: Helena I (för ett fantastiskt samarbete i Bergen, på Smyril, på Färöarna, i källaren, i Ume och i Tänndalen – high five!), Malin (för alla allvarliga och lättsamma prat och ditt smittande skratt), Magnus Å (för alla bra diskussioner, mer och mindre vetenskapliga), Stina C.B. (för trevligt rumssällskap och handledning på kurslab), Leila (tack för att du hjälper mig att behålla mina vänner – och för att du dansar så bra!), Helena H (för att du är en lyssnare och en toppen-träningskompis), Petter (tack för allt tjôt om JAVA, barn, riskokare, brödbak och locket på eller av), Stina M (för din humor och för att du är så bra på danska – va fäen Määts!), Magnus E (för att du kan vara både arg och glad), Johanna (för ditt lugn och din attityd – det är en konst att leva i nuet), Sindra (för att du hjälpte mig överleva en grundkurs som assistent), Ove (för en helt underbar kajak-tur och för att du är en rättskaffens), Yvonne (som ju tydligen var doktor hela tiden, fast bara inte på pappret), Ragnar (tack för gott samarbete, och lunchsällskap i bögparken), Thorvald (för att du är så sträng mot studenterna men så barnkär), Ralf (för alla diskussioner som inte leder nånstans, och för alla de som leder nånstans), Kent (för bugg-kursen), Nana (för allt dagis-prat), Leo (för att du fortsätter att jobba för jämställdheten på IAK – och aligna NMR data), Lina (tack för din oblyghet och glada humör), Christoffer (den perfekta studenten), Axel (för det tyska popen), Helena B, Bodil, Cristina S and Sharam.

Page 84: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

84 | Processing and analysis of NMR data

Tack alla ni andra som gör/gjorde det så trevligt på IAK: Jonas R (för att du så väl har balanserat upp mitt ”bråttom” med ditt lugn, du är ovärderlig), Anders Co (för att du är så mån om att stämningen skall vara god på IAK), Anita (för trevligt samarbete på kurs), AnneMarie (för allt du alltid fixar), Carlo (för vin och kulinariska upplevelser), Håkan (för äventyrliga hemresor) Pol (for your whistling), Conny (för att du så vänligt lånar ut din scanner), Ulrika, Lena, Björn, Roger, Ulla, Ramon, Eva, Rolf och Boven.

Och tack Hillevi på kemibiblioteket för din gränslösa service!

Stort tack Ni på AstraZeneca som har hjälpt mig med allt från teoretiska problem till att hitta min dator och mina tofflor: Fredrik, Ina, Bengt, Johan, Ingrid, Joanna med flera. Tack särskilt till Susanne, Anders och Alex; er NMR-hjälp har varit ovärderlig!

Ett extra tack till Er som har korrekturläst och kommit med värdefulla åsikter om manus: Magnus, Susanne, Ralf, Bosse, Sven, Jonas R.

Jag vill också tacka för värdefull inspiration från Ume: Henrik, Johan, Suss, Mattias, Pera m.fl.

Tack alla ni sköna vänner som inte har med kemi att göra: Jenny, Micke, Camilla, Mathias, Gunilla, Alvin, Eva, Anna, Robert, Karin (tackas extra för snowboard och surfing), Danne, Jenny, Sandis, Christina, Magnus, Magda, Ludde, Annika, Peter, Martin, Marie, Ola, Klara.

Jag vll också tacka Ylva, Kjell, Folke (d.ä.), Monica, Tim och Frida för släktkalas och visat intresse.

Tack också alla trivsamma grannar, uthålliga barnvakter, trevliga dagisföräldrar och proffsig dagis- och skolpersonal.

Och min närmsta familj: Farmor och Morfar, tack för att ni finns/fanns. Mamma och Pappa, tack för ert outtröttliga och kravlösa stöd, jag är er evigt tacksam för det. Sara och Carl med familjer, tack för att ni gör något helt annat än jag och på så sätt ger mig perspektiv.

Men... ♫♪ Vad vore livet utan Östen, vad vore livet utan Folke, vad vore livet utan Mina – Ingenting!! – Ingenting!! – Ingenting!! ♪♫ Östen, tack för att du är som du är, jag älskar dig. Folke och Mina, mina kära solstrålar! Det är nog i alla fall ni som har hjälpt mig mest i detta arbetet, genom att ni ger mig den viktiga distansen till allt det här.

Page 85: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 85

References

1. Szantay Jr, C. and Demeter, A., NMR spectroscopy, Progress in Pharmaceutical and Biomedical Analysis 4 (2000) 109-145.

2. Al Deen, T.S., Hibbert, D.B., Hook, J.M. and Wells, R.J., An uncertainty budget for the determination of the purity of glyphosate by quantitative nuclear magnetic resonance (QNMR) spectroscopy, Accreditation and Quality Assurance 9 (2004) 55-63.

3. Bauer, M., Reproducibility of 1H-NMR integrals: a collaborative study, Journal of Pharmaceutical and Biomedical Analysis 17 (1998) 419-425.

4. Derome, A.E., Modern NMR Techniques for Chemistry Research, Series ed: Baldwin, J.E. and Magnus, P.D. Vol. 6. Pergamon, Oxford, UK (1987)

5. Grivet, J.-P., Accuracy and precision of intensity determinations in quantitative NMR, Data Handling in Science and Technology (1996) 306-329.

6. Malz, F. and Jancke, H., Validation of quantitative NMR, Journal of Pharmaceutical and Biomedical Analysis 38 (2005) 813.

7. Maniara, G., Rajamoorthi, K., Rajan, S. and Stockton, G.W., Method performance and validation for quantitative analysis by 1H and 31P NMR spectroscopy. Applications to analytical standards and agricultural chemicals, Analytical Chemistry 70 (1998) 4921-4928.

8. Pieters, L.A.C. and Vlietinck, A.J., Applications of quantitative 1H- and 13C-NMR spectroscopy in drug analysis, Journal of Pharmaceutical and Biomedical Analysis 7 (1989) 1405-1417.

9. Rabenstein, D.L. and Keire, D.A., Quantitative chemical analysis by NMR, Practical Spectroscopy 11 (1991) 323-369.

10. Bain, A.D. and Lao, L., NMR as a quantitative analytical tool in chemical applications of isotopes. in: Isotopes in the Physical and Biomedical Science, Ed: Buncel, E. and R., J.J., Elsevier Science Publishers B. V., Amsterdam, (1991) pp. 411-429.

11. Becker, E.D., Ferretti, J.A. and Gambhir, P.N., Selection of optimum parameters for pulse Fourier transform nuclear magnetic resonance, Analytical Chemistry 51 (1979) 1413-20.

Page 86: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

86 | Processing and analysis of NMR data

12. Ernst, R.R., Bodenhausen, G. and Wokaun, A., Principles of nuclear magnetic resonance in one and two dimensions. Clarendon, Oxford (1987)

13. Field, L.D. and Sternhell, S., Analytical NMR. John Wiley & sons, Chichester (1989)

14. Fukushima, E. and Roeder, S.B.W., Experimental Pulse NMR: A Nuts and Bolts Approach. Addison-Wesley Pub. Comp., Massachusetts (1981)

15. King, R.W. and Williams, K.R., The Fourier transform in chemistry. Part 1. Nuclear magnetic resonance: introduction, Journal of Chemical Education 66 (1989) A213-A219.

16. King, R.W. and Williams, K.R., The Fourier transform in chemistry. Part 2. Nuclear magnetic resonance: the single pulse experiment, Journal of Chemical Education 66 (1989) A243-A248.

17. Skoog, D.A., Holler, F.J. and Nieman, T.A., Principles of Instrumental Analysis. 5:th ed. Saunders College Publishing, Florida (1998)

18. Bloch, F., Hansen, W. and Packard, M.E., Nuclear induction, Physical Review 69 (1946) 127.

19. Purcell, E.M., Torrey, H.C. and Pound, R.V., Resonance absorption by nuclear magnetic moments in a solid, Physical Review 69 (1946) 37-38.

20. Gueron, M., Plateau, P. and Decorps, M., Solvent Signal Suppression in NMR, Progress in Nuclear Magnetic Resonance Spectroscopy 23 (1991) 135-209.

21. Braun, S., Kalinowski, H.-O. and Berger, S., 150 and more basic NMR experiments: a practical course. 2nd ed. Wiley-VCH, Weinheim (1998)

22. Delsuc, M.A. and Lallemand, J.Y., Improvement of dynamic range in nmr by oversampling, Journal of Magnetic Resonance 69 (1986) 504-507.

23. Cross, K.J., Digital filtering, Data Handling in Science and Technology 18 (1996) 120-145.

24. Glasser, L., Fourier transforms for chemists. Part II. Fourier transforms in chemistry and spectroscopy, Journal of Chemical Education 64 (1987) A260-A266.

25. Griffiths, L. and Irving, A.M., Assay by nuclear magnetic resonance spectroscopy: quantification limits, Analyst 123 (1998) 1061-1068.

26. Ernst, R.R., Sensitivity Enhancement in Magnetic Resonance. in: Advances in Magnetic Resonance, Ed: Waugh, J.S., Academic Press Inc., New York, (1966) pp. 1-135.

27. Lindon, J.C. and Ferrige, A.G., digitisation and data processing in Fourier rtransform NMR, Progress in NMR spectroscopy 14 (1980) 27-66.

28. Hodginsson, P. and Hore, P.J., Sampling and the quantification of NMR data. in: Advances in magnetic and optical resonance, Academic press, (1997).

29. Martin, Y.L., A Global Approach to Accurate and Automatic Quantitative Analysis of NMR Spectra by Complex Least-Squares Curve Fitting, Journal of Magnetic Resonance. Series A 111 (1994) 1-10.

30. Gesmar, H., Led, J.J. and Abildgaard, F., Improved Methods For Quantitative Spectral Analysis Of NMR Data, Progress in Nuclear Magnetic Resonance Spectroscopy 22 (1990) 255-288.

Page 87: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 87

31. Bretthorst, G.L., Bayesian-Analysis. I. Parameter Estimation Using Quadrature NNR Models, Journal of Magnetic Resonance 88 (1990) 533-551.

32. Kotyk, J.J., Hoffman, N.G., Hutton, W.C., Bretthorst, G.L. and Ackerman, J.J.H., Comparison Of Fourier And Bayesian-Analysis Of Nmr Signals. II. Examination Of Truncated Free Induction Decay Nmr Data, Journal of Magnetic Resonance. Series A 116 (1995) 1-9.

33. Sibisi, S., Skilling, J., Brereton, R.G., Laue, E.D. and Staunton, J., Maximum entropy signal processing in practical NMR spectroscopy, Nature 311 (1984) 446-447.

34. Evilia, R.F., Quantitative NMR spectroscopy, Analytical Letters 34 (2001) 2227-2236.

35. Pauli, G.F., Jaki, B.U. and Lankin, D.C., Quantitative 1H NMR: Development and potential of a method for natural products analysis, Journal of Natural Products 68 (2005) 133-149.

36. Muller, N. and Goldenson, J., Rapid analysis of reaction mixtures by nuclear magnetic resonance spectroscopy, Journal of the American Chemical Society 78 (1956) 5182-5183.

37. Hollis, D.P., Quantitative analysis of aspirin, phenacetin, and caffeine mixtures by Nuclear Magnetic Resonance Spectrometry, Analytical Chemistry 35 (1963) 1682-1684.

38. Jungnickel, J.L. and Forbes, J.W., Quantitative Measurement of Hydrogen Types by Intregated Nuclear Magnetic Resonance Intensities, Analytical Chemistry 35 (1963) 938-942.

39. Rackham, D.M., Recent applications of quantitative nuclear magnetic resonance spectroscopy in pharmaceutical research, Talanta 23 (1976) 269-274.

40. Pauli, G.F., qNMR - A versatile concept for the validation of natural product reference compounds, Phytochemical Analysis 12 (2001) 28-42.

41. Szantay Jr, C., High-field NMR spectroscopy as an analytical tool for quantitative determinations: pitfalls, limitations and possible solutions, TrAC Trends in Analytical Chemistry 11 (1992) 332-344.

42. Nicholson, J.K. and Wilson, I.D., High resolution proton magnetic resonance spectroscopy of biological fluids, Progress in NMR spectroscopy 21 (1989) 449-501.

43. Ichikawa, M., Nonaka, N., Amano, H., Takada, I., Ishimori, S., Andoh, H. and Kumamoto, K., Proton NMR Analysis of Octane Number for Motor Gasoline: Part V, Applied Spectroscopy 46 (1992) 1548-1551.

44. Sarpal, A.S., Kapur, G.S., Mukherjee, S. and Jain, S.K., Estimation of oxygenates in gasoline by 13C NMR spectroscopy, Energy & Fuels 11 (1997) 662-667.

45. Yang, C.M., Grey, A.A., Archer, M.C. and Bruce, W.R., Rapid Quantitation of Thermal Oxidation Products in Fats and Oils by 1H-NMR Spectroscopy, Nutrition and Cancer 30 (1998) 64-68.

46. Lachenmeier, D.W., Frank, W., Humpfer, E., Schafer, H., Keller, S., Mörtter, M. and Spraul, M., Quality control of beer using high-resolution nuclear magnetic resonance spectroscopy and multivariate analysis, European Food Research And Technology 220 (2005) 215-221.

Page 88: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

88 | Processing and analysis of NMR data

47. Vogels, J.T.W.E., Tas, A.C., Venekamp, J. and Van der Greef, J., Partial Linear Fit: a new NMR spectroscopy preprocessing tool for pattern recognition applications, Journal of Chemometrics 10 (1996) 425-438.

48. Holzgrabe, U., Diehl, B.W.K. and Wawer, I., NMR spectroscopy in pharmacy, Journal of Pharmaceutical and Biomedical Analysis 17 (1998) 557-616.

49. Lacroix, P.M., Dawson, B.A., Sears, R.W., Black, D.B., Cyr, T.D. and Ethier, J.C., Fenofibrate raw materials: HPLC methods for assay and purity and an NMR method for purity, Journal of Pharmaceutical and Biomedical Analysis 18 (1998) 383-402.

50. Lankhorst, P.P., Poot, M.M. and deLange, M.P.A., Quantitative determination of lovastatin and dihydrolovastatin by means of 1H NMR spectroscopy, Pharmacopeial Forum 22 (1996) 2414-2422.

51. Wells, R.J., Hook, J.M., Al-Deen, T.S. and Hibbert, D.B., Quantitative nuclear magnetic resonance (QNMR) spectroscopy for assessing the purity of technical grade agrochemicals: 2,4-dichlorophenoxyacetic acid (2,4-D) and sodium 2,2-dichloropropionate (dalapon sodium), Journal of Agricultural and Food Chemistry 50 (2002) 3366-3374.

52. Berregi, I., Santos, J.I., del Campo, G. and Miranda, J.I., Quantitative determination of (-)-epicatechin in cider apple juices by 1H NMR, Talanta 61 (2003) 139-145.

53. Fernández-Ramírez, A., Salazar-Cavazos, M.D.L., Rivas-Galindo, V., Ceniceros-Almaguer, L. and Waksman de Torres, N., Determination of isoperoxisomicine A1 content in peroxisomicine A1 batches by means of 1H NMR, Analytical Letters 37 (2004) 2433-2444.

54. Ruiz-Calero, V., Saurina, J., Galceran, M.T., Hernández-Cassou, S. and Puignou, L., Estimation of the composition of heparin mixtures from various origins using proton nuclear magnetic resonance and multivariate calibration methods, Analytical and Bioanalytical Chemistry 373 (2002) 259-265.

55. Ruiz-Calero, V., Saurina, J., Galceran, M.T., Hernández-Cassou, S. and Puignou, L., Potentiality of proton nuclear magnetic resonance and multivariate calibration methods for the determination of dermatan sulfate contamination in heparin samples, Analyst 125 (2000) 933-938.

56. Keun, H.C., Beckonert, O., Griffin, J.L., Richter, C., Moskau, D., Lindon, J.C. and Nicholson, J.K., Cryogenic probe 13C NMR spectroscopy of urine for metabonomic studies, Analytical Chemistry 74 (2002) 4588-4593.

57. Griffin, J.L., Nicholls, A.W., Keun, H.C., Mortishire-Smith, R.J., Nicholson, J.K. and Kuehn, T., Metabolic profiling of rodent biological fluids via 1H NMR spectroscopy using a 1 mm microlitre probe, Analyst 127 (2002) 582-584.

58. Hoult, D.I. and Richards, R.E., The signal-to-noise ratio of the Nuclear Magnetic Resonance Experiment, Journal of Magnetic Resonance 24 (1976) 71-85.

59. Lindon, J.C., Nicholson, J.K. and Wilson, I.D., Direct coupling of chromatographic separations to NMR spectroscopy, Progress in Nuclear Magnetic Resonance Spectroscopy 29 (1996) 1-49.

60. Hoult, D.I., Sensitivity optimization [in NMR], Topics in Carbon 13 NMR Spectroscopy 3 (1979) 16-28.

Page 89: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 89

61. Ernst, R.R. and Anderson, W.A., Application of Fourier transform spectroscopy to magnetic resonance, The Review of Scientific Instruments 37 (1966) 93-102.

62. Traficante, D.D., Optimum tip angle and relaxation delay for quantitative analysis, Concepts in Magnetic Resonance 4 (1992) 153-160.

63. Rabenstein, D.L., Sensitivity enhancement by signal averaging in pulsed/Fourier transform NMR spectroscopy, Journal of Chemical Education 61 (1984) 909-913.

64. Traficante, D.D. and Steward, L.R., The relationship between sensitivity and integral accuracy. Further comments on optimum tip angle and relaxation delay for quantitative analysis, Concepts in Magnetic Resonance 6 (1994) 131-135.

65. Roy, J., Pharmaceutical Impurities -A Mini-Review, AAPS PharmSciTech 3 (2002) 1-8.

66. Fux, P., Determination of trace amounts of polydimethylsiloxane in extracts of chemicals by proton nuclear magnetic-resonance spectroscopy, Analyst 115 (1990) 179-183.

67. Kestner, T.A., Newmark, R.A. and Bardeau, C., Quantitative analysis of polyethylene oxide in polyethylene by 1H-NMR spectroscopy, Polymer Preprints (American Chemical Society, Division of Polymer Chemistry) 37 (1996) 232-233.

68. Lindgren, B. and Martin, J.R., Nuclear Magnetic Resonance (NMR) as a tool for quantitative determination, Pharmaeuropa 5 (1993) 51-54.

69. Simeral, L.S., Improved Analysis of Minor Species in Mixtures Using Carbon-13 Decoupling in One-Dimensional Proton NMR, Applied Spectroscopy 49 (1995) 400-402.

70. Soininen, P., Haarala, J., Vepsalainen, J., Niemitz, M. and Laatikainen, R., Strategies for organic impurity quantification by 1H NMR spectroscopy: Constrained total-line-shape fitting, Analytica Chimica Acta 542 (2005) 178-185.

71. Thakur, K.A.M., Kean, R.T., Hall, E.S., Doscotch, M.A. and Munson, E.J., A quantitative method for determination of lactide composition in poly(lactide) using 1H NMR, Analytical Chemistry 69 (1997) 4303-4309.

72. Vogels, J.T.W.E. and Venekamp, J.C., Profiling impurities in Chlomethiazole Edisilate using proton NMR spectroscopy and multivariate calibration techniques, TNO Report V97.659, TNO Nutrition and Food Research Institute, Zeist, (1997)

73. Wyszecka-Kaszuba, E., Warowna-Grzeskiewicz, M. and Fijalek, Z., Determination of 4-aminophenol impurities in multicomponent analgesic preparations by HPLC with amperometric detection, Journal of Pharmaceutical and Biomedical Analysis 32 (2003) 1081-1086.

74. Lundsted, L.G. and Ile, G., Polyoxyalkylene compounds, United States Patent, no: 2 674 619 (1954)

75. Scherlund, M., Malmsten, M., Holmqvist, P. and Brodin, A., Thermosetting microemulsions and mixed micellar solutions as drug delivery systems for periodontal anesthesia, International Journal of Pharmaceutics 194 (2000) 103-116.

76. Berg, M., Djurle, A., Melin, M. and Nyman, P., Process for the purification of aldehyde impurities, United States Patent, no: 6448371 B1 (2002)

77. Bales, J.R., Higham, D.P., Howe, I., Nicholson, J.K. and Sadler, P.J., Use of high-resolution proton nuclear magnetic resonance spectroscopy for rapid multi-component analysis of urine, Clinical Chemistry 30 (1984) 426-432.

Page 90: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

90 | Processing and analysis of NMR data

78. Bock, J.L., Analysis of Serum by High-Field Proton Nuclear Magnetic Resonance, Clinical Chemistry 28 (1982) 1873-1877.

79. Nicholson, J.K., Buckingham, M.J. and Sadler, P.J., High resolution 1H NMR studies of vertebrate blood and plasma, Biochemical Journal 211 (1983) 605-615.

80. Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G. and Kell, D.B., Metabolomics by numbers: acquiring and understanding global metabolite data, Trends In Biotechnology 22 (2004) 245-252.

81. Lindon, J.C., Nicholson, J.K., Holmes, E. and Everett, J.R., Metabonomics: Metabolic processes studied by NMR spectroscopy of biofluids, Concepts in Magnetic Resonance 12 (2000) 289-320.

82. Nicholson, J.K., Lindon, J.C. and Holmes, E., “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data, Xenobiotica 29 (1999) 1181-1189.

83. Fiehn, O., Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks, Comparative and Functional Genomics 2 (2001) 155-168.

84. Keun, H.C., Metabonomic modeling of drug toxicity, Pharmacology & Therapeutics In Press (2005) Available online 26 July 2005.

85. Antti, H., Bollard, M.E., Ebbels, T., Keun, H., Lindon, J.C., Nicholson, J.K. and Holmes, E., Batch statistical processing of 1H NMR-derived urinary spectral data, Journal of Chemometrics 16 (2002) 461-468.

86. Beckwith-Hall, B.M., Brindle, J.T., Barton, R.H., Coen, M., Holmes, E., Nicholson, J.K. and Antti, H., Application of orthogonal signal correction to minimise the effects of physical and biological variation in high resolution 1H NMR spectra of biofluids, Analyst 127 (2002) 1283-1288.

87. Brekke, T., Kvalheim, O.M. and Sletten, E., Prediction of Physical Properties of Hydrocarbon Mixtures by Partial-Least-Squares Calibration of Carbon-13 Nuclear Magnetic Resonance Data, Analytica Chimica Acta 223 (1989) 123-134.

88. Cloarec, O., Dumas, M.E., Trygg, J., Craig, A., Barton, R.H., Lindon, J.C., Nicholson, J.K. and Holmes, E., Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies, Analytical Chemistry 77 (2005) 517-526.

89. Defernez, M. and Colquhoun, I.J., Factors affecting the robustness of metabolite fingerprinting using 1H NMR spectra, Phytochemistry 62 (2003) 1009-1017.

90. Holmes, E. and Antti, H., Chemometric contributions to the evolution of metabonomics: mathematical solutions to characterising and interpreting complex biological NMR spectra, Analyst 127 (2002) 1549-1557.

91. Holmes, E., Foxall, P.J.D., Nicholson, J.K., Neild, G.H., Brown, S.M., Beddell, C.R., Sweatman, B.C., Rahr, E., Lindon, J.C., Spraul, M., and Neidig, P., Automatic Data Reduction and Pattern-Recognition Methods for Analysis of 1H Nuclear Magnetic Resonance Spectra of Human Urine from Normal And Pathological States, Analytical Biochemistry 220 (1994) 284-296.

Page 91: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 91

92. Holmes, E., Nicholson, J.K., Nicholls, A.W., Lindon, J.C., Connor, S.C., Polley, S. and Connelly, J., The identification of novel biomarkers of renal toxicity using automatic data reduction techniques and PCA of proton NMR spectra of urine, Chemometrics and Intelligent Laboratory Systems 44 (1998) 245-255.

93. Lenz, E.M., Bright, J., Knight, R., Wilson, I.D. and Major, H., Cyclosporin A-induced changes in endogenous metabolites in rat urine: A metabonomic investigation using high field 1H NMR spectroscopy, HPLC-TOF/MS and chemometrics, Journal of Pharmaceutical and Biomedical Analysis 35 (2004) 599-608.

94. Lenz, E.M., Bright, J., Knight, R., Wilson, I.D. and Major, H., A metabonomic investigation of the biochemical effects of mercuric chloride in the rat using 1H NMR and HPLC-TOF/MS: time dependant changes in the urinary profile of endogenous metabolites as a result of nephrotoxicity, Analyst 129 (2004) 535-541.

95. Potts, B.C.M., Deese, A.J., Stevens, G.J., Reily, M.D., Robertson, D.G. and Theiss, J., NMR of biofluids and pattern recognition: assessing the impact of NMR parameters on the principal component analysis of urine from rat and mouse, Journal of Pharmaceutical and Biomedical Analysis 26 (2001) 463-476.

96. Spraul, M., Neidig, P., Klauck, U., Kessler, P., Holmes, E., Nicholson, J.K., Sweatman, B.C., Salman, S.R., Farrant, R.D., Rahr, E., Beddell, C.R., and Lindon, J.C., Automatic Reduction of NMR Spectroscopic Data for Statistical and Pattern Recognition Classification of Samples, Journal of Pharmaceutical and Biomedical Analysis 12 (1994) 1215-1225.

97. Williams, R.E., Lenz, E.M., Evans, J.A., Wilson, I.D., Granger, J.H., Plumb, R.S. and Stumpf, C.L., A combined 1H NMR and HPLC-MS-based metabonomic study of urine from obese (fa/fa) Zucker and normal Wistar-derived rats, Journal of Pharmaceutical and Biomedical Analysis 38 (2005) 465-471.

98. MATLAB 6.5. The MathWorks Inc., Natick, MA (2002)99. Wold, S., Chemometrics; what do we mean with it, and what do we want from it?

Chemometrics and Intelligent Laboratory Systems 30 (1995) 109-115.100. Brereton, R.G., Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

John Wiley & Sons, West Sussex, UK. (2003)101. Eriksson, L., Johansson, E., Kettaneh-Wold, N. and Wold, S., Multi- and

Megavariate Data Analysis, Principles and applications. Umetrics Academy, Umeå (2001)

102. Martens, H. and Naes, T., Multivariate Calibration. Wiley, Chichester, Sussex, UK (1991)

103. Jackson, J.E., A user’s guide to principal components. Wiley, New York (1991)104. Wold, S., Esbensen, K. and Geladi, P., Principal Component Analysis, Chemometrics

and Intelligent Laboratory Systems 2 (1987) 37-52.105. Pearson, K., On lines and planes of closest fit to systems of points in space, Philosophical

Magazine and Journal B 2 (1901) 559-572.106. Keun, H.C., Ebbels, T.M.D., Antti, H., Bollard, M.E., Beckonert, O.,

Schlotterbeck, G., Senn, H., Niederhauser, U., Holmes, E., Lindon, J.C., and Nicholson, J.K., Analytical reproducibility in 1H NMR-based metabonomic urinalysis, Chemical Research in Toxicology 15 (2002) 1380-1386.

Page 92: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

92 | Processing and analysis of NMR data

107. Bro, R., PARAFAC. Tutorial and applications, Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171.

108. Kiers, H.A.L. and Van Mechelen, I., Three-Way Component Analysis: Principles and Illustrative Application, Physiological methods 6 (2001) 84-110.

109. Wold, H., Estimation of principal components and related models by iterative least squares. in: Multivariate analysis, Ed: Krishnaiah, P.R., Academic Press, New York, (1966) pp. 391-420.

110. Wold, H., Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach. in: Perspectives in probability and statistics, Ed: Gani, J., Jerusalem Academic Press, Jerusalem, (1975) pp. 117-142.

111. Wold, S., Martens, H. and Wold, H. The multivariate calibration problem in chemistry solved by the PLS method. in: Matrix Pencils (Lecture notes in mathematics). 1983. Pite Havsbad, Springer Verlag, Heidelberg.

112. Barker, M. and Rayens, W., Partial least squares for discrimination, Journal of Chemometrics 17 (2003) 166-173.

113. Geladi, P. and Kowalski, B.R., Partial least-squares regression: a tutorial, Analytica Chimica Acta 185 (1986) 1-17.

114. Wold, S., Sjöstrom, M. and Eriksson, L., PLS-regression: a basic tool of chemometrics, Chemometrics and Intelligent Laboratory Systems 58 (2001) 109-130.

115. Bro, R., Multiway calibration. Multilinear PLS, Journal of Chemometrics 10 (1996) 47-61.

116. Wold, S., Kettaneh Wold, N. and Skagerberg, B., Nonlinear PLS Modeling, Chemometrics and Intelligent Laboratory Systems 7 (1989) 53-65.

117. Thomsen, J.U. and Meyer, B., Pattern Recognition of the 1H NMR Spectra of Sugar Alditols Using a Neural Network, Journal of Magnetic Resonance 84 (1989) 212-217.

118. Amendolia, S.R., Doppiu, A., Ganadu, M.L. and Lubinu, G., Classification and quantitation of 1H NMR spectra of alditols binary mixtures using artificial neural networks, Analytical Chemistry 70 (1998) 1249-1254.

119. Zupan, J. and Gasteiger, J., Neural Networks for Chemists. VCH Verlagsgesellschaft, Weinheim (1993)

120. Geladi, P., MacDougall, D. and Martens, H., Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat, Applied Spectroscopy 39 (1985) 491-500.

121. Wold, S., Antti, H., Lindgren, F. and Öhman, J., Orthogonal signal correction of near-infrared spectra, Chemometrics and Intelligent Laboratory Systems 44 (1998) 175-185.

122. Stone, M., Cross-Validatory Choise and Assessment of Statistical Predictions, Journal of the Royal Statistical Society 36 (1974) 111-147.

123. Pierce, K.M., Hope, J.L., Johnson, K.J., Wright, B.W. and Synovec, R.E., Classification of gasoline data obtained by gas chromatography using a piecewise alignment algorithm combined with feature selection and principal component analysis, Journal of Chromatography A In Press (2005) Available online 17 May.

Page 93: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 93

124. Åberg, M., Thesis: Variance reduction in analytical chemistry, Analytical Chemistry, Stockholm University (2004)

125. Mahalanobis, P.C., On tests and measures of group divergence. Part I. Theoretical formulae., Journal and Proceedings of the Asian Society of Bengal 26 (1930) 541-588.

126. Jaumot, J., Vives, M., Gargallo, R. and Tauler, R., Multivariate resolution of NMR labile signals by means of hard- and soft-modelling methods, Analytica Chimica Acta 490 (2003) 253-264.

127. Rabenstein, D.L. and Isab, A.A., Determination of the intracellular pH of intact erythrocytes by 1H NMR spectroscopy, Analytical Biochemistry 121 (1982) 423-432.

128. Bollard, M.E., Stanley, E.G., Lindon, J.C., Nicholson, J.K. and Holmes, E., NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition, NMR in Biomedicine 18 (2005) 143-162.

129. Brown, T.R. and Stoyanova, R., NMR spectral quantitation by principal-component analysis. II. Determination of frequency and phase shifts, Journal of Magnetic Resonance. Series B 112 (1996) 32-43.

130. Stoyanova, R., Nicholls, A.W., Nicholson, J.K., Lindon, J.C. and Brown, T.R., Automatic alignment of individual peaks in large high-resolution spectral data sets, Jounal of Magnetic Resonance 170 (2004) 329-335.

131. Trygg, J. and Wold, S., Orthogonal projections to latent structures (O-PLS), Journal of Chemometrics 16 (2002) 119-128.

132. Witjes, H., Melssen, W.J., Zandt, H., van der Graaf, M., Heerschap, A. and Buydens, L.M.C., Automatic correction for phase shifts, frequency shifts, and lineshape distortions across a series of single resonance lines in large spectral data sets, Journal of Magnetic Resonance 144 (2000) 35-44.

133. Lee, G.-C. and Woodruff, D.L., Beam search for peak alignment of NMR signals, Analytica Chimica Acta 513 (2004) 413-416.

134. Torgrip, R.J.O., Åberg, M., Karlberg, B. and Jacobsson, S.P., Peak alignment using reduced set mapping, Journal of Chemometrics 17 (2003) 573-582.

135. Kassidas, A., MacGregor, J.F. and Taylor, P.A., Synchronization of batch trajectories using dynamic time warping, Aiche Journal 44 (1998) 864-875.

136. Wang, C.P. and Isenhour, T.L., Time-Warping Algorithm Applied to Chromatographic Peak Matching Gas Chromatography/Fourier Transform Infrared Mass Spectrometry, Analytical Chemistry 59 (1987) 649-654.

137. Vintsyuk, T.K., Element-wise recognition of continuous speech composed of words from a specified dictionary, Kibernetika 2 (1971) 133-143.

138. Johnson, K.J., Wright, B.W., Jarman, K.H. and Synovec, R.E., High-speed peak matching algorithm for retention time alignment of gas chromatographic data for chemometric analysis, Journal of Chromatography A 996 (2003) 141-155.

139. Eilers, P.H.C., Parametric Time Warping, Analytical Chemistry 76 (2004) 404-411.

140. Nielsen, N.P.V., Carstensen, J.M. and Smedsgaard, J., Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping, Journal of Chromatography A 805 (1998) 17-35.

Page 94: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

94 | Processing and analysis of NMR data

141. Nielsen, N.P.V., Smedsgaard, J. and Frisvad, J.C., Full second-order chromatographic/spectrometric data matrices for automated sample identification and component analysis by non-data-reducing image analysis, Analytical Chemistry 71 (1999) 727-735.

142. Pravdova, V., Walczak, B. and Massart, D.L., A comparison of two algorithms for warping of analytical signals, Analytica Chimica Acta 456 (2002) 77-92.

143. Andersson, F.O., Kaiser, R. and Jacobsson, S.P., Data preprocessing by wavelets and genetic algorithms for enhanced multivariate analysis of LC peptide mapping, Journal of Pharmaceutical and Biomedical Analysis 34 (2004) 531-541.

144. Walczak, B. and Wu, W., Fuzzy warping of chromatograms, Chemometrics and Intelligent Laboratory Systems 77 (2005) 173-180.

145. Westad, F. and Martens, H., Shift and intensity modeling in spectroscopy - general concept and applications, Chemometrics and Intelligent Laboratory Systems 45 (1999) 361-370.

146. Åberg, K.M., Torgrip, R.J.O. and Jacobsson, S.P., Extensions to peak alignment using reduced set mapping: classification of LC/UV data from peptide mapping, Journal of Chemometrics 18 (2004) 465-473.

147. Wehrens, R. and Buydens, L.M.C., Evolutionary optimisation: a tutorial, Trends in Analytical Chemistry 17 (1998) 193-203.

148. Bisiani, R., Search, Beam. in: Encyclopedia of Artificial Intelligence, Ed: Shapiro, S.C., Wiley, New York, (1992) pp. 1467-1468.

149. Idborg-Björkman, H., Edlund, P.-O., Kvalheim, O.M., Schuppe-Koistinen, I. and Jacobsson, S.P., Screening of Biomarkers in Rat Urine Using LC/Electrospray Ionization-MS and Two-Way Data Analysis, Analytical Chemistry 75 (2003) 4784-4792.

150. Plumb, R.S., Stumpf, C.L., Gorenstein, M.V., Castro-Perez, J.M., Dear, G.J., Anthony, M., Sweatman, B.C., Connor, S.C. and Haselden, J.N., Metabonomics: the use of electrospray mass spectrometry coupled to reversed-phase liquid chromatography shows potential for the screening of rat urine in drug development., Rapid Communications in Mass Spectrometry 16 (2002) 1991-1996.

151. Idborg, H., Edlund, P.-O. and Jacobsson, S.P., Multivariate approaches for efficient detection of potential metabolites from liquid chromatography/mass spectrometry data, Rapid Communications in Mass Spectrometry 18 (2004) 944-954.

152. Liu, Y. and Brown, S.D., Wavelet multiscale regression from the perspective of data fusion: new conceptual approaches, Analytical and Bioanalytical Chemistry 380 (2004) 445-452.

153. Roussel, S., Bellon-Maurel, V., Roger, J.M. and Grenier, P., Fusion of aroma, FT-IR and UV sensor data based on the Bayesian inference. Application to the discrimination of white grape varieties, Chemometrics and Intelligent Laboratory Systems 65 (2003) 209-219.

154. Steinmetz, V., Sevila, F. and Bellon-Maurel, V., A methodology for sensor fusion design: Application to fruit quality assessment, Journal of Agricultural Engineering Research 74 (1999) 21-31.

155. Hall, D.L., Mathematical techniques in Multisensor Data Fusion. Artech House, London (1992)

Page 95: Processing and analysis of NMR data - DiVA portalsu.diva-portal.org/smash/get/diva2:197688/FULLTEXT01.pdfProcessing and analysis of NMR data Impurity determination and metabolic profiling

References | 95

156. Eriksson, L., Johansson, E., Lindgren, F., Sjöström, M. and Wold, S., Megavariate analysis of hierarchical QSAR data, Journal of Computer-Aided Molecular Design 16 (2002) 711-726.

157. Wold, S., Kettaneh, N. and Tjessem, K., Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection, Journal of Chemometrics 10 (1996) 463-482.

158. Workman Jr, J.J., Quantification of LDPE [low density poly(ethylene)], LLDPE [linear low density poly(ethylene)], and HDPE [high density poly(ethylene)] in polymer film mixtures “as received” using multivariate modeling with data augmentation (data fusion) and infrared, Raman, and near-infrared spectroscopy, Spectroscopy Letters 32 (1999) 1057-1071.

159. Westerhuis, J.A., Kourti, T. and MacGregor, J.F., Analysis of multiblock and hierarchical PCA and PLS models, Journal of Chemometrics 12 (1998) 301-321.

160. Rännar, S., MacGregor, J.F. and Wold, S., Adaptive batch monitoring using hierarchical PCA, Chemometrics and Intelligent Laboratory Systems 41 (1998) 73-81.

161. Rutledge, D.N., Barros, A.S. and Giangiacomo, R., Interpreting near infrared spectra of solutions by outer product analysis with time domain - NMR, Magnetic Resonance in Food Science: a view to the future. - (Special Publication - Royal Society of Chemistry) 262 (2001) 179-192.