wavelength–wavelength and sample–sample two-dimensional correlation analyses of short-wave...

10
Volume 55, Number 2, 2001 APPLIED SPECTROSCOPY 163 0003-7028 / 01 / 5502-0163$2.00 / 0 q 2001 Society for Applied Spectroscopy Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk SLOBODAN S Ï AS Ï IC ´ and YUKIHIRO OZAKI* School of Science, Department of Chemistry, Kwansei-Gakuin University, Uegahara, Nishinomiya, 662-8501, Japan Short-wave near-infrared (NIR) spectra of raw milk have been an- alyzed in the 800–1100 nm region by two-dimensional (2D) corre- lation spectroscopy. In this study, we have used both the well-known generalized 2D correlation spectroscopy method, which yields cor- relation coef cients among spectral variances on all the wavelength points (wavelength–wavelength correlation), and a novel sample– sample correlation spectroscopy method, which gives correlation co- ef cients among the concentration dynamics of the species in the system. The sample–sample correlation spectroscopy develops cor- relation maps with the samples on the axes. First, a set of 34 spectra was ordered according to increasing fat content, while all other milk components varied freely. Both synchronous and asynchronous wavelength–wavelength correlation maps have shown strong base- line changes despite the use of multiplicative scatter correction as a pretreatment. Bands at 930 and 970 nm due to fat and water, respectively, have been found to be the most signi cant spectral features. The synchronous sample–sample correlation map was cal- culated from the original data matrix and was compared with the outer product of the fat concentration vector. The comparison has revealed that the main spectral variances in the raw milk spectra in the short-wave NIR region are due to fat. The same procedure was repeated for a set of 15 samples that contained a constant fat content and were ordered according to increasing protein content. Poor agreement was found between the outer product of the protein concentration vector and the synchronous sample–sample correla- tion map. This result suggests that the spectral variances in raw milk spectra that have a constant fat content are due not exclusively to proteins but also due to other milk components. Comparisons of partial least-squares (PLS) regression analysis of fat and proteins and both sample–sample and wavelength–wavelength 2D correla- tion analyses of raw milk spectra have been made. Index Headings: Two-dimensional correlation spectroscopy; Partial least-squares regression; PLS; Milk; NIR spectroscopy; Protein; Fat. INTRODUCTION Since it was proposed in 1993, 1 generalized two-di- mensional (2D) correlation spectroscopy has gained wide popularity in a variety of elds in science and technology. One can nd a number of 2D correlation studies on var- ious molecules and materials such as alcohols, polymers, proteins, and biomedical materials. 2–10 In all these studies, spectral variations were investigated through the main outputs of 2D correlation spectroscopy, synchronous and asynchronous spectra. Since the time of its initial ap- pearance, development of the theory of generalized 2D correlation spectroscopy has been relatively slow. In comparison with a large number of application studies, Received 2 June 2000; accepted 16 October 2000. * Author to whom correspondence should be sent. there are only a few studies that are concerned solely with the theory of generalized 2D spectroscopy. 11–13 Quite recently, we have provided a new insight into the calculation of synchronous and asynchronous spec- tra. 14 We have de ned a synchronous spectrum as a con- tinuous array of the scalar products of dynamic vectors of the data matrix that are de ned as the spectral vari- ances on a given variable, depending on the samples. At the same time, an asynchronous spectrum has been rec- ognized as an array of the scalar products of the dynamic vectors in the original data matrix and the matrix which is orthogonal to the original one. In this way, synchro- nous and asynchronous spectra can be calculated on the basis of linear algebra. Moreover, we have proposed a new way of calculating the correlation matrix that contains information directly related to the concentration dynamics of the species under consideration. 15 In conventional 2D correlation analysis, spectra are ordered as columns in the data matrix, and post-multiplying this matrix with its transposed one gives the synchronous spectrum. Our new idea is to calculate a synchronous spectrum by premultiplying the original data matrix with the spectra in columns with its trans- pose. In proposing this method, we have utilized the fact that all the spectral points in the overlapped spectra can be expressed as the matrix product of the spectral char- acteristics and concentrations of the species present in the system, D 5 SC where D is an experimental matrix, S is a matrix with the spectra of pure components in columns, and C is a concentration matrix with concentration pro les of com- ponents in rows. 15 Conventional generalized 2D correla- tion spectra present correlations among spectral features of the components. Synchronous and asynchronous spec- tra give information only about spectral bands that change simultaneously and show the order of these spec- tral changes. They do not offer any explicit information about the concentration of the species. Our new method calculates scalar products of the vectors that present con- centrations of the species, and thus it offers straightfor- ward data about concentration dynamics. The new meth- od provides correlation maps that are considerably sim- pler than conventional 2D correlation maps and are also complementary to them. The resulting correlation spectra have samples on the axes instead of wavelength or wave- number, and they contain correlation coef cients that re- veal relations among concentration changes of the species present in the system. To distinguish between conven- tional generalized 2D correlation spectroscopy having

Upload: slobodan

Post on 03-Oct-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

Volume 55, Number 2, 2001 APPLIED SPECTROSCOPY 1630003-7028 / 01 / 5502-0163$2.00 / 0q 2001 Society for Applied Spectroscopy

Wavelength–Wavelength and Sample–SampleTwo-Dimensional Correlation Analyses of Short-WaveNear-Infrared Spectra of Raw Milk

SLOBODAN SÏ ASÏ IC and YUKIHIRO OZAKI*School of Science, Department of Chemistry, Kwansei-Gakuin University, Uegahara, Nishinomiya, 662-8501, Japan

Short-wave near-infrared (NIR) spectra of raw milk have been an-alyzed in the 800–1100 nm region by two-dimensional (2D) corre-lation spectroscopy. In this study, we have used both the well-knowngeneralized 2D correlation spectroscopy method, which yields cor-relation coef� cients among spectral variances on all the wavelengthpoints (wavelength–wavelength correlation), and a novel sample–sample correlation spectroscopy method, which gives correlation co-ef� cients among the concentration dynamics of the species in thesystem. The sample–sample correlation spectroscopy develops cor-relation maps with the samples on the axes. First, a set of 34 spectrawas ordered according to increasing fat content, while all other milkcomponents varied freely. Both synchronous and asynchronouswavelength–wavelength correlation maps have shown strong base-line changes despite the use of multiplicative scatter correction asa pretreatment. Bands at 930 and 970 nm due to fat and water,respectively, have been found to be the most signi� cant spectralfeatures. The synchronous sample–sample correlation map was cal-culated from the original data matrix and was compared with theouter product of the fat concentration vector. The comparison hasrevealed that the main spectral variances in the raw milk spectrain the short-wave NIR region are due to fat. The same procedurewas repeated for a set of 15 samples that contained a constant fatcontent and were ordered according to increasing protein content.Poor agreement was found between the outer product of the proteinconcentration vector and the synchronous sample–sample correla-tion map. This result suggests that the spectral variances in rawmilk spectra that have a constant fat content are due not exclusivelyto proteins but also due to other milk components. Comparisons ofpartial least-squares (PLS) regression analysis of fat and proteinsand both sample–sample and wavelength–wavelength 2D correla-tion analyses of raw milk spectra have been made.

Index Headings: Two-dimensional correlation spectroscopy; Partialleast-squares regression; PLS; Milk; NIR spectroscopy; Protein;Fat.

INTRODUCTION

Since it was proposed in 1993,1 generalized two-di-mensional (2D) correlation spectroscopy has gained widepopularity in a variety of � elds in science and technology.One can � nd a number of 2D correlation studies on var-ious molecules and materials such as alcohols, polymers,proteins, and biomedical materials.2–10 In all these studies,spectral variations were investigated through the mainoutputs of 2D correlation spectroscopy, synchronous andasynchronous spectra. Since the time of its initial ap-pearance, development of the theory of generalized 2Dcorrelation spectroscopy has been relatively slow. Incomparison with a large number of application studies,

Received 2 June 2000; accepted 16 October 2000.* Author to whom correspondence should be sent.

there are only a few studies that are concerned solely withthe theory of generalized 2D spectroscopy.11–13

Quite recently, we have provided a new insight intothe calculation of synchronous and asynchronous spec-tra.14 We have de� ned a synchronous spectrum as a con-tinuous array of the scalar products of dynamic vectorsof the data matrix that are de� ned as the spectral vari-ances on a given variable, depending on the samples. Atthe same time, an asynchronous spectrum has been rec-ognized as an array of the scalar products of the dynamicvectors in the original data matrix and the matrix whichis orthogonal to the original one. In this way, synchro-nous and asynchronous spectra can be calculated on thebasis of linear algebra.

Moreover, we have proposed a new way of calculatingthe correlation matrix that contains information directlyrelated to the concentration dynamics of the species underconsideration.15 In conventional 2D correlation analysis,spectra are ordered as columns in the data matrix, andpost-multiplying this matrix with its transposed one givesthe synchronous spectrum. Our new idea is to calculatea synchronous spectrum by premultiplying the originaldata matrix with the spectra in columns with its trans-pose. In proposing this method, we have utilized the factthat all the spectral points in the overlapped spectra canbe expressed as the matrix product of the spectral char-acteristics and concentrations of the species present in thesystem,

D 5 SC

where D is an experimental matrix, S is a matrix withthe spectra of pure components in columns, and C is aconcentration matrix with concentration pro� les of com-ponents in rows.15 Conventional generalized 2D correla-tion spectra present correlations among spectral featuresof the components. Synchronous and asynchronous spec-tra give information only about spectral bands thatchange simultaneously and show the order of these spec-tral changes. They do not offer any explicit informationabout the concentration of the species. Our new methodcalculates scalar products of the vectors that present con-centrations of the species, and thus it offers straightfor-ward data about concentration dynamics. The new meth-od provides correlation maps that are considerably sim-pler than conventional 2D correlation maps and are alsocomplementary to them. The resulting correlation spectrahave samples on the axes instead of wavelength or wave-number, and they contain correlation coef� cients that re-veal relations among concentration changes of the speciespresent in the system. To distinguish between conven-tional generalized 2D correlation spectroscopy having

Page 2: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

164 Volume 55, Number 2, 2001

FIG. 1. Short-wave near-infrared spectra of 34 milk samples beforeMSC (A) and after MSC (B).

two wavenumbers axes and this new approach that usestwo sample axes, we call the former wavenumber–wave-number correlation spectroscopy and the latter sample–sample correlation spectroscopy. A detailed backgroundof sample–sample correlation spectroscopy was presentedin our previous paper.15 The � rst two applications of sam-ple–sample correlation spectroscopy were concerned witha two-component model system and temperature-depen-dent near-infrared (NIR) spectra of neat oleic acid.15

In the present study we have applied both wave-number–wavenumber (in general, it should be called var-iable–variable correlation spectroscopy—in this study weuse wavelength as the variable so that we have wave-length–wavelength axes) and sample–sample correlationspectroscopy to the short-wave NIR spectra of raw milk.The purpose of this study is to demonstrate the potentialof the novel method of 2D sample–sample correlationspectroscopy and the power of combining both correla-tion matrices, with the dimensions of the variables (wave-length–wavelength) and with those of the samples (sam-ple–sample). There are two bene� ts in choosing milkspectra as an example for the application of combinedcorrelation spectroscopies. First, the results obtained canbe compared with those obtained in our parallel partialleast-squares (PLS) regression study of the same sys-tem.16 Comparison of the sample–sample correlationanalysis with the well-known regression method canshow the bene� ts of its use and make it more understand-able. Second, NIR spectra of milk have attracted keeninterest but are fairly dif� cult to analyze because of thewater band that dominates the spectra and also becauseof the strong re� ection effects of fat globules, which cre-ate strong baseline changes.17 Finally, the results of theapplication to such a heavily overlapped system can bea benchmark for the future implementation of the com-bined correlation analysis.

EXPERIMENTAL

A detailed procedure for obtaining the milk samplesand recording NIR spectra was described elsewhere.16

All the correlation matrices that appear in this paperwere calculated by 2D Pocha software composed by D.Adachi (Kwansei-Gakuin University) and plotted bycommercial software.

RESULTS AND DISCUSSION

A set of 40 short-wave NIR spectra of milk is shownin Fig. 1A. The spectra are dominated by a broad featureat 970 nm due to a second overtone of the O–H stretchingvibration of water. Baseline � uctuation is also notable.The spectra in Fig. 1A were subjected to multiplicativescatter correction (MSC),18 which accounts for commonampli� cation and offset. MSC pretreatment is highly de-sirable in the analysis of NIR spectra of milk. The spectraof milk after MSC, shown in Fig. 1B, were used for thecalculation of 2D correlation spectroscopies.

Wavelength–Wavelength Correlation Spectroscopyof NIR Spectra of Milk Ordered with Respect to theFat Content. Initially, wavelength–wavelength correla-tion spectroscopy was applied to fat concentration-depen-dent NIR spectral variations of milk. Wavelength–wave-length correlation spectroscopy cannot give meaningful

results if samples are not ordered according to some per-turbation. We selected a set with a total of 34 spectraordered with the fat content, the � rst spectrum having thesmallest concentration of fat and the last spectrum havingthe highest concentration of the set. All the other majormilk components such as proteins and lactose vary freelyamong this spectral set. Figure 2A shows the wave-length–wavelength synchronous spectrum, calculated ac-cording to MM t. There are only two autopeaks at 970and 930 nm in the synchronous spectrum. In the corners

Page 3: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

APPLIED SPECTROSCOPY 165

FIG. 2. The wavelength–wavelength synchronous (A) and asynchro-nous (B) spectra generated from the spectra in Fig. 1B.

of the synchronous spectrum appear strong positive cor-relation coef� cients due to very intensive spectral vari-ances in the regions of 800–890 and 1010–1100 nm. Thisobservation reveals that, after MSC, strong changes ap-pear in the regions out of the water band. The corre-sponding asynchronous spectrum in Fig. 2B shows thepositions where the deviations from the Beer’s law arethe highest. The asynchronous spectrum contains morediscernible information because of the elimination of allthe bands that change simultaneously (the scalar productof the dynamic vectors at the peak positions of bands thatchange in concert is zero). Figure 2 reveals that the stron-gest asynchronicity appears along the region of 1020–1100 nm. No simultaneous change of the spectral data isobserved for that region, the spectral region of 800–840nm, or for the bands at 940, 968, and 1020 nm. In ad-dition, clear cross peaks appear at wavelength coordinates968/940 and 968/876 nm as well as between the regionof 800–840 nm and the bands at 876 and 948 nm.

In this way, the wavelength–wavelength correlationspectra enable us to register the strong bands at 927 and970 nm, the signi� cant baseline changes in the regionsof 800–890 and 1010–1100 nm, and the weak bands at876, 940, and 1020 nm. We estimated the intensity of thebands from the correlation spectra where they appeared.The synchronous spectrum has more intensity than the

asynchronous spectrum, and the bands that appear in theformer carry most of the spectral information.

Now, we must clarify the origin of the bands shown.From which species do they come? In the design we havechosen, there is nothing that could be regarded as a ‘‘per-turbation variable’’. All the major components of milk(fat, proteins, and lactose) change. There is an order inthe change of fat content, but it certainly does not allowus to conclude that the bands that appear in the wave-length–wavelength correlation spectra are due to fat.They could arise from other components of milk and rep-resent combinations of their spectral changes. In our pre-vious 2D correlation study of NIR spectra of milk in the1100–2300 nm region,18 the protein or fat content waskept constant, and ordered change in the fat or proteincontent was considered the ‘‘perturbation variable’’.However, it was an approximation because concentrationchanges in all the other milk components were simplyneglected. All the bands that appeared in wavelength–wavelength correlation maps in that study were assumedto come from fat or protein, and all the spectral changeswere regarded as a combination of contributions from fatand protein. The result needed to be supported by someother facts, and the agreement between the results ob-tained and the literature data was regarded as veri� cationof the correctness of the approach. Apparently, in thatstudy there was no ‘‘formal’’ proof verifying that the au-topeaks and cross peaks are due to fat and proteins.Moreover, this is a general problem that occurs for allthe variable–variable correlation spectra. In the presentcase, the design chosen is quite general and does notallow for the separation of the spectral contributions fromfat and proteins because both of them vary throughoutthe sample set. To overcome this problem, we have ap-plied sample–sample correlation spectroscopy.

Sample–Sample Correlation Spectroscopy of NIRSpectra of Milk Ordered with Respect to the Fat Con-tent. In our PLS regression study of NIR spectra of milk(in the 1100–2300 nm region) it was found that the mainspectral variations after MSC are related to the fat con-tent.18 It seems very likely that the same holds for thespectral region of 800–1100 nm. It means that all themilk spectra can be approximated by an expression, s· ct,where s is a column vector that roughly represents thespectrum of fat, and c is a column vector that representsthe concentration change in fat throughout the samples.Strictly speaking, s is a vector that includes not only thebands due to fat but also the bands that come from theinteractions of fat with water or any other milk species.In this case, we are not able to estimate the spectrum offat, but we know the concentration changes in fat, i.e.,vector c. This vector is shown in Fig. 3A.

Now let us make the outer product, c· c1, of that vector.The three-dimensional representation of the matrix ob-tained is shown in Fig. 3B. One can observe the intensityincrease in the array of lines as one moves from the � rstto the last sample. The corresponding 2D representationis given in Fig. 3C for the sake of visual comparison withthe wavelength–wavelength 2D maps. The highest pointin Figs. 3B and 3C is located at the sample coordinate(34,34) because the fat content is the highest in sample34. The weakest intensity is found at the sample coordi-nate (1,1) because the smallest fat content is in sample 1.

Page 4: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

166 Volume 55, Number 2, 2001

FIG. 4. The three- (A) and two-dimensional (B) representations of thesynchronous sample–sample correlation spectrum generated from thespectra in Fig. 1B.

¬

FIG. 3. The fat concentration vector (A) and the three- (B) and two-(C ) dimensional presentations of the outer product of that vector.

A sample–sample correlation spectrum given by thematrix product M tM is shown in Fig. 4A (three-dimen-sional plot) and Fig. 4B (two-dimensional plot). The sim-ilarities between Figs. 3B and 4A and between Figs. 3Band 4C are striking. A precise comparison between theouter product of the fat concentration vector and the sam-ple–sample synchronous correlation spectrum is possibleby simultaneously plotting auto-scaled slice spectra ex-tracted from the spectra in Figs. 3B and 4A. Such auto-scaled representation of the slice spectra taken along sam-ple 1 from both matrices is shown in Fig. 5. The corre-lation coef� cient between the two lines is rather high,0.890. Figure 5 allows us to conclude that all the majorspectral variances in the spectra in Fig. 1B are really due

Page 5: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

APPLIED SPECTROSCOPY 167

FIG. 5. An autoscaled slice spectrum extracted from the sample–sam-ple synchronous correlation spectrum along sample No. 1 and a slicespectrum of the outer product of the vector from Fig. 3A along thesame sample.

FIG. 6. The asynchronous sample–sample correlation spectrum gen-erated from the spectra in Fig. 1B.

FIG. 7. Some of the milk spectra shown in Fig. 1B after subtractionof the mean spectrum.

to the fat content. The asynchronous 2D sample–samplecorrelation spectrum (Fig. 6) shows only noise, con� rm-ing that only one active concentration vector exists in themilk spectra ordered with the fat content. If the contri-bution from the other species is signi� cant, then the re-sulting asynchronous correlation will not be zero. In thiscase, the asynchronous spectrum represents the scalarproduct between one vector (fat concentration change)and the vector that is nearly orthogonal to it. Before con-sidering the wavelength–wavelength correlation maps inthe light of results of the sample–sample correlation, weattempted sample–sample correlation analysis once morewith the column-centered M t matrix.

Actually, an attempt was made to improve the preci-sion of the sample–sample correlation spectra. The � rststep in the calculation of generalized 2D correlation spec-troscopy is usually the centering along the dynamic vec-tors.1 When such a centering (row centering) is appliedto M t, the spectra remain unchanged, moving only slight-ly downward. The row means are simply mean values ofgiven spectra, and the subtraction of that number decreas-es the value of each spectral ordinate uniformly but doesnot affect the shape of the spectra shown in Fig. 1B. Ifthe spectra were centered along the columns of M t beforethe calculation of the sample–sample correlation matrix,all the bulk and inert parts of the spectra that were dueto water would be removed. Figure 7 shows the milkspectra after the column centering. The sample–samplecorrelation matrix obtained from these spectra now has ashape that is more understandable.

The positive values in the corners (1,1) and (34,34)show the highest autopeaks at the beginning and the endof the sample set, while the most negative values arefound at the corners (1,34) and (34,1). An almost equiv-alent map is obtained by the calculation of the outer prod-uct of the mean centered vector from Fig. 3A. After mean

centering, negative values appear at the samples that con-tain less fat than the average content of fat and positivevalues at the samples with a fat content higher than theaverage. Thus, the resulting outer product must containpositive and negative values. The extreme values in thesample–sample correlation matrix and outer product ma-trix are found in the corners due to the fact that samples1 and 34 contain fat that is (geometrically) located at thelongest distance from the mean. Figure 8 compares au-toscaled slice spectra extracted from the sample–samplecorrelation matrix with those from the outer product ofcentered fat concentration vector. The comparison revealsthat only a slight improvement is achieved. This resultmeans that, although they are visually very different fromeach other, the spectra in Fig. 1B and those in Fig. 7carry the same information about the concentration dy-namics in the system.

The Assignment of the Fat Bands. Since sample–sample correlation spectroscopy has proved that the mainsource of the spectral variations is due to the fat content,we can assign the bands observed in the wavelength–

Page 6: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

168 Volume 55, Number 2, 2001

FIG. 8. An autoscaled slice spectrum extracted from the sample–sam-ple synchronous correlation spectrum along the sample No. 1 and slicespectrum of the outer product of the centered vector from Fig. 3A alongthe same sample.

FIG. 9. A set of 15 spectra of milk samples with constant fat content.

wavelength synchronous and asynchronous maps. Theband at 930 nm is due to a third overtone of the C–Hstretching vibration of fat, while the band at 970 nm isassigned to a second overtone of the O–H stretching vi-bration of water.19 The notable baseline changes near bothends of the spectral regions could be related to the fatcontent, although not in a straightforward way. Namely,if the baseline change comes only from the fat � uctua-tion, then there should not be peaks along the 1100 nmline in the asynchronous spectrum. However, the asyn-chronicity is the strongest along that very line. On theother hand, if the baseline change is not related to the fatcontent, then we could not get such a clear sample–sam-ple correlation result. Consequently, there is a relation-ship between the fat content and baseline level, but thequanti� cation of that relationship is quite dif� cult. Theother bands observed in the asynchronous spectrum areassigned to fat and interactions among fat, water, andproteins. It must be kept in mind that the water banddominates the spectrum and that the proteins vary freelyamong the samples, interacting more strongly with waterthan fat. Hence, the band at 946 nm that shows strikingasynchronous correlation could come from interaction be-tween fat and water. Similarly, the band at 870 nm doesnot show any asynchronicity with the fat band at 930 nm,allowing us to assign it tentatively to another third over-tone of the C–H stretching vibration of fat. The band at1020 nm could be due to a combination mode of C–Hstretching and bending vibrations of CH3 groups of fat;although at nearly the same frequency, a second overtoneof the N–H stretching vibrations of proteins may appear.19

We must emphasize that our interpretation of the asyn-chronous spectrum (Fig. 2B) strongly relies on the factthat, among all the highest coef� cients in the synchro-nous spectrum (Fig. 2A) that covers quite a large part ofthe experimental spectra, only the band at 930 nm is due

to fat, while other important correlation coef� cients comefrom the baseline changes. Hence, the main effect ofasynchronous correlation is not only to eliminate thebands with the same spectral dynamics but also to em-phasize some bands of fat through their asynchronouscorrelation with the regions (e.g., 1020–1100 nm) thatare only partly connected with fat. In the synchronousspectrum, all these bands are too weak to be noticed, butthe asynchronous correlation diminishes the dominationof the baseline changes in the spectra, enabling the ap-pearance of fat bands.

Wavelength–Wavelength Correlation Spectroscopyof NIR Spectra of Milk Ordered with Respect to theProtein Content. A set of NIR spectra chosen for thewavelength–wavelength correlation analysis of proteinsin milk had to be composed in a different way from theset used for the corresponding analysis of fat. In this dataset, the fat content was kept constant. There were a totalof 15 samples with the average fat content of 4.00 wt %and standard deviation of 0.08 wt %. The samples wereordered according to increasing protein content. Figure 9depicts NIR spectra of the set prepared. Again, one cansee a signi� cant change in the baseline. It is interestingto note that a similar design applied to the NIR spectrain the 1100–2300 nm region diminished the baselinechange markedly.18 The shape of the spectra shown inFig. 9 suggests that great dif� culty could be expected inthe protein analysis because the system is not simpli� ed,even though one factor (fat) is excluded from consider-ation.

Figure 10A shows the wavelength–wavelength syn-chronous spectrum obtained from the spectra in Fig. 9.As expected, the main variation of the data is again po-sitioned in the regions of 800–850 and 1020–1100 nm.The important autopeaks are found at 950 and 962 nm,while the band at 840 nm appears through the correlationwith the peak at 950 nm. The situation is similar to thatfound in Fig. 2A. Instead of the bands at 930 and 970nm, the bands appear at 950, 962, and 849 nm. An asyn-

Page 7: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

APPLIED SPECTROSCOPY 169

FIG. 10. The synchronous (A) and asynchronous (B) sample–samplecorrelation spectra calculated from the spectra shown in Fig. 9.

®

FIG. 11. The synchronous sample–sample correlation spectrum (A)calculated from the spectra in Fig. 9, the protein concentration vector(B), and the outer product of the protein concentration vector (C ).

chronous counterpart illustrated in Fig. 10B reveals thatthe strongest cross peaks are again located along the 1100nm line at the wavelength coordinates 1100/936, 1100/968, and 1100/1026 nm as well as in the region of 800–860 nm. In addition, there are cross peaks at 940/968 and940/800 nm. Note that the peaks at 950 and 840 nm donot appear in the asynchronous map.

Sample–Sample Correlation Spectroscopy of NIRSpectra of Milk Ordered with Respect to the ProteinContent. The procedure for the calculation of sample–sample correlation spectra was the same as that for thelast one. Figure 11A shows the 2D sample–sample cor-relation map, and Figs. 11B and 11C present the proteinconcentration vector and 2D presentation of the outerproduct obtained from that vector, respectively. Thecomparison of the auto-scaled slice spectra extractedfrom the spectra in Figs. 11A and 11C without and withcolumn centering of M t are shown in Figs. 12A and12B, respectively. The correlation coef� cients for the

Page 8: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

170 Volume 55, Number 2, 2001

FIG. 12. An autoscaled slice spectra extracted from the synchronouscorrelation map in Fig. 11A along sample No. 1 and from the outerproduct in Fig. 11C before (A) and after (B) centering.

FIG. 13. A slice spectrum along sample No. 3 in the asynchronoussample–sample correlation spectrum calculated from the spectra shownin Fig. 9.

slice spectra in Figs. 12A and 12B show relatively poorcorrelation between the concentration change of proteinand concentration dynamics revealed by the M tM crossproduct matrix. One of the slices from the asynchronoussample–sample correlation spectrum de� nitely shows aspeci� c trend (Fig. 13). Hence, one can conclude that,in the case of protein, there is no dominant contributionfrom the protein to the overall concentration dynamic ofthe system. The concentration variances of some othermilk components, the presence of water, and some phys-ical effects that yield strong light scattering create the

spectral variances together with the protein variation.The system is not as simple as that for the analysis offat, and therefore, we cannot assign the bands as con-� dently in the synchronous and asynchronous spectra toproteins.

The Assignment of the Protein Bands. The band at972 nm is due to an O–H stretching vibration of water,while those at 950 and 962 nm probably come from thatof water interacting with proteins in milk. The band at840 nm may be assigned to a combination of the secondovertone of N–H stretching, bending, and C–N stretchingvibrations.19 The band at 1026 nm is most likely due tothe second overtone of the N–H stretching vibration orthe combination of 2 3 N–H stretching and 2 3 amideI.19

Comparison of the Present Results from the Com-bined 2D Correlation Spectroscopy Analyses with thePrevious Results from the PLS Regression Analysis.The same NIR spectra of milk used in this paper werealso used in our previous study for the PLS regressionanalysis of fat, protein, and lactose in raw milk.16 Thecomparison of the results obtained here with those ob-tained by the PLS regression shows a good agreementfor the band assignment of fat. The � rst PLS loadingweights gave peaks at 930 and 970 nm with strongslopes in the regions of 800–850 and 1020–1100 nm.That loading explained 86% of spectral and 73% of con-centration variances. The second loading weights werefound to explain 13% and 6% of spectral and concen-tration variances, respectively. In the second loading,besides the bands at 930 and 970 nm, additional bandsat 840 and 1020 nm were registered. The third loadingweights were found to be very important for concentra-tion variation (16%) with the new band at 950 nm. Thus,all the bands observed in the present wavelength–wave-length correlation analysis appeared in the � rst threeloadings too, with the clear domination of the � rst load-

Page 9: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

APPLIED SPECTROSCOPY 171

ing that had the same features as the synchronous wave-length–wavelength correlation spectrum. The bands inthe lower loadings are located at the same positions asthose in the asynchronous spectrum. This result provesthat our methodology for reading asynchronous spec-trum is consistent. The good agreement between the out-er product of the fat concentration vector and the sam-ple–sample synchronous correlation matrix is in linewith the fact that the � rst loading explains 73% of con-centration variation while the third one, which has anadditional band at 946 nm between the fat and waterbands, adds an additional 16%.

As for proteins, the � rst two loadings were found tocover 99% of spectral and 22% of concentration vari-ances. All the other four important loadings explainedless than 1% of spectral and 77% of concentration var-iance. The main bands in the � rst two PLS loadingswere observed near 940 and 1020 nm with strong slopesnear the ends of the spectral regions, as in the case offat. The additional important bands for the protein cal-ibration were registered at 906, 926, 950, 972, 1025, and1032 nm. Many of these bands are identi� ed in the syn-chronous and asynchronous wavelength–wavelengthcorrelation maps (Figs. 10A and 10B). However, thereis no clear correspondence between the PLS loadingsand correlation peaks as well as between the concentra-tion variance explained by loadings and sample–samplecorrelation matrices. The reason lies in the different wayof processing spectra. PLS regression utilizes the proteinvector as a ‘‘guiding hand’’ for decomposition of theoriginal data matrix, while correlation analyses are onlytables of correlation coef� cients that are simpli� ed bymeans of orthogonal correlation. Thus, for all the caseswhere one component dominates the data matrix, PLSregression and combined 2D correlation analyses shouldyield essentia lly similar resu lts . The disc repanciesemerge when important data have a small spectral re-sponse or when they are strongly obscured by some oth-er spectral effects, such as the baseline change in thiscase.

However, these discrepancies are mainly due to dif� -culties in plotting variable–variable correlation maps. Ingeneral, all the bands observed in the PLS regressionstudy probably could be identi� ed in the variable–vari-able correlation spectra as well. To achieve such consis-tency, one must set the threshold level for displaying cor-relation peaks at a rather low level. All the bands iden-ti� ed by the regression study must show some mutualcorrelation, and accordingly they should appear in the 2Dsynchronous or asynchronous variable–variable correla-tion spectra. However, the complexity of the correlationmaps is inversely proportional to the threshold level ofpeak selection; the lower the level, the more complicatedthe map. The correlation results for the set of spectraordered with respect to the protein content are a veryillustrative example for the relation between PLS load-ings and variable–variable correlation maps in the casewhere the component of interest shows minor spectralresponse. It is possible to make synchronous and asyn-chronous maps that contain correlation peaks at all thepositions, as in the PLS regression study. Unfortunately,such maps would be extremely complicated and couldnot offer straightforward information. On the other hand,

the PLS regression method offers well-classi� ed infor-mation about the spectral and concentration importanceof particular bands. It makes analysis of species with mi-nor spectral contributions more precise. In turn, the PLSregression method is more complicated than correlationanalyses, and demands strong background knowledge inchemometrics.

CONCLUSION

This paper has presented for the � rst time the appli-cation of a novel method for 2D correlation analysis,sample–sample correlation spectroscopy, to real-worldsamples that show almost identical spectra with baseline� uctuations. The concentration changes in fat and pro-teins in raw milk have been proved by synchronous andasynchronous sample–sample correlation spectra. It hasbeen found that the fat content is of crucial importancefor the spectral variations in short-wave NIR spectra ofmilk. Correlation coef� cients of 0.896 and 0.912 havebeen obtained for one of the slice spectra from the outerproduct of the fat concentration vector and one of theslice spectra from the sample–sample synchronous cor-relation spectrum. On the other hand, a correlation co-ef� cient of only 0.67 has been obtained for the slice spec-trum from the outer product of the protein concentrationvector and that from the synchronous sample–sample cor-relation map generated from a set of spectra with constantfat content. The synchronous and asynchronous wave-length–wavelength correlation maps for the set of spectraordered with respect to the fat content have revealed thebands at 930 and 970 nm as being the most important.The baseline changes near the ends of the spectral regionsinvestigated have been found to be partially related to fat.As for proteins, the baseline changes are poorly relatedto protein content and present an obstacle for the spectralanalysis of proteins. A comparison between the resultsfrom PLS regression and those from the combined useof sample–sample and wavelength–wavelength correla-tion analyses shows good agreement for the spectra or-dered by the fat content and partial accordance with thoseordered by the protein content. The reason for this resultlies in the fact that the spectral responses of proteins inraw milk are very weak due to the strong baseline chang-es, the domination of water bands, and the in� uence ofother milk components.

1. I. Noda, Appl. Spectrosc. 47, 1329 (1993).2. Proceedings of the International Symposium on Two-Dimensional

Correlation Spectroscopy, Y. Ozaki and I. Noda, Eds. (AmericanInstitute of Physics, New York, 1999).

3. I. Noda, Y. Liu, Y. Ozaki, and M. A. Czarnecki, J. Phys. Chem.99, 3068 (1995).

4. Y. Ren, T. Murakami, T. Nishioka, K. Nakashima, I. Noda, and Y.Ozaki, Macromolecules 32, 6307 (1999).

5. Y. Wang, K. Murayama, Y. Myojo, R. Tsenkova, N. Hayashi, andY. Ozaki, J. Phys. Chem. B 102, 6655 (1998).

6. C. P. Schultz, H. Fabian, and H. H. Mantsch, Biospectrosc. 4, 19(1998).

7. N. P. Magtoto, N. L. Sefara, and H. H. Richardson, Appl. Spectrosc.53, 178 (1999).

8. A. Nabet and M. Pezolet, Appl. Spectrosc. 51, 466 (1997).9. M. A. Czarnecki, P. Wu, and H. W. Siesler, Chem. Phys. Lett. 283,

326 (1998).10. L. Smeller and K. Heremans, Vib. Spectrosc. 19, 315 (1999).11. W. Windig, D. E. Margevich and W. P. McKenna, Chemom. Intell.

Lab. Syst. 28, 109 (1995).

Page 10: Wavelength–Wavelength and Sample–Sample Two-Dimensional Correlation Analyses of Short-Wave Near-Infrared Spectra of Raw Milk

172 Volume 55, Number 2, 2001

12. H. Wang and R. Palmer, in Proceedings of the International Sympo-sium on Two-Dimensional Correlation Spectroscopy, Y. Ozaki and I.Noda, Eds. (American Institute of Physics, New York, 1999), p. 41.

13. I. Noda, Appl. Spectrosc. 54, 994 (2000).14. S. SÏ asÏ ic, A. Muszynski, and Y. Ozaki, Appl. Spectrosc., paper in

press.15. S. SÏ asÏ ic, A. Muszynski, and Y. Ozaki, J. Phys. Chem. A, 104, 6380

and 6388 (2000).

16. S. SÏ asÏ ic and Y. Ozaki, Anal. Chem., paper in press.17. S. SÏ asÏ ic and Y. Ozaki, Appl. Spectrosc. 54, 1327 (2000).18. P. Geladi, D. MacDougall, and H. Martens, Appl. Spectrosc. 39,

491 (1985).19. B. G. Osborne and T. Fearn, Near Infrared Spectroscopy in Food

Analysis (Longman Scienti� c and Technical, Harlow, 1986).