using artificially generated spectral data to improve protein secondary structure prediction from...

7

Click here to load reader

Upload: mete-severcan

Post on 26-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

www.elsevier.com/locate/yabio

Analytical Biochemistry 332 (2004) 238–244

ANALYTICAL

BIOCHEMISTRY

Using artificially generated spectral data to improveprotein secondary structure prediction from Fourier

transform infrared spectra of proteins

Mete Severcana,*, Parvez I. Harisb, Feride Severcanc

a Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara 06531, Turkeyb Faculty of Health and Life Sciences, De Montfort University, The Gateway, Leicester, LE1 9BH, UK

c Department of Biology, Middle East Technical University, Ankara 06531, Turkey

Received 8 February 2004

Available online 28 July 2004

Abstract

Secondary structures of proteins have been predicted using neural networks from their Fourier transform infrared spectra. To

improve the generalization ability of the neural networks, the training data set has been artificially increased by linear interpolation.

The leave-one-out approach has been used to demonstrate the applicability of the method. Bayesian regularization has been used to

train the neural networks and the predictions have been further improved by the maximum-likelihood estimation method. The net-

works have been tested and standard error of prediction (SEP) of 4.19% for a helix, 3.49% for b sheet, and 3.15% for turns have been

achieved. The results indicate that there is a significant decrease in the SEP for each type of structure parameter compared to pre-

vious works.

� 2004 Elsevier Inc. All rights reserved.

Keywords: Protein; Secondary structure prediction; Neural networks; FTIR; Spectroscopy

Fourier transform infrared (FTIR)1 spectroscopy is

being increasingly used for investigating protein struc-

ture and stability (for a recent review, see [1]). FTIR

spectra of proteins contain valuable information about

the structure of proteins. Different conformational

types such as helix, sheet, turns, etc. result in different

absorption bands which are usually broad and overlap-

ping. Unfortunately, there are still problems with accu-rate quantification of protein secondary structure from

FTIR spectra. Various techniques such as curve fitting

[2], partial least squares analysis [3], factor analysis [4],

and neural networks [5] have been used to predict sec-

0003-2697/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.ab.2004.06.030

* Corresponding author. Fax: +90-312-210-1261.

E-mail address: [email protected] (M. Severcan).1 Abbreviations used: FTIR, Fourier transform infrared; NN,

neural network; SEP, standard error of prediction; DCT, discrete

cosine transform; pdf, probability density function.

ondary structure of proteins from their infrared spec-

tra. All of these techniques have varying degrees of

advantages and disadvantages. In a previous study

[6], we proposed a neural network (NN) method that

is able to provide predictions better than previously

used methods. In that work, we search for a neural net-

work model that performs best for all the validation

vectors taken one at a time from a collection of train-ing vectors.

Recently, one of us [7,8] proposed two new NN im-

plementations which provide improved results. In the

first study, it was revealed that providing the neural net-

work analysis with only part of the amide I region from

empirically determined structure-sensitive regions in

combination with appropriate preprocessing of the spec-

tral data produced better results. This led to an SEP of4.47% for a helix, an SEP of 6.16% for b sheet, and an

SEP of 4.61% for turns. The second study incorporated

Page 2: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244 239

an automatic amide I frequency selection procedure uti-

lizing a genetic algorithm. The corresponding SEP val-

ues for this method are 4.58, 5.72, and 4.42. The

highly limited data set available is the main problem

in obtaining a NN with good generalization properties

and small prediction errors. In this study, we extendour previous work by increasing the size of the training

data set artificially, by using Bayesian regularization to

train the neural networks which minimizes a combina-

tion of squared errors and weights, and by using maxi-

mum-likelihood estimation to improve the predictions

further. The SEP values obtained in this case are

4.19% for a helix, 3.49% for b sheet, and 3.15% for

turns. After presenting the method and the results, wediscuss the advantages and limitations of the NN meth-

ods for predictions of secondary structure of proteins

from FTIR spectra.

Materials and methods

Data set

We have used the same data set that we used previ-

ously, taken from the work of Lee et al. [4] who used

them to test their factor analysis method. The data set

contains FTIR spectra of 18 water soluble proteins re-

corded in water. The secondary structure of these pro-

teins are known from X-ray crystallography. The

details about the sources of the proteins, sample prepa-rations for infrared spectroscopy, etc. can be found in

Lee et al. [4]. Details about FTIR spectroscopic analysis

of citrate synthase have been reported by Severcan and

Haris [9].

Neural network

A NN is a nonlinear parameterized mapping from aninput vector x to an output y = g (x; w), where the pa-

rameter vector w consists of the connection weights of

the NN. It can be trained to perform regression and

classification tasks. The weights w are adjusted during

training in such a way as to minimize a cost function

which is a function of the error e = t � g (p; w), where

t is the known target value for a given training input vec-

tor x = p.In this work, target values are the structure parame-

ters obtained from X-ray crystallography while the in-

put vectors are the infrared absorption values

measured at 101 points from 1600 to 1700cm�1, namely

the amide I band. The FTIR spectral shape within the

amide I band is mainly determined by the secondary

structure content and the contribution to the prediction

of structure parameters of the other bands outside am-ide I band is not significant [7]. The highly limited size

of the training data set constitutes the most important

problem encountered in this work. FTIR spectra of only

18 proteins and their corresponding secondary structure

parameters are currently available to us. Using the

leave-one-out method for testing the performance of

the NN model, only at most 17 sets of data are available

for training the NN. This constraint forces the use of aNN with a few hidden units and a restricted number of

connection weights. Therefore, a data reduction tech-

nique should be applied to the spectral data. For this

purpose we have employed discrete cosine transform

(DCT) [10]. DCT is a good approximation to the Karh-

unen–Loeve transform by which it is possible to repre-

sent a signal with the least number of coefficients for a

specified amount of error in a statistical sense. The mag-nitude of a DCT coefficient is an indication of the con-

tent of a particular frequency in the signal. Features of

the signal with sharp variations require the inclusion

of higher frequencies in the DCT representation, imply-

ing more DCT coefficients. In this work we have taken

the DCT of the FTIR spectra and determined the num-

ber of coefficients to be used by optimizing the NN. De-

pending on the protein structure this number variesbetween 10 and 20 in our case. The limited size of the

training data set also raises the problem of generaliza-

tion: a neural network is expected to make good predic-

tions when data not present in the training set are used

as an input. To improve the generalization ability, the

algorithm that we have developed uses (1) an artificially

increased data set, (2) training with Bayesian regulariza-

tion which minimizes a combination of squared errorsand weights and determines the correct combination to

produce a NN which generalizes well, and (3) maxi-

mum-likelihood estimation to determine the most likely

prediction from the conditional histogram of predic-

tions, as described below.

We have modeled each structure parameter by a dif-

ferent neural network. In the preprocessing step, first the

DCTs of the normalized spectra of K proteins (K = 18)are obtained. Then, the first L coefficients of DCT are

retained, forming length L training vectors pi,

i = 1, . . .,K. The target values ti, i = 1, . . .,K, are the cor-responding known structure parameters. Fig. 1 illus-

trates the basic steps of the NN training and testing

algorithm. To test the performance of the method the

leave-one-out method has been used, removing one ele-

ment of the protein data in turn, as illustrated by theparallel paths in Fig. 1. After removing one element of

the protein data, the size of the data set is artificially in-

creased by averaging pi with pi + 1, pi with pi + 2, pi with

pi + 3, and the corresponding target values ti with ti + 1, tiwith ti + 2, ti with ti + 3, in a cyclic manner. In this way

the number of training vectors are increased to 68. Then

a NN is trained using a Bayesian regularization back-

propagation algorithm [11,12]. The trained NN is testedwith the left out protein data and a prediction is ob-

tained. Since the training algorithm starts with random

Page 3: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

Fig. 1. Basic training algorithm.

240 M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244

initial weights, this prediction is a random variable, and

therefore a maximum-likelihood estimate of this vari-

able can be obtained from the most likely value of the

conditional probability density function (pdf) of the pre-

diction conditioned on the target value being estimated.The conditional pdf can be obtained numerically ap-

proximately from the histogram of the predictions.

For this purpose, the training step is repeated with dif-

ferent initial weights a large number of times (200 in

our case), and a histogram is obtained. The most likely

value of this conditional histogram, i.e., the prediction

where the histogram is maximum, gives the maximum-

likelihood estimate. In the final step, standard error ofprediction defined by the square root of the mean square

value of the prediction errors is computed.

The SEP of the training algorithm described above

depends on the structure of the neural network and its

parameters. The number of inputs, i.e., the number of

DCT coefficients used, the number of units in the hidden

layer, and the sum squared error (performance goal) of

the NN have to be optimized. Since the spectral shapesare highly dependent on the contents of different protein

secondary structures it is expected that NN models for

different secondary structures will have different number

of inputs and different number of hidden units. For ex-

ample, sharp peaks due to b sheet structure can better be

described by higher frequency components in the DCT

expansion. Consequently, the optimized network hasmore inputs for the case of b sheet content prediction.

This fact explains why lower SEP values can be obtained

when different NN models are designed for different sec-

ondary structures.

Results and discussion

The algorithm described above has been implemented

using the Neural Network toolbox of MATLAB. For

each structure parameter a different neural network

model has been realized. The network parameters have

been optimized to yield the minimum SEP. As an exam-

ple, for the case of NN modeling the b sheet structure,

the minimum value of SEP was obtained for 19 DCT co-

efficients and 2 hidden neurons, as shown in Fig. 2. Op-timization results for a helix have yielded 12 DCT

coefficients and 2 hidden units, while for the NN model-

Page 4: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

Fig. 2. Variation of SEP as a function of number of DCT coefficients

used as input, with number of hidden neurons as a parameter.

M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244 241

ing of turns structure 11 DCT coefficients and 3 hidden

units have been obtained. Table 1 presents the results to-

gether with our previous results for comparison purpos-

es. As can be seen from this table, there is a considerableimprovement in prediction accuracy for all of the three

types of structures. The prediction error has dropped

from 7.7 to 4.19% for a helix, from 6.4 to 3.49% for bsheet, and from 4.8 to 3.15% for turns.

Table 2 shows a comparison of our results with the

recent NN prediction results of Hering et al. [7,8] in

two different studies, using the same data set. As ob-

served, there is a significant reduction in prediction errorfor all three structure parameters, especially for b sheet

and turns contents.

Predicting secondary structure of proteins using neu-

ral networks can be considered a multidimensional

curve-fitting problem. The NN training algorithm tries

to fit a surface to the data points (pi, ti), where the co-

ordinate pi is an N-dimensional vector representing the

FTIR spectrum and the value of the surface at pi is thetarget value, ti, representing the known structure

Table 1

Comparison of the standard error of prediction results of this study with th

% a Helix % b Shee

X-ray Previous study Present study X-ray P

Alcohol dehydrogenase 29 21 25.41 40 3

Trypsin inhibitor 26 38 27.19 45 3

Carbonic anhydrase 16 10 8.70 45 5

Concanavalin A 3 8 7.48 65 5

Chymotrypsinogen 12 18 10.69 49 4

Chymotrypsin 11 15 14.47 50 4

Cytochrome c 49 59 52.18 11 1

Elastase 10 13 12.04 46 5

Hemoglobin 86 75 76.36 0

Insulin 61 50 61.32 15 1

Lysozyme 46 60 48.15 19 1

Myoglobin 88 90 88.30 0

Nuclease 26 16 20.71 37 4

Prealbumin 6 8 12.92 61 5

Papain 28 30 27.26 29 3

Protease 11 17 8.65 57 4

Ribonuclease A 23 19 26.84 46 4

Ribonuclease S 23 17 20.59 53 4

SEP 7.7 4.19

parameter of the ith protein. When the FTIR spectrum

of a new protein is provided, the corresponding vector

p is calculated and fed to the NN that simply reads

out, or interpolates, the structure parameter t from

the surface that the model represents. The accuracy

of the model depends on how well the surface repre-sents the physical phenomenon behind the FTIR mea-

surement process.

One of the factors that affects the accuracy is the

number of FTIR spectra available for training and

how well they cover different types of structural classes.

With a small number of spectra, if the NN is overtrained

to make the training error very small, the surface might

deviate considerably from the actual surface. Therefore,the prediction error may be large when the network is

tested with a new vector p. This is related to the well-

known generalization problem. To improve the general-

ization the complexity of the NN should be low enough.

It has been shown that the required training set size for

no overfitting and good generalization of a feedforward

NN is N � 38nw, where nw is the number of connection

weights [13]. Thus, using the available data set of 18FTIR spectra, it should be possible to train NNs with

two or three hidden units, one output unit, and 13 in-

puts, with good generalization. Based on this assump-

tion, in our previous work we had obtained the

prediction error values mentioned above using resilient

back-propagation training and a form of cross valida-

tion. Clearly, a larger set for training data should

improve generalization. In the absence of originaldata, artificial data generated by averaging different

combinations of input data, as described above, force

the multidimensional surface to pass through linearly

interpolated points. This method helps to reduce predic-

e results of the previous study

t % Turns

revious study Present study X-ray Previous study Present study

8 41.64 19 19 17.44

3 46.20 16 19 18.94

7 45.87 25 22 26.97

9 62.45 22 24 22.24

4 51.93 23 21 21.52

8 47.52 25 23 27.87

4 13.78 22 14 21.62

3 51.06 28 21 21.19

5 3.87 8 13 9.02

9 17.42 12 16 16.29

0 12.10 23 13 17.27

3 0.60 7 13 8.55

2 42.29 23 22 22.02

5 57.05 19 23 18.04

1 35.39 18 20 21.02

8 54.58 18 19 20.10

7 45.79 21 22 16.66

7 52.79 15 23 19.26

6.4 3.49 4.8 3.15

Page 5: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

able 2

omparison of the standard error of prediction results of this study

ith the results of Hering et al. [7,8]

SEP % a Helix SEP % b Sheet SEP % Turns

his study 4.19 3.49 3.15

ering et al. [7] 4.5 6.2 4.6

ering et al. [8] 4.6 5.7 4.4

242 M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244

T

C

w

T

H

H

tion error and error variance as long as the input data

are reliable. The interpolated data do not introduce

any new features to the training set. However, it helps

to regularize the network by forcing its response to be

smoother and less likely to overfit. In fact, when we ap-plied the artificially increased data set for training in our

earlier method, we obtained a SEP of 3.9% for a helix,

however the SEP for b sheet increased slightly to 7.0%

and that of turns decreased slightly to 4.4%. The almost

50% reduction in the SEP for a helix clearly demon-

strates the advantage of incorporating artificial data.

The algorithm that we used in this work is much simpler

and yields much lower SEP on the average. Therefore,we preferred to present this method. However, it should

be stressed that if one of the input data points is in error

(which could simply be due to normalization of the spec-

tra as will be discussed below), averaging of this data

point with other input points may spread the local error

to other regions of the interpolation surface.

The second factor affecting the accuracy of the meth-

od is related to the accuracy of the training data. FTIRspectra are plots of log absorption values as a function

of infrared wavelength. The absorption depends not on-

ly on the protein but also on sample thickness and pro-

tein concentrations, parameters which are often difficult

to control. In addition, subtraction of the background

water absorption is usually done by eye, contributing

additional uncertainty to the data. It is obvious from

multidimensional surface description of the NN that ifthe input data points, i.e., vectors pi, are not correctly

positioned the surface will be distorted and the resulting

predictions will be in error. If it were possible to keep

relative levels of the spectra of different proteins at a

constant value for a predefined concentration and sam-

ple thickness, no such problem would arise. However, in

practice this ideal condition is difficult to realize even in

the same laboratory. Normalization of the spectra helpsto reduce this problem if the proteins have similar prop-

erties. However, to normalize the spectra usually one

has to scale and shift the spectra to satisfy certain con-

ditions. These operations may change the spectral

shapes of proteins in relation to each other and therefore

cause incorrect training of the NN, hence errors in pre-

diction occur. Typical normalizations made in practice

normalize the area under the amide I band while keepingthe absorption value at 1700cm�1 at a fixed value, nor-

malize the absorbance maximum swing in the amide I

band to a constant value, and, in the case of DCT, nor-

malize the transform coefficients of the absorbance spec-

tra with the average value, etc. One can increase the

number of different normalizations by including other

parameters, such as statistical moments. Clearly, each

normalization results in a different set of pi and thereforepredictions may be slightly different. We have presented

in this report predictions using a normalization that fix-

es the absorbance maximum of the spectra in the amide

I band constant, which gave better results than other

normalizations. Which type of normalization is best is

an open question.

Table 3 shows comparison of the absolute value of

the prediction errors of the current method with thoseof the previous method and the methods used by Lee

et al. [4] and Hering et al. [7]. The average absolute

error and the standard deviation of absolute error

have decreased by almost 50% partly due to the

smoothing effect of the newly generated artificial data

and partly due to the training algorithm used. On the

other hand, although the correlation of our previous

work with that of Hering et al. was 0.8, as was report-ed by Hering, the current errors have much less corre-

lation. This can be expected due to the considerable

decrease in the SEP of the present method. However,

the maximum absolute error for the current work oc-

curs for lysozyme and this agrees with the previous

work and with the results of Hering et al. [7]. Lyso-

zyme also has the second largest absolute error in

the results of Lee et al. [4]. Similarly, absolute averageerror for hemoglobin is also large and it agrees with

other results. At the opposite end, average absolute er-

rors for trypsin inhibitor and myoglobin have dropped

dramatically.

To test the effectiveness of the method in predicting

secondary structure content of proteins outside the ori-

ginal 18-protein data set, we trained NNs using all of

the 18 proteins and tested these networks by applyingthe FTIR spectrum of pig citrate synthase as input.

Two sets of NNs were obtained, one set for the extended

training data set and the other for the original data set.

Each NN was separately optimized. The SEP values for

the NNs trained with the original data set were 7.03%

for a helix, 12.00% for b sheet, and 4.72% for turn struc-

tures. The corresponding values for the NNs with the

extended data set were 4.19, 3.49, and 3.15%, respective-ly. Table 4 shows the secondary structure contents as de-

termined by X-ray crystallography and the

corresponding NN predictions. The absolute errors are

within or close to the SEPs obtained. Average absolute

error of structure parameters for NNs trained by the ori-

ginal data set alone is considerably larger than that for

NNs trained by the extended data set. However, this re-

sult should not imply that the prediction errors for anynewly introduced protein will always be within SEP.

SEP is a statistical measure of confidence of the NN.

Page 6: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

Table 3

Comparison of the average absolute prediction errors of the current method with the previous method and the methods of Lee et al. and Hering et al

Current method Previous method Lee et al. Hering et al.

Alcohol dehydrogenase 2.26 3.33 1 2

Trypsin inhibitor 1.78 9 14 5.01

Carbonic anhydrase 3.38 7 7 7.28

Concanavalin A 2.42 4.33 8 1.87

Chymotrypsinogen 1.91 4.33 4.33 3.41

Chymotrypsin 2.94 2.67 1.67 2.2

Cytochrome c 2.11 7 5 6.43

Elastase 4.64 5.67 4.33 5.49

Hemoglobin 4.84 7 10 3.46

Insulin 2.34 6.33 5 4.42

Lysozyme 4.93 11 8 8.68

Myoglobin 0.81 3.67 6.67 2.36

Nuclease 3.85 5.33 2.67 3.99

Prealbumin 3.94 4 10.67 2.93

Papain 3.38 2 4.67 4.21

Protease 2.29 5.33 5.33 5.74

Ribonuclease A 2.80 2 4 1.29

Ribonuclease S 2.29 6.67 5 6

Average 2.94 5.37 5.96 4.27

Standard deviation 1.14 2.38 3.26 2.05

Correlation coefficient 0.28 0.10 0.31

Table 4

Secondary structure predictions for pig citrate synthase, a protein outside the original training set

% a Helix % b Sheet % Turns

X-ray crystallography 56.5 1.4 16.5

NN prediction with extended data set 52.3 3.9 19.1

Absolute error with extended data set 4.2 2.5 2.6

NN prediction with original data set 54.0 14.0 14.4

Absolute error with original data set 2.5 12.6 2.1

M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244 243

Prediction errors larger than SEP are possible although

their probability of occurrence becomes smaller as theprediction error increases.

The significant reduction in the error of prediction as

presented in our study must be viewed with caution. This

is because, in addition to the problems highlighted earli-

er, there are other factors that may influence the accuracy

of the results. Analysis of secondary structure content by

FTIR spectroscopy, using methods such as that used by

us, depends on protein reference spectra whose second-ary structures have been deduced from X-ray crystallog-

raphy which is a direct method for determining complete

three-dimensional structure of proteins. However, there

is more then one way of defining the beginning and end-

ing of certain secondary structural elements from X-ray

data. Various methods such as those reported by Levitt

and Greer [14], and Kabsch and Sander [15] are used

to quantify the contents of different secondary structuresfrom the X-ray data. Levitt and Greer�s method is based

on dihedral angles and bond distances associated with

the a carbons in conjunction with hydrogen bonding.

In contrast, Kabsch and Sander�s method is based on hy-

drogen-bonding patterns. Often the values of secondary

.

structures deduced by these methods differ significantly

for the same protein. This results in another possiblesource of error in secondary structure predictions using

FTIR spectroscopy. Furthermore, there are specific lim-

itations associated with the various methodologies used

for predicting secondary structure from FTIR spectra

such as curve fitting and multivariate and neural network

analyses. With multivariate and neural network ap-

proaches, the number and diversity of protein FTIR

spectra used in the reference set will also influence thequality of predictions. With curve-fitting techniques, as-

signment of peaks can often be a serious problem in ac-

curate secondary structure estimations. Therefore, one

has to be aware of how these will have an impact in the

error associated with the estimation of secondary struc-

tures from FTIR spectra.

In summary, results presented in this paper demon-

strate that in the absence of sufficient training data, ar-tificially generated data help to increase the prediction

accuracy. However, the ideal solution to the problem

is to sufficiently enrich the reference set by including ori-

ginal spectra of proteins covering the diversity of protein

structural classes.

Page 7: Using artificially generated spectral data to improve protein secondary structure prediction from Fourier transform infrared spectra of proteins

244 M. Severcan et al. / Analytical Biochemistry 332 (2004) 238–244

Acknowledgments

This work has been supported by the Royal Society

of UK through a collaborative travel grant and by the

Scientific and Technical Research Council of Turkey

(Project No. 100E036).

References

[1] P.I. Haris, F. Severcan, FTIR spectroscopic characterization of

protein structure in aqueous and non-aqueous media, J. Mol.

Catal. B-Enzymatic 7 (1999) 207–221.

[2] D.M. Byler, H. Susi, Examination of the secondary structure of

proteins by deconvolved FTIR spectra, Biopolymers 25 (1986)

469–487.

[3] F. Dousseau, M. Pezolet, Determination of the secondary

structure content of proteins in aqueous solutions from their

Amide-I and Amide-II infrared bands: comparison between

classical and partial least-squares methods, Biochemistry 29

(1990) 8771–8779.

[4] D.C. Lee, P.I. Haris, D. Chapman, C.R. Mitchell, Determination

of protein secondary structure using factor-analysis of infrared

spectra, Biochemistry 29 (1990) 9185–9193.

[5] P. Pancoska, V. Janota, T.A. Keiderling, Novel matrix descriptor

for secondary structure segments in proteins: demonstration of

predictability from circular dichroism spectra, Anal. Biochem. 267

(1999) 72–83.

[6] M. Severcan, F. Severcan, P.I. Haris, Estimation of protein

secondary structure from FTIR spectra using neural networks, J.

Mol. Struct. 565-566 (2001) 383–387.

[7] J.A. Hering, P.R. Innocent, P.I. Haris, An alternative method for

rapid quantification of protein secondary structure from FTIR

spectra using neural networks, Spectroscopy 16 (2002) 53–69.

[8] J.A. Hering, P.R. Innocent, P.I. Haris, Atomic amide I frequency

selection for rapid quantification of protein secondary structure

from Fourier transform infrared spectra of proteins, Proteomics 2

(2002) 839–849.

[9] F. Severcan, P.I. Haris, Fourier transform infrared spectroscopy

suggests unfolding of loop structures precedes complete unfolding

of pig citrate synthase, Biopolymers 69 (2003) 440–447.

[10] A.K. Jain, Fundamentals of Digital Image Processing, Prentice-

Hall, Englewood Cliffs, 1986 pp. 150–154.

[11] D.J.C. MacKay, Bayesian interpolation, Neural Comput. 4 (1992)

415–447.

[12] F.D. Foresee, M.T. Hagan, Gauss-Newton approximation to

Bayesian learning, in: Proceedings of the International Joint

Conference on Neural Networks, 1997.

[13] R. Lange, R. Maenner, Quantifying a critical training set size for

generalization and overfitting using teacher neural networks, in:

M. Marinaro, P. Morasso (Eds.), Proceedings of the International

Conference on Artificial Neural Networks (ICANN), London,

GB 1, 1994 pp. 497–500.

[14] M. Levitt, J. Greer, Automatic identification of secondary

structure in globular proteins, J. Mol. Biol. 114 (1977) 181–239.

[15] W. Kabsch, C. Sander, Dictionary of protein secondary structure-

pattern recognition of hydrogen-bonded and geometrical features,

Biopolymers 22 (1983) 2577–2637.