protein concentration is not an absolute prerequisite for the determination of secondary structure...
TRANSCRIPT
ANALYTICAL
Analytical Biochemistry 319 (2003) 114–121
www.elsevier.com/locate/yabio
BIOCHEMISTRY
Protein concentration is not an absolute prerequisite forthe determination of secondary structure from circular
dichroism spectra: a new scaling method
Vincent Raussens, Jean-Marie Ruysschaert, and Erik Goormaghtigh*
Laboratory for Structure and Function of Biological Membranes, Structural Biology and Bioinformatics Center, Free University of Brussels,
Campus Plaine, CP 206/2, Boulevard du Triomphe, Brussels B-1050, Belgium
Received 25 February 2003
Abstract
We present here a simple and rapid method to extract good estimates of protein secondary structure content from circular di-
chroism (CD) spectra without any prior knowledge of the sample concentration. The method involves two steps: first, a single-
wavelength normalization procedure and, second, the application for each secondary structure of a quadratic model based on one or
two wavelength intensities. These quadratic models were derived by a cross-validation analysis of a new protein CD spectrum
database. Tested on CD spectra of proteins at different concentrations, the normalization was shown to render the method virtually
independent of the sample concentration. Further tests on CD spectra not recorded in our laboratory showed that our quadratic
models are of general applicability. Even though the success of the present approach is less than that for currently available
methods, its simplicity and the fact that the concentration is not needed may be very attractive for the study of small amounts of
membrane proteins or peptides for which an accurate concentration determination might be very difficult or impossible to obtain.
� 2003 Elsevier Science (USA). All rights reserved.
Keywords: Circular dichroism; Protein secondary structure; Protein concentration
Circular dichroism (CD) is one of the most widelyused techniques for determining the structure of pro-
teins. Far-UV CD (below 240–250 nm) ellipticity is
highly sensitive to the different secondary structures
found in proteins and polypeptides, each secondary
structure type having bands with characteristic wave-
lengths and intensities. Many mathematical methods
have been devised to extract this structure information
from CD spectra. They are all based on the represen-tation of a spectrum as a linear combination of basis
spectra. The basis spectra are characteristic of the vari-
ous secondary structure elements or a combination of
these. The major methods for extracting information
from spectra are multilinear regression [1], singular va-
lue decomposition [2], ridge regression [3], principal
component factor analysis [4], convex constraint analy-
* Corresponding author. Fax: +32-26505382.
E-mail address: [email protected] (E. Goormaghtigh).
0003-2697/03/$ - see front matter � 2003 Elsevier Science (USA). All rightsdoi:10.1016/S0003-2697(03)00285-9
sis [5], neural network [6], and self consistent method [7](for a review see [8,9]). Due to the sensitivity of band
intensity on the different secondary structures, the use of
these methods requires precise knowledge of the sample
concentration. In practice, this step becomes limiting
when working with small amounts of rare biological
materials. Colorimetric assays (such as Lowry et al. [10],
Bradford [11], or bicinchoninic acid [12]) are not accu-
rate enough because most of them depend (at least inpart) on specific amino acid content of the protein
studied, and this content can vary from the usually used
standard (BSA).1 This discrepancy between the calcu-
lated and the actual concentration can be sometimes as
high as several dozens percent [13], especially with
membrane proteins or small peptides. Probably, the
1 Abbreviations used: BSA, bovine serum albumin; RaSP, rationally
selected proteins; PC, principal component; PCA, principal component
analysis.
reserved.
V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121 115
most used method for protein concentration determi-nation is the measurement of the solution absorbance at
280 nm [14]. Yet, this technique needs a precise knowl-
edge of the protein extinction coefficient in native con-
ditions and might be perturbed by light scattering,
especially for membrane proteins in the presence of
micelle-forming detergent or in the presence of proteo-
liposomes. In insoluble proteins, such as prions or am-
yloid, thin films or gels are examined but concentrationscannot be precisely defined. The most accurate concen-
tration determination is certainly quantitative amino
acid analysis, but the technique is not always accessible
and requires a relatively large amount of material. Al-
ternatively, methods directly related to the amide bond
content like Biuret or to the total nitrogen content can
be utilized but these methods also require large amount
of material because they are not sensitive. Furthermore,they are also subjected to interference from different
agents quite commonly used in protein study (e.g., re-
ducing agents for the Biuret).
We investigated here the possibility of using a simple
normalization of the CD spectra that does not require
knowledge of the protein concentration. Even though
the success of the present approach is less than that for
current methods that require the input of protein con-centration, the fact that the concentration is not needed
may be very attractive for the study of small amounts of
materials, for membrane proteins and peptides for
which an accurate concentration determination might be
very difficult to obtain.
For the first time, McPhie [15] presented, in 2001, a
procedure, based on the analysis of an intensive value of
CD spectra, the Kuhn g factor, that does not requireknowledge of the protein concentration. We present
here another approach that is much simpler to use and
reaches the same final accuracy.
Materials and methods
RaSP database
The set of reference proteins used for this study is an
‘‘optimal’’ basis set that is described in another paper
(K.O. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh,
unpublished). It represents a wide range of helix and
sheet fractional content values and 60 different protein
domain folds. Briefly, the set members were chosen from
120 proteins that had been identified as potential refer-ence set proteins through a search of the protein crystal
structure databases CATH [16], SCOP [17–19], and
PDB_SELECT [20,21] and commercial protein sources
(Sigma-Aldrich and Fluka). Proteins were chosen based
on their fold. The final selection was based on other
criteria including available purity as checked by densi-
tometry of SDS–PAGE analyses, crystal structure
quality, nonprotein contaminants, sufficient solubility,and stability. The final set of 50 proteins fully spans
several different ‘‘conformational spaces’’ as described
by CATH, has fractional content in the different sec-
ondary structures, and has distributions of structures
that reflect the natural abundances found in the PDB.
Briefly, the proteins (sorted by helix content as in Fig. 2)
are the following: (1) Trypsin inhibitor (soy bean), (2)
avidin, (3) erabutoxin b, (4) concavalin A, (5) metallo-thionein II, (6) a-hemolysin (alpha-toxin), (7) lectin(lentil), (8) superoxide dismutase (Cu, Zn), (9) immu-
noglobulin gamma, (10) xylanase, (11) trypsinogen, (12)
a-chymotrypsinogen A, (13) carbonic anhydrase, (14)thaumatin, (15) pepsinogen, (16) rennin (chymosin b),
(17) pepsin, (18) trypsin inhibitor (BPTI), (19) ubiquitin,
(20) monellin, (21) ribonuclease A, (22) ricin, (23) pa-
pain, (24) alcohol dehydrogenase, (25) glucose oxidase,(26) ovalbumin, (27) subtilisin BPN0 (nagarse), (28) a-lactalbumin, (29) subtilisin Carlsberg, (30) lysozyme,
(31) penicillin amidohydrolase, (32) DDDD-transpeptidase,
(33) lipoxygenase-1, (34) phosphoglyceric kinase, (35)
peroxidase, (36) dihydropteridine reductase, (37) triose
phosphate isomerase, (38) insulin, (39) cytochrome c,
(40) phospholipase A2, (41) superoxide dismutase (Fe),
(42) glutathione S-transferase, (43) parvalbumin, (44)citrate synthetase, (45) troponin, (46) apolipoprotein
E3, N-terminal domain (residues 1–183), (47) hemo-
globin, (48) ferritin (apo), (49) colicin A, C-terminal
domain, and (50) myoglobin.
Protein secondary structure evaluation from the DSSP
program output
The secondary structure of the RaSP proteins was
determined with the DSSP program [22]. There are eight
assignments made by DSSP. Six are familiar to protein
chemists: a helix (denoted by H), 310 helix (G), p helix(I), b sheet (E), turn (T), and unassigned structure (in-dicated by a blank space in the DSSP program output).
Unassigned structure has been referred to by many
names, such as irregular, other, disordered, random, orcoil. Because of their extremely low frequency, p heliceswere not considered in this study.
CD data collection and processing
All protein preparations were desalted by dialysis or
size-exclusion chromatography. CD spectra were col-
lected on a JASCO J-710 CD spectrometer using filteredprotein solutions in 2mM Hepes, pH 7.2, with an ab-
sorbance of �0.5–0.8 at 192 nm (�0.1mg/ml) in a 0.1-cm cell. Each CD spectrum was the accumulation of
eight scans at 50 nm/min with a 1-nm slit width and a
time constant of 0.5 s for a nominal resolution of 1.7 nm.
Data were collected from 185 to 260 nm.
116 V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121
Linear regression models—cross-validation
We present here a two-step procedure: (1) in the first
step, the search of the best wavenumbers to use for both
the normalization and the regression equations is carried
out; (2) in the second step, the normalization is applied
and the regression equations are applied to any un-
known protein. This step can be carried out easily with a
pocket calculator using the equations reported in Table1. The first ‘‘calibration’’ step is presented in detail here.
In an attempt to build a model describing the
secondary structure content, we used either the elliptic-
ities at selected wavelengths or the PC scores. In the
latter case, principal component analysis was per-
formed.
The simplest model relates the ellipticity at one
wavelength to one secondary structure content. For thesake of simplicity, we consider the helix content in
the following description. The model used is linear in the
ellipticity Ej;ki and the square of the ellipticity E2j;ki ,where j is the spectrum number and ki is the wavelengthindex. E represents here the ellipticity of the rescaledspectra (see below). The model includes a constant, a1,and two proportionality factors (a2 and a3), one for theellipticity and one for the square of the ellipticity. Forthe a helix content, this one-wavelength (ki) model canbe written
1 E1;ki E21;ki1
1
..
. ... ..
.
1 E50;ki E250;ki
0BBBBBBBBB@
1CCCCCCCCCA
�a1a2a3
0@
1A ¼
fhelix 1fhelix 2fhelix 3...
fhelix 49fhelix 50
0BBBBBBB@
1CCCCCCCA: ð1Þ
It can be easily solved for the best constants ak in theleast square sense.For identifying the best wavelength, cross-validation
was carried out. For this purpose, each spectrum was
removed in turn from the database, a model was built
from the remaining spectra, and the removed spectrum
was predicted, yielding a predicted concentration fhelix j.This prediction was repeated for all the spectra of the
series. The standard deviation of the difference between
the predicted values fhelix j and the ‘‘real’’ frefhelix j valueswas used to evaluate the quality of the model. This
Table 1
Predictive models for the main secondary structures
Predictive modelsa
a helix 27:58� 14:46 � E193 � 5:66 � E2193 þ 1:86 � E211 � 14b sheet 8:66þ 11:97 � E196 þ 7:36 � E2196 � 0:80 � E211 þ 15:turn 12:49þ 0:28 � E234 � 0:49 � E2234random 38:9þ 3:14 � E193 � 0:56 � E2193aAll CD spectra were normalized at 207 nm prior to the application of tb Standard deviation obtained after a cross-validation process for each se
standard deviation characterizes the quality of themodel built at wavelength ki.The whole cross-validation procedure was then re-
peated for every wavelength ki of the spectra (everynanometer). The wavelength with the smallest standard
deviation was retained for further building of a more
accurate model containing a second ellipticity value in
an ascending stepwise manner. The selected wavelength
is called k1 below.When PCA decomposition was realized, the scores
where used instead of the ellipticities in the procedure
described above. In the course of the cross-validation
procedure, PCA was carried out for every model.
Ascending stepwise model building
For the ascending stepwise building of the model, thebest wavelength k1 selected in the cross-validation pro-cess described above was retained in the model and a
second one was added as described in Eq. (2) for the ahelix content model.
1 E1;k1 E21;k1 E1;ki E21;ki1
1
..
. ... ..
. ... ..
.
1 E50;k1 E250;k1 E50;ki E250;ki
0BBBBBBBBBBBBBB@
1CCCCCCCCCCCCCCA
�
a1a2a3a4a5
0BBBBBB@
1CCCCCCA
¼
fhelix 1fhelix 2fhelix 3
..
.
fhelix 49fhelix 50
0BBBBBBBBB@
1CCCCCCCCCA
:
ð2Þ
As described before, the cross-validation procedure
was used, the constants a1 to a5 were obtained and usedto predict the secondary structure fhelix j from Eq. (2).
The standard deviation of the difference between the
predicted values fhelix j and the reference frefhelix j valueswas used to evaluate the quality of the new model. The
cross-validation procedure was repeated for everywavelength ki of the spectra. The wavelength with thesmallest standard deviation is retained. The model now
contains two wavelengths k1 and k2. The same proce-dure was repeated up to eight times in this study, de-
fining the eight best wavelengths for the description of a
Std Dev. (%)b
:72 � E2211 11.9
38 � E2211 11.1
4.15
10.3
he models. E represents the ellipticity at a given wavelength.
condary structure.
V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121 117
secondary structure. Finally, the entire procedure wasrepeated for all the secondary structures considered.
Normalization
Normalization consisted in multiplying every spec-
trum by a factor such that its ellipticity at a given
wavelength was equal to 1. All wavelengths (every
nanometer) were normalized in turn and the wholeprocess described above was restarted. The normaliza-
tion wavelength yielding the lowest standard deviation
was determined for every secondary structure.
Asynchronous map
Generalized two-dimensional correlation spectra
were calculated according to Noda [23,24] and Sasicet al. [25].
Results
a helix structure
In the first part of the work, we searched for the bestspectral normalization. To select the wavelength that
will give the best normalization, we arbitrarily set the
ellipticity of the RaSP50 CD spectra at 1 at every
wavelength in turn. Linear models describing the a helixstructure content were built with each of these normal-
ized sets of spectra. Each model was obtained by a
combined cross-validation/ascending stepwise approach
(see Materials and methods) and the correspondingstandard deviation was calculated.
Fig. 1 shows the standard deviation as a function of
the normalization wavelength for models using only one
Fig. 1. Evolution of the best standard deviation for the a helix pre-diction in cross-validation for the 50 proteins of the database as a
function of the wavelength at which the CD spectra were normalized.
The model contains one wavelength.
wavelength (Eq. (1)). The best normalization wave-length appears to be around 206–207 nm, with an ad-
ditional region of interest near 245 nm. Such a profile
computed with two to eight wavelength models indicates
that the normalization region, 206–207 nm, remains the
best in all cases (not shown).
In the second part of the work, we investigated the
effect of incorporating an increasing number of wave-
lengths in the construction of the model. Fig. 2 showsthe evolution of the prediction standard deviation for
the determination of the a helix structure content bycross-validation as a function of the number of wave-
lengths included in the model after normalization of all
the spectra at 207 nm. Clearly, the ellipticity at a single
wavelength (193 nm) contains most of the information
necessary to build the model. In Fig. 3, the normalized
spectra have been sorted according to the helix content(inset) and plotted. It is apparent from the rescaled
spectra (Fig. 3) that the region near 190 nm is correlated
to the helical content. After normalization (by a nega-
tive factor for most spectra), the ellipticity decreases as
the helix content increases. Addition of a second wave-
length (211 nm) significantly improves the description of
the helix content (standard deviation �12%). Additionof a third wavelength does not improve the predictionmodel anymore, and further addition results in a deg-
radation of the prediction, indicating that these other
wavelengths do not contain more useful independent
information for describing the helix content but rather
bring in noisy unrelated information. Fig. 4 illustrates
the relation between the actual a helix content and thepredicted one obtained by cross-validation for a model
containing two wavelengths, 193 and 211 nm, after
Fig. 2. Evolution of the standard deviation (%) for the determination
of the a helix structure content by cross-validation as a function of thenumber of added wavelengths to the model. All the spectra were
normalized at 207 nm. The wavelengths at which the best standard
deviations are found are indicated on the curve. The 0 added wave-
length point refer to the standard deviation characterizing the distri-
bution of the helix content in the RaSP database.
ig. 3. Series of spectra sorted according to the a helix content after rescaling at 207 nm. Inset: evolution of the a helix content with the spectrum
118 V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121
F
Fig. 4. Relation between the actual a helix concentration and thepredicted a helix concentration obtained by cross-validation for amodel containing the ellipticities at two wavelengths: 193 and 211 nm.
All the spectra were normalized at 207 nm for building this model. The
model built by linear regression is the central dotted line. The external
dotted lines are drawn at 1 standard deviation. The numbers identifythe spectra in the RaSP database. The circled proteins are troponin
(45); N-terminal apolipoprotein E3(1–183) (46); hemoglobin (47); co-
licin A (47); and myoglobin (50). The predictive model used to obtain
the helix content is described in Table 1.
number.
normalization of the spectra at 207 nm. The standarddeviation of the prediction is 11.9%. It is important to
compare this value to the standard deviation of the ac-
tual content distribution. For the a helix, the spread ofthe helical content in the RaSP50 database is charac-
terized by a standard deviation of 22%. This reference
value is shown in Fig. 2 as the starting point (this would
correspond to a model calculated with 0 wavelength
taken into account). It can be observed in Fig. 4 that a
series of spectra (circled spectra 45, 46, 47, 49, and 50)
with high helix content is predicted with too low a helix
content. We could not find any rational explanation
behind the poor prediction for these five proteins.Decomposition of the spectrum series into principal
components is another way to extract uncorrelated in-
formation (principal components are the eigenvectors of
the spectrum correlation matrix). When the normaliza-
tion/cross-validation process was repeated for building
models from the first three PCs, the best model built was
obtained with a normalization at 207 nm, in perfect
agreement with the previous data. A model using the190–240 nm range and including three PCs described the
helix content with a standard deviation of 11.7% (data
not shown). Interestingly, the series of spectra circled in
Fig. 4 appeared at the same position with respect to the
regression line, underlying the intrinsic lack of capability
for correctly predicting the helix content in these pro-
teins.
b sheet structure
A similar analysis has been carried out for the b sheetstructure. The best normalization wavelength was found
to be 207 nm again (not shown). As was the case for the
a helix, two wavelengths, 196 and 211 nm, are sufficientto describe the information contained in the spectra that
are related to the b sheet structure content. As forthe helix structure, adding more than two wavelengths
Fig. 5. Relation between the actual b sheet concentration and thepredicted b sheet concentration obtained by cross-validation for amodel containing two wavelengths: 196 and 211 nm. The model built
by linear regression is the central dotted line. The external dotted lines
are drawn at 1 standard deviation. The numbers identify the spectrain the RaSP database. The predictive model used to obtain the b sheetcontent is described in Table 1.
Fig. 6. Prediction of the a helix content for hemoglobin (Hemo alpha)and pepsin (Peps alpha) and for the b sheet content for hemoglobin(Hemo beta) and pepsin (Peps beta). The spectra of pepsin and he-
moglobin were recorded for different protein concentrations and dif-
ferent pathlengths as explained in the text. They were normalized at
207 nm before prediction. The arrows indicate the corresponding val-
ues determined for the proteins in the database.
V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121 119
resulted in the poorest predictive models when tested incross-validation mode. The standard deviation of the
prediction is now 11.1% (Fig. 5) (the actual standard
deviation in RaSP50 for b sheet is 17.7%).
Other structures
The other structures defined by DSSP have been
tested in the same way (see Table 1). For turns, randomstructures, and the sums of them, the standard devia-
tions of the prediction are 4, 11, and 12% respectively.
Yet, this is barely 1% better than the reference standard
deviation characterizing the distribution of the struc-
tures in the database. Summing up, the 310 helix and the
a helix did not improve the prediction for the helices ingeneral (not shown).
Precision
The investigation described so far deals with the ac-
curacy of the structure determination from rescaled CD
spectra. Another question is the precision obtained for
data recorded at different protein concentrations and
with different cell pathlengths. To address this question,
an investigator who took no part in the work describedso far conducted a series of dilutions of two test pro-
teins: pepsin (17) and hemoglobin (49). Spectra were
recorded at 0.05, 0.1, 0.5, and 1mg/ml in cells with
pathlengths of 0.1, 0.2, 0.5, and 1mm, yielding 16
spectra for each protein. Fig. 6 shows the predicted a
helix and b sheet contents obtained with the predictivemodels determined previously (Table 1) after normali-
zation at 207 nm. The most extreme conditions (i.e., thetwo most-diluted samples with a pathlengths of 0.1mm
(spectra too noisy) and the most concentrated sample
with a pathlength of 1mm (intensity too high)) were out
of range and removed from the figure. The remaining
spectra, as judged from Fig. 6, allowed a quite precise
determination of the secondary structure content, dem-
onstrating the validity of the normalization procedure.
Detailed inspection of the data reported in Fig. 6 revealthat there is no correlation between the deviation of a
particular measurement and the protein concentration
or the cell pathlength (not shown).
Portability
To establish the portability of the method, we ob-
tained the CD spectra of 18 proteins identical to pro-teins included in the series used for this work (RaSP50
database). The 18 CD spectra were extracted from a 42-
protein database combining spectra from different ori-
gins (for a description, see [26,27]). These spectra were
rescaled at 207 nm as previously described and a helixand b sheet structures were predicted with the predictivemodels (Table 1). The prediction for both structures was
identical to the value obtained from our own databasewith standard deviation of 7% (not shown). It can
therefore be concluded that the application of the pa-
rameters determined with the RaSP database used in
this work can be transferred to other spectra recorded
under completely different conditions with reasonably
good success.
Fig. 7. Asynchronous correlation map of the spectra series presented in
Fig. 2 after rescaling at 207 nm. For equally weighting the spectral
variations at each wavelength, normalization of the data across the
spectra had been realized for each wavelength before computation.
120 V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121
Discussion
All the current methods available for protein CD
spectra analysis need an accurate protein concentration
determination prior to any calculation. This step is
crucial to obtain good and valuable results. In practice,
the protein concentration determination with the accu-
racy needed by these methods is not always easy to
obtain for reasons briefly described in the introduction.In some cases, the protein concentration of the sample is
almost impossible to assess. Available quantities can be
very small, the proteins or peptides can be difficult to
handle for quantitative measurements (e.g., membrane
and hydrophobic proteins or peptides, proteins that
aggregate such as prion, b-amyloid peptide, etc., andproteins that have to be analyzed in gels or thin films).
In an attempt to overcome this problem, we present herea simple and rapid method to obtain rather good esti-
mates of the secondary structure contents without prior
knowledge of the sample concentration.
The method involves two steps: first a normalization
procedure at one wavelength and second the application
of a quadratic model equation including one or two
wavelength intensities.
It is important to assess the rationale behind theprocess described above. It appears that at least three
wavelengths are correlated with the secondary structure
content and are independent. The first one is the nor-
malization wavelength (207 nm). It is indeed necessary
that a good correlation exists between the normalization
wavelength and the structure content. The absence of
such a correlation would result in normalized spectra in
which the information relative to the structure is de-stroyed. The first wavelength included in the model must
obviously bring information that is independent of the
information contained in the wavelength used for nor-
malization. Once a first wavelength is added, the infor-
mation content at any wavelength that is correlated with
it becomes zero. The cross-validation procedure indi-
cates that, in the cases of a helix and b sheet, there isalso a second unrelated wavelength that can be includedin the model. The intensity at this second added wave-
length has to be uncorrelated with the first one and with
the normalization wavelength. To test this, we show in
Fig. 7 the asynchronous correlation map for the spectra
normalized at 207 nm. A major asynchronous correla-
tion was found between the spectral regions around 193
and 211 nm in good agreement with the selected wave-
lengths in the a helix and the b sheet predictive models(Table 1). This observation strengthens the selection of
the wavelength obtained from Fig. 2 and validates the
approach presented here. Altogether, it seems that three
unrelated types of information in a CD spectrum can
be used to describe and predict protein secondary
structures, one, probably the most useful one, is used for
normalization. The loss of information with respect to
concentration-normalized spectra is probably thereforereally significant.
Prediction of a helix content results in a standarddeviation of 11.9%. Yet, a group of proteins appear
specifically ill predicted in Fig. 4, namely troponin (45),
N-terminal apolipoprotein E3(1–183) (46), hemoglobin
(47), colicin (49), and myoglobin (50). It can be hy-
pothesized that useful information for predicting these
proteins with high helix content has been lost in thenormalization process. This is particularly striking in
view of the slightly better result (standard devia-
tion¼ 11.1%) achieved for the b sheet structure. Theother structures defined by DSSP (turns, 310 helices,
random) were predicted with standard deviations of 4, 3,
and 11, respectively, i.e., barely better than the standard
deviations of the distribution of these structures in the
database. Consequently, summing the a and 310 helixcontents did not improve the prediction (not shown).
The analysis of a series of spectra of pepsin and he-
moglobin recorded at different concentrations with dif-
ferent pathlengths demonstrates that the results
obtained are basically independent of the protein con-
centration, provided that the spectra are of sufficient
quality. This confirms the validity of our normalization
procedure.Finally, the analysis of spectra recorded and pub-
lished by others [26,27] using our predictive models in-
dicates that the simple equations (Table 1) that we
derived from the analysis of our new rationally selected
protein database are widely applicable.
Recently, McPhie [15] presented a procedure, based
on the analysis of an intensive value of CD spectra, the
V. Raussens et al. / Analytical Biochemistry 319 (2003) 114–121 121
Kuhn g factor, that for the first time does not requireknowledge of the protein concentration. We presented
here another approach that reaches the same final ac-
curacy. These two approaches are definitely nonoptimal,
compared to the most accurate methods available, but
those accurate methods all require a highly precise
knowledge of the sample concentration. Therefore,
McPhie�s method and ours can represent very usefulassessments of protein secondary structure when thisknowledge is lacking. In addition, we believe that our
method is much simpler to use. It requires only a nor-
malization of the data at 207 nm and the application of
predictive model equations reported in Table 1. This can
be easily done with a pocket calculator.
References
[1] N. Greenfield, G.D. Fasman, Computed circular dichroism
spectra for the evaluation of protein conformation, Biochemistry
8 (1969) 4108–4116.
[2] J.P. Hennessey, W.C. Johnson, Information content in the
circular dichroism of proteins, Biochemistry 20 (1981) 1085–
1094.
[3] S.W. Provencher, J. Gl€oockner, Estimation of globular protein
secondary structure from circular dichroism, Biochemistry 20
(1981) 33–37.
[4] R. Pribi�cc, Principal component analysis of Fourier transforminfrared and/or circular dichroism spectra of proteins applied in a
calibration of protein secondary structure, Anal. Biochem. 223
(1994) 26–34.
[5] A. Perczel, M. Hollosi, G. Tusnady, G.D. Fasman, Convex
constraint analysis: a natural deconvolution of circular dichroism
curves of proteins, Protein Eng. 4 (1991) 669–679.
[6] G. Bohm, R. Muhr, R. Jaenicke, Quantitative analysis of protein
far UV circular dichroism spectra by neural networks, Protein
Eng. 5 (1992) 191–195.
[7] N. Sreerama, R.W. Woody, A self-consistent method for the
analysis of protein secondary structure from circular dichroism,
Anal. Biochem. 209 (1993) 32–44.
[8] N. Greenfield, Methods to estimate the conformation of proteins
and polypeptide from circular dichroism data, Anal. Biochem. 235
(1996) 1–10.
[9] S.Y. Venyaminov, J.T. Yang, in: G.D. Fasman (Ed.), Circular
Dichroism and the Conformational Analysis of Biomolecules,
Plenum Press, New York, 1996, pp. 69–107.
[10] O.H. Lowry, N.J. Rosebrough, A.L. Farr, R.J. Randall, Protein
measurement with the Folin phenol reagent, J. Biol. Chem. 193
(1951) 265–275.
[11] M. Bradford, A rapid and sensitive method for the quantitation of
microgram quantities of protein utilizing the principle of protein–
dye binding, Anal. Biochem. 72 (1976) 248–254.
[12] P.K. Smith, R.I. Krohn, G.T. Hermanson, A.K. Mallia, F.H.
Gartner, M.D. Provenzano, E.K. Fujimoto, N.M. Goeke, B.J.
Olson, D.C. Klenk, Measurement of protein using bicinchoninic
acid, Anal. Biochem. 150 (1985) 76–85.
[13] W.H. Peters, A.M. Fleuren-Jakobs, K.M. Kamps, J.J. de Pont,
S.L. Bonting, Lowry protein determination on membrane prep-
arations: need for standardization by amino acid analysis, Anal.
Biochem. 124 (1982) 349–352.
[14] C.N. Pace, F. Vajdos, L. Fee, G. Grimsley, T. Gray, How to
measure and predict the molar absorption coefficient of a protein,
Protein Sci. 4 (1995) 2411–2423.
[15] P. McPhie, Circular dichroism studies on proteins in films and in
solution: estimation of secondary structure by g-factor analysis,
Anal. Biochem. 293 (2001) 109–119.
[16] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells,
J.M. Thornton, CATH—a hierarchic classification of protein
domain structures, Structure 5 (1997) 1093–1108.
[17] T.J.P. Hubbard, A.G. Murzin, S.E. Brenner, C. Chothia, SCOP: a
structural classification of proteins database, Nucleic Acids Res.
25 (1997) 236–239.
[18] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, SCOP: a
structural classification of proteins database for the investigation
of sequences and structures, J. Mol. Biol. 247 (1995) 536–540.
[19] G.J. Barton, SCOP: structural classification of proteins, Trends
Biochem. Sci. 19 (1994) 554–555.
[20] U. Hobohm, C. Sander, Enlarged representative set of protein
structures, Protein Sci. 3 (1994) 522–524.
[21] U. Hobohm, M. Scharf, R. Schneider, C. Sander, Selection of
representative protein data sets, Protein Sci. 1 (1992) 409–417.
[22] W. Kabsch, C. Sander, Dictionary of protein secondary structure:
pattern recognition of hydrogen-bonded and geometrical features,
Biopolymers 22 (1983) 2577–2637.
[23] I. Noda, Generalized 2-dimensional correlation method applicable
to infrared, Raman, and other types of spectroscopy, Appl.
Spectrosc. 47 (1993) 1329–1336.
[24] I. Noda, Determination of two-dimensional correlation spectra
using the Hilbert transform, Appl. Spectrosc. 54 (2000) 994–999.
[25] S. Sasic, A. Muszynski, Y. Ozaki, New insight into the mathe-
matical background of generalized two-dimensional correlation
spectroscopy and the influence of mean normalization pretreat-
ment on two-dimensional correlation spectra, Appl. Spectrosc. 55
(2001) 343–349.
[26] N. Sreerama, S.Y. Venyaminov, R.W. Woody, Estimation of
protein secondary structure from circular dichroism spectra:
inclusion of denatured proteins with native proteins in the
analysis, Anal. Biochem. 287 (2000) 243–251.
[27] N. Sreerama, R.W. Woody, Estimation of protein secondary
structure from circular dichroism spectra: comparison of CON-
TIN, SELCON, and CDSSTR methods with an expanded
reference set, Anal. Biochem. 287 (2000) 252–260.