an orientation-dependent hydrogen bonding potential improves prediction … · 2008-02-27 · an...

21
An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity and Structure for Proteins and Protein–Protein Complexes Tanja Kortemme a , Alexandre V. Morozov a,b and David Baker a * a Howard Hughes Medical Institute and Department of Biochemistry J-567 Health Sciences Box 357350, University of Washington, Seattle WA 98195-7350, USA b Department of Physics University of Washington Box 351560, Seattle WA 98195-1560, USA Hydrogen bonding is a key contributor to the specificity of intramolecular and intermolecular interactions in biological systems. Here, we develop an orientation-dependent hydrogen bonding potential based on the geometric characteristics of hydrogen bonds in high-resolution protein crystal structures, and evaluate it using four tests related to the prediction and design of protein structures and protein– protein complexes. The new potential is superior to the widely used Coulomb model of hydrogen bonding in prediction of the sequences of proteins and protein–protein interfaces from their structures, and improves discrimination of correctly docked protein – protein complexes from large sets of alternative structures. q 2003 Elsevier Science Ltd. All rights reserved Keywords: hydrogen bond; electrostatics; protein docking; protein design; free energy function *Corresponding author Introduction Hydrogen bonding interactions are abundant in proteins and protein – protein complexes. 1,2 Despite their ubiquitous nature, the relative importance of hydrogen bonds for protein stability and pro- tein – protein recognition has been somewhat controversial. 3,4 Most hydrogen bond donors and acceptors are satisfied in non-surface-accessible parts of proteins. 1,5 However, replacement of buried salt-bridge networks with hydrophobic resi- dues can lead to protein stabilization. 6 There may be no net gain in free energy for hydrogen bond formation in folding and binding, as the formation of hydrogen bonds between protein atoms results in the loss of hydrogen bonds formed with water. 7,8 Thus hydrogen bonds might primarily provide specificity rather than stability to proteins and protein – protein interfaces. 9,10 An accurate energetic description of hydrogen bonding interactions is required for understanding the role of hydrogen bonds in both intra- molecular and intermolecular interactions. How- ever, the physical nature of hydrogen bonds is complex. Ab initio calculations decompose the total energy of a hydrogen bond into several components: electrostatics, polarization, exchange repulsion, charge-transfer and coupling contri- butions, 11,12 and calculation of these terms from first principles is not straightforward for biological macromolecules. Phenomenologically, a hydrogen bond is formed when a positively polarized hydrogen atom (bound to an electronegative donor atom) penetrates the van der Waals sphere of an acceptor atom to interact with its lone pair electrons (or polarizable p-electrons in the case of aromatic rings). This partial covalent character implies a directionality of the hydrogen bond. The observed orientation dependence of hydrogen bonds in crystal structures of small molecules, 13 – 18 and proteins 1,19 – 21 generally supports an orien- tation of the hydrogen towards the lone electron pairs of the acceptor atom. However, the location of the lone pair cannot be simply assumed based on the hybridization of the acceptor, as the hybridi- zation state of the acceptor atom itself is perturbed by hydrogen bond formation, leading to a distor- tion of the original hybridization by mixing with the 1s orbital of the hydrogen. 22,23 This highlights the potential “environment dependency” of hydrogen bonding interactions. Additional prob- lems are posed by polarization effects causing non-additivity in hydrogen bond energetics. Whereas earlier molecular mechanics potentials included explicit hydrogen bonding terms, 24,25 current force fields generally attempt to model the specifics of hydrogen bonds by a combination of Coulomb and Lennard – Jones interactions with 0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved E-mail address of the corresponding author: [email protected] Abbreviation used: rmsd, root-mean-square deviation. doi:10.1016/S0022-2836(03)00021-4 J. Mol. Biol. (2003) 326, 1239–1259

Upload: others

Post on 18-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

An Orientation-dependent Hydrogen BondingPotential Improves Prediction of Specificity andStructure for Proteins and Protein–Protein Complexes

Tanja Kortemmea, Alexandre V. Morozova,b and David Bakera*

aHoward Hughes MedicalInstitute and Departmentof BiochemistryJ-567 Health SciencesBox 357350, University ofWashington, SeattleWA 98195-7350, USA

bDepartment of PhysicsUniversity of WashingtonBox 351560, SeattleWA 98195-1560, USA

Hydrogen bonding is a key contributor to the specificity of intramolecularand intermolecular interactions in biological systems. Here, we developan orientation-dependent hydrogen bonding potential based on thegeometric characteristics of hydrogen bonds in high-resolution proteincrystal structures, and evaluate it using four tests related to the predictionand design of protein structures and protein–protein complexes. The newpotential is superior to the widely used Coulomb model of hydrogenbonding in prediction of the sequences of proteins and protein–proteininterfaces from their structures, and improves discrimination of correctlydocked protein–protein complexes from large sets of alternativestructures.

q 2003 Elsevier Science Ltd. All rights reserved

Keywords: hydrogen bond; electrostatics; protein docking; protein design;free energy function*Corresponding author

Introduction

Hydrogen bonding interactions are abundant inproteins and protein–protein complexes.1,2 Despitetheir ubiquitous nature, the relative importanceof hydrogen bonds for protein stability and pro-tein–protein recognition has been somewhatcontroversial.3,4 Most hydrogen bond donors andacceptors are satisfied in non-surface-accessibleparts of proteins.1,5 However, replacement ofburied salt-bridge networks with hydrophobic resi-dues can lead to protein stabilization.6 There maybe no net gain in free energy for hydrogen bondformation in folding and binding, as the formationof hydrogen bonds between protein atoms resultsin the loss of hydrogen bonds formed withwater.7,8 Thus hydrogen bonds might primarilyprovide specificity rather than stability to proteinsand protein–protein interfaces.9,10

An accurate energetic description of hydrogenbonding interactions is required for understandingthe role of hydrogen bonds in both intra-molecular and intermolecular interactions. How-ever, the physical nature of hydrogen bonds iscomplex. Ab initio calculations decompose thetotal energy of a hydrogen bond into severalcomponents: electrostatics, polarization, exchange

repulsion, charge-transfer and coupling contri-butions,11,12 and calculation of these terms fromfirst principles is not straightforward for biologicalmacromolecules. Phenomenologically, a hydrogenbond is formed when a positively polarizedhydrogen atom (bound to an electronegativedonor atom) penetrates the van der Waals sphereof an acceptor atom to interact with its lone pairelectrons (or polarizable p-electrons in the case ofaromatic rings). This partial covalent characterimplies a directionality of the hydrogen bond. Theobserved orientation dependence of hydrogenbonds in crystal structures of small molecules,13 – 18

and proteins1,19 – 21 generally supports an orien-tation of the hydrogen towards the lone electronpairs of the acceptor atom. However, the locationof the lone pair cannot be simply assumed basedon the hybridization of the acceptor, as the hybridi-zation state of the acceptor atom itself is perturbedby hydrogen bond formation, leading to a distor-tion of the original hybridization by mixing withthe 1s orbital of the hydrogen.22,23 This highlightsthe potential “environment dependency” ofhydrogen bonding interactions. Additional prob-lems are posed by polarization effects causingnon-additivity in hydrogen bond energetics.

Whereas earlier molecular mechanics potentialsincluded explicit hydrogen bonding terms,24,25

current force fields generally attempt to model thespecifics of hydrogen bonds by a combination ofCoulomb and Lennard–Jones interactions with

0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved

E-mail address of the corresponding author:[email protected]

Abbreviation used: rmsd, root-mean-square deviation.

doi:10.1016/S0022-2836(03)00021-4 J. Mol. Biol. (2003) 326, 1239–1259

Page 2: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

refined atomic charges,26 – 29 although explicithydrogen bonding has been used in potentialsapplied successfully to protein design.30,31 Inthe absence of feasible first principle methods,our approach to the improvement of currenthydrogen bonding potentials relies on chemicalintuition and the vast information available inthe protein structure database. This approach isconceptually similar to previous studies ofhydrogen bonds which use information avail-able in databases of small molecule crystalstructures,15,18 but determines the relevant par-ameters for proteins (while the physical principlesgoverning interactions should be transferablebetween different classes of molecules, the detailsmight not be).

We derive a hydrogen bond energy functionbased on geometrical parameters of hydrogenbonds observed in high-resolution protein crys-tal structures. Subsequently, we evaluate thenew hydrogen bonding potential and compareit to a purely electrostatic representation ofpolar interactions using four different tests: therecovery of the native amino acid sequencebased on the structure of proteins (test 1) andprotein–protein complexes (test 2), the discrimi-nation of misfolded from native or near-nativeprotein structures (test 3) and the identificationof correct relative orientations of protein part-ners in protein–protein complexes (test 4). Thefour tests are closely related to the proteindesign problem (test 1, for review see Pokala &Handel32), the protein–protein interface designproblem (test 2), the decoy discrimination pro-blem (test 3)33,34 and the protein docking pro-blem (test 4).35 Our tests demonstrate theusefulness of the database-derived hydrogenbonding function, its superiority to simple effec-tive distance-dependent Coulomb treatments ofelectrostatic interactions in our test cases, andhighlight the importance of continued develop-ment of accurate descriptions of hydrogen bondinginteractions in biological systems.

Results

Derivation of the hydrogen bonding function

Hydrogen bond geometries were derived from aset of 698 crystal structures with a resolution ofbetter than 1.6 A and R-factors of better than 0.25(see Methods). Figure 1 illustrates the four geo-metrical parameters considered: (a) the distancedHA between the hydrogen and acceptor atoms,(b) the angle Q at the hydrogen atom, (c) the angleC at the acceptor atom and (d) the dihedral angleX corresponding to rotation around the acceptor–acceptor base bond in the case of an sp2 hybridizedacceptor (the distribution in the sp3 case is flat).

This analysis requires the explicit placement ofpolar hydrogen atoms, which has been noted tobe important for correct treatment of hydrogenbonding interactions.36 As hydrogen atoms aregenerally not included in the coordinates derivedfrom the crystal diffraction data, polar hydrogenatoms were added in cases where the position ofthe hydrogen atom was defined by the chemistryof the donor group (backbone amide protons,tryptophan indole, histidine imidazole, asparagineand glutamine amide groups and arginine guanidogroup). Standard bond lengths and angleswere taken from the CHARMM19 force fieldparameters.28 Polar hydrogen atoms associatedwith a rotatable bond (serine, threonine and tyro-sine hydroxyl groups and lysine amino group)were not considered for the compilation of statis-tics as they could not be placed without makingassumptions about the hydrogen bond geometry.

Figure 2 shows the distributions of dHA, Q, C andX obtained from the analysis of a total of 11,680side-chain–side-chain and 89,537 backbone–back-bone hydrogen bonds. Backbone hydrogen bondswere treated separately as their geometry wasfound to differ significantly from that of side-chain–side-chain hydrogen bonds, presumablydue to steric constraints imposed by regular sec-ondary structure elements. For the angular distri-butions of backbone–backbone hydrogen bonds,only occurrences with a proton-acceptor distancebetween 1.4 A and 2.6 A were considered. Forside-chain–side-chain hydrogen bonds, angledistributions were collected in two different dis-tance ranges (1.4–2.1 A and 2.1–3.0 A) to takeinto account a shift in hydrogen bond geometryat longer distances observed in high-resolutionprotein structures.37 The distributions shown werecorrected for the differences in volume elementsof the bins (see Figure 2). For the dependenceon the acceptor angle C, separate statistics werecollected for sp2 and sp3 hybridized acceptoratoms to take into account different electronicdistributions around the acceptor atom. The dis-tance distributions for both side-chain–side-chainand backbone–backbone hydrogen bonds show amaximum at around 2.0 A, with slightly shorterdistances observed for side-chain–side-chainhydrogen bonds with an sp2 hybridized acceptor

Figure 1. Schematic representation of the parametersused to describe hydrogen bond geometry. dHA, distancebetween the hydrogen and acceptor atoms; Q, angle atthe hydrogen atom; C, angle at the acceptor atom; X,the dihedral angle given by rotation around theacceptor–acceptor base bond in the case of an sp2 hybri-dized acceptor. A, acceptor; D, donor; H, hydrogen; AB,acceptor base; R1, R2, atoms bound to the acceptor base.

1240 An Orientation-dependent Hydrogen Bonding Potential

Page 3: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Figure 2 (continued on next page)

Page 4: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Figure 2 (legend opposite)

Page 5: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

and backbone–backbone hydrogen bonds inb-sheets, and hardly any hydrogen bonds shorterthan 1.6 A. The angle at the hydrogen atom showsa clear preference for linearity for short side-chain–side-chain hydrogen bonds, whereas forbackbone–backbone hydrogen bonds the distri-bution is shifted to lower angles, presumablycaused by steric constraints of hydrogen bonds inregular secondary structure elements. Differencesbetween side-chain–side-chain and backbone–backbone hydrogen bonds are also observed forthe angle at the acceptor, with maxima at around1208 for side-chain–side-chain hydrogen bondsand shifted to higher angles for the backbone–backbone distributions. The differences betweenthe different hybridization states at the acceptorfor side-chain–side-chain hydrogen bonds aresmall, with a slightly sharper distribution (but notshifted to significantly lower angles) for the sp3

case. A larger difference is seen comparing the geo-metries in short and long hydrogen bonds, withbroader distributions found in the latter case. Thedihedral angle shows fairly flat distributionsexcept for a well-defined peak in the case of helicalhydrogen bonds and other non-b backbone–back-bone hydrogen bonds (mainly involving turns).

Our hydrogen bonding potential consists of adistance-dependent energy term ðEðdHAÞÞ andthree angular dependent energy components(EðQÞ dependent on the angle at the hydrogen,EðCÞ dependent on the angle at the acceptor atom,and EðXÞ dependent on the dihedral angle inhydrogen bonds involving an sp2 hybridizedacceptor) derived from the logarithm of the prob-ability distributions found in the crystal structureanalysis (see Methods). The total hydrogen bondenergy (EHB) was taken to be a linear combinationof the four terms:

EHB ¼ WHB½EðdHAÞ þ EðQÞ þ EðCÞ þ EðXÞ� ð1Þ

where WHB is the relative weight of the hydrogenbonding term with respect to the other terms ofthe energy function (see Methods). The addition ofthe different energy terms assumes an indepen-

dence of the geometric parameters, which appearsto be a reasonable approximation in our dataset.An exception is the distance dependence of theangular distributions for side-chain–side-chainhydrogen bonds, which we approximate usingtwo different distance ranges for collecting angularstatistics. The observed acceptor angle for side-chain–side-chain hydrogen bonds differs signifi-cantly from the cosine-dependence with a maxi-mum at linear angles used in other geometry-dependent hydrogen bonding potentials (seeFigure 2(d)). Other differences are the replacementof the often used 10–12 potential for the distance-dependence by the more sharply peaked database-derived term (see Figure 2), as well as the inclusionof explicit hydrogen atoms for determining thehydrogen bonding geometry.

Testing of the hydrogen bonding function

It is not trivial to demonstrate that a newdescription of a contribution to macromolecularenergetics is an improvement over previousdescriptions because the individual components ofthe free energy cannot readily be measured inde-pendently in experiments. Here, we use a numberof different tests to compare our new treatment toprevious representations. The first two tests arebased on the assumption that the substitution ofthe sequences of monomeric proteins and inter-faces of protein–protein complexes with non-native amino acids on average produces anincrease in free energy over the naturally occurringsequence. This assumption is consistent withextensive mutational data that show that aminoacid replacements are far more often destabilizingrather than stabilizing. The third and fourth testsare based on the assumption that native proteinstructures and protein–protein interfaces arelower in free energy than the vast majority of non-native conformations.38 While it is not necessarythat the individual contributions to the free energy(such as the electrostatic component) all favor thenative structure, it is plausible that given several

Figure 2. Distribution of geometric hydrogen bonding parameters obtained from 698 protein crystal structures.(a) Backbone–backbone hydrogen bonds occurring in a-helices (for secondary structure classification see Methods).(b) Backbone–backbone hydrogen bonds occurring in b-sheets. (c) All other backbone–backbone hydrogen bonds.(d) Side-chain–side-chain hydrogen bonds involving an sp2 hybridized acceptor. (e) Side-chain–side-chain hydrogenbonds involving an sp3 hybridized acceptor. The X distributions are not shown, as they are uniform. Raw countswere corrected for the different volume elements encompassed by the bins (angular correction: sinðangleÞ except forthe X angle; distance correction: (distance)2). Distributions shown are (indicated as label in each graph): hydrogen-acceptor distance dHA, red bars, light blue bars show the comparison with a standard 10–12 potential; angle Q at thehydrogen; angle C at the acceptor (red bars, blue bars in the plot for sp2 hybridized acceptors show the comparisonwith the angular dependency of a dipole–dipole interaction assuming a 1808 angle at the hydrogen and planarity ofthe hydrogen bond); dihedral angle X. Side-chain–side-chain angular distributions were collected for two differentdistance ranges as explained in the text. The Q and C angular distributions for helical backbone–backbone hydrogenbonds show side peaks at angles lower than 1208 which most likely result from 310 conformations at helix termini orother i ¼ i; i þ 3 interactions. These conformations also cause the small peak in the distance distribution at distanceslarger than 2.4 A. The C angular distribution for backbone–backbone hydrogen bonds classified as other has ashoulder around 1108 probably resulting from strained i; i þ 2 interactions that could be omitted in the potential.

An Orientation-dependent Hydrogen Bonding Potential 1243

Page 6: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Figure 3 (legend opposite)

Page 7: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

alternative models of electrostatic interactions, theone which most favors the native sequence andstructure is the most accurate.

Prediction of amino acid identity in monomericproteins (test 1)

The first test is to compare the recovery of thenaturally occurring amino acids in protein designcalculations using the new hydrogen bondingpotential or a Coulomb treatment of electrostatics.In these calculations, side-chains at each sequenceposition in a set of proteins were substituted one-by-one by all amino acids in all the rotamer confor-mations in the Dunbrack backbone-dependentlibrary (see Methods).39 For each of a total of 7308sequence positions, the free energy of all rotamersof all amino acids is determined, and the lowestfree energy amino acid is selected. For each aminoacid type, Figure 3 shows a substitution profile,depicting how often the native amino acid wasfound to be energetically most favorable, and howoften each of the other 18 amino acids (cysteineresidues were excluded because potential disulfidebonds were not modeled) was chosen to be themost favorable replacement. Three different energyfunctions were used in creating these substitutionprofiles to pinpoint the influence of the represen-tation of hydrogen bonds and electrostatic inter-actions to the total free energy: (1) inclusion of allenergy terms (red bars) (see Methods for the com-plete free energy function and parameterization);(2) exclusion of the hydrogen bonding contribution(light blue bars); and (3) exclusion of the hydrogenbonding term and representation of electrostaticand hydrogen bonding interactions instead by aCoulomb potential with a linear distance-depen-dent dielectric constant and a weight adjusted toapproximately match the magnitude of the hydro-gen bonding term used in (1). Clear differencesbetween the three energy functions are observedfor the charged (D, E, R, K) and polar (N, Q, T, S)amino acids (Figure 3(a) and (b)). In all cases, theinclusion of the new hydrogen bonding term isuseful in discriminating the native amino acidtype from others. For all amino acids except gluta-mine the native amino acid is predicted with thehighest frequency (for a sequence position with anative glutamine residue, glutamate is pickedwith a higher probability). Representing hydrogenbonding interactions with a Coulomb term using alinear dielectric constant gives worse results for allamino acids, in some cases selecting a non-nativeamino acid type with the highest frequency. Particu-

larly dramatic examples are glutamate, lysine, andserine, where alanine is preferred frequently ifhydrogen bonds and Coulomb terms are excluded,an effect that can only partially be rescued by repre-senting hydrogen bonding with a strong Coulombterm alone.

For the polar aromatic amino acids (W, Y, H;Figure 3(c)), the differences between the threeenergy functions are small. Presumably the selec-tion is dominated by the packing interactions ofthe large aromatic ring in these cases, althoughusing the hydrogen bonding potential improvesthe discrimination of tyrosine from phenylalanine,and histidine from tyrosine.

Although, as shown in Figure 3, the hydrogenbonding potential provides a significant improve-ment of the recognition of native amino acids forpolar and charged residues, the overall predictionaccuracy for these residue classes is worse thanfor the hydrophobic amino acids A, I, L, V, and Fas well as G and P (for substitution profiles of thehydrophobic amino acids see the distributions forinterfaces in Figure 4 which were similar to thosefor monomeric proteins). This is partially due to alimitation inherent in our test, as one assumptionin our optimization procedure is that the nativeamino acid type is lowest in free energy at eachsequence position. This assumption does notnecessarily hold true for all sequence positions (asthe choice of a certain amino acid type at a certainposition might also have been optimized for func-tion and solubility), and will be more valid foramino acids in the protein core that are pre-sumably selected for stability. The positions ofpolar and charged residues in native structuresare likely to be determined not only by energeticconsiderations tested by our procedure but also byfunctional and solubility constraints (this mightexplain the particularly poor predictions for histi-dine residues, which are often found in activesites due to their pKa value in the experimentalpH range; also, all histidine residues are modeledas neutral in our procedure). Moreover, in particu-lar for the long polar amino acids, limited confor-mational sampling using the rotamer approachmight make it difficult to reach optimal hydrogenbonding geometries required for selecting the nativeamino acid (hydrophobic packing might be lessdemanding). Also, since our free energy functiondoes not contain explicit penalties for cavities(apart from the loss of favorable packing inter-actions) or unsatisfied hydrogen bonding donorsor acceptors, replacement of a larger side-chain bya smaller residue may not be sufficiently penalized.

Figure 3. Recovery of native sequences in single domain proteins. For all sequence positions containing a specificamino acid type, the bars show for all amino acids except cysteine how often each amino acid type is found to be ener-getically most favorable. Substitution profiles are calculated with different energy functions: red bars, complete energyfunction; light blue bars, energy function without the hydrogen bonding term; dark blue bars, energy function withoutthe hydrogen bonding term, but scaling the Coulomb term to be of similar magnitude. (a) Charged amino acids;(b) polar amino acids; (c) polar aromatic amino acids.

An Orientation-dependent Hydrogen Bonding Potential 1245

Page 8: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Figure 4 (legend opposite)

Page 9: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Figure 4. Application of the energy function to the prediction of sequences in protein–protein interfaces. For all sequence positions containing a specific amino acid type, the barsshow for all amino acids except cysteine how often each amino acid type is found to be energetically most favorable. Substitution profiles are calculated with the complete energyfunction including the hydrogen bonding term. (a) Charged amino acids; (b) polar amino acids; (c) polar aromatic amino acids; (d) hydrophobic amino acids; (e) glycine and proline.

Page 10: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Prediction of amino acid identities in protein–protein complexes (test 2)

Electrostatic interactions are thought to beparticularly relevant in molecular recognition.Experimental as well as theoretical studies havehighlighted the role of electrostatic interactionsin determining the rate of association of protein–protein complexes,40 as well as for recognitionspecificity.41 While the role of hydrogen bonds andcharge–charge interactions for specificity appearswell established, their importance for the stabilityof protein–protein complexes is somewhat underdebate.42 The energy function described above wasapplied unchanged to a different dataset of 50binary protein–protein complexes. The rationalewas twofold. First, we wanted to see whether theenergy function is also useful for prediction ofspecificity in protein recognition by testingwhether it performs well in recognizing the nativeamino acid sequence in protein interfaces. Second,as the energy function was parameterized on themonomeric dataset, we used this second set as anindependent test of performance. The substitutionprofiles for a total of 2986 sequence positionslocated in protein interfaces (Figure 4) show that,as in the case of the monomeric protein set, formost positions the most strongly predicted aminoacid is the naturally occurring residue. This

suggests that the potential function should begenerally applicable to the prediction of specificityin protein–protein interactions for whole familiesof protein sequences where a structure is availablefor at least one homologous protein–protein com-plex. If, given the structure of a protein–proteininterface and the sequence of one partner, thesequences of potentially interacting partner can bepredicted, the method can be used to computation-ally subdivide the sequences into interacting andnon-interacting pairs as a guide for furtherinvestigations.

Decoy discrimination for monomeric singledomain proteins (test 3)

Next, we investigated the difference in thehydrogen bond energy of native or native-repacked structures (native backbone coordinateswith modeled side-chains from a rotamer library)and alternative conformations (decoys) generatedusing the ROSETTA ab initio structure predictionmethod.43,44 We use the normalized energy gap(Z-score: energy difference between the native ornear-native structure and the mean of the decoydistribution, divided by the standard deviation)to quantify the signal-to-noise ratio for decoydiscrimination.33 Table 1 shows the Z-scores for

Table 1. Native (Zn) and native repacked (Znr) Z-scores for the monomeric single domain decoy set (SS-secondarystructure class: a-helix, b-sheet) for the following energetic contributions: side-chain–backbone hydrogen bonds (HBsc–bb), side-chain–side-chain hydrogen bonds (HB sc–sc), backbone–backbone hydrogen bonds (HB bb–bb)

PDB code SSHB sc–bb HB sc–sc HB bb–bb Combined HB score

Zn Znr Zn Znr Zn Znr Zn Znr

1a32 a 1.07 0.25 5.98 0.39 3.04 3.04 4.59 3.121ail a 21.75 20.97 5.79 1.18 8.32 8.32 8.22 7.561am3 a 1.79 2.15 0.84 20.41 1.50 1.50 2.39 2.401cc5 a 0.29 3.59 20.30 20.30 21.72 21.72 21.53 20.161cei a 0.17 1.54 6.50 20.51 4.69 4.69 5.80 4.891hyp a 2.56 1.39 20.28 20.28 2.35 2.35 3.30 2.851lfb a 21.11 20.26 1.09 0.26 0.73 0.73 0.45 0.591mzm a 0.44 20.38 20.31 20.31 2.79 2.79 2.79 2.511r69 a 2.22 1.94 3.75 1.85 1.40 1.40 3.36 2.581utg ab 0.55 20.97 2.75 20.41 4.18 4.18 4.80 3.541ctf ab 2.43 21.37 20.43 20.43 5.12 5.12 6.01 4.331dol ab 20.22 2.84 20.42 20.42 1.08 1.08 0.57 2.651orc ab 20.93 20.16 4.68 2.77 3.87 3.87 3.57 3.451pgx ab 0.19 2.21 1.80 20.39 5.28 5.28 4.47 5.331ptq ab 5.77 4.62 2.47 1.05 20.51 20.51 3.18 2.191tif ab 1.05 0.79 7.03 20.48 7.09 7.09 7.09 5.361vcc ab 2.42 2.85 4.00 20.43 3.66 3.66 5.50 4.762fxb ab 20.07 1.09 9.53 9.13 20.11 20.11 2.48 1.885icb ab 2.11 3.07 7.04 20.41 3.80 3.80 5.61 5.031bq9 b 2.30 21.26 3.69 20.30 5.09 5.09 6.37 2.691csp b 21.25 20.38 20.26 20.26 4.67 4.67 2.43 3.151msi b 2.50 0.95 2.88 20.29 2.15 2.15 3.82 2.401tuc b 1.51 2.99 2.50 1.07 3.45 3.45 4.38 5.161vif b 1.88 21.10 3.31 0.30 3.42 3.42 4.47 2.215pti b 20.60 20.25 13.31 20.34 4.29 4.29 6.62 3.11

Mean 1.01 1.01 3.48 0.48 3.20 3.20 4.03 3.34Stdev 1.66 1.72 3.46 1.99 2.33 2.33 2.18 1.63

Combined HB score, generalized linear model fit using the three hydrogen bond scores. Stdev, standard deviation from mean value.Successful discrimination is defined as a Z-score .1.

1248 An Orientation-dependent Hydrogen Bonding Potential

Page 11: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

the hydrogen bonding term of monomeric single-domain proteins, split into side-chain–side-chain,side-chain–backbone (using side-chain–side-chainstatistics, see Methods) and backbone–backbonecontributions. The backbone–backbone contri-bution is found to be the best discriminator of thenative structure versus the decoys (Zn column,Table 1). It alone is capable of successful discrimi-nation for 21 out of the 25 structures (a failure isdefined as a Z-score ,1). If all three hydrogenbonding terms are combined using a generalizedlinear model fit, 22 out of the 25 structures aresuccessfully discriminated. Interestingly, one ofthe failures contains a heme cofactor (1cc5) nottaken into account in either decoy creation or inthe scoring of decoys and the native structures.

We also computed Z-scores for native structureswith all side-chains repacked (native-repacked z-scores) using the same potential as for the decoys.Similar values for side-chain–backbone contri-butions are obtained when the decoys are com-pared to either the native structures (Zn column)or the native-repacked structures (Znr column).45,46

However, there is a noticeable drop in the side-chain–side-chain Z-score for native versus nativerepacked structures, indicating that for some pro-

teins the repacking procedure does not reproducethe precise side-chain–side-chain geometriesobserved in the native structure. This effect can bedue to limitations in rotamer sampling in particu-lar for long polar side-chains, or, alternatively, theenergy function favors the exposure of surfaceside-chains to solvent over the formation of side-chain–side-chain hydrogen bonds.

A further, more challenging test is to be able torank different decoy conformations by their close-ness to the native structure, often represented as acorrelation between the free energy score and theroot-mean-square deviation (rmsd) of the back-bone atoms to the experimentally determinednative protein structure (the decoy discriminationproblem).10,47 – 49 Table 2 shows the Z-scores fordiscriminating near-native decoys from all otherdecoy conformations (Zlrms). Near-native decoysare defined as the lowest 5% of the rmsd distri-bution in the decoys; the cutoff rmsd value variesbetween 1.10 A and 2.84 A for different proteinsin the set. Discrimination is poor in the set, withsuccessful discrimination for only four out of 23proteins both using all hydrogen bonding terms incombination or just considering the backbone–backbone contribution alone (Table 2, Dec. columns).

Table 2. Low rmsd (Zlrms) Z-scores for the monomeric single domain decoy set (SS, secondary structure class: a-helix,b-sheet) for the following energetic contributions: Coulomb electrostatics with a linear distance-dependent dielectricconstant (Coulomb), side-chain–backbone hydrogen bonds (HB sc–bb), side-chain–side-chain hydrogen bonds (HBsc–sc), backbone–backbone hydrogen bonds (HB bb–bb)

PDB code SS

Coulomb(Zlrms)

HB sc–bb(Zlrms)

HB sc–sc(Zlrms)

HB bb–bb(Zlrms)

Combined HBscore (Zlrms)

CombinedHB þ vdW

score (Zlrms)

Dec. PN. Dec. PN. Dec. PN. Dec. PN. Dec. PN. Dec. PN.

1a32 a 0.20 0.14 20.49 20.46 0.05 0.09 1.16 0.59 1.14 0.56 1.02 0.731am3 a 20.39 20.26 0.00 0.14 20.26 20.19 0.43 0.46 0.42 0.47 0.69 0.841bw6 a 20.27 0.08 0.20 0.11 20.04 20.03 0.51 0.10 0.53 0.12 0.66 0.521gab a 0.58 0.48 0.54 0.41 0.39 0.33 0.55 0.03 0.60 0.12 1.13 0.691kjs a 20.01 0.13 20.16 20.13 0.01 0.10 0.32 0.31 0.31 0.29 0.69 0.801mzm a 20.04 0.39 0.01 20.17 20.02 0.05 0.55 1.92 0.56 1.90 0.46 2.061nkl a 20.13 0.41 20.17 20.04 0.02 0.21 0.14 2.25 0.14 2.25 0.20 2.101nre a 1.05 0.67 20.81 20.75 0.17 20.02 1.27 1.81 1.25 1.80 0.90 1.671pou a 0.23 0.47 20.12 20.08 0.40 0.15 0.22 1.69 0.23 1.71 0.37 1.551r69 a 0.80 1.13 0.41 0.68 0.13 0.07 0.02 1.83 0.06 1.91 0.98 2.261res a 20.46 20.43 0.07 0.11 0.08 0.06 0.32 0.10 0.33 0.13 0.43 0.481uba a 20.39 20.07 0.06 0.30 0.11 0.04 0.24 20.17 0.24 20.09 0.13 0.021uxd a 0.26 0.31 20.43 20.45 20.05 0.00 1.14 0.43 1.13 0.36 1.25 0.842ezh a 0.15 20.08 20.26 0.45 0.15 0.00 0.53 1.61 0.52 1.66 0.48 1.382pdd a 0.31 0.33 0.20 20.15 0.43 1.09 0.55 0.37 0.59 0.44 0.89 1.191aa3 ab 0.84 0.36 0.28 0.32 20.06 0.02 20.34 0.61 20.32 0.68 0.42 0.851afi ab 0.84 0.62 20.09 0.30 0.31 0.22 0.63 2.26 0.65 2.25 1.07 2.301ctf ab 0.53 1.20 0.15 20.04 20.14 20.15 20.13 2.75 20.13 2.72 0.41 2.581pgx ab 0.77 1.22 0.27 0.98 20.03 20.26 0.74 2.84 0.76 2.85 0.59 1.922fow ab 0.35 0.15 20.07 20.07 20.09 20.26 20.51 1.51 20.52 1.48 20.14 1.492ptl ab 0.52 0.73 20.21 20.03 20.18 0.24 0.32 1.56 0.29 1.58 0.89 1.641sro b 0.97 2.19 0.19 0.30 0.20 0.31 0.38 0.26 0.41 0.40 0.47 1.481vif b 2.03 1.86 0.23 20.03 0.13 0.22 1.55 1.52 1.57 1.47 1.58 1.29

Mean 0.38 0.52 20.01 0.07 0.08 0.10 0.46 1.16 0.47 1.18 0.68 1.33Stdev 0.58 0.64 0.31 0.38 0.18 0.27 0.49 0.93 0.48 0.90 0.39 0.66

Combined HB score: generalized linear model fit using the three hydrogen bond scores only. Combined HB þ vdW score, general-ized linear model fit using the three hydrogen bond scores and the van der Waals scores. Dec. and PN. denote Z-scores for the originalab initio decoy set and the perturbed-native set (see Methods). Stdev, standard deviation from mean value. Successful discrimination isdefined as a Z-score .1.

An Orientation-dependent Hydrogen Bonding Potential 1249

Page 12: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

Even the addition of a van der Waals term (seeMethods) does not improve discrimination signifi-cantly (five out of 23 structures are successfullydiscriminated). As for the recovery of nativeamino acid types in monomeric proteins (Figure2), a representation of electrostatic interactionspurely using a Coulomb term with a linear dis-tance-dependent dielectric performs worse thanthe hydrogen bonding term, discriminating lowrmsd decoys for only two out of 23 structures inthe set (Table 2). It should, however, be notedthat the overall Z-scores for discriminating near-native from other decoys are low and do notjustify unequivocal conclusions comparing theperformance of the different energy terms.

The hydrogen bonding potential used for decoydiscrimination is relatively short-ranged. Thus, ifthere are few structures close enough to the nativestructure to detect native-like hydrogen bondinginteractions, discrimination is expected to be poor.To test this hypothesis, we repeated the discrimi-nation test for near-native structures with a dif-ferent decoy set created by perturbations startingfrom the native structure, containing many struc-tures with rmsd values to the native structure inthe 1–3 A range (Table 2, PN columns). Whilethere is still no significant signal using a Coulombterm, side-chain–backbone or side-chain–side-chain hydrogen bonds alone, 12 out of 23 structures

can now be successfully discriminated using thebackbone–backbone hydrogen bonding term, withslight improvements by combining all hydrogenbonding scores and adding in van-der-Waals inter-actions (Table 2, PN columns).

In cases of successful discrimination of low rmsddecoys in the perturbed native set, scatter plotsshow a correlation between the hydrogen bondingterm (alone and in combination with the van-der-Waals scores) with the rmsd to the native structure.As suggested by the data in Table 2, in most ofthese cases the hydrogen bonding term alone pro-vides the main part of the signal; an example ofthis is the protein 1vif shown in Figure 5(a) and(b). However, in a few cases the addition ofthe van-der-Waals term significantly improvesdiscrimination (Figure 5(c) and (d)).

Decoy discrimination in protein docking (test 4)

As mentioned above, electrostatic interactionshave been shown to be important in protein–pro-tein recognition. Therefore, as the fourth and finaltest of our hydrogen bond function, we assessedits ability to discriminate native and near-nativebinary protein–protein complex structures fromdocked arrangements with a range of rmsd values(rmsd values are obtained by computing the over-all Ca rmsd in the complex) (the protein docking

Figure 5. Scatter plots of the combined hydrogen bonding score alone (a) and (c) and in combination with the vander Waals score (b) and (d) versus decoy Ca rmsd from the native structure for selected monomeric proteins from theperturbed native data set. a) and b) Structure 1vif; c) and d) structure 1r69. Native structures are shown with redcircles, native structures with the side-chains modeled using our rotamer repacking protocol are shown with greensquares and decoys are shown with black triangles.

1250 An Orientation-dependent Hydrogen Bonding Potential

Page 13: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

problem). We used the backbone conformationsof the protein partners as observed in the nativecomplex structure. However, the side-chain confor-mations in the native complex structure were dis-carded, and modeled from our standard rotamerlibrary during all docking steps. All decoys werecreated by rigid-body motions of the two proteinsversus each other, incorporating extensive confor-mational rearrangement of the side-chains inthe complex interface.46,50 Tables 3 and 4 show theresults of a Z-score analysis for protein–proteincomplexes as described for the single-domainmonomeric proteins above. The protein–proteincomplex decoy set was divided into antibody-antigen (18 structures) and other complexes (13structures). The hydrogen bonding model alone

successfully discriminates the native structurefor 23 out of the 31 protein–protein complexesstudied. All three hydrogen bonding terms contri-bute to decoy discrimination, with successful pre-dictions in about two-thirds of all cases for eachcomponent separately (Table 3).

Lastly, we tested whether the hydrogen bondingfunction also helps in discriminating near-nativefrom high rsmd decoys (Table 4). In contrast to theresults obtained for the single domain monomericproteins (Table 2), for protein–protein complexesgood discrimination is achieved using a combi-nation of the three hydrogen bonding terms inabout two-thirds of all structures studied. Againall three hydrogen bond components are significantcontributors (the combined HB score discriminates

Table 3. Native (Zn) and native repacked (Znr) Z-scores for the protein–protein complex decoy sets for the followingenergetic contributions: side-chain–backbone hydrogen bonds (HB sc–bb), side-chain–side-chain hydrogen bonds(HB sc–sc), backbone–backbone hydrogen bonds (HB bb–bb): (A) Antibody/antigen (ab) decoy set; (B) non-anti-body/antigen (nab) decoy set

PDB codeHB sc–bb HB sc–sc HB bb–bb

Combined HBscore

Zn Znr Zn Znr Zn Znr Zn Znr

(A)1a2y ab 4.07 3.08 3.99 2.96 0.92 0.92 2.47 1.981cz8 ab 0.16 20.25 0.44 0.07 6.12 6.12 6.04 5.991dqj ab 1.06 2.21 1.71 2.29 5.46 5.46 5.80 5.591e6j ab 1.79 2.78 1.24 2.76 6.28 6.28 5.28 6.171egj ab 1.74 0.12 1.76 0.29 20.22 20.22 0.72 0.121eo8 ab 20.65 0.92 0.28 1.95 20.19 20.19 0.96 2.011fdl ab 3.02 2.10 2.93 2.26 1.89 1.89 2.66 2.561fj1 ab 2.84 2.97 2.61 2.90 20.13 20.13 1.51 1.901g7h ab 2.94 1.32 2.74 1.14 2.88 2.88 3.38 2.721ic4 ab 1.68 1.71 2.05 1.75 5.06 5.06 5.29 4.861jhl ab 21.58 20.40 21.52 20.12 3.96 3.96 2.31 3.181jrh ab 4.07 2.16 4.41 2.51 8.08 8.08 8.56 7.751mlc ab 20.55 2.57 0.11 2.93 2.40 2.40 2.33 3.401nca ab 1.88 4.54 1.58 4.10 20.23 20.23 0.50 1.911nsn ab 20.77 0.13 20.62 0.13 20.16 20.16 20.36 20.011osp ab 1.93 1.32 1.69 1.44 8.46 8.46 7.82 7.961qfu ab 20.86 4.14 20.30 4.81 20.26 20.26 0.01 2.111wej ab 1.07 1.53 1.29 1.50 20.20 20.20 0.79 0.69

Mean 1.32 1.83 1.47 1.98 2.78 2.78 3.12 3.38Stdev 1.72 1.41 1.56 1.37 3.11 3.11 2.64 2.38

(B)1ACB nab 20.85 20.88 20.99 20.87 12.70 12.70 11.33 11.421AVZ nab 1.07 1.89 1.35 2.41 20.16 20.16 1.05 1.961brs nab 5.74 5.09 6.33 4.80 20.32 20.32 3.43 2.321CHO nab 20.51 20.32 20.59 20.13 12.79 12.79 12.06 12.251MDA nab 21.27 21.22 21.40 21.37 20.35 20.35 21.03 21.021PPF nab 21.09 20.27 21.20 20.34 9.82 9.82 8.77 9.071SPB nab 7.00 5.65 6.32 5.44 13.85 13.85 14.06 13.901UGH nab 3.95 6.64 3.63 6.24 20.27 20.27 2.34 4.212PCC nab 20.93 20.63 20.97 20.62 20.32 20.32 20.87 20.622PTC nab 3.43 2.44 3.17 2.51 5.70 5.70 6.18 6.001CSE nab 2.14 0.56 1.94 0.52 9.64 9.64 9.16 8.661FIN nab 5.01 5.40 4.93 5.36 20.20 20.20 3.65 3.992BTF nab 1.70 2.49 2.19 2.68 3.54 3.54 4.18 4.40

Mean 1.95 2.06 1.90 2.05 5.11 5.11 5.72 5.89Stdev 2.86 2.81 2.84 2.72 5.86 5.86 4.78 4.64

Combined HB score, generalized linear model fit using the three hydrogen bond scores. Stdev, standard deviation from mean value.Successful discrimination is defined as a Z-score .1.

An Orientation-dependent Hydrogen Bonding Potential 1251

Page 14: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

22 out of 31 structures, whereas backbone–back-bone hydrogen bonds alone only discriminate 11structures). As seen before, a Coulomb term aloneis substantially less effective than the combinedhydrogen bonding terms, only discriminating 13structures.

The chemical character of protein–protein inter-faces is a combination of that seen on proteinsurfaces and in protein cores.2 Thus we wanted totest whether the inclusion of a van der Waals terminto the free energy function, describing non-polarpacking interactions in the interface, could furtherimprove the discrimination of docking decoys.The encouraging results obtained just using the

hydrogen bonding terms of our energy functioncan be slightly improved when including the vander Waals component, leading to successful dis-crimination in 24 out of the 31 complexes. Theslight improvement is only seen for non-antibodycomplexes. This behavior might be due to theknown poorer shape complementarity and largersolvation in antibody–antigen complexes,51 whichwas our original justification for splitting theprotein–protein complex set into the two differentclasses.

In 70% of the cases for non-antibody complexesand 50% for antibody/antigen complexes, a clearcorrelation between the combined hydrogen

Table 4. Low rmsd (Zlrms) Z-scores for the protein–protein complex decoy sets for the following energetic contri-butions: Coulomb electrostatics with a linear distance-dependent dielectric constant (Coulomb), side-chain–backbonehydrogen bonds (HB sc–bb), side-chain–side-chain hydrogen bonds (HB sc–sc), backbone–backbone hydrogenbonds (HB bb–bb): (A) antibody/antigen (ab) decoy set; (B) non-antibody/antigen (nab) decoy set

PDB codeCoulomb

(linear model) HB sc–bb HB sc–sc HB bb–bb Combined HB score Combined HB þ vdW scoreZlrms Zlrms Zlrms Zlrms Zlrms Zlrms

(A)1a2y ab 0.41 1.57 1.57 20.27 1.28 1.191cz8 ab 1.53 0.60 0.63 1.99 1.66 1.711dqj ab 1.42 1.97 1.90 20.21 1.50 2.361e6j ab 0.73 1.12 1.40 1.04 1.76 1.961egj ab 0.63 1.27 1.42 20.22 1.42 1.551eo8 ab 1.27 20.67 20.25 20.20 0.48 1.041fdl ab 20.27 1.00 1.12 0.44 1.20 0.851fj1 ab 0.34 2.66 2.74 20.14 2.58 2.721g7h ab 0.80 1.69 1.72 1.88 1.99 1.241ic4 ab 1.61 1.15 1.33 2.50 1.97 2.191jhl ab 0.13 20.44 20.31 20.24 20.11 0.181jrh ab 1.64 1.35 1.47 1.56 1.81 1.421mlc ab 1.13 0.47 0.82 0.95 1.45 1.781nca ab 1.59 1.66 1.67 20.09 1.39 1.861nsn ab 0.83 20.04 0.04 20.16 0.14 0.131osp ab 0.39 0.16 0.25 20.13 0.31 0.831qfu ab 0.80 1.32 1.72 0.32 2.15 2.181wej ab 0.67 20.31 20.05 20.20 0.28 0.07

Mean 0.87 0.92 1.07 0.49 1.29 1.40Stdev 0.56 0.91 0.85 0.92 0.75 0.76

(B)1ACB nab 0.67 20.24 20.36 3.84 2.14 2.131AVZ nab 1.12 20.24 20.02 20.16 0.24 0.651brs nab 1.42 3.29 3.31 0.27 2.53 2.531CHO nab 1.12 20.34 20.33 4.20 3.39 3.191MDA nab 0.00 0.72 0.65 20.11 0.27 1.551PPF nab 0.78 20.69 20.67 16.45 10.61 7.231SPB nab 2.64 1.05 1.09 6.77 5.29 4.231UGH nab 1.11 1.87 1.86 20.02 1.48 1.522PCC nab 0.50 1.20 1.10 20.32 0.55 0.462PTC nab 0.85 0.97 0.96 3.80 3.23 2.611CSE nab 1.36 0.63 0.76 3.93 3.32 2.941FIN nab 0.98 0.14 0.25 20.22 0.29 1.232BTF nab 0.60 0.54 1.04 0.61 1.79 1.88

Mean 1.01 0.68 0.74 3.00 2.70 2.47Stdev 0.62 1.07 1.06 4.67 2.71 1.70

Combined HB score, generalized linear model fit using the three hydrogen bond scores only. Combined HB þ vdW score,generalized linear model fit using the three hydrogen bond scores and the van der Waals scores. Stdev, standard deviation frommean value. Successful discrimination is defined as a Z-score .1. Stdev, standard deviation of mean value. Successful discriminationis defined as a Z-score .1. The mean Z-score for the combined HB scores and combined HB þ vdW scores in (B) is lower than thescore for the bb–bb hydrogen bonds alone, but maximizes the number of protein structures that can be successfully discriminated(lower standard deviation).

1252 An Orientation-dependent Hydrogen Bonding Potential

Page 15: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

bonding score (alone or in combination with thevan der Waals scores versus score) and rmsd isapparent when approaching the native structure,starting from about 3 A away. Examples are givenin Figures 6 and 7 for antibody/antigen and non-antibody complexes.

Discussion

We have developed a simple orientation-depen-dent hydrogen bonding function, derived from thegeometries of hydrogen bonds observed in high-resolution protein crystal structures (Figure 2).Several tests of this function have been performed:the prediction of amino acid sequences in mono-meric proteins (1) and protein–protein interfaces(2); the discrimination of native and near-nativestructures from misfolded conformations for singledomain monomeric proteins (3); and the appli-cation of the hydrogen bonding term to theprotein–protein docking problem (using boundprotein backbones, but repacked side-chains) (4).The hydrogen bonding term contributes signifi-cantly to the performance of the energy functionin all four tests.

The prediction of amino acid sequences inmonomeric structures for polar and charged

amino acids is clearly improved by inclusion ofthe hydrogen bonding potential (Figure 3). It isparticularly notable that this effect cannot be repro-duced using a Coulomb potential of similar magni-tude with a linear distance-dependent dielectricconstant. This suggests that the Coulomb modelof electrostatic interactions ignores some of theessential physical chemistry. Although the hydro-gen bonding potential developed here is simple, itappears to capture the specifics of hydrogen bond-ing interactions reasonably well. Its directionalityand explicit placement of polar hydrogen atomsare likely to be the major advantage over the Cou-lomb description of electrostatic interactions. Theinclusion of polar hydrogen atoms in traditionaltreatments of electrostatic interactions has beensuggested to introduce significant noise due to thesensitivity of the electrostatic energy to the preciselocations of the protons.10 Interestingly, molecularmechanics potentials originally contained explicithydrogen bonding terms, which were replaced byelectrostatic representations in later versions. Incontrast, our results suggest a clear advantage ofthe inclusion of the explicit hydrogen bonding,but not based on a simple model such as dipole–dipole interactions (see Figure 2(d)).

The hydrogen bonding potential also performswell in discriminating native structures from

Figure 6. Scatter plots of the combined hydrogen bonding score alone (a), c) and e) and in combination with the vander Waals score (b), d) and f) versus decoy Ca rmsd from the native structure for selected antibody/antigen complexes.Native structures with the side-chains modeled using our rotamer repacking protocol are shown with green squaresand decoys with black triangles.

An Orientation-dependent Hydrogen Bonding Potential 1253

Page 16: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

misfolded decoys in both the monomeric and pro-tein–protein complex decoy sets. However, it doesnot discriminate incorrect conformations fromnative-like structures in the case of the single-domain decoys generated by the ROSETTA ab initiomethod, and subjected to refinement and extensiveside-chain repacking. The hydrogen bondingpotential is clearly sensitive to distances of atomsbetween 1.5 A and 2.5 A, and there are very fewdecoys within the 1–3 A rmsd range for most ofthe structures in the single domain decoy set.The manifestation of the specificity of hydrogenbonds suggests that the width of the “hydrogenbond funnel” around the native structure isnarrow on the scale of these decoy sets. This issupported by the results on the perturbed-native decoy set: in cases where there aremany decoys in the 1–3 A range available, dis-crimination of native-like structures is possiblefor about half of the structures in the set(Table 2, Figure 5). Backbone hydrogen bondsare the best discriminator, while side-chain–side-chain and side-chain–backbone hydrogenbonds are not contributing significantly. Perhapsin less well packed globally misfolded decoy struc-tures sufficient alternative side-chain conformationsare available that local side-chain hydrogen bond-ing patterns can be optimized to a similar extentas in the native structure.

In contrast, both backbone and side-chainmediated hydrogen bonds contribute significantlyto the discrimination of docking decoys. In manycases, we observe a good correlation between thehydrogen bond score (alone or in combinationwith the van der Waals score) and the rmsd to thenative structure (Figures 6 and 7). This correlationstarts to become apparent in the rmsd range of2–3 A, as suggested by the width of the hydrogenbonding funnel in the single-domain set. Thisresult points to the applicability of this type ofenergy function to the minimization of dockingdecoys towards the native complex once decoysstructurally close enough can be generated.52 Ourprotein complex data set was generated using thebackbones of the protein partners in their boundconformations (although the native side-chainconformations are eliminated in the creation of alldocking decoys). A more stringent test for futurework will be to create protein complex decoysstarting from the conformations of separatelycrystallized (unbound) components.

It should be emphasized that the success in thehydrogen bonding potential in reproducing nativesequences and discriminating misfolded fromnative and near-native conformations does notbear on the question of whether hydrogen bondsare contributing to the stability of proteins andprotein interfaces. Our approach is based on the

Figure 7. Scatter plots of the combined hydrogen bonding score alone (a), c) and e) and in combination with the vander Waals score (b), d) and f) versus decoy Ca rmsd from the native structure for selected non-antibody complexes.Native structures with the side-chains modeled using our rotamer repacking protocol are shown with green squaresand decoys with black triangles.

1254 An Orientation-dependent Hydrogen Bonding Potential

Page 17: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

assumption that the sequences and structures ofnative single domain structures and protein inter-faces are on average electrostatically optimized42

when compared to non-native sequences andalternative compact conformations. This does notrequire that hydrogen bonding interactions innative proteins and protein–protein complexes aremore favorable than the hydrogen bonds the samegroups make with water in the solvated unfoldedor unbound ensembles.

The encouraging results predicting the identityof amino acid residues in protein interfaces (Figure4), taken together with a previous study predictingbinding energy hotspots in 19 protein–proteincomplexes with reasonable accuracy,46 suggest thatour energy function including the new hydrogenbonding term recapitulates determinants of bothspecificity and affinity in protein–protein inter-faces. This suggests that a combination of theenergy function and our side-chain repackingprotocol will enhance the prediction of specificityin protein interactions and aid the design of pro-tein–protein complexes. Given the available struc-ture of a specific complex of one or more membersof a large family of known protein interactiondomains, the method should allow the generationof specificity profiles for all family members withsignificant overall sequence and assumed struc-tural similarity (T.K. & D.B., unpublished results).With regard to the design aspect, we have appliedthis method to the redesign of specificity in a com-plex between a bacterial DNase and its inhibitorprotein as well as to the design of a protein–protein interface to create a chimeric artificialendonuclease.53

Methods

Native protein structure datasets

Three different collections of protein structures solvedby X-ray crystallography were used in this study. (1) Thedataset used for compiling hydrogen bonding statisticscontained 698 proteins with a resolution of 1.6 A orbetter and a crystallographic R factor of 0.25 or better,taken from the Dunbrack culled pdb collection †. Thelist was additionally filtered to only include single-chainproteins. (2) The high-resolution dataset used for para-meterizing the energy function was taken from Word etal.54 and contained non-redundant structures (less than30% sequence homology) with a resolution of 1.7 A orbetter, a crystallographic R factor of 0.2 or better, and aPro-Check overall G-factor of 20.6 or better.55 Anadditional filter excluded structures with missing side-chain atoms and backbone sections and yielded a finalset of 52 structures. (3) The protein interface dataset wasgenerated from the non-redundant set of protein–pro-tein interfaces compiled by Tsai et al.56 Only heterodi-meric protein–protein complexes were selected, andstructures were additionally filtered to not contain sig-nificant portions of missing density, yielding a total of

50 structures. Ligands and ions contained in the struc-tures were ignored in the analysis.

Atomic coordinates and preparation ofnative structures

Atomic coordinates were taken from structures solvedby X-ray crystallography. Polar hydrogen atoms wereadded to all structures, using CHARMM 19 standardbond lengths and angles. For rotatable bonds in polarhydrogen containing side-chains, several rotamersreflecting different hydrogen positions were created (seebelow), including a 1808 flip of asparagine and glutamineamide groups and the two histidine imidazole tautomers(assumed to be uncharged). Global optimization of thehydrogen bonding network and replacement of missingatoms for amino acid side-chains was performed foreach structure using a simple Metropolis Monte Carloprocedure as described previously45 and the energy func-tion described below. No other minimization of nativestructures was performed.

Definition of secondary structure for backbone–backbone hydrogen bonds

Secondary structure was defined by backbone torsionangles (helix: 21808 , f , 2208; 2908 , c , 2108;sheet: 21808 , f , 2208; 1808 . c . 208 or21808 , c , 21708). Classification of “helix” or “strand”required that at least two adjacent residues have thesame secondary structure. For a hydrogen bond to becounted as occurring in a helix or strand, both residueswere required to have the same secondary structureclassification; all other backbone–backbone hydrogenbonds were summarized as “other”.

The free energy function

The free energy function is a linear combination of vander Waals interactions (represented by the attractive partof a Lennard–Jones potential ðELJattrÞ and a linear dis-tance-dependent repulsive term ðELJrepÞÞ; orientation-dependent terms for side-chain–side-chain, side-chain–backbone and backbone–backbone hydrogen bondsðEHBðsc– bbÞ; EHBðsc– scÞ and EHBðbb– bbÞÞ (see Results andbelow), an implicit solvation model ðGsolÞ;

57 an energyderived from an amino acid and backbone-dependentrotamer probability ðErotðaa;f;cÞÞ;39 an energy derivedfrom amino-acid type (aa) dependent backbone f;cprobabilities ðEf=cðaaÞÞ; and amino acid type dependentreference energies to approximate the interactions madein the unfolded state ensemble (Eref

aa ; naa is the numberof amino acids of a certain type):

DG ¼ WattrELJattr þ WrepELJrep þ WHBðsc– bbÞEHBðsc–bbÞ

þ WHBðsc– scÞEHBðsc– scÞ þ WHBðbb– bbÞEHBðbb– bbÞ

þ WsolGsol þ Wf=cEf=cðaaÞ þ WrotErotðaa;f;cÞ

þX20

aa¼1

naaErefaa ð2Þ

The Lennard–Jones potential, solvation term, and back-bone-dependent amino acid and rotamer probabilitieswere as previously described.45,57 The amino acid typedependent reference energies which approximate thefree energy of the unfolded reference state45 and the† http://www.fccc.edu/research/labs/dunbrack

An Orientation-dependent Hydrogen Bonding Potential 1255

Page 18: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

weights W for the relative contributions of the differentenergy terms were obtained by fitting all parts of thescoring function to reproduce native sequences of natu-rally occurring proteins as described below. Energies forthe different geometric parameters describing back-bone–backbone and side-chain–side-chain hydrogenbonds EðpÞ were obtained using:

EðpÞ ¼ 2lnfproteinðpÞ

frandomðpÞð3Þ

where fproteinðpÞ is the frequency at which a geometricparameter p is observed in a certain bin in the high-resolution crystal structure dataset, and frandomðpÞ is areference frequency value assuming equal distributionin all bins (for the distance distributions, the long-rangecutoff was 2.6 A for backbone–backbone hydrogenbonds and 3.0 A for side-chain–side-chain hydrogenbonds). The kT prefactor is left out in equation (3) as itis included in the weight given in equation (1). Thedistributions were collected as described in Results.Energies for backbone–side-chain hydrogen bonds weretaken from the side-chain–side-chain statistics. Hydro-gen bonding energies with largely unfavorable energiesfor one or more of the component energy terms (result-ing in a positive total hydrogen bonding energy) wereset to zero. Coulomb electrostatics (to replace the hydro-gen bonding term in our tests) used a linear distance-dependent dielectric constant. All parameters for thehydrogen bonding potential can be found in theSupplementary Material. Partial charges were takenfrom the CHARMM19 parameter set28 with or without acorrection for charged residues neutralizing chargedgroups but increasing their polarity;57 no significantdifference between the two charge sets was observed inour tests.

Parameterizing the energy function onmonomeric proteins

The relative contributions of the different terms of thefree energy function were parameterized on the high-resolution structure dataset as described previously.45

Briefly, rotamers for all amino acids at all sequencepositions in the data set with 52 crystal structures witha resolution of 1.7 A or better were created (a total of7308 sequence positions with an average of 684 rotamersper sequence position). The components of the energyfunction (attractive and repulsive van der Waals inter-actions, solvation, hydrogen bonding, Coulomb electro-statics, backbone-dependent amino acid type androtamer probabilities and reference energies) werecomputed for all rotamers at each sequence positionassuming a constant environment of all other aminoacids in their native conformation. The weights onall terms were optimized using a conjugate gradientmethod to maximize the probability of the native aminoacid type at each position. Different initial values yieldedsimilar fitted parameters, showing convergence of the fit.The relative weights were 1.06 (attractive Lennard–Jones), 0.77 (repulsive Lennard–Jones), 0.72 (solvation),0.42 (backbone–side-chain hydrogen bonding) 0.40(side-chain–side-chain hydrogen bonding; the weightfor backbone–backbone hydrogen bonding could not bedetermined using this procedure as the backbone stayedconstant), 0.89 (backbone-dependent rotamer proba-bility) and 0.86 (backbone-dependent amino acid typeprobability). These weights result in a free energy ofabout 3 kcal/mol for a hydrogen bond with ideal geome-

try. The rotamer library used was taken from Dunbrack39

as described by Kuhlman & Baker.45 Additional rotamerswere included with small deviations (10–208) of the x1

and x2 angles for buried residues, and extra angles forx3 and x4 angles as described by Mayo & co-workers.58

For serine, threonine and tyrosine, hydrogen confor-mations were chosen according to those observed inneutron diffraction maps.59 For threonine and serinehydroxyl groups, the three different staggered positions,for tyrosine hydroxyl groups hydrogen atoms wereassumed to be in the plane of the aromatic ring. Inall cases, small deviations (^208) were includedadditionally.

Decoy sets

The generation of the decoy sets is described in moredetail elsewhere.50,60,61 In brief, two sets of decoys wereused: the first set contains decoys for 41 monomeric,single domain proteins with less than 90 amino acid resi-dues, generated by the ROSETTA method for ab initioprotein structure prediction. Approximately 2000 decoyswere used for each structure. This set was further subdi-vided into (a) 25 proteins with an available high res-olution crystal structure (used for discriminationof the native structure) and (b) 23 proteins where10% of the decoys produced by ROSETTA had a Ca

rmsd from the native structure of 4 A or better (notethat some proteins belong to both subsets). Inaddition, for each of these 23 structures, 300 decoyswere created by peptide fragment-insertion startingfrom the native structure (“perturbed-native decoyset”) using the ROSETTA method, followed by refine-ment on the centroid and full-atom level. The lattertwo decoy sets were used for discriminating low-rmsd from high-rmsd decoys. Finally, polar hydrogenatoms were added to all decoys, followed by side-chain repacking, and simultaneous optimization ofthe hydrogen bonding network and scoring using theenergy function described above.

The second set contained docking decoys for 31 pro-tein–protein complexes, with 18 antibody–antigen and13 enzyme–enzyme inhibitor and other interfaces. Foreach structure, 400 decoys were used (the results wereunchanged when a larger set of 2000 decoys per struc-ture was tested). Decoys were created by rigid-body per-turbations of the relative orientation of the two partnersin the protein–protein complex.50 Note that the backbonecoordinates of the bound conformations of both partnerswere used. However, the side-chain conformationalinformation contained in the crystal structure coordi-nates was eliminated by repacking all side-chains usingthe rotamer repacking protocol described previously45

prior to rigid-body docking. As a last step, the interfaceof all docked decoys was repacked using a Monte-Carlosimulated annealing protocol and the energy functiondescribed above, and final scores were collected. For therepacking and scoring of decoys, the side-chain–side-chain hydrogen bonds were divided into three environ-ment classes, dependent on the extent of burial ofboth participating residues (class 1, exposed–exposedand exposed–intermediate; class 2, exposed–buried andintermediate–intermediate; class 3, intermediate–buriedand buried–buried). The extent of burial was definedby the number of Cb atoms within a sphere of 10 Aradius of the Cb atom of the residue of interest: exposed0–14, intermediate 15–20, buried .20). The relativecontributions of the environment-dependent hydrogen

1256 An Orientation-dependent Hydrogen Bonding Potential

Page 19: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

bonding terms were estimated from changes in proteinstability upon point mutation, and were 0.18, 0.28 and0.91.46

Z-score analysis and logistic regression

Three different Z-score measures were used, definedas follows:

Zref ¼kEl2 Eref

sEð4Þ

where:

kEl ¼1

N

XN

i¼1

Ei ð5Þ

is an average energy of N decoys:

s2E ¼

1

N

XN

i¼1

ðEi 2 kElÞ2 ð6Þ

is the standard deviation of decoy energies, and Eref isthe reference energy which is either Enat (energy of thenative structure experimentally determined by X-raydiffraction or nuclear magnetic resonance spectroscopy)or Enat_rep (energy of the structure with the nativepolypeptide backbone but repacked side-chains usingour rotamer repacking protocol). These Z-scores arereferred to as Zn (native) and Znr (native-repacked) inthe Tables.

The low rmsd (root mean standard deviation in Ca

coordinates from the native structure) Z-scores (Zlrms,discriminating near-native from non-native confor-mations) are defined as:

Zlrms ¼kElhi 2 kEllo

shiE

ð7Þ

where the sum of the averages and the standard devia-tion are computed over decoys with high (hi) and low(lo) rmsd separately. Low rmsd (near-native) decoys aredefined as the lowest 5% of the rmsd distribution.

Combined free energies (and their Z-scores) usingdifferent energy terms were obtained by logisticregression using a generalized linear modelimplemented in the R statistical software package.

Acknowledgements

We thank members of the Baker laboratory formany stimulating discussions, Jerry Tsai, KiraMisura, Jeff Gray and Stewart Moughon for helpwith creating the original decoy sets, Kira Misuraand a reviewer for very helpful comments on themanuscript, and Keith Laidig for computing sup-port. T.K. was supported by the Human FrontierScience Program and EMBO. This work was alsosupported by a grant from the NIH and theHoward Hughes Medical Institute.

References

1. Baker, E. N. & Hubbard, R. E. (1984). Hydrogenbonding in globular proteins. Prog. Biophys. Mol.Biol. 44, 97–179.

2. Conte, L. L., Chothia, C. & Janin, J. (1999). Theatomic structure of protein–protein recognitionsites. J. Mol. Biol. 285, 2177–2198.

3. Hendsch, Z. S. & Tidor, B. (1994). Do salt bridgesstabilize proteins? A continuum electrostatic analy-sis. Protein Sci. 3, 211–226.

4. Pace, C. N. (2001). Polar group burial contributesmore to protein stability than nonpolar group burial.Biochemistry, 40, 310–313.

5. McDonald, I. K. & Thornton, J. M. (1994). Satisfyinghydrogen bonding potential in proteins. J. Mol. Biol.238, 777–793.

6. Waldburger, C. D., Schildbach, J. F. & Sauer, R. T.(1995). Are buried salt bridges important for proteinstability and conformational specificity? NatureStruct. Biol. 2, 122–128.

7. Yang, A. S. & Honig, B. (1995). Free energy determi-nants of secondary structure formation: I. alpha-helices. J. Mol. Biol. 252, 351–365.

8. Hendsch, Z. S., Jonsson, T., Sauer, R. T. & Tidor, B.(1996). Protein stabilization by removal of unsatisfiedpolar groups: computational approaches and experi-mental tests. Biochemistry, 35, 7621–7625.

9. Lumb, K. J. & Kim, P. S. (1995). A buried polar inter-action imparts structural uniqueness in a designedheterodimeric coiled coil. Biochemistry, 34, 8642–8648.

10. Petrey, D. & Honig, B. (2000). Free energy determi-nants of tertiary structure and the evaluation ofprotein models. Protein Sci. 9, 2181–2191.

11. Morokuma, K. (1977). Why do molecules interact?The origin of electron donor–acceptor complexes,hydrogen bonding and proton affinity. Accts. Chem.Res. 10, 294–300.

12. Kollman, P. A. (1977). Noncovalent interactions.Accts. Chem. Res. 10, 365–371.

13. Taylor, R., Kennard, O. & Versichel, W. (1983).Geometry of the N–H· · ·OvC hydrogen bond. 1.Lone-pair directionality. J. Am. Chem. Soc. 105,5761–5766.

14. Taylor, R. & Kennard, O. (1984). Hydrogen-bondgeometry in organic crystals. Accts. Chem. Res. 17,320–325.

15. Gavezzotti, A. & Filippini, G. (1994). Geometry of theintermolecular X–H· · ·Y (X, Y ¼ N, O) hydrogenbond and the calibration of empirical hydrogen-bond potentials. J. Phys. Chem. ser. B, 98, 4831–4837.

16. Platts, J. A., Howard, S. T. & Bracke, B. R. F. (1996).Directionality of hydrogen bonds to sulfur andoxygen. J. Am. Chem. Soc. 118, 2726–2733.

17. Lommerse, J. P. M., Price, S. L. & Taylor, R. (1997).Hydrogen bonding of carbonyl, ether, and ester oxy-gen atoms with alkanol hydroxyl groups. J. Comput.Chem. 18, 757–774.

18. Grzybowski, B. A., Ishchenko, A. V., DeWitte, R. S.,Whitesides, G. M. & Shakhnovich, E. I. (2000).Development of a knowledge-based potential forcrystals of small organic molecules: calculation ofenergy surfaces for CvO· · ·H–N hydrogen bonds.J. Phys. Chem. ser. B, 104, 7293–7298.

19. Ippolito, J. A., Alexander, R. S. & Christianson,D. W. (1990). Hydrogen bond stereochemistry inprotein structure and function. J. Mol. Biol. 215,457–471.

An Orientation-dependent Hydrogen Bonding Potential 1257

Page 20: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

20. Stickle, D. F., Presta, L. G., Dill, K. A. & Rose, G. D.(1992). Hydrogen bonding in globular proteins.J. Mol. Biol. 226, 1143–1159.

21. Fabiola, F., Bertram, R., Korostelev, A. & Chapman,M. S. (2002). An improved hydrogen bond potential:impact on medium resolution protein structures.Protein Sci. 11, 1415–1423.

22. McGuire, R. F., Momany, F. A. & Scheraga, H. A.(1972). Energy parameters in polypeptides. V. Anempirical hydrogen bond function based on molecu-lar orbital calculations. J. Phys. Chem. 76, 375–393.

23. Wiberg, K. B., Marquez, M. & Castejon, H. (1994).Lone pairs in carbonyl compounds and ethers.J. Org. Chem. 59, 6817–6822.

24. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States,D. J., Swaminathan, S. & Karplus, M. (1983).CHARMM: a program for macromolecular energy,minimization and dynamics calculations. J. Comput.Chem. 4, 187–217.

25. Weiner, S. J., Kollman, P. A., Case, D. A., Singh, U. C.,Ghio, C., Alagona, G. et al. (1984). A new force fieldfor molecular mechanical simulation of nucleic acidsand proteins. J. Am. Chem. Soc. 106, 765–784.

26. Jorgensen, W. J. & Tirado-Rives, J. (1988). The OPLSpotential function for proteins. Energy minimi-zations for crystals of cyclic peptides and crambin.J. Am. Chem. Soc. 110, 1657–1666.

27. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R.,Merz, K. M. J., Ferguson, D. M. et al. (1995). A secondgeneration force field for the simulation of proteins,nucleic acids, and organic molecules. J. Am. Chem.Soc. 117, 5179–5197.

28. Neria, E., Fischer, S. & Karplus, M. (1996). Simulationof activation free energies in molecular systems.J. Chem. Phys. 105, 1902–1921.

29. Buck, M. & Karplus, M. (2001). Hydrogen bondenergetics: a simulation and statistical analysis ofN-methyl acetamide (NMA), water and human lyso-zyme. J. Phys. Chem. ser. B, 105, 11000–11015.

30. Mayo, S. L., Olafson, B. D. & Goddard, W. A. I.(1990). DREIDING: a generic force field for molecu-lar simulations. J. Phys. Chem. 94, 8897–8909.

31. Dahiyat, B. I., Gordon, D. B. & Mayo, S. L. (1997).Automated design of the surface positions of proteinhelices. Protein Sci. 6, 1333–1337.

32. Pokala, N. & Handel, T. M. (2001). Review: proteindesign—where we were, where we are, where we’regoing. J. Struct. Biol. 134, 269–281.

33. Hao, M. H. & Scheraga, H. A. (1999). Designingpotential energy functions for protein folding. Curr.Opin. Struct. Biol. 9, 184–188.

34. Osguthorpe, D. J. (2000). Ab initio protein folding.Curr. Opin. Struct. Biol. 10, 146–152.

35. Sternberg, M. J., Gabb, H. A. & Jackson, R. M. (1998).Predictive docking of protein–protein and protein–DNA complexes. Curr. Opin. Struct. Biol. 8, 250–256.

36. Brunger, A. T. & Karplus, M. (1988). Polar hydrogenpositions in proteins: empirical energy placementand neutron diffraction comparison. Proteins: Struct.Funct. Genet. 4, 148–156.

37. Lipsitz, R. S., Sharma, Y., Brooks, B. R. & Tjandra, N.(2002). Hydrogen bonding in high-resolution proteinstructures: a new method to assess NMR proteingeometry. J. Am. Chem. Soc. 124, 10621–10626.

38. Anfinsen, C. B. (1973). Principles that govern thefolding of protein chains. Science, 181, 223–230.

39. Dunbrack, R. L., Jr & Cohen, F. E. (1997). Bayesianstatistical analysis of protein side-chain rotamerpreferences. Protein Sci. 6, 1661–1681.

40. Schreiber, G. (2002). Kinetic studies of protein–pro-tein interactions. Curr. Opin. Struct. Biol. 12, 41–47.

41. Sheinerman, F. B., Norel, R. & Honig, B. (2000).Electrostatic aspects of protein–protein interactions.Curr. Opin. Struct. Biol. 10, 153–159.

42. Lee, L. P. & Tidor, B. (2001). Barstar is electrostaticallyoptimized for tight binding to barnase. Nature Struct.Biol. 8, 73–76.

43. Bonneau, R., Tsai, J., Ruczinski, I., Chivian, D., Rohl,C., Strauss, C. E. & Baker, D. (2001). Rosetta inCASP4: progress in ab initio protein structure predic-tion. Proteins: Struct. Funct. Genet. 45, 119–126.

44. Simons, K. T., Ruczinski, I., Kooperberg, C., Fox,B. A., Bystroff, C. & Baker, D. (1999). Improvedrecognition of native-like protein structures using acombination of sequence-dependent and sequence-independent features of proteins. Proteins: Struct.Funct. Genet. 34, 82–95.

45. Kuhlman, B. & Baker, D. (2000). Native proteinsequences are close to optimal for their structures.Proc. Natl Acad. Sci. USA, 97, 10383–10388.

46. Kortemme, T. & Baker, D. (2002). A simple physicalmodel for binding energy hot spots in protein–pro-tein complexes. Proc. Natl Acad. Sci. USA, 99,14116–14121.

47. Park, B. H., Huang, E. S. & Levitt, M. (1997). Factorsaffecting the ability of energy functions to discrimi-nate correct from incorrect folds. J. Mol. Biol. 266,831–846.

48. Gatchell, D. W., Dennis, S. & Vajda, S. (2000).Discrimination of near-native protein structuresfrom misfolded models by empirical free energyfunctions. Proteins: Struct. Funct. Genet. 41, 518–534.

49. Vorobjev, Y. N. & Hermans, J. (2001). Free energiesof protein decoys provide insight into determinantsof protein stability. Protein Sci. 10, 2498–2506.

50. Gray, J. J., Moughon, S., Kortemme, T., Schueler-Furman, O., Misura, K. M. S., Morozov, A. V. &Baker, D. (2003). Protein–protein docking predic-tions for the CAPRI experiment. In press.

51. Lawrence, M. C. & Colman, P. M. (1993). Shapecomplementarity at protein/protein interfaces.J. Mol. Biol. 234, 946–950.

52. Camacho, C. J. & Vajda, S. (2001). Protein dockingalong smooth association pathways. Proc. Natl Acad.Sci. USA, 98, 10636–10641.

53. Chevalier, B. S., Kortemme, T., Chadsey, M. S.,Baker, D., Monnat, R. J. J. & Stoddard, B. L.(2002). Design, activity and structure of E-DreI, ahighly site-specific artifical endonuclease. Mol.Cell, 10, 895–905.

54. Word, J. M., Lovell, S. C., LaBean, T. H., Taylor, H. C.,Zalis, M. E., Presley, B. K. et al. (1999). Visualizingand quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms.J. Mol. Biol. 285, 1711–1733.

55. Laskowski, R. A., Rullmannn, J. A., MacArthur,M. W., Kaptein, R. & Thornton, J. M. (1996). AQUAand PROCHECK-NMR: programs for checkingthe quality of protein structures solved by NMR.J. Biomol. NMR, 8, 477–486.

56. Tsai, C. J., Lin, S. L., Wolfson, H. J. & Nussinov,R. (1996). A dataset of protein–protein interfacesgenerated with a sequence-order-independent com-parison technique. J. Mol. Biol. 260, 604–620.

57. Lazaridis, T. & Karplus, M. (1999). Effective energyfunction for proteins in solution. Proteins: Struct.Funct. Genet. 35, 133–152.

1258 An Orientation-dependent Hydrogen Bonding Potential

Page 21: An Orientation-dependent Hydrogen Bonding Potential Improves Prediction … · 2008-02-27 · An Orientation-dependent Hydrogen Bonding Potential Improves Prediction of Specificity

58. Dahiyat, B. I. & Mayo, S. L. (1997). De novo proteindesign: fully automated sequence selection. Science,278, 82–87.

59. Kossiakoff, A. A., Shpungin, J. & Sintchak, M. D.(1990). Hydroxyl hydrogen conformations in trypsindetermined by the neutron diffraction solvent dif-ference map method: relative importance of stericand electrostatic factors in defining hydrogen-bonding geometries. Proc. Natl Acad. Sci. USA, 87,4468–4472.

60. Tsai, J., Bonneau, R., Morozov, A. V., Kuhlman, B.,Rohl, C. A. & Baker, D. (2003). An improved proteindecoy set for testing energy functions for proteinstructure prediction. In press

61. Morozov, A. V., Kortemme, T. & Baker, D. (2003).Evaluation of models of electrostatic interactions inproteins. J. Phys. Chem. B. In press.

Edited by P. Wright

(Received 19 July 2002; received in revised form 17December 2002; accepted 17 December 2002)

Supplementary Material comprising four Tablesis available on Science Direct

An Orientation-dependent Hydrogen Bonding Potential 1259