mapping the energetics of water–protein and water–ligand interactions with the “natural”...

21
Mapping the Energetics of Water–Protein and Water–Ligand Interactions with the “Natural” HINT Forcefield: Predictive Tools for Characterizing the Roles of Water in Biomolecules Alessio Amadasi 1 , Francesca Spyrakis 1 , Pietro Cozzini 2 * , Donald J. Abraham 3 , Glen E. Kellogg 3 * and Andrea Mozzarelli 1 * 1 Department of Biochemistry and Molecular Biology University of Parma, 43100 Parma, Italy 2 Laboratory of Molecular Modelling, Department of General and Inorganic Chemistry, Chemical-Physics and Analytical Chemistry University of Parma, 43100 Parma, Italy 3 Department of Medicinal Chemistry & Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University Richmond, VA 23298-0540 USA The energetics and hydrogen bonding pattern of water molecules bound to proteins were mapped by analyzing structural data (resolution better than 2.3 A ˚ ) for sets of uncomplexed and ligand-complexed proteins. Water–protein and water–ligand interactions were evaluated using hydropatic interactions (HINT), a non-Newtonian forcefield based on experimentally determined log P octanol/water values. Potential water hydrogen bonding ability was assessed by a new Rank algorithm. The HINT-derived binding energies and Ranks for second shell water molecules were K0.04 kcal mol K1 and 0.0, respectively, for first shell water molecules K0.38 kcal mol K1 and 1.6, for active site water molecules K0.45 kcal mol K1 and 2.3, for cavity water molecules K0.55 kcal mol K1 and 3.3, and for buried water molecules K0.56 kcal mol K1 and 4.4. For the last four classes, similar energies indicate that internal and external water molecules interact with protein almost equally, despite different degrees of hydrogen bonding. The binding energies and Ranks for water molecules bridging ligand–protein were K1.13 kcal mol K1 and 4.5, respectively. This energetic contribution is shared equally between protein and ligand, whereas Rank favors the protein. Lastly, by comparing the uncomplexed and complexed forms of proteins, guidelines were developed for prediction of the roles played by active site water molecules in ligand binding. A water molecule with high Rank and HINT score is unlikely to make further interactions with the ligand and is largely irrelevant to the binding process, while a water molecule with moderate Rank and high HINT score is available for ligand interaction. Water molecule displaced for steric reasons were characterized by lower Rank and HINT score. These guidelines, tested by calculating HINT score and Rank for 50 water molecules bound in the active site of four uncomplexed proteins (for which the structures of the liganded forms were also available), correctly predicted the ultimate roles (in the complex) for 76% of water molecules. Some failures were likely due to ambiguities in the structural data. q 2006 Elsevier Ltd. All rights reserved. Keywords: free energy of ligand binding; water; computational methods; HINT; protein *Corresponding authors Introduction Water is a biomolecule Proteins are biological macromolecules built from L-amino acids and characterized by a defined three-dimensional structure and function. A second, always present, constituent of proteins 0022-2836/$ - see front matter q 2006 Elsevier Ltd. All rights reserved. E-mail addresses of the corresponding authors: [email protected]; [email protected]; andrea. [email protected] doi:10.1016/j.jmb.2006.01.053 J. Mol. Biol. (2006) 358, 289–309

Upload: independent

Post on 01-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

doi:10.1016/j.jmb.2006.01.053 J. Mol. Biol. (2006) 358, 289–309

Mapping the Energetics of Water–Protein andWater–Ligand Interactions with the “Natural” HINTForcefield: Predictive Tools for Characterizing the Rolesof Water in Biomolecules

Alessio Amadasi1, Francesca Spyrakis1, Pietro Cozzini2*, DonaldJ. Abraham3, Glen E. Kellogg3* and Andrea Mozzarelli1*

1Department of Biochemistryand Molecular BiologyUniversity of Parma, 43100Parma, Italy

2Laboratory of MolecularModelling, Department ofGeneral and InorganicChemistry, Chemical-Physicsand Analytical ChemistryUniversity of Parma, 43100Parma, Italy

3Department of MedicinalChemistry & Institute forStructural Biology and DrugDiscovery, VirginiaCommonwealth UniversityRichmond, VA 23298-0540 USA

0022-2836/$ - see front matter q 2006 E

E-mail addresses of the [email protected]; [email protected]

The energetics and hydrogen bonding pattern of water molecules boundto proteins were mapped by analyzing structural data (resolution betterthan 2.3 A) for sets of uncomplexed and ligand-complexed proteins.Water–protein and water–ligand interactions were evaluated usinghydropatic interactions (HINT), a non-Newtonian forcefield based onexperimentally determined log Poctanol/water values. Potential waterhydrogen bonding ability was assessed by a new Rank algorithm. TheHINT-derived binding energies and Ranks for second shell watermolecules were K0.04 kcal molK1 and 0.0, respectively, for first shellwater molecules K0.38 kcal molK1 and 1.6, for active site water moleculesK0.45 kcal molK1 and 2.3, for cavity water molecules K0.55 kcal molK1

and 3.3, and for buried water molecules K0.56 kcal molK1 and 4.4. For thelast four classes, similar energies indicate that internal and external watermolecules interact with protein almost equally, despite different degrees ofhydrogen bonding. The binding energies and Ranks for water moleculesbridging ligand–protein were K1.13 kcal molK1 and 4.5, respectively. Thisenergetic contribution is shared equally between protein and ligand,whereas Rank favors the protein. Lastly, by comparing the uncomplexedand complexed forms of proteins, guidelines were developed forprediction of the roles played by active site water molecules in ligandbinding. A water molecule with high Rank and HINT score is unlikely tomake further interactions with the ligand and is largely irrelevant to thebinding process, while a water molecule with moderate Rank and highHINT score is available for ligand interaction. Water molecule displacedfor steric reasons were characterized by lower Rank and HINT score.These guidelines, tested by calculating HINT score and Rank for 50 watermolecules bound in the active site of four uncomplexed proteins (forwhich the structures of the liganded forms were also available), correctlypredicted the ultimate roles (in the complex) for 76% of water molecules.Some failures were likely due to ambiguities in the structural data.

q 2006 Elsevier Ltd. All rights reserved.

Keywords: free energy of ligand binding; water; computational methods;HINT; protein

*Corresponding authors

lsevier Ltd. All rights reserve

ding authors:@vcu.edu; andrea.

Introduction

Water is a biomolecule

Proteins are biological macromolecules builtfrom L-amino acids and characterized by a definedthree-dimensional structure and function.A second, always present, constituent of proteins

d.

290 Water–Protein and Water–Ligand Interactions

is water, critical for achieving the correct proteinfold, for flexibility in carrying out biologicalfunctions, and for mediating protein–protein,protein–ligand and protein–DNA interactions.The role and the energetic contribution of watermolecules to protein structure, dynamics andfunction have been thoroughly investigated byexperimental, theoretical and computationalapproaches over the past several decades.1–20

Furthermore, in recent years, the distributionand the affinity of water molecules within enzymeactive sites have been evaluated for the design ofselective inhibitors that may be eventually suitablefor therapeutic applications.21–29 Most experimen-tal information on the localization of watermolecules within a protein is derived from X-rayand neutron diffraction crystallography andnuclear magnetic resonance spectroscopy.4,30

Conventional wisdom is that there is about onewater molecule per amino acid residue in aprotein. However, in practice, this numberdepends on the quality of the structural determi-nation;4,31 i.e. while one water molecule is foundper residue at 2.0 A, 1.6–1.7 are found at 1.0 Aresolution.2,32 In addition, visual inspection of 906protein crystal structures indicated a weakdependence between the number of water mole-cules and the fraction of polar/apolar surface,32

which is somewhat surprising, since water isexpected to preferentially interact with polarresidues. Furthermore, it was found that, onaverage, there is one “buried” water moleculeper 27 protein residues.33

Water molecules have been classified on thebasis of different criteria, including the number ofhydrogen bonds formed and the associated inte-raction energies, the thermal B-factor, the accessiblesurface area, the residence time, the conservationand displacement upon ligand binding. Theconcepts of first and second hydration shells,34,35

water bound to flexible or fixed side-chains inwide, deep or narrow crevices, small or big cavitiesin the interior of the protein,2 distribution hier-archies36 and proximal or perpendicular radialdistribution functions5,9,37 are among many ideasput forward to link simulation and experimentalevidence. Buried and tightly bound water mole-cules exhibit residence times of the order ofhundreds of picoseconds,38–41 whereas watermolecules that are more on the surface and incontact with the bulk water exhibit shorterresidence times of the order of 5–50 ps.37,41–45 Theinterdependence of protein surface and boundwater has been analyzed carefully for 56 high-resolution protein structures.46 Computationalsimulations have provided details on waternetworks within a protein, and on pathwaysthat allow water to move from one site to another,or access to sites located deep in the proteincore.41,47–49 Computational procedures have beendeveloped for the identification of potential sitesfor water molecules and the prediction of theirenergy.21,25,26,50–67

The energetics of water–protein andwater–ligand interactions

The thermodynamic energy balance of a watermolecule that interacts with either a protein or aligand or with both simultaneously (the so-called“bridging water”) is complex, as it entails enthalpicand entropic differences between water “free” insolution and “bound” at the protein site.28 Evenmore complex is the thermodynamic description ofthe concerted encounter between a protein andligand that must take into account the reorgani-zation of water molecules around the ligand andwithin the protein active site. Analyses of high-resolution structures of protein–ligand complexeshave indicated that water molecules mediaterecognition via formation of hydrogen bonds.24

Strongly bound water molecules in the active siteof a protein are not displaced easily by ligandbinding, thus these water molecules structurallymodify the shape of the protein surface as it isrecognized by a ligand.23 Some of these watermolecules are quite conserved within the active siteof homologous proteins.22 Careful analyses of thethree-dimensional structure of proteins in theabsence and presence of ligands have allowed apartial, yet valuable, understanding of the roleplayed by individual water molecules in complexstabilization.25,26,48,68–74 Some ordered water mol-ecules present in the free form of a protein areretained upon complex formation with ligands,while others are apparently present only in thecomplex. Complex formation may be associated,perhaps surprisingly, with solvation or with adecrease in water mobility, thus rendering thesewater molecules more ordered and detectable in thecomplex. It has been argued that the unfavorableentropic contribution associated with decreasedwater mobility might be compensated partially byan increase in protein flexibility.75–77 Accordingly, anumber of ligands have been designed to displacespecific water molecules, thus increasing theentropic term and, consequently, the affinity. Well-known examples are cyclic urea inhibitors ofhuman immunodeficiency virus type 1 (HIV-1)protease displacing the conserved water 30178 andnitrile-containing compounds that bind in theactive site of scytalone dehydratase.79

In the present work, we have addressed thefollowing two questions: (i) what is the strength ofthe water–protein interaction and to what extent doesthis strength depend on water localization? and (ii)are conserved or ligand-displaceable water moleculespredictable by analyzing the structure of the nativeprotein and evaluating their affinity with the proteinactive site and their hydrogen bonding pattern?Obviously, this knowledge would be of great help inthe rational design of strongly binding ligands.

The HINT “natural” forcefield

To answer these questions, we have made use ofHINT (hydropathic interactions), a non-Newtonian

Water–Protein and Water–Ligand Interactions 291

forcefield based on experimentally determinedlog Po/w (partition constant for 1-octanol/water)values, developed by Kellogg and Abraham.80–82

This natural forcefield is particularly suitable forthe quantitative evaluation of non-covalent mol-ecular interactions, especially entropically drivenhydrophobic effects, whereas classical molecularmechanics approaches usually omit or only par-tially evaluate solvation/desolvation events. Theproduct ai aj Tij is a simple representation of theinteraction between two atoms (i and j) and is botha qualitative and a quantitative measure of theassociation process for the two atoms; the entirebiomolecular interaction is scored by the followingequation:X

i

Xj

bij ZXi

Xj

ðaiSiajSjTijRij CrijÞ (1)

where bij is the interaction score between atoms, a isthe hydrophobic atom constant, S is the atomicsolvent-accessible surface area, Tij is a logic functionassuming C1 or K1 values, depending on the polarnature of interacting atoms, and Rij and rij arefunctions of the distance between the atoms i and j.Rij is usually a simple exponential function, while rijis an adaptation of the Lennard-Jones function. Thekey parameters a are calculated by a procedureadapted from the CLOG-P method conceived byHansch and Leo.83 A positive bij value indicateshydropathically favorable contacts (hydrogen bonds,acid/base and hydrophobic/hydrophobic), whilenegative bij values indicate unfavorable contacts(acid/acid, base/base and hydrophobic/polar).

In the HINT model, both positive and negativeinteractions are quantified using the same protocol,i.e. a biomolecular association process is evaluatedas a concerted event and not as a net sum of termscategorized by interaction type, each having theirown “equation”.84 Consequently, different scoresare only due to the different hydropathic propertiesof the interacting atoms, expressed implicitly in thehydrophobic atomic constants. The log Po/w of aligand or biological molecule is the sum of allhydrophobic atomic constants for that molecule;thus, the a parameters are dimensionless thermo-dynamic quantities related directly to the freeenergy of atom transfer between 1-octanol andwater. The HINT paradigm is based on theassumption that each bij is related to a partial dgvalue and the total HINT score (HTOT) is compar-able directly to the global DG8

interaction. HINT wasapplied successfully in a correlation of free energyof binding in ligand–protein complexes,85,86 for theevaluation of the influence of ionization states ofprotein and ligand functional groups on complexaffinities,87,88 for the contribution of water tothe free energy of association at protein–proteininterfaces89 and at ligand–protein interfaces.14,29 Anumber of other studies with ligand–nucleic acidsassociation have been reported.90–92

Recently, we have developed a procedure calledRank that is designed to calculate the number and

the geometrical quality of potential hydrogen bondsformed by each water molecule to non-water atomsin a solvated protein.93 We have applied a combinedHINT and Rank analysis for the characterization ofwater molecules bound to proteins both in theabsence and in the presence of ligands. This hasallowed us to predict the role played by watermolecules localized in the active site of the nativeprotein when ligand binding takes place.

Results and Discussion

Classification of crystallographically detectedwater molecules

Water molecules bound to a protein wereclassified with respect to their location usingMOLCAD tools as described in Materials andMethods.47,94,95 The most superficial and solvent-exposed regions of each protein were colored blue,while depressions, clefts and grooves were pro-gressively colored green, yellow and orange.A representative example, HIV-1 protease, isshown in Figure 1(a). Water molecules were dividedinto five classes: (1) water molecules found in activesites; (2) water molecules inserted deeply in cavities;(3) buried water molecules; (4) first shell externalwater molecules; and (5) second shell external watermolecules. In the first hydration shell, watermolecules were 4 A or less from the nearest proteinheavy-atom (Figure 1(b)), while the second shellwater molecules were more than 4 A from theprotein. Thus, the minimum distance between thehydration layers was 4 A.46 Water molecules placedin protein active sites (Figure 1(c)) could bedisplaced or retained upon substrate or ligandentrance, or possibly internalized into cavitiesformed at the protein–ligand interface. Cavities areidentified easily, because they appear as pores in theprotein surface (Figure 1(d)) with a diameter similarto that of a water molecule.46 Therefore, watermolecules were considered localized in a cavitywhen, with the exception of this single pore, theentire molecule surface was surrounded by proteinresidues, and the water molecule can escape only bytranslating through the pore in a unique solvent-accessible direction. In contrast, buried watermolecules are locked in closed internal clefts, i.e.are encased completely in the protein matrix and arecompletely undetectable with a Connolly surfacemodel. As a consequence, buried water moleculesshould form at least three hydrogen bonds withprotein residues to compensate for the loss ofhydrogen bonds with bulk water.96 These “internal”water molecules are normally detectable easily byX-ray crystallography due to their low B-factors,corresponding to well-defined energy minima.2

Crystallographically detected water moleculeswere identified, characterized and analyzed for 12protein structures (data set I, see Table 1). Proteinsin set I were selected on the basis of three criteria:functional and structural heterogeneity, availability

Figure 1. (a) Connolly surface of HIV-1 protease (PDB code 1G6L) built using Sybyl MOLCAD tools. Protrudingregions are colored blue, while depressions, clefts and grooves are colored green, yellow and orange, respectively. Watermolecules belonging to different topographic classes are rendered in CPK. (b) Water molecule in the first hydration shell.(c) Buried (1) and active site (2) water molecules. For clarity, a portion of the surface belonging to the Arg8 side-chain hasbeen hidden. (d) Water molecule in a surface cavity.

292 Water–Protein and Water–Ligand Interactions

of a relatively high-resolution structure (all struc-tures determined at resolution better than 2.3 A).The set includes three aspartic proteases (PDBcodes 1G6L, 4APE, and 1W50), two serineproteases (1TPO and 1JOU), four transport pro-teins (1LIB, 1I04, 1GCG, and 1XCA), a ribonuclease(1FS3), a GTPase domain (1WER), and a heat shockprotein (1YER). In total, 2186 water molecules wereevaluated and classified into the five categoriesdescribed above: 112 water molecules were foundwithin protein active sites, 205 in cavities, 148buried into the protein matrix, 1493 in the firstexternal hydration layer and 228 in the secondhydration layer (Table 2). The strength of theinteraction formed by each water molecule withthe protein was evaluated by the HINT forcefield,82

and the number of potential hydrogen bonds was

predicted using the Rank algorithm (Materials andMethods).93 All water molecules were optimizedexhaustively by the HINT tool93 in order toproduce the maximum number and strength ofinteractions. It is important to note that hydrogenatoms are not usually detected by X-ray diffractiontechniques, and nearly always must be added andminimized by computational techniques in mol-ecular models of proteins. Often these watermolecules can be trapped in local energy minima,so our exhaustive procedure is designed to find aglobal minimum for the orientation of each watermolecule, even at a cost of increased calculationtime.

The mean HINT score and Rank values, relativeto each individual class of water molecule and tothe entire water molecule set, were calculated and

Table 1. Data set I: protein structures

Crystallographically detected water moleculesa

ProteinsPDBcode

Resolution(A) Total 2nd shell 1st shell Site Cavity Buried Refs.

HIV-1 protease 1G6L 1.90 95 4 70 8 8 5 109Lipid-binding protein 1LIB 1.70 89 3 65 12 3 6 97Trypsin 1TPO 1.70 84 2 45 2 12 23 110Mouse major urinary

protein-I1I04 2.00 154 8 113 1 29 3 111

Thrombin 1JOU 1.80 205 26 116 7 32 24 112Endothiapepsin 4APE 2.10 343 102 198 17 11 15 113Galactose-binding

protein1GCG 1.90 175 4 133 14 17 7 114

b Secretase 1W50 1.75 279 5 170 19 43 42 115P120 GAP activating

domain1WER 1.60 181 0 142 8 21 10 116

Ribonuclease A 1FS3 1.40 108 8 90 4 4 2 117Heat shock protein 90 1YER 1.65 335 37 263 9 18 8 118Retinoic acid transport 1XCA 2.30 138 29 88 11 7 3 119

a See the text for definitions of water topographic classes.

Water–Protein and Water–Ligand Interactions 293

are presented in Table 2. These data indicate thatHINT score and Rank values increase concomi-tantly from second layer to first layer hydrationwater molecules to, in order, water moleculesplaced in active sites, in cavities and buriedcompletely. This finding is in agreement withseveral empirical and experimental observations,stating that peripheral water molecules are lessstable than internal water molecules.2,7,44 Theaverage HINT score value was 20 for second shellwater molecules, 198 for the first shell, 233 foractive sites, 284 for cavities, and 287 for buriedwater molecules (Figure 2(a)). We have reportedthat about 515 HINT score units corresponds toK1 kcal molK1.85 Thus, for reference, the buriedwater molecule HINT score value represents aboutK0.56 kcal molK1. This value might appear smallfor a buried water molecule. However, it representsthe total enthalpic and entropic balance to place awater molecule within a protein. In previous workby Fischer and Verma, the entropic cost wasestimated to be about 20.6 kcal molK1 at 300 K,while the enthalpic gain for a water moleculeforming four hydrogen bonds was K20.0 kcalmolK1.76 Williams et al. estimated aboutK0.6 kcal molK1 for each hydrogen bond formedby water molecules in apolar cavities.33 The freeenergy for the transfer of a water molecule to eithera crystallographic unoccupied apolar cavity,forming two hydrogen bonds, or to a cavity thatalready contains a water molecule was calculatedto be K0.2(G1.5) kcal molK1, and K10.0(G1.3)

Table 2. Distribution of water molecules bound to protein str

2nd externallayer

1st externalshell S

No. water molecules 228 1493Mean HINT score 20G38 198G175Mean rank 0.0 1.6G1.0

kcal molK1, respectively, by Wade et al.47 Morerecently, Olano and Rick calculated a free energychange of K4.7 kcal molK1 for the hydration of apolar cavity in bovine pancreatic trypsin inhibitorwith formation of four hydrogen bonds.77 It shouldbe pointed out that all thermodynamic calculatedvalues contain large uncertainty, as there can besignificant disagreement between calculations andstructural data in predicting cavity occupancy.77

The HINT score, although proportional to freeenergy, is more properly used in relative comparisonsbetween different interacting molecules (see below).The last four classes of water exhibit rather similarHINT score averages (–0.38 to K0.56 kcal molK1),indicating that internal and external water moleculesare similarly bound to the protein. Since buried andcavity water molecules are almost completelysurrounded by the protein matrix, whereas externaland active site water molecules are (by definition)directly solvent-exposed and interact with the proteinmatrix only partially, this behavior is at first some-what surprising. However, the chemical nature of theprotein residues that interact with water moleculesprovides a likely explanation. The surface of internalcavities is essentially shaped by polar backbonegroups. Williams and co-workers observed thatburied water molecules usually make 53% of theirpolar contacts with the protein backbone, 30% withside-chains and 17% with other buried watermolecules.33 The protein surface is characterizedlargely by polar (charged and uncharged) aminoacid residues that expose their side-chains to the bulk

uctures in five different topographic classes

Water molecules

ite waters Cavity Buried Total

112 205 148 2186233G217 284G180 287G194 204G1832.3G1.3 3.3G1.1 4.4G1.1 1.8G1.4

Figure 2. (a) Distribution of HINTscores for different classes of watermolecules in data set I (uncom-plexed proteins). HINT scores forsecond shell external water mol-ecules (black), first shell externalwater molecules (dark blue), activesite water molecules (green), cavitywater molecules (magenta) andburied water molecules (red) areplotted. HINT scores are binned asfollows: 250Zpercentage of watermolecules with scores between 0and 249; 500Zpercentage of watermolecules with scores between 250and 499, etc. (b) Distribution ofRanks for different classes of watermolecules in data set I. Ranks forsecond shell external water mol-ecules (black), first shell externalwater molecules (dark blue), activesite water molecules (green), cavitywater molecules (magenta) andburied water molecules (red).Ranks are binned as follows: 0Zpercentage of water molecules withRanks less than 0.5; 1Zpercentageof water molecules with Ranksbetween 0.5 and 1.49, etc.

294 Water–Protein and Water–Ligand Interactions

solvent, where they are able to interact strongly withthe hydration water molecules. The interactionsbetween water and polar side-chains are scoredhigher than contacts between water and backbonemoieties because the latter are often involved in intra-chain interactions. This is reflected in the differenthydrophobic atomic constants and solvent-accessiblesurface areas assigned by the HINT forcefield to theseatoms in their different environments. Thus, externaland cavity water molecules can exhibit very similarscore values, even if the number of hydrogen bondsthey are involved in is quite different. This interpre-tation is supported by the distribution of Ranks,which shows a smooth increase with water class: forthe 2186 water molecules analyzed, the averagevalues are 0.0 for second shell hydration watermolecules (they have no direct interaction withthe protein), 1.6 for the first shell water molecules,2.3 for active site water molecules, 3.3 for cavity watermolecules, and 4.4 for buried water molecules(Figure 2(b); Table 2).

Identification of water molecules bridgingprotein and ligand in protein–ligand complexes

The HINT score and Rank analysis were per-formed on a set of 15 protein–ligand complexes thateach contain at least one well-identified water

molecule bridging the protein–ligand interface(data set II, see Table 3). In most cases, the sameproteins were analyzed also in the absence of ligandto investigate the expulsion of water by incomingligands (vide infra). Here, we investigated the abilityof the HINT forcefield to identify water moleculesplaying a significant energetic and/or functionalrole, and to characterize their geometrical features.This is of obvious relevance for liganddesign.25,26,28,29,64

The 15 protein–ligand complexes in our data set,all structurally determined at a resolution betterthan 2.3 A, contained 2489 water molecules(between 62 and 264 per complex). These data,especially with respect to the critical bridging watermolecules, are summarized in Table 3. Figure 3presents pictorial vignettes of ten of these watermolecules that have been identified as “bridging”.

For further characterization of bridging watermolecules, the Rank algorithm was used toobtain “partial” Rank values for the protein andligand contributions of the interacting watermolecule. Similarly, partial HINT score values forHProtein–Water and HLigand–Water contributions werecalculated. The Rank and HINT score values forbridging water molecules in the 15 protein–ligandcomplexes analyzed in set II are given in Table 4.The average HINT and Rank values are 583

Table 3. Data set II: ligand–protein complexes

Bridging water molecules

Complexedprotein

PDBcode

Resolution(A)

No. crystal watermolecules IDa Interacting residuesb Figurec Reference

HIV-1 protease 4PHV 2.10 104 1(301) Ile50, Ile150 3(a) 78,120,121HIV-2 protease 1HII 2.30 194 1(301) Ile50, Ile150 – 122Lipid-binding

protein1LID 1.60 92 184 Arg106 3(b) 97

Trypsin 3PTB 1.70 62 416 Trp215, Val227 3(c) 110Trypsin 1TNH 1.80 163 299(414) Ser217, Gly219, Lys224 – 110,123

349(416) Ser214, Trp215, Val227Mouse major

urinary protein-I1I05 2.00 92 501 Phe56 3(d) 111

503 Leu58, Tyr138Thrombin 1D6W 2.00 133 426 Phe227, Tyr228, Trp215 3(e) 124,125

415 Asp189, Gly219,Asp221

Thrombin 1AE8 2.00 159 531 Asp189 3(f) 125,126478 Asp189, Tyr225

Thrombin 1A4W 1.80 157 418 Gly193, Ser195 – 125,127409 Ser214, Ser195413 Leu41395 Phe227, Tyr228471 Asp189

Endothiapepsin 1ENT 1.90 264 1 Leu220, Tyr222 3(g) 128–130Galactose-

binding protein2GBP 1.90 214 313(5) Asp14, Phe16, Asn211 – 114,131

L-Arabinose-binding protein

1ABE 1.70 227 309 Gln11, Glu14, Asn205 3(h) 132,133310 Asp89, Thr147

b Secretase 1TQF 1.80 360 35 Asp32, Asp228 3(i) 134Retinoic acid

transport1CBS 1.80 100 309 Arg111 – 97,98

Protein kinase Acatalytic sub unit

1JBP 2.20 168 448(f) ADP ribose, Leu49,Glu127, Tyr330

3(j) 135,136

a Identification number for the water molecule in the PDB file and, when present, the nomenclature for that water molecule as usedwithin the literature reference (in parentheses).

b The named residues are interacting with the ligand through the bridging water molecule.c A vignette for the water molecule and its surrounding environment is presented in this Figure.

Water–Protein and Water–Ligand Interactions 295

(K1.13 kcal molK1) and 4.5, respectively. Theaverage HINT score value here is significantlyhigher than that for buried water molecules (583versus 287); this difference can likely be attributed tothe differing chemical natures of protein groupsshaping internal cavities versus the protein/ligandgroups forming binding sites. As noted above,cavities are formed mostly by backbone residues;however, binding pockets very often contain polaror charged polar groups like carboxylate and amine,and/or guanidinium ions, contributed by either theproteins or the ligand. Interactions with thesegroups are, as expected due to the implicitCoulombic reinforcement, scored higher by theHINT forcefield. Thus, HINT recognizes andenergetically rewards conservation of watermolecules and indicates the relevance of thesebridging water molecules to ligand binding. Thedistinct protein–water and ligand–water HINTscore contributions are given in Figure 4(a),while the total HINT score distribution for bridgingwaters is shown in the inset. No real difference canbe observed for the protein–water and ligand–waterHINT score contributions; i.e. the energetic contri-bution of bridging water molecules to the complexis shared equally between protein (HProtein–WaterZ307) and ligand (HLigand–WaterZ277). A somewhatdifferent, but not contradictory, conclusion is

derived from the analysis of the Ranks for thesebridging water molecules. The calculated Rank forthese water molecules (Table 4) is comparable to theRank determined for buried water molecules(Table 2). This is not unexpected, due to thegeometrical constraints of bridging water mol-ecules, which are frequently hydrogen-bondingsaturated. The average protein–water and ligand–water Ranks are 3.0 and 1.5, respectively (Table 4).The distribution of protein–water and ligand–waterRank contributions is illustrated in Figure 4(b),whereas the overall Rank distribution of bridgingwater molecules is shown in the inset. It isimmediately clear that proteins serve to lockconserved bridging water molecules to a muchgreater extent than ligands. One satisfactoryexplanation is that protein-binding pocket surfaces,full of clefts and dips, and significant side-chainflexibility, are better able to envelop watermolecules than synthetic ligands that are usuallyendowed (by design) with rigid geometries.

Comparison of active site water moleculesbetween native and liganded proteins

Water molecules located within native (unligan-ded) protein active sites were examined to identifyfeatures that would allow predictions of whether

Figure 3. Examples of water molecules bridging ligand and protein. (a) HIV-1 protease complexed with inhibitorL-700,417 (4PHV); (b) lipid-binding protein complexed with oleic acid (1LID); (c) trypsin complexed with benzyldiamine(3PTB); (d) mouse major urinary protein complexed with HMH (1I05); (e) thrombin complexed with guanidinic-basedinhibitor (1D6W); (f) thrombin complexed with thiazole-based inhibitor (1AE8); (g) endothiapepsin complexed withphosphostatine (1ENT); (h) L-arabinose-binding protein complexed with L-arabinose (1ABE); (i) b-secretase complexed withan aminopentyloxyacetamide (1TQF); (j) protein kinase A catalytic subunit complexed with ADP (1JBP).

296 Water–Protein and Water–Ligand Interactions

Table 4. Rank and HINT score values for bridging water molecules identified in set II ligand–protein complexes

Rank HINT score

PDB code Water IDaProtein–

water Ligand–water TotalProtein–

water Ligand–water Total

4PHV 1(301) 2.7 3.1 5.9 53 474 5271HII 1(301) 2.9 3.0 6.0 92 591 6801LID 184 2.5 1.4 3.9 139 662 7973PTB 416 3.8 1.3 5.1 256 173 4311TNH 299(414) 4.1 1.4 5.5 294 632 926

349(416) 3.8 1.2 5.0 225 96 3211I05 501 1.2 1.1 2.3 K9 88 76

503 3.2 0.0 3.2 255 K86 1701D6W 426 3.6 1.4 4.9 75 363 439

415 3.9 1.3 5.2 426 404 8321AE8 531 1.2 1.2 2.4 511 335 846

478 4.7 0.0 4.7 467 215 6821A4W 418 2.6 1.1 3.7 109 243 350

409 2.9 2.4 5.4 206 102 307413 1.2 1.0 2.2 66 422 488395 3.7 1.3 4.9 146 143 287471 3.9 1.4 5.2 613 41 661

1ENT 1 2.9 2.6 5.5 122 854 9692GBP 313(5) 3.6 1.2 4.7 301 19 3171ABE 309 4.1 1.6 5.7 751 102 849

310 2.7 1.3 4.0 471 K95 3721TQF 35 2.7 1.3 4.0 1039 85 11251CBS 309 2.5 1.4 3.9 68 739 8071JBP 448(f) 3.0 2.4 5.4 700 34 732Mean 3.0G0.9 1.5G0.6 4.5G1.1 307G265 277G268 583G279

a Identification number for the water molecule in the PDB file and, when present, the nomenclature for that water molecule as usedwithin the literature reference (in parentheses).

Figure 4. (a) HINT score distri-bution for bridging watermolecules in ligand–proteincomplexes. The partial HProtein–

Water and HLigand–Water values arecolored blue and yellow, respect-ively. Inset: the average HINT scoredistribution (HTOTZHProtein–WaterCHLigand–Water). (b) Rank distributionfor bridging water molecules inligand–protein complexes. The par-tial RankProtein–Water and RankLigand–

Water values are colored blue andyellow, respectively. Inset: averageRank distribution (RankTOTZRankProtein–WaterCRankLigand–Water),representing the number of poten-tial hydrogen bonds formed by eachbridging water molecule with bothprotein and ligand.

Water–Protein and Water–Ligand Interactions 297

298 Water–Protein and Water–Ligand Interactions

particular water molecules would be displaced orretained upon entrance of the ligand. This analysisproceeded in two steps: first, our model was trainedusing data set III (a subset of set II where X-raystructures for both the unbound and ligand-boundprotein forms are available); next, in the followingsubsection, this model will be tested with data setIV (four additional proteins with both unbound andligand-bound structures available). Set III includesHIV-1 protease, lipid-binding protein, trypsin,mouse major urinary protein I, thrombin, endothia-pepsin, galactose-binding protein, b-secretase andretinoic acid transport protein. The ligand-boundand unbound models for each protein were super-imposed and active site water molecules, withina 4 A range from the protein–ligand interface, wereidentified before and after ligand or substrateentrance, and catalogued into seven differentcategories: (1) conserved protein–ligand bridgingwater molecules; (2) conserved water molecules inthe binding pocket but not essential for complexformation; (3) conserved water molecules inbinding site cavities; (4) conserved watermolecules in more peripheral binding site areas;(5) “functionally” and (6) “sterically” displacedwater molecules, i.e. water molecules that aredisplaced by the action of a ligand that eitherbears a hydrogen-bonding group or is not geome-trically compatible with the presence of water; and(7) “missing” water molecules, i.e. water moleculesthat do not appear to play any functional orstructural role and yet are apparently displacedupon ligand binding. Active site water moleculesare considered conserved only when the wateroxygen atom in the protein–ligand complex lieswithin 1.2 A from the position occupied in thecorresponding unliganded protein.59 The uniqueexception is bridging conserved water moleculeslocated at the protein–ligand interfaces in bothlipid-binding and retinoic acid-binding proteins(water molecules 184 and 309), located 2.4 A and2.2 A from the positions occupied in the uncom-plexed structures, respectively. This is a typical casein which water molecules, pushed towards thebinding pocket walls, may move to more significantroles after the ligand entrance. Due to the recog-nized importance of these conserved water mole-cules in mediating the protein–ligand complexformation, they have been retained in ouranalyses.97,98

A representative example, uncomplexed andliganded endothiapepsin, is illustrated inFigure 5(a) and (b). Out of 20 water moleculeslocalized in the putative active site of the uncom-plexed protein, only six are retained in the activesite after ligand binding. One of these (Figure 5,green) is buried deeply in a protein cavity, andis thus not easily accessible to ligand groups;two water molecules (yellow) are conserved inboth liganded and unliganded endothiapepsinstructures but their location is not critical for theprotein–ligand interaction; another two (blue),although conserved, are too far away to affect the

binding event. Only water 1 (red) seems to actuallyparticipate in the binding process, forming an equalnumber of hydrogen bonds with both the proteinand the ligand. HINT scores and Ranks for thewater molecules in the uncomplexed proteins of setIII were determined and are given in Table 5.Conserved water molecules that are localized in theactive site of the uncomplexed protein and catego-rized as either bridging or non-bridging in thecorresponding ligand–protein complex exhibited aRank of 2.8 and HINTscore of 260 (K0.50 kcal molK1).These values are very close to those in Table 4(RankZ3.0, HProtein–WaterZ307) for well-definedbridging water molecules. This similarity providesinsight into the nature of potential interfacial watermolecules: despite being “locked” in their positions,they are still able to form strong hydrogen bondswith incoming ligands or substrates. It is importantto reiterate that, while a Rank of around 3 maycorrespond to three “average” H-bonds formedwith protein residues, it may also result from twoexceptional contacts (ideal distance and angles), as,for example, in water 301 of HIV-1 protease. Thisarrangement allows these conserved water mole-cules to form additional strong and well-localizedH-bonds with incoming ligand molecules in thebinding pocket, thus contributing significantly toprotein–ligand complex formation. In contrast,conserved water molecules localized in deep sitecavities, showing higher mean Rank and HINTscores, 3.7 and 384, respectively, are perhaps toowell-fixed by the surrounding protein residues andare less suitable for interaction with ligands and,therefore, less involved in the binding process. TheRank of 3.7 for this class almost certainly indicatesthree or four H-bond interactions with the proteinand, at most, only one remaining interaction site.Functionally and sterically displaced water mol-ecules were both characterized by lower Ranks, 2.2and 1.5, respectively, and HINT scores, 402 and 154,respectively. It follows that sterically displacedwater molecules can be removed easily by ligands,whereas functionally displaced water molecules arebound to the protein more tightly and can be shiftedor eliminated only by ligands with groups capableof forming compensating hydrogen bonds. Finally,external conserved and missing water moleculesexhibit identical Ranks, 1.1, but very different HINTscores, 151 and 26, respectively. External conservedwater molecules were found to form a singlehydrogen bond to, at most, one peripheral polarprotein group. As a consequence, they can bedisplaced at a modest energetic cost. However,these hydration water molecules, placed around theprotein, even if less detectable and more easilydisplaced than other more conserved water mole-cules, are not less relevant. They are in a fluctuatingcloud that is more or less affected by the nature ofthe protein surface.7

We can conclude from these results that watermolecules characterized by Rank less than 1.5 arelikely non-relevant for binding and should be classedas sterically displaced, conserved external or missing

Figure 5. Comparison betweenwater molecules found in thebinding pocket of (a) uncomplexedendothiapepsin and (b) water mol-ecules still present in the ligandedprotein active site. Functionallydisplaced, sterically displaced andmissing water molecules arecolored magenta, cyan and white,respectively. Water molecules con-served in protein cavities, at theprotein–ligand interface, or inmore external binding pocketregions are colored green, yellowand blue, respectively, while theunique bridging water molecule(water 1) is colored red.

Water–Protein and Water–Ligand Interactions 299

water molecules. Rank between 1.5 and 4.0 indicateswater molecules that may be functionally displaced,conserved active site, conserved bridging, or steri-cally displaced. Of these, all are relevant with respectto ligand binding except the latter, sterically dis-placed, class of water molecules. Here, reference tothe corresponding HINT scores indicates that watermolecules with Rank greater than 1.5 and HINTscorelower than 150 should be considered as (only)sterically displaceable and not engaged actively inthe ligand binding process. Others with Rankbetween 1.5 and 4.0 and HINT scores greater than150 are energetically relevant and of potential interestfor ligand design. Finally, as alluded to above, Ranksof around 4 and greater clearly identify less accessiblewater molecules that are usually strongly conservedin active site cavities and are unlikely to interactsignificantly with ligands.

Prediction of the role of water molecules inligand binding by inspecting unbound proteins

In order to test whether the HINT scores andRanks of water molecules can be used (alone) to

predict the role of water molecules in a ligandbinding event, a test set of four proteins (set IV), forwhich both the structures of the uncomplexedprotein and of the protein–ligand complex areavailable with resolution better than 2.20 A(Table 6), were analyzed. First, water moleculeslocated within a 4 A range from the walls of theempty binding pockets were examined and catalo-gued as described above for set III, based on theirHINT score and Rank. A total of 50 water moleculeswere thus analyzed and scored: six were foundin the binding pocket of unliganded concanavalinA,99 16 in phosphodiesterase 4B,100 11 in carboxy-peptidase A101 and 17 in penicillopepsin.102 Table 6lists the assignments (predictions) of each of thesewater molecules to the categories described above.Next, searches for the same water molecules in thecorresponding ligand–protein complexes103–106

were performed, and their actual roles in thecomplexes were determined as for set III.

Predicted and actual roles played by the 50 watermolecules within the set IV protein–ligandcomplexes are compared in Table 6. In concanavalinA all water molecules, except water 292, were

Table 5. Distribution and parameters for water molecules in set III and belonging to seven different categories

PDB code Number of water molecules

ProteinUncom-plexed

Com-plexed

Conservedbridging

Conservedactive site

Conservedcavity

Conservedexternal

Function-ally dis-placed

Stericallydisplaced Missing

HIV-1 protease 1G6L 4PHV 1 0 0 0 3 5 0Lipid binding

protein1LIB 1LID 1 3 0 0 0 1 0

Trypsin 1TPO 1TNH 2 0 0 1 1 0 0Mouse MUP-I 1I04 1I05 2 0 0 0 0 0 0Thrombin 1JOU 1A4W 1 0 2 1 0 2 0Endothiapepsin 4APE 1ENT 1 2 1 2 6 3 6Galactose-

binding1GCG 2GBP 1 0 2 0 5 2 0

b Secretase 1W50 1TQF 1 5 0 1 2 7 0Retinoic acid

transport1XCA 1CBS 1 0 0 0 1 4 1

Total 11 10 5 5 18 24 7Mean HINT

score259G348 266G140 384G27 151G98 402G324 154G164 26G57

Mean rank 2.8G1.7 2.8G0.8 3.7G1.1 1.1G0.4 2.2G0.9 1.5G1.1 1.1G1.0

300 Water–Protein and Water–Ligand Interactions

predicted properly. We predicted water 292 to beconserved or functionally displaceable, whereas itwas sterically displaced in the complex, i.e. it wasdisplaced by a ligand functional group that did notmake compensating interactions, at least in thiscomplex.

In comparing the liganded and the unligandedforms of phosphodiesterase, water molecules 173,718 and 741 were not identified correctly. Inparticular, given its high Rank (4.8), water 173 waspredicted to be conserved in the cavity, whileinspection of the ligand–protein complex activesite identifies it to be a missing water. At first, itwould appear that a more correct assignment mayhave been achieved by considering the HINT score(85 units), which is far too low for a conservedcavity water molecule, but this is not the case. Thisdiscrepancy between Rank and HINT score forwater 173 is actually due to its cavity havingsignificant hydrophobic character despite its appa-rently perfect balance of H-bond acceptors anddonors. In general, discrepancies between Rank andHINT score estimations arise because HINT calcu-lations are influenced strongly by the chemicalnatures of the interacting groups. For instance, awater molecule surrounded by backbone groups ischaracterized by a HINT score lower than oneinteracting with charged side-chains, whereas theRank would be unaffected. For water molecules 718and 741, neither Rank nor HINT score, which aretoo low and too high, respectively, allow predictionof these water molecules to be functionally dis-placeable and missing. A GRID analysis50 of thebinding pocket, using the water probe, locatedwater 173 in both the unliganded and ligandedphosphodiesterase, whereas water 741 was notconfirmed (energetically unfavorable) even in theuncomplexed protein. In other words, water 173may have been missed by the crystallographicanalysis in the liganded structure, while water 741may be a crystallographic artefact in the unliganded

structure. This conclusion is supported by the veryhigh B-factor of 60.42 A2 assigned to water 741,100

suggesting a high degree of uncertainty in itslocation.

Similarly, water 314, water 574 and water 313were crystallographically identified in the uncom-plexed carboxypeptidase A active site, but not inthe protein–ligand complex. It is possible that thesewater molecules are present in the binding pocketbut are either not detected or undetectable bydiffraction analysis. In fact, GRID analysis confirmsthe presence of water molecules 313 and 314 in boththe uncomplexed and ligand-complexed proteinstructures of carboxypeptidase-A, while water 574was not confirmed by GRID in either structure.Thus, while water 574 is probably a crystallographicartefact, there is enough space and favorableinteractions for water 313 and water 314 to beretained in the protein–ligand complex. It ispossible that these water molecules were missedin the X-ray diffraction analysis of the complex.

Five water molecules were not predicted prop-erly in penicillopepsin. In particular, water 170 andwater 52, having low HINT scores and Rank, wereclassified as non-relevant, whereas they were foundto be conserved. These inconsistencies could beattributed to a number of factors, but it is difficult, ifnot impossible, to definitively assign specific move-ments or mechanisms for particular water mole-cules based only on crystallographic evidence.Solvent molecules may move into more significantroles after the ligand entered the active site; forexample, water molecules found in the bindingpocket of the uncomplexed protein too far awayfrom site atoms could be pushed towards theprotein surface by the incoming ligand and becomeconserved or even bridging water molecules. Incontrast, water 111, which was predicted to beconserved or functionally displaceable, was notdetected in the protein–ligand complex. GRIDanalysis confirmed the presence of this water

Table 6. Prediction set IV: the predicted and crystallographically observed roles for listed water molecules present in protein active sites

PDB code for proteins (resolution)

Protein Uncomplexed ComplexedActive site

waters Water ID RankHINTscore Prediction Crystallographic evidence

Concanavalin-A 2CTV (1.95 A) 5CNA (2.00 A) 6 291 3.6 691 Conserved/functionally displaced Functionally displaced300 3.4 609 Conserved/functionally displaced Functionally displaced371 2.2 25 Sterically displaced/missing/external Missing306 2.1 105 Sterically displaced/missing/external External292 2.0 242 Conserved/functionally displaced Sterically displaced372 1.0 119 Sterically displaced/missing/external Sterically displaced

Phosphodiesterase 4B 1F0J (1.77 A) 1XLX (2.19 A) 16 173 4.8 85 Conserved cavity water Missing324 3.2 128 Sterically displaced/missing/external Sterically displaced36 2.3 564 Conserved/functionally displaced Conserved748 2.3 1068 Conserved/functionally displaced Conserved180 2.2 537 Conserved/functionally displaced Conserved538 2.1 102 Sterically displaced/missing/external Missing5 2.1 549 Conserved/functionally displaced Conserved

233 2.1 603 Conserved/functionally displaced Conserved741 2.1 216 Conserved/functionally displaced Missing124 2.0 295 Conserved/functionally displaced Conserved91 1.9 317 Conserved/functionally displaced Functionally displaced641 0.9 126 Sterically displaced/missing/external Sterically displaced728 0.9 387 Sterically displaced/missing/external Missing718 0.9 50 Sterically displaced/missing/external Functionally displaced359 0.8 172 Sterically displaced/missing/external Missing139 0.0 76 Sterically displaced/missing/external Missing

Carboxypeptidase A 5CPA (1.54 A) 6CPA (2.00 A) 11 587 5.3 899 Conserved cavity water Conserved cavity water563 5.2 33 Conserved cavity water Conserved cavity water567 3.5 908 Conserved/functionally displaced Functionally displaced314 3.5 319 Conserved/functionally displaced Missing574 3.4 304 Conserved/functionally displaced Missing313 3.1 316 Conserved/functionally displaced Missing571 2.2 834 Conserved/functionally displaced Functionally displaced570 2.0 77 Sterically displaced/missing/external Sterically displaced354 1.8 99 Sterically displaced/missing/external External581 1.1 89 Sterically displaced/missing/external Sterically displaced340 0.0 24 Sterically displaced/missing/external Sterically displaced

Penicillopepsin 3APP (1.80 A) 1PPM (1.70 A) 17 111 3.6 266 Conserved/functionally displaced Missing39 3.5 1105 Conserved/functionally displaced Functionally displaced174 2.5 347 Conserved/functionally displaced Conserved271 2.3 256 Conserved/functionally displaced Conserved257 2.1 320 Conserved/functionally displaced Sterically displaced284 2.0 534 Conserved/functionally displaced Functionally displaced305 1.9 91 Sterically displaced/missing/external Missing145 1.8 99 Sterically displaced/missing/external Functionally displaced170 1.0 152 Sterically displaced/missing/external Conserved61 1.0 365 Sterically displaced/missing/external Sterically displaced303 0.9 63 Sterically displaced/missing/external Sterically displaced78 0.9 30 Sterically displaced/missing/external External

(continued on next page)

Table

6(continued)

PD

Bco

de

for

pro

tein

s(r

eso

luti

on

)

Pro

tein

Un

com

ple

xed

Co

mp

lex

edA

ctiv

esi

tew

ater

sW

ater

IDR

ank

HIN

Tsc

ore

Pre

dic

tio

nC

ryst

allo

gra

ph

icev

iden

ce

160

0.9

74S

teri

call

yd

isp

lace

d/

mis

sin

g/

exte

rnal

Ste

rica

lly

dis

pla

ced

287

0.9

89S

teri

call

yd

isp

lace

d/

mis

sin

g/

exte

rnal

Ste

rica

lly

dis

pla

ced

520.

8K

52S

teri

call

yd

isp

lace

d/

mis

sin

g/

exte

rnal

Co

nse

rved

300

0.7

K2

Ste

rica

lly

dis

pla

ced

/m

issi

ng

/ex

tern

alS

teri

call

yd

isp

lace

d24

10.

0K

38S

teri

call

yd

isp

lace

d/

mis

sin

g/

exte

rnal

Ste

rica

lly

dis

pla

ced

302 Water–Protein and Water–Ligand Interactions

molecule in the uncomplexed, but not in theliganded, protein. However, careful inspection ofthe ligand–protein complex explains the absence ofwater 111. There is no space available at the protein–ligand interface, because of space now occupied bythe ligand, and because of slight movement of theprotein residues. This is a case where waterdisplacement is associated with a conformationalchange. The prediction of conformational changessuch as this, induced by ligand binding, is not atrivial task, and can be addressed only by intensiveand time-consuming molecular dynamics simu-lations. In a practical sense, while dynamicsapproaches could be useful in some later stageligand design projects, they would be far tooexpensive as part of early stage in silico screening,where thousands to millions of compounds wouldneed to be analyzed, scored and ranked.

Overall, 38 of the 50 water molecules wereclassified correctly, i.e. a 76% correct predictionrate. (It should be noted that at least another five, or10%, of the incorrectly classified water moleculeshad questions about the accuracy of crystallo-graphy as described above.) A similar success rateof 75% was reported by Raymer and co-workersusing Consolv, a K-nearest-neighbors genetic algo-rithm59 that identifies conserved water molecules inprotein–ligand complexes using four environmen-tal predictive parameters: atomic density, atomichydrophilicity, number of hydrogen bonds andcrystallographic temperature B-factors. Consolvwas the first empirical method developed to predictconservation of active site water molecules uponligand binding. Because the algorithm relies on theevaluation of geometric features, the results aredependent on crystallographic data quality, as areall computational procedures, including HINT (86)and Rank. Conserved or displaced water moleculesare identified by characterizing the micro-environ-ment of each water molecule, manually collectinggeometrical features known to correlate with waterbinding. Consolv provides no energetic estimationof the displacement or retention of water moleculesand no discrimination between the different waterroles, i.e., functionally and sterically displacedwater molecules, bridging or cavity water mole-cules. Simply, water molecules are described asconserved or displaced.

Obviously, all conserved water molecules,whether they are bridging, conserved in cavitiesor in more external regions of the binding pocket,are important molecules, and should be included inmolecular modeling and docking experiments.However, the water molecules energetically essen-tial are probably only those bridging or functionallydisplaceable, both of which belong to the samestructural class. Water molecules buried deeply incavities are, in fact, too removed from the active siteto impact the binding event. Similarly, the inclusionof conserved external water molecules might lead tomisleading binding pocket geometries. Recently, anovel method called WaterScore, based on geo-metrical parameters and on energetic estimations,

Water–Protein and Water–Ligand Interactions 303

was proposed.60 WaterScore is able to discriminatebetween bound and displaceable water moleculesusing four properties: temperature B-factor, solvent-accessible surface area (SASA), number of protein–water contacts, and total hydrogen bond energy.Water molecules are catalogued into three differentcategories: bound, sterically displaced and dis-placed (roughly corresponding to our functionallydisplaceable classification). A prediction rate of 67%correct on a set of 46 water molecules bound to fournative proteins was obtained.60 The authorsobserved that bound water molecules usuallypresent low B-factors, small SASA, large atomiccontacts and low hydrogen bond energies,suggesting that important conditions for conservedwater molecules are to be buried deeply in crevicesor grooves of the binding pocket, to be surroundedby many protein atoms, and forming severalhydrogen bonds, all of which strongly restrictwater mobility. These are clearly the necessaryconditions for a cavity water molecule, but do notdescribe potential bridging water molecules, whichneed to be sufficiently exposed for contact withincoming ligands and able to make at least one ortwo new hydrogen bonds.

It is also of interest that a continuum electrostaticanalysis carried out on bacteriophage T4 lyso-zyme107 indicated that only about half of thepotential sites were actually occupied by orderedwater molecules. Moreover, conserved and non-conserved sites exhibited very similar free energies,thus suggesting difficulties in discriminatingbetween them on the basis of pure thermodynamiccalculations.107 Limitations to this approach havebeen overcome recently using a “solvated rotamer”model that, as of yet, has been applied only towater-mediated hydrogen bonds in protein–proteinand protein–nucleic acid interfaces.108

X-ray crystal data quality, B-factor, HINT scoreand Rank

We have shown that the quality of HINT freeenergy predictions depend on the resolution of theunderlying crystallographic structures.86 It wasfound that the correlation between the HINT scoreand binding free energy for 76 protein–ligandcomplexes, solved at a resolution of %3.2 A,produced a standard error of 2.5 kcal molK1,whereas the correlation for a subset of 56 structuressolved at a resolution of %2.5 A produced astandard error of 1.8 kcal molK1. Certainly, HINTand Rank, as well as all other computationalapproaches attempting to evaluate the energeticsof biomolecular interactions and the role andrelevance of water, rely significantly on the qualityof the structural data. In particular, the localizationof a water molecule by X-ray crystallography isstrongly dependent on its thermal motion. In fact, awater vibrating 1.5 A around a fixed mean positionshows a peak electron density of 10% compared to awater vibrating 0.5 A. The corresponding B-factor is60 A2 versus 4 A2.2 In addition, the actual fidelity of

a structure depends also on other factors, includingthe date of deposition, the program used to processraw data and refine the structure, the X-ray sourceand the temperature of data collection. Thisevolution in technology suggests that it would beof great interest for the scientific community andparticularly for computational chemists to recollectdata on a number of “classic” protein structureswith current methodologies, or to, at least, re-analyze the original structure factors for a numberof them. We would, of course, be very interested inanalyzing with HINT and Rank, and with othercomputational procedures, the same protein struc-ture with its associated water molecules as afunction of varying resolution.

Finally, we should acknowledge that, while incontrast to other procedures59,60 HINT and Rankanalyses do not make a direct use of the informationreported in crystallographic B-factors on theuncertainty of atom coordinates, they are notignored in the overall analysis. In the presentwork, and in other applications of HINT, knowl-edge of the B-factors associated with water oxygenwas useful in discussing discrepancies in waterlocalization between X-ray crystallographic dataand our analyses.

Conclusions

Our “natural” approach, which is based only onthe combination of HINT score and Rank, isapparently able to recognize important watermolecules that should be considered in under-standing ligand binding by identifying with rela-tively high accuracy the relevant geometric andenergetic features of water–protein and water–ligand interactions. As demonstrated in this work,well-defined HINT scores and Ranks are diagnosticfor functionally displaceable, conserved or bridgingwater molecules. These water molecules must beincluded when building geometrically and func-tionally correct models of the binding pocket. Watermolecules conserved within the unliganded bind-ing site of a protein could mediate the formation ofthe protein–ligand complex. Similarly, bridgingwater molecules could become functionally dis-placed or vice versa, depending on the chemicalnature and geometry of the ligand. In contrast,sterically displaced water molecules, missing orexternal water molecules (all characterized by lowHINT score and Rank), as well as deep cavity watermolecules, are not really carrying significantinformation with respect to ligand binding. Thesewater molecules should not be considered activelyin molecular models, thus avoiding lengthy time-consuming calculations. Finally, both HINT scoreand Rank calculations are very time-efficient, takingof the order of a few seconds to complete. We feelthat the procedures outlined here represent a set ofvery useful and accessible tools for modelingprotein–ligand systems. Biological processeswhere careful mapping of the water contribution

304 Water–Protein and Water–Ligand Interactions

to the energetics of a biomacromolecular associationis important include protein–protein and protein–DNA recognition. Certainly, this information wouldbe useful in structure-based (rational) drug design.We are currently applying HINT score and Rank tothese systems.

Materials and Methods

Model building

The three-dimensional coordinates of protein andprotein–ligand complex structures were retrieved fromthe Protein Data Bank† and imported into the molecularmodeling program Sybyl version 6.91‡. All structureswere checked for chemically consistent atom and bondtype assignment. Amino-terminal and carboxyl-terminalgroups were set to be protonated and deprotonated,respectively. Hydrogen atoms, not present in the PDBfiles, were added using Sybyl Biopolymer and Build/Editmenu tools. To ameliorate steric clashes, added hydrogenatoms were energy minimized using the Powell algo-rithm with a convergence gradient of 0.5 kcal (mol A)K1

for 1500 cycles. This procedure does not affect heavy-atom positions.

Hydropathic analysis

The program HINT§ (Hydropathic INTeractions)provides the core methodology for this hydropathicanalysis. Certain algorithms used in this study arecurrently only available in the 3.09Sb versions. Asdescribed previously,85 the “all” option for treatmentand calculation of parameters for hydrogen atoms waschosen for molecule partitions. With this option, a partiallogPo/w value ai and a SASA Si are assigned to each polarand non-polar interacting atom (including all hydrogenatoms). The “neutral” option was chosen as the solventcondition for protein partitioning. A new HINT optionthat corrects the Si terms for backbone amide hydrogenatoms by adding a 20 A2 was used in this study. Thiscorrection improves the relative energetics of inter andintra-molecular hydrogen bonds involving backboneamide groups, which had been de-emphasized inprevious versions of HINT.

Rank and HINT score calculations for water molecules

Each water molecule was automatically scored andoptimized against the protein, using the Rank algo-rithm,93 implemented in this HINT version as part of the“optimize water network” option. The most constrainedwater molecule, i.e. the water molecule that is able to formthe highest number of hydrogen bonds, exhibits thehighest rank value and, thus, it is the first to be optimizedand locked in a defined position. Next, all the other watermolecules are optimized sequentially in their positions,following the decreasing Rank scale. The algorithmallows a maximum of four interactions for each watermolecule; i.e. at most four targets (%2 donors and %2

† www.rcsb.org‡ www.tripos.com§ www.tripos.coms www.edusoft-lc.com: contact G.E.K. for more infor-

mation.

acceptors) can be identified and classified. The followingequation is used to calculate the Rank, corresponding tothe weighted number of potential hydrogen bonds:

Rank ZXn

ð2:80 �A=rnÞCXm

cosðqTdKqnmÞ

" #�6

( )(2)

where rn is the distance between the water oxygen atomand the target heavy atom n (number of valid targets or amaximum of 4). This is scaled relative to 2.8 A. qTd is theideal tetrahedral angle (109.58), and qnm is the anglebetween targets n and m (mZn to number of validtargets). To weight the geometrical quality of hydrogenbonds properly, any angle less than 608 is rejected alongwith its associated target. The combination of the distanceand angle term values can, in some cases of particularlyfavorable geometries, exceed the actual number ofhydrogen bonds, e.g. O4.0. However, Rank is generallya fairly reliable indicator of potential hydrogen bonds forthe water. During the optimization process, the wateroxygen atom is allowed to translate 0.1 A around itscrystallographic location, while hydrogen atoms canadopt all possible positions in order to maximizehydrogen bonds and acid/base interactions and tominimize unfavorable hydrophobic/polar or acid/acidcontacts. The HINT score for each water molecule iscalculated taking into account non-covalent interactionswith protein and ligand (if present in the molecularmodel) groups in an 8.0 A sphere, centered at the wateroxygen atom.

Classification of water molecules

After being ranked, scored and optimized, watermolecules were classified with respect to their positionin the protein matrix. Protein surfaces were built in Sybylusing MOLCAD tools. “Connolly” was chosen as surfacetype and the probe radius was set to 1.4 A, to betteridentify regions effectively accessible to water mol-ecules.47,94 The water probe (radius 1.4 A) is rolledalong the molecular accessible areas, building convexand concave surfaces, depending on whether proteinatoms were well-exposed to solvent or arranged to formclefts and/or grooves. The topographical features of theprotein surface, represented by different map colors(Figure 1), were defined as a function of cavity depth.This property describes qualitatively the solvent exposureof the surface and the depth at which each superficialprotein atom is localized. The measurement of the cavitydepth was calculated by building, over the first molecularsurface, a second surface with a 6.0 A radius probe that isunable to explore smaller cavities. The distance betweenthese two generated surface maps is a direct measure ofthe cavity depth.95

Identification of bridging water molecules in protein–ligand complexes

The selected protein–ligand complexes contain atleast one water molecule bridging protein and ligandinterface. All water molecules were Ranked, optimizedand scored against both the protein and ligand.Crystallographic water molecules were scored indivi-dually such that the total HComplex–Water could bedifferentiated into HProtein–Water and HLigand–Water

contributions.

Water–Protein and Water–Ligand Interactions 305

Comparison of site and number of water moleculesin native and ligand–protein complexes

Structures of the same protein, in the absence and in thepresence of ligand, were examined to investigate thechange of active site water molecules upon ligandbinding. Since it is difficult to predict conserved watermolecules when there are significant changes in confor-mation, i.e. between bound and unbound states, onlyproteins presenting little or no binding pocket geometricvariations were considered in this study, in a mannersimilar to previous investigations.59,60 For this calcu-lation, each pair of proteins (liganded and native) wasfirst superimposed. Only water molecules within 4 A ofboth the protein and ligand were identified. This filterhighlights water molecules within the binding pocket anddiscards more peripheral water molecules that do notcontribute significantly to binding free energy.29 Thesesolvent molecules, found in both native and ligand–protein complex structures, were compared and classifiedin different categories: conserved bridging water mole-cules, conserved active site water molecules, stericallydisplaced water molecules, functionally displaced watermolecules, water molecules conserved in cavities of theactive site, conserved external water molecules andmissing water molecules, when no justification for thedisplacement was found. These water molecules wereRanked and scored with the HINT tools (vide supra). Thesoftware GRID version 22a† was used to localize potentialwater sites50 as a check on crystallographic waterplacements. All GRID analyses were performed defininga box with dimensions 15 A!15 A!15 A centered on theligand molecule; the grid spacing was set to 0.33 A.

Acknowledgements

The financial support of COFIN 2005 from theItalian Ministry of Education and University (toA.M.) and GM71894 from the U.S. National Institutesof Health (to G.E.K.) are gratefully acknowledged.Helpful suggestions from J. Neel Scarsdale and JasonRife in preparing this manuscript are appreciated.

References

1. Rupley, J. A. & Careri, G. (1991). Protein hydrationand function. Advan. Protein Chem. 41, 37–172.

2. Levitt, M. & Park, B. H. (1993). Water: now you see it,now you don’t. Structure, 1, 223–226.

3. Gregory, R. B. (1995). Protein–Solvent Interactions,Marcel Dekker, Inc., New York, NY.

4. Karplus, P. A. & Faerman, C. (1994). Ordered waterin macromolecular structure. Curr. Opin. Struct. Biol.4, 770–776.

5. Makarov, V., Pettitt, B. M. & Feig, M. (2002).Solvation and hydration of protein and nucleicacids: a theoretical view of simulation and exper-iments. Accts Chem. Res. 35, 376–384.

6. Israelachvili, J. & Wennerstrom, H. (1996). Role ofhydration and water structure in biological andcolloidal interactions. Nature, 379, 219–225.

† www.moldiscovery.com

7. Timasheff, S. N. (2002). Protein hydration, thermo-dynamic binding, and preferential hydration. Bio-chemistry, 41, 13473–13482.

8. Barron, L. D., Hecht, L. & Wilson, G. (1997). Thelubricant of life: a proposal that solvent waterpromotes extremely fast conformational fluctuationsin mobile heteropolypeptide structure. Biochemistry,36, 13143–13147.

9. Pettitt, B. M., Makarov, V. A. & Andrews, B. K. (1998).Protein hydration density: theory, simulations andcrystallography. Curr. Opin. Struct. Biol. 8, 218–221.

10. Papoian, G. A., Ulander, J., Eastwood, M. P.,Luthey-Schulten, Z. & Wolynes, P. G. (2004). Waterin protein structure prediction. Proc. Natl Acad. Sci.USA, 101, 3352–3357.

11. Sessions, R. B., Thomas, G. L. & Parker, M. J. (2004).Water as a conformational editor in protein folding.J. Mol. Biol. 343, 1125–1133.

12. Mattos, C. (2002). Protein–water interactions in adynamic world. Trends Biochem. Sci. 27, 203–208.

13. Saenger, W. (1987). Structure and dynamics of watersurrounding biomolecules. Annu. Rev. Biophys.Biophys. Chem. 16, 93–114.

14. Cozzini, P., Fornabaio, M., Marabotti, A., Abraham,D. J., Kellogg, G. E. & Mozzarelli, A. (2004). Freeenergy of ligand binding to protein: evaluation of thecontribution of water molecules by computationalmethods. Curr. Med. Chem. 11, 3093–3118.

15. Makhatazde, G. I. & Privalov, P. L. (1994). Hydrationeffects in protein unfolding. Biophys. Chem. 51,291–309.

16. Edsall, J. T. & McKenzie, H. A. (1983). Water andproteins II. The location and dynamics of water inprotein systems and its relation to their stability andproperties. Advan. Biophys. 16, 53–183.

17. Cooper, A. (2005). Heat capacity effects in proteinfolding and ligand binding: a re-evaluation of therole of water in biomolecular thermodynamics.Biophys. Chem. 115, 89–97.

18. Thanki, N., Umrania, Y., Thornton, J. M. &Godfellow, J. M. (1991). Analysis of protein main-chain solvation as a function of secondary structure.J. Mol. Biol. 221, 669–691.

19. Petukhov, M., Rychkov, G., Firsov, L. & Serrano, L.(2004). H-bonding in protein hydration revisited.Protein Sci. 13, 2120–2129.

20. Frauenfelder, H., Fenimore, P. W. & McMahon, B. H.(2002). Hydration, slaving and protein function.Biophys. Chem. 98, 35–48.

21. de Graaf, C., Vermeulen, N. P. & Feenstra, K. A.(2005). Cytochrome p450 in silico: an integrativemodeling approach. J. Med. Chem. 48, 2725–2755.

22. Poornima, C. S. & Dean, P. M. (1995). Hydration indrug design. 3. Conserved water molecules at theligand-binding sites of homologous proteins.J. Comput. Aided Mol. Des. 9, 521–531.

23. Poornima, C. S. & Dean, P. M. (1995). Hydration indrug design. 2. Influence of local site surface shapeon water binding. J. Comput. Aided Mol. Des. 9,513–520.

24. Poornima, C. S. & Dean, P. M. (1995). Hydration indrug design. 1. Multiple hydrogen-bonding featuresof water molecules in mediating protein–ligandinteractions. J. Comput. Aided Mol. Des. 9, 500–512.

25. Li, Z. & Lazaridis, T. (2003). Thermodynamiccontributions of the ordered water molecule inHIV-1 protease. J. Am. Chem. Soc. 125, 6636–6637.

306 Water–Protein and Water–Ligand Interactions

26. Li, Z. & Lazaridis, T. (2005). The effect of waterdisplacement on binding thermodynamics: conca-navalin A. J. Phys. Chem. B, 109, 662–670.

27. Golke, H. & Klebe, G. (2002). Approaches to thedescription and prediction of the binding affinity ofsmall-molecule ligands to macromolecularreceptors. Angewandte Chem. Intl. Edit. 41, 2644–2676.

28. Ladbury, J. E. (1996). Just add water! The effect ofwater on the specificity of protein-ligand bindingsites and its potential application to drug design.Chem. Biol. 3, 973–980.

29. Fornabaio, M., Spyrakis, F., Mozzarelli, A., Cozzini,P., Abraham, D. J. & Kellogg, G. E. (2004). Simple,intuitive calculations of free energy of binding forprotein-ligand complexes. 3. The free energy contri-bution of structural water molecules in HIV-1protease complexes. J. Med. Chem. 47, 4507–4516.

30. Teeter, M. M. (1991). Water-protein interactions:theory and experiment. Annu. Rev. Biophys. Biophys.Chem. 20, 557–600.

31. Thanki, N., Thornton, J. M. & Goodfellow, J. M.(1988). Distributions of water around amino acidresidues in proteins. J. Mol. Biol. 202, 637–657.

32. Carugo, O. & Bordo, D. (1999). How many watermolecules can be detected by protein crystallo-graphy? Acta Crystallog. sect. D, 55, 479–483.

33. Williams, M. A., Goodfellow, J. M. & Thornton, J. M.(1994). Buried waters and internal cavities inmonomeric proteins. Protein Sci. 3, 1224–1235.

34. Eisenberg, D. & McLachlan, A. D. (1986). Solvationenergy in protein folding and binding. Nature, 319,199–203.

35. Vedani, A. & Huntha, D. (1991). An algorithm for thesystematic solvation of proteins based on direction-ality of hydrogen-bonds. J. Am. Chem. Soc. 113,5860–5862.

36. Mezei, M. & Beveridge, D. L. (1986). Structuralchemistry of biomolecular hydration via computersimulation: the proximity criterion. Methods Enzy-mol. 127, 21–47.

37. Lounnas, V., Pettitt, B. M. & Phillips, G. N., Jr (1994).A global model of the protein-solvent interface.Biophys. J. 66, 601–614.

38. Otting, G., Liepinsh, E. & Wuthrich, K. (1991).Protein hydration in aqueous solution. Science, 254,974–980.

39. Makarov, V. A., Andrews, B. K., Smith, P. E. & Pettitt,B. M. (2000). Residence times of water molecules inthe hydration sites of myoglobin. Biophys. J. 79,2966–2974.

40. Likic, V. A. & Prendergast, F. G. (2001). Dynamics ofinternal water in fatty acid binding protein:computer simulations and comparison with exper-iments. Proteins: Struct. Funct. Genet. 43, 65–72.

41. Levitt, M. & Sharon, R. (1988). Accurate simulationof protein dynamics in solution. Proc. Natl Acad. Sci.USA, 85, 7557–7561.

42. Knapp, E. W. & Muegge, I. (1993). Heterogeneousdiffusion of water at protein surfaces: application toBPTI. J. Phys. Chem. 97, 11339–11343.

43. Muegge, I. & Knapp, E. W. (1995). Residence timesand lateral diffusion of water at protein surfaces:application to BPTI. J. Phys. Chem. 99, 1371–1374.

44. Brunne, R. M., Liepinsh, E., Otting, G., Wuthrich, K.& van Gunsteren, W. F. (1993). Hydration of proteins.A comparison of experimental residence times ofwater molecules solvating the bovine pancreatictrypsin inhibitor with theoretical model calculations.J. Mol. Biol. 231, 1040–1048.

45. Ahlstrom, P., Teleman, O. & Jonsson, B. (1988).Molecular dynamics simulation of interfacial waterstructure and dynamics in a parvalbumin solution.J. Am. Chem. Soc. 110, 4198–4203.

46. Kuhn, L. A., Siani, M. A., Pique, M. E., Fisher, C. L.,Getzoff, E. D. & Tainer, J. A. (1992). The interdepen-dence of protein surface topography and boundwater molecules revealed by surface accessibilityand fractal density measures. J. Mol. Biol. 228, 13–22.

47. Wade, R. C., Mazor, M. H., McCammon, J. A. &Quiocho, F. A. (1991). A molecular dynamics studyof thermodynamic and structural aspects of thehydration of cavities in proteins. Biopolymers, 31,919–931.

48. Okimoto, N., Nakamura, T., Suenaga, A., Futatsugi,N., Hirano, Y., Yamaguchi, I. & Ebisuzaki, T. (2004).Cooperative motions of protein and hydration watermolecules: molecular dynamics study of scytalonedehydratase. J. Am. Chem. Soc. 126, 13132–13139.

49. Henchman, R. H. & McCammon, J. A. (2002).Structural and dynamic properties of water aroundacetylcholinesterase. Protein Sci. 11, 2080–2090.

50. Goodford, P. J. (1985). A computational procedurefor determining energetically favorable binding siteson biologically important macromolecules. J. Med.Chem. 28, 849–857.

51. Pastor, M., Cruciani, G. & Watson, K. A. (1997). Astrategy for the incorporation of water moleculespresent in a ligand binding site into a three-dimensional quantitative structure–activity relation-ship analysis. J. Med. Chem. 40, 4089–4102.

52. Kastenholz, M. A., Pastor, M., Cruciani, G.,Haaksma, E. E. & Fox, T. (2000). GRID/CPCA: anew computational tool to design selective ligands.J. Med. Chem. 43, 3033–3044.

53. Wade, R. C., Clark, K. J. & Goodford, P. J. (1993).Further development of hydrogen bond functionsfor use in determining energetically favorablebinding sites on molecules of known structure. 1.Ligand probe groups with the ability to form twohydrogen bonds. J. Med. Chem. 36, 140–147.

54. Wade, R. C. & Goodford, P. J. (1993). Furtherdevelopment of hydrogen bond functions for usein determining energetically favorable binding siteson molecules of known structure. 2. Ligand probegroups with the ability to form more than twohydrogen bonds. J. Med. Chem. 36, 148–156.

55. Boobbyer, D. N., Goodford, P. J., McWhinnie, P. M. &Wade, R. C. (1989). New hydrogen-bond potentialsfor use in determining energetically favorablebinding sites on molecules of known structure.J. Med. Chem. 32, 1083–1094.

56. Rarey, M., Kramer, B. & Lengauer, T. (1999). Theparticle concept: placing discrete water moleculesduring protein-ligand docking predictions. Proteins:Struct. Funct. Genet. 34, 17–28.

57. Pitt, W. R., Murrayrust, J. & Goodfellow, J. M. (1993).AQUARIUS2—knowledge-based modeling ofsolvent sites around proteins. J. Comput. Chem. 14,1007–1018.

58. Roe, S. M. & Teeter, M. M. (1993). Patterns forprediction of hydration around polar residues inproteins. J. Mol. Biol. 229, 419–427.

59. Raymer, M. L., Sanschagrin, P. C., Punch, W. F.,Venkataraman, S., Goodman, E. D. & Kuhn, L. A.(1997). Predicting conserved water-mediated andpolar ligand interactions in proteins using aK-nearest-neighbors genetic algorithm. J. Mol. Biol.265, 445–464.

Water–Protein and Water–Ligand Interactions 307

60. Garcia-Sosa, A. T., Mancera, R. L. & Dean, P. M.(2003). WaterScore: a novel method for distinguish-ing between bound and displaceable water mol-ecules in the crystal structure of the binding site ofprotein-ligand complexes. J. Mol. Model. (Online), 9,172–182.

61. Jiang, L., Kuhlman, B., Kortemme, T. & Baker, D.(2005). A “solvated rotamer” approach to modelingwater-medaited hydrogen bonds at protein–proteininterfaces. Proteins: Struct. Funct. Genet. 58, 893–904.

62. Blundell, T. L. (1996). Structure-based drug design.Nature, 384, 23–26.

63. Verlinde, C. L. M. J. & Hol, W. G. J. (1994). Structure-based drug design: progress, results and challenges.Structure, 2, 577–587.

64. Lloyd, D. G., Garcia-Sosa, A. T., Alberts, I. L.,Todorov, N. P. & Mancera, L. R. (2004). The effectof tightly bound water molecules on the structuralinterpretation of ligand-derived pharmacophoremodels. J. Comput. Aided Mol. Des. 18, 89–100.

65. Ehrlich, L., Reczko, M., Bohr, H. & Wade, R. C.(1998). Prediction of protein hydration sites fromsequence by modular neural networks. Protein Eng.11, 11–19.

66. Zhang, L. & Hermans, J. (1996). Hydropilicity ofcavities in proteins. Proteins: Struct. Funct. Genet. 24,433–438.

67. Vedani, A. & Huhta, D. W. (1991). An algorithm forthe systematic solvation of proteins based on thedirectionality of hydrogen-bonds. J. Am. Chem. Soc.113, 5860–5862.

68. Bhat, T. N., Bentley, G. A., Boulot, G., Greene, M. I.,Tello, D., Dall’Acqua, W. et al. (1994). Bound watermolecules and conformational stabilization helpmediate an antigen-antibody association. Proc. NatlAcad. Sci. USA, 91, 1089–1093.

69. Covell, D. G. & Wallqvist, A. (1997). Analysis ofprotein-protein interactions and the effects of aminoacid mutations on their energetics. The importanceof water molecules in the binding epitope. J. Mol.Biol. 269, 281–297.

70. Tame, J. R., Sleigh, S. H., Wilkinson, A. J. &Ladbury, J. E. (1996). The role of water in sequence-independent ligand binding by an oligopeptidetransporter protein. Nature Struct. Biol. 3, 998–1001.

71. Otwinowski, Z., Schevitz, R. W., Zhang, R. G.,Lawson, C. L., Joachimiak, A., Marmorstein, R. Q.et al. (1988). Crystal structure of trp repressor/operator complex at atomic resolution. Nature, 335,321–329.

72. Lemieux, R. (1996). How water provides the impetusfor molecular recognition in aqueous solution. AcctsChem. Res. 29, 373–380.

73. Ikura, T., Urakabo, Y. & Ito, N. (2004). Water-mediated interaction at a protein-protein interface.Chem. Phys. 307, 111–119.

74. Lazaridis, T. (2002). Binding affinity and specificityfrom computational studies. Curr. Org. Chem. 6,1319–1332.

75. Verma, C. S. & Fischer, S. (2005). Protein stability andligand binding: new paradigms from in-silicoexperiments. Biophys. Chem. 115, 295–302.

76. Fischer, S. & Verma, C. S. (1999). Binding of buriedstructural water increases the flexibility of proteins.Proc. Natl Acad. Sci. USA, 96, 9613–9615.

77. Olano, L. R. & Rick, S. W. (2004). Hydration freeenergies and entropies for water in protein interiors.J. Am. Chem. Soc. 126, 7991–8000.

78. Lam, P. Y., Jadhav, P. K., Eyermann, C. J.,Hodge, C. N., Ru, Y., Bacheler, L. T. et al. (1994).Rational design of potent, bioavailable, nonpeptidecyclic ureas as HIV protease inhibitors. Science, 263,380–384.

79. Chen, J. M., Xu, S. L., Wawrzak, Z., Basarab, G. S. &Jordan, D. B. (1998). Structure-based design of potentinhibitors of scytalone dehydratase: displacement ofa water molecule from the active site. Biochemistry,37, 17735–17744.

80. Kellogg, G. E., Semus, S. F. & Abraham, D. J. (1991).HINT: a new method of empirical hydrophobic fieldcalculation for CoMFA. J. Comput. Aided Mol. Des. 5,545–552.

81. Abraham, D. J. & Kellogg, G. E. (1994). The effect ofphysical organic properties on hydrophobic fields.J. Comput. Aided Mol. Des. 8, 41–49.

82. Kellogg, G. E. & Abraham, D. J. (2000). Hydro-phobicity: is LogPo/w more than the sum of its part?Eur. J. Med. Chem. 35, 651–661.

83. Hansch, C. & Leo, A. J. (1979). Substituent Constantsfor Correlation Analysis in Chemistry and Biology,Wiley, New York.

84. Dill, K. A. (1997). Additivity principles in biochem-istry. J. Biol. Chem. 272, 701–704.

85. Cozzini, P., Fornabaio, M., Marabotti, A., Abraham,D. J., Kellogg, G. E. & Mozzarelli, A. (2002). Simple,intuitive calculations of free energy of binding forprotein-ligand complexes. 1. Models without explicitconstrained water. J. Med. Chem. 45, 2469–2483.

86. Kellogg, G. E., Fornabaio, M., Spyrakis, F., Lodola,A., Cozzini, P., Mozzarelli, A. & Abraham, D. J.(2004). Getting it right: modeling of pH, solvent and“nearly” everything else in virtual screening ofbiological targets. J. Mol. Graph. Model. 22, 479–486.

87. Fornabaio, M., Cozzini, P., Mozzarelli, A., Abraham,D. J. & Kellogg, G. E. (2003). Simple, intuitivecalculations of free energy of binding for protein-ligand complexes. 2. Computational titration andpH effects in molecular models of neuraminidase-inhibitor complexes. J. Med. Chem. 46, 4487–4500.

88. Spyrakis, F., Fornabaio, M., Cozzini, P., Mozzarelli,A., Abraham, D. J. & Kellogg, G. E. (2004).Computational titration analysis of a multiproticHIV-1 protease-ligand complex. J. Am. Chem. Soc.126, 11764–11765.

89. Burnett, J. C., Botti, P., Abraham, D. J. &Kellogg, G. E. (2001). Computationally accessiblemethod for estimating free energy changes resultingfrom site-specific mutations of biomolecules: sys-tematic model building and structural/hydropathicanalysis of deoxy and oxy hemoglobins. Proteins:Struct. Funct. Genet. 42, 355–377.

90. Cashman, D. J., Rife, J. P. & Kellogg, G. E. (2001).Which aminoglycoside ring is most important forbinding? A hydropathic analysis of gentamicin,paromomycin, and analogues. Bioorg. Med. Chem.Letters, 11, 119–122.

91. Cashman, D. J. & Kellogg, G. E. (2004). A compu-tational model for anthracycline binding to DNA:tuning groove-binding intercalators for specificsequences. J. Med. Chem. 47, 1360–1374.

92. Cashman, D. J., Scarsdale, J. N. & Kellogg, G. E.(2003). Hydropathic analysis of the free energydifferences in anthracycline antibiotic binding toDNA. Nucl. Acids Res. 31, 4410–4416.

308 Water–Protein and Water–Ligand Interactions

93. Chen, D. L. & Kellogg, G. E. (2005). A computationaltool to optimize ligand selectivity between twosimilar biomacromolecular targets. J. Comput. AidedMol. Des. 19, 69–82.

94. Sreenivasan, U. & Axelsen, P. H. (1992). Buried waterin homologous serine proteases. Biochemistry, 31,12785–12791.

95. Keil, M., Exner, T. E. & Brickmann, J. (2004). Patternrecognition strategies for molecular surfaces: III.Binding site prediction with a neural network.J. Comput. Chem. 25, 779–789.

96. Rashin, A. A., Iofin, M. & Honig, B. (1986). Internalcavities and buried waters in globular proteins.Biochemistry, 25, 3619–3625.

97. Xu, Z., Bernlohr, D. A. & Banaszak, L. J. (1993). Theadipocyte lipid-binding protein at 1.6-A resolution.Crystal structures of the apoprotein and with boundsaturated and unsaturated fatty acids. J. Biol. Chem.268, 7875–7884.

98. Kleywegt, G. J., Bergfors, T., Senn, H., Le Motte, P.,Gsell, B., Shudo, K. & Jones, T. A. (1994). Crystalstructures of cellular retinoic acid binding proteins Iand II in complex with all-trans-retinoic acid and asynthetic retinoid. Structure, 2, 1241–1258.

99. Weisgerber, S. & Helliwell, J. R. (1993). Resolutioncrystallographic studies of native concanavalin Ausing rapid laue data collection methods and theintroduction of a monochromatic large-angle oscil-lation technique (Lot). J. Chem. Soc. Faraday Trans. 89,2667–2675.

100. Xu, R. X., Hassell, A. M., Vanderwall, D., Lambert,M. H., Holmes, W. D., Luther, M. A. et al. (2000).Atomic structure of PDE4: insights into phosphodi-esterase mechanism and specificity. Science, 288,1822–1825.

101. Rees, D. C., Lewis, M. & Lipscomb, W. N. (1983).Refined crystal structure of carboxypeptidase A at1.54 A resolution. J. Mol. Biol. 168, 367–387.

102. James, M. N. & Sielecki, A. R. (1983). Structure andrefinement of penicillopepsin at 1.8 A resolution.J. Mol. Biol. 163, 299–361.

103. Naismith, J. H., Emmerich, C., Habash, J., Har, S. J.,Helliwell, J. R., Hunter, W. N. et al. (1994). Refinedstructure of concanavalin-A complexed with methylalpha-D-mannopyranoside at 2.0 A resolution andcomparison with the saccharide-free structure. ActaCrystallog. sect. D, 50, 847–858.

104. Card, G. L., England, B. P., Suzuki, Y., Fong, I., Lee,B., Luu, C. et al. (2004). Structural basis for theactivity of drugs that inhibit phosphodiesterase.Structure, 12, 2233–2247.

105. Kim, H. & Lipscomb, W. N. (1990). Crystal structureof carboxypeptidase A with a strong bound phos-phonate in a new crystalline form: comparison withstructures of other complexes. Biochemistry, 29,5546–5555.

106. Fraser, M. E., Strynadka, N. C., Bartlett, P. A.,Hanson, J. E. & James, M. N. (1992). Crystallographicanalysis of transition-state mimics bound to peni-cillopepsin: phosphorus-containing peptide ana-logues. Biochemistry, 31, 5201–5214.

107. Dennis, S., Camacho, C. J. & Vajda, S. (2000).Continuum electrostatic analysis of preferred sol-vation sites around proteins in solution. Proteins:Struct. Funct. Genet. 38, 176–188.

108. Jiang, L., Kuhlman, B., Kortemme, T. & Baker, D.(2005). A “solvated rotamer” approach to modelingwater-mediated hydrogen bonds at protein–proteininterfaces. Proteins: Struct. Funct. Genet. 58, 893–904.

109. Pillai, B., Kannan, K. K. & Hosur, M. V. 1. (2001). AX-ray study shows closed flap conformation incrystals of tethered HIV-1 PR. Proteins: Struct.Funct. Genet. 43, 57–64.

110. Marquart, M., Walter, J., Deisenhofer, J., Bode, W. &Huber, R. (1983). The geometry of the reactive siteand of the peptide groups in trypsin, trypsinogenand its complexes with inhibitors. Acta Crystallog.sect. B, 39, 480–490.

111. Timm, D. E., Baker, L. J., Mueller, H., Zidek, L. &Novotny, M. V. (2001). Structural basis of pheromonebinding to mouse major urinary protein (MUP-I).Protein Sci. 10, 997–1004.

112. Huntington, J. A. & Esmon, C. T. (2003). Themolecular basis of thrombin allostery revealed by a1.8 A structure of the “slow” form. Structure, 11,363–364.

113. Pearl, L. & Blundell, T. L. (1984). The active site ofaspartic proteinases. FEBS Letters, 174, 96–101.

114. Flocco, M. M. & Mowbray, S. L. (1994). The 1.9 AX-ray structure of a closed unliganded form of theperiplasmic glucose/galactose receptor from Salmo-nella typhimurium. J. Biol. Chem. 269, 8931–8936.

115. Patel, S., Vuillard, L., Cleasby, A., Murray, C. V. &Yon, J. (2004). Apo and inhibitor complex structuresof Bace (Beta-Secretase). J. Mol. Biol. 343, 407–416.

116. Scheffzek, K., Lautwein, A., Kabsch, W., Ahmadian,M. R. & Wittinghofer, A. (1996). Crystal structure ofthe GTPase-activating domain of human p120GAPand implications for the interaction with Ras. Nature,384, 591–596.

117. Chatani, E., Hayashi, R., Moriyama, H. & Ueki, T.(2002). Conformational strictness required for maxi-mum activity and stability of bovine pancreaticribonuclease A as revealed by crystallographicstudy of three Phe120 mutants at 1.4 A resolution.Protein Sci. 11, 72–81.

118. Stebbins, C. E., Russo, A. A., Schneider, C., Rosen, N.,Hartl, F. U. & Pavletich, N. P. (1997). Crystal structureof an Hsp90-geldanamycin complex: targeting of aprotein chaperone by an antitumor agent. Cell, 89,239–250.

119. Chen, X., Tordova, M., Gilliland, G. L., Wang, L.,Li, Y., Yan, H. & Ji, X. (1998). Crystal structure ofapo-cellular retinoic acid-binding protein type II(R111M) suggests a mechanism of ligand entry.J. Mol. Biol. 278, 641–653.

120. Bone, R., Vacca, J. P., Anderson, P. S. & Holloway,M. K. (1991). X-ray crystal structure of the HIVprotease complex with L-700,417, an inhibitor withpseudo C2 symmetry. J. Am. Chem. Soc. 113,9382–9384.

121. Wlodawer, A. & Vondrasek, J. (1998). Inhibitors ofHIV-1 protease: a major success of structure-assisteddrug design. Annu. Rev. Biophys. Biomol. Struct. 27,249–284.

122. Priestle, J. P., Fassler, A., Rosel, J., Tintelnot-Blomley,M., Strop, P. & Grutter, M. G. (1995). Comparativeanalysis of the X-ray structures of HIV-1 and HIV-2proteases in complex with CGP 53820, a novelpseudosymmetric inhibitor. Structure, 3, 381–389.

123. Kurinov, I. V. & Harrison, R. W. (1994). Prediction ofnew serine proteinase inhibitors. Nature Struct. Biol.1, 735–743.

124. Krishnan, R., Mochalkin, I., Arni, R. & Tulinsky, A.(2000). Structure of thrombin complexed withselective non-electrophilic inhibitors having cyclo-hexyl moieties at P1. Acta Crystallog. sect. D, 56,294–303.

Water–Protein and Water–Ligand Interactions 309

125. Vijayalakshmi, J., Padmanabhan, K. P., Mann, K. J. &Tulinsky, A. (1994). The isomorphous structures ofprethrombin2, hirugen-, and PPACK-thrombin:changes accompanying activation and exosite bind-ing to thrombin. Protein Sci. 3, 2254–2271.

126. De Simone, G., Balliano, G., Milla, P., Gallina, C.,Giordano, C., Tarricone, C. et al. (1997). Humanalpha-thrombin inhibition by the highly selectivecompounds N-ethoxycarbonyl-D-Phe-Pro-alpha-azaLys p-nitrophenyl ester and N-carbobenzoxy-Pro-alpha-azaLys p-nitrophenyl ester: a kinetic,thermodynamic and X-ray crystallographic study.J. Mol. Biol. 269, 558–569.

127. Matthews, J. H., Krishnan, R., Costanzo, M. J.,Maryanoff, B. E. & Tulinsky, A. (1996). Crystalstructures of thrombin with thiazole-containinginhibitors: probes of the S1 0 binding site. Biophys. J.71, 2830–2839.

128. Bailey, D., Cooper, J. B., Veerapandian, B.,Blundell, T. L., Atrash, B., Jones, D. M. & Szelke,M. (1993). X-ray-crystallographic studies ofcomplexes of pepstatin A and a statine-containinghuman rennin inhibitor with endothiapepsin. Bio-chem. J. 289, 363–371.

129. Bailey, D. & Cooper, J. B. (1994). A structuralcomparison of 21 inhibitor complexes of the asparticproteinase from Endothia parasitica. Protein Sci. 3,2129–2143.

130. Blundell, T. L., Jenkins, J. A., Sewell, B. T.,Pearl, L. H., Cooper, J. B. et al. (1990). X-ray analyses

of aspartic proteinases. The three-dimensionalstructure at 2.1 A resolution of endothiapepsin.J. Mol. Biol. 211, 919–941.

131. Vyas, N. K., Vyas, M. N. & Quiocho, F. A. (1988).Sugar and signal-transducer binding sites of theEscherichia coli galactose chemoreceptor protein.Science, 242, 1290–1295.

132. Quiocho, F. A. & Vyas, N. K. (1984). Novelstereospecificity of the L-arabinose-binding protein.Nature, 310, 381–386.

133. Quiocho, F. A., Wilson, D. K. & Vyas, N. K. (1989).Substrate specificity and affinity of a proteinmodulated by bound water molecules. Nature, 340,404–407.

134. Coburn, C. A., Stachel, S. J., Li, Y. M., Rush, D. M.,Steele, T. G., Chen-Dodson, E. et al. (2004). Identifi-cation of a small molecule nonpeptide active sitebeta-secretase inhibitor that displays a nontradi-tional binding mode for aspartyl proteases. J. Med.Chem. 47, 6117–6119.

135. Madhusudan-Trafny, E. A., Xuong, N. H., Adams,J. A., Ten Eyck, L. F., Taylor, S. S. & Sowadski, J. M.(1994). cAMP-dependent protein kinase: crystallo-graphic insights into substrate recognition andphosphotransfer. Protein Sci. 3, 176–187.

136. Shaltiel, S., Cox, S. & Taylor, S. S. (1998). Conservedwater molecules contribute to the extensive networkof interactions at the active site of protein kinase A.Proc. Natl Acad. Sci. USA, 95, 484–491.

Edited by M. Levitt

(Received 5 September 2005; received in revised form 30 December 2005; accepted 14 January 2006)Available online 2 February 2006