modelling protein docking using shape complementarity,...

15
Modelling Protein Docking using Shape Complementarity, Electrostatics and Biochemical Information Henry A. Gabb, Richard M. Jackson and Michael J. E. Sternberg* Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln’s Inn Fields, P.O. Box 123, London WC2A 3PX, UK A protein docking study was performed for two classes of biomolecular complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecu- lar complexes for which crystal structures of both the complexed and uncomplexed proteins are available were used for eight of the ten test systems. Our docking experiments consist of a global search of transla- tional and rotational space followed by refinement of the best predictions. Potential complexes are scored on the basis of shape complementarity and favourable electrostatic interactions using Fourier correlation theory. Since proteins undergo conformational changes upon binding, the scoring function must be sufficiently soft to dock unbound structures success- fully. Some degree of surface overlap is tolerated to account for side- chain flexibility. Similarly for electrostatics, the interaction of the dis- persed point charges of one protein with the Coulombic field of the other is measured rather than precise atomic interactions. We tested our dock- ing protocol using the native rather than the complexed forms of the pro- teins to address the more scientifically interesting problem of predictive docking. In all but one of our test cases, correctly docked geometries (interface C a RMS deviation 42A ˚ from the experimental structure) are found during a global search of translational and rotational space in a list that was always less than 250 complexes and often less than 30. Vary- ing degrees of biochemical information are still necessary to remove most of the incorrectly docked complexes. # 1997 Academic Press Limited Keywords: molecular recognition; protein – protein docking; protein – protein complex; predictive docking algorithm; fast Fourier transform *Corresponding author Introduction Knowledge of the three-dimensional (3D) struc- ture of protein–protein complexes provides a valu- able understanding of the function of molecular systems. The rate of protein structure determi- nation is increasing rapidly. Today there are some 5,000 entries in the Brookhaven Protein Data Bank (PDB) (Bernstein et al., 1977). However, determi- nation of the structure of protein–protein com- plexes still remains a difficult problem with only about 100 deposited co-ordinates. Thus, computer algorithms capable of predicting protein–protein interactions are becoming increasingly important. Such algorithms can also provide insight into the process of protein–protein recognition. The general strategy for simulating protein –pro- tein docking involves matching shape complemen- tarity (for recent reviews, see Janin, 1995a; Shoichet & Kuntz, 1996). Some approaches focus specifically on matching surfaces (e.g. see Jiang & Kim, 1991; Katchalski-Katzir et al., 1992; Walls & Sternberg, 1992; Helmer-Citterich & Tramontano, 1994). Others enhance the search for geometric compli- mentarity by matching the position of surface spheres and surface normals (e.g. see Kuntz et al., 1982; Shoichet & Kuntz, 1991; Norel et al., 1995). Shape complementarity is measured by a variety of scoring functions, some of which aim to model the hydrophobic effect during association from the change in solvent-accessible surface area of mol- Abbreviations used: PDB, Brookhaven Protein Data Bank; 3D, three-dimensional; FFT, fast Fourier transform; DFT, discrete Fourier transform; IFT, inverse Fourier transform; CDR, complementarity determining region; RMS, root-mean square; CPU, central processing unit. J. Mol. Biol. (1997) 272, 106–120 0022–2836/97/360106–15 $25.00/0/mb971203 # 1997 Academic Press Limited

Upload: others

Post on 15-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

Modelling Protein Docking using ShapeComplementarity, Electrostatics andBiochemical Information

Henry A. Gabb, Richard M. Jackson and Michael J. E. Sternberg*

Biomolecular ModellingLaboratory, Imperial CancerResearch Fund, Lincoln's InnFields, P.O. Box 123, LondonWC2A 3PX, UK

A protein docking study was performed for two classes of biomolecularcomplexes: six enzyme/inhibitor and four antibody/antigen. Biomolecu-lar complexes for which crystal structures of both the complexed anduncomplexed proteins are available were used for eight of the ten testsystems. Our docking experiments consist of a global search of transla-tional and rotational space followed by re®nement of the best predictions.Potential complexes are scored on the basis of shape complementarityand favourable electrostatic interactions using Fourier correlation theory.Since proteins undergo conformational changes upon binding, the scoringfunction must be suf®ciently soft to dock unbound structures success-fully. Some degree of surface overlap is tolerated to account for side-chain ¯exibility. Similarly for electrostatics, the interaction of the dis-persed point charges of one protein with the Coulombic ®eld of the otheris measured rather than precise atomic interactions. We tested our dock-ing protocol using the native rather than the complexed forms of the pro-teins to address the more scienti®cally interesting problem of predictivedocking. In all but one of our test cases, correctly docked geometries(interface Ca RMS deviation 42 AÊ from the experimental structure) arefound during a global search of translational and rotational space in alist that was always less than 250 complexes and often less than 30. Vary-ing degrees of biochemical information are still necessary to remove mostof the incorrectly docked complexes.

# 1997 Academic Press Limited

Keywords: molecular recognition; protein±protein docking; protein±protein complex; predictive docking algorithm; fast Fourier transform*Corresponding author

Introduction

Knowledge of the three-dimensional (3D) struc-ture of protein±protein complexes provides a valu-able understanding of the function of molecularsystems. The rate of protein structure determi-nation is increasing rapidly. Today there are some5,000 entries in the Brookhaven Protein Data Bank(PDB) (Bernstein et al., 1977). However, determi-nation of the structure of protein±protein com-plexes still remains a dif®cult problem with onlyabout 100 deposited co-ordinates. Thus, computer

algorithms capable of predicting protein±proteininteractions are becoming increasingly important.Such algorithms can also provide insight into theprocess of protein±protein recognition.

The general strategy for simulating protein±pro-tein docking involves matching shape complemen-tarity (for recent reviews, see Janin, 1995a; Shoichet& Kuntz, 1996). Some approaches focus speci®callyon matching surfaces (e.g. see Jiang & Kim, 1991;Katchalski-Katzir et al., 1992; Walls & Sternberg,1992; Helmer-Citterich & Tramontano, 1994).Others enhance the search for geometric compli-mentarity by matching the position of surfacespheres and surface normals (e.g. see Kuntz et al.,1982; Shoichet & Kuntz, 1991; Norel et al., 1995).Shape complementarity is measured by a varietyof scoring functions, some of which aim to modelthe hydrophobic effect during association from thechange in solvent-accessible surface area of mol-

Abbreviations used: PDB, Brookhaven Protein DataBank; 3D, three-dimensional; FFT, fast Fouriertransform; DFT, discrete Fourier transform; IFT, inverseFourier transform; CDR, complementarity determiningregion; RMS, root-mean square; CPU, central processingunit.

J. Mol. Biol. (1997) 272, 106±120

0022±2836/97/360106±15 $25.00/0/mb971203 # 1997 Academic Press Limited

Page 2: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

ecular surface area (e.g. see Cher®ls et al., 1991).Several algorithms employ a simpli®ed scheme toestimate electrostatic interactions (e.g. see Jian &Kim, 1991; Walls & Sternberg, 1992). In general thealgorithms yield a limited set of favourable com-plexes, one or a few of which are close (typically 1to 3 AÊ RMS) to the native structure. Recognizingthis, several groups have additionally focused onscreening the correct solution from the false posi-tives by modeling the hydrophobic effect, electro-static interactions and desolvation (Gilson &Honig, 1988; Vakser & A¯alo, 1994; Jackson &Sternberg, 1995; Weng et al., 1996). Most of theabove studies have focused on rigid body docking.Recently, however, Monte Carlo simulations havebeen used to re®ne ¯exible side-chain positionsafter rigid body docking (Totrov & Abagyan,1994).

In the development of these algorithms manyworkers demonstrate that the bound complex canbe generated starting from the component boundmolecules. Simulations on the pertinent biologicalproblem of docking starting from unbound com-plexes have also been performed on a limited num-ber of test cases. However, these studies tended tobe applications of algorithms developed on boundcomplexes. As a consequence, selective pruning ofsome interface side-chains was required to demon-strate that the algorithms could be applied to un-bound complexes. The next step, therefore, is toestablish an automated procedure that is successfulon a series of molecules without manual interven-tion guided by knowledge of the answer.

An important step in the identi®cation of reliabledocking approaches was the blind trial organizedby James and co-workers (Strynadka et al., 1996).Five groups submitted entries to predict the dock-ing of b-lactamase and its inhibitor starting withthe unbound structures prior to knowledge of thecomplex. All entrants were able to identify a sol-ution close to the correct answer (i.e. better than2.7 AÊ RMS for superposition of the predicted onthe X-ray complex). The closest agreement was byFourier correlation algorithm (Katchalski-Katziret al., 1992), which found a 1.1 AÊ solution. This ap-proach performs a complete translational and ro-tational search in Fourier space and selects bindinggeometries with high surface correlation scores.However, general conclusions cannot be drawnfrom one trial, particularly as the b-lactamase/in-hibitor complex is highly amenable to dockingmethods that rely on shape complementarity aloneas the surface area buried in the complex is par-ticularly large (Janin, 1995a). It is not clear whetherthe Fourier approach or the other strategies invol-ving measures of surface complementarity willreliably predict unbound systems without con-sideration of other properties, particularly electro-statics.

This paper reports extensive trials of docking un-bound protein±protein molecules to obtain struc-tural models for their complexes. Our approachrecognises that a major problem in rigid body

docking, starting with unbound complexes, is thatthe algorithm must be suf®ciently soft to manageconformational changes, yet speci®c enough toidentify the correct solution. At present, thealternative approaches of an initial non-rigid bodysearch is computationally intractable. A variationof the Fourier correlation algorithm (Katchalski-Katzir et al., 1992) was chosen as it incorporatessoft docking, is computationally fast and math-ematically elegant. We introduce a soft treatmentof electrostatic interactions into the Fourier corre-lation approach. In addition, we evaluate the selec-tivity provided by speci®c biological constraints ofthe type likely to be available in genuine dockingapplications.

Algorithm

Measuring shape complementarity byFourier correlation

The docking protocol used in this study followsclosely the shape recognition algorithm ofKatchalski-Katzir et al. (1992) (shown schematicallyin Figure 1). The geometric surface recognitionmethod takes advantage of the fast Fourier trans-form (FFT) and Fourier correlation theory to scanrapidly the translational space of two rigidly rotat-ing molecules. The algorithm begins by discretiz-ing two-molecules, A and B, in 3D grids withevery node (l,m,n � {1 . . . N}) assigned a value:

fAl;m;n�

1 : surface of moleculer : core of molecule0 : open space

8<:and

fBl;m;n� 1

0::

inside moleculeopen space

�By convention, A and B denote the larger and

smaller proteins, respectively. Any grid nodewithin 1.8 AÊ of an atom is considered to be insidea protein. Grid nodes at the surface of molecule Aare scored differently. A 1.5 AÊ surface layer isused. We set r � ÿ 15 in all of our docking exper-iments. Contrary to the ®nding of Katchalski-Katzir et al.(1992) we ®nd that the choice of r doesaffect the performance of the algorithm. Speci®-cally, the degree of surface overlap toleratedduring docking is directly related to the value of r.

In all our calculations, 128 � 128 � 128 grids areused. Grid spacing is determined by the size of therespective molecules. The approximate radius of aprotein is the distance between the geometriccentre and the most distal atom. The grid size isthe sum of the protein diameters plus 1 AÊ . Gridspacings ranged from 0.74 AÊ /node to 0.84 AÊ /nodefor the enzyme/inhibitor complexes and from0.91 AÊ /node to 0.94 AÊ /node for the antibody/anti-gen complexes.

Protein Docking 107

Page 3: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

The correlation function of fA and fB is:

fCa;b;g �XN

l�1

XN

m�1

XN

n�1

fAl;m;n� fBl�a;m�b;n�g

where N3 is the number of grid points and a, b, gis the translation vector of molecule B relative to A.A high correlation score denotes a complex withgood surface complementarity. If molecule B sig-ni®cantly overlaps molecule A the correlation isnegative. Zero correlation most likely means thatthe molecules are not in contact.

Calculating fCl;m;nas shown above is very inef®-

cient, requiring N3 multiplications and additionsfor every N3 shifts a, b, g. Since fA and fB are dis-crete functions i.e. (representing the discretized

molecules) it is possible to calculate fC much morerapidly by FFT. The FFT requires of the order ofN3 ln (N3) calculations, which is signi®cantly lessthan N6. (See Press et al. (1986) and Bracewell(1990) for thorough reviews of fast transforms andFourier correlation.) The discrete functions fA andfB are transformed (denoted DFT for discrete Four-ier transform) and the complex conjugate F*Ais mul-tiplied times FB:

FA � DFT� fA�FB � DFT� fB�FC � �F�A� �FB�

The correlation function describing the shapecomplementarity of molecules A and B along each

Figure 1. The Fourier correlationdocking algorithm used in thisstudy, based on the method ofKatchalski-Katzir et al. (1992). Mol-ecules A and B are discretized dif-ferently. Molecule A has a negativecore and a positive surface layer(the dark band) whereas no surfacecore distinction is made for mol-ecule B. It is only necessary to dis-cretize and Fourier transformmolecule A one time. Electrostaticcomplementarity is calculated con-currently with shape complemen-tarity. Similarly, the transform ofthe electric ®eld of molecule Aneed only be calculated once. Thecross-section of a sample 3D Four-ier correlation function illustrates asearch of translational space. Thegeometric centres of the two mol-ecules are superposed at the origin.Molecule A is ®xed in the centre ofthe grid. As molecule B movesthrough the grid, a ``signal''describing shape complementarityemerges. A zero correlation scoreindicates that the proteins are notin contact while negative scores(the empty region in the centre) in-dicate signi®cant surface pen-etration. The highest peak indicatesthe translation vector giving thebest surface complementarity.

108 Protein Docking

Page 4: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

translational vector is the inverse Fourier transform(IFT) of the transform product:

fC � IFT�FC�A cross-section through a sample correlation

function is shown in Figure 1. The highest peak de-notes the translation vector producing the bestshape complementarity for the current orientation.

After each translational scan, molecule B is ro-tated about one of its Euler angles until rotationalspace has been completely scanned. For an angulardeviation of a � 15�, this yields 360 � 360 � 180/a3� 6912 orientations. However, many of theseorientations are degenerate and must be removedusing the following relationship (Lattman, 1972):

a � cosÿ1 tr�R1 � RT2 � ÿ 1

2

where R1 is the rotation matrix of the ®rst orien-tation, RT

2 is the transpose of the rotation matrix ofthe second orientation, and tr is the matrix trace. Ifa4 1� then the two orientations are degenerate.Removing degeneracies in this fashion yields 6385unique orientations. A ®ner angular rotation iscomputationally demanding for extensive trials.For example, there are 22,105 non-degenerateorientations for a � 10�.

Measuring electrostatic complementarity byFourier correlation

Shape complementarity is not the only factor in-volved in molecular binding. Electrostatic attrac-tion, particularly the speci®c charge±chargeinteractions in the binding interface, also plays animportant role. For speed and consistency, electro-static complementarity is calculated by Fourier cor-relation using a simple Coulombic model. Sincecharged amino acid side-chains are usually on theprotein surface, they are often involved in bindingand tend to be highly ¯exible (Figure 2). Therefore,calculating individual point charge interactionswhen attempting to dock the uncomplexed struc-tures is not feasible and can produce misleading re-sults. So rather than try to measure speci®ccharge±charge interactions, we measure the pointcharges of one protein interacting with the electric®eld of the other as grid points. In this way, pointcharges are dispersed to simulate side-chain move-ment. The electrostatic calculations proceed in amanner very similar to those of shape complemen-tarity. Charges are assigned to the atoms of proteinA (Table 1) and the molecule is placed in a grid.The electric ®eld at each grid node (excludingthose of the protein core) is calculated:

fi �X

j

qj

e�rij�rij

where fi, is the ®eld strength at node i (positionl,m,n), qj is the charge on atom j,rij is the distancebetween i and j (a minimum cutoff distance of 2 AÊ

is imposed to avoid arti®cially large values of f),and e(rij) is a distance-dependent dielectric func-tion. In this case, a pseudo-sigmoidal function,based on the sigmoidal function of Hingerty et al.(1985), is used:

e�rij� � 38rij ÿ4 : rij � 6 �A

224 : 6 �A< rij < 8 �A

80 : rij � 8 �A

8>><>>:Several distance-dependent dielectric functions

were tested. This one was chosen because it effec-tively damps long-range electrostatic effects thatare not relevant to the binding interface. In fact, di-electric functions that do not damp long-range ef-fects give inconsistent results, sometimes showingpoor electrostatics for experimentally determinedcomplexes. The treatment of protein B is much sim-pler. Charges are assigned to its atoms and then

Figure 2. Examples of docking to illustrate inducedbinding in the interface. (a) Selected binding site resi-dues of ovomucoid when free (2ovo, light grey) andwhen bound to a-chymotrypsin (1cho, dark grey). (b)Selected binding site residues of BPTI when free (4pti,light grey) and when bound to trypsin (2ptc, dark grey).Scoring functions used for predictive docking must besuf®ciently ``soft'' to account for conformational changesof this magnitude. The whole complexes for CHO andPTC are shown in Figures 4(b) and (d) respectively.

Protein Docking 109

Page 5: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

discretized in a grid (ql,m,n) by trilinear weighting(Rogers & Sternberg, 1984; Edmonds et al., 1984).

Calculations of the electrostatic interactions pro-ceeds as outlined in Figure 1 and as described forsurface correlation except that the discrete func-tions are:

fAl;m;n� fl;m;n

0::

entire grid excluding corecore of molecule

�fBl;m;n

� ql;m;n

and the correlation function becomes:

f eleca;b;g �

XN

l�1

XN

m�1

XN

n�1

fAl;m;n� qBl�a;m�b;n�g

So both grids are Fourier transformed and corre-lated such that the static charges of molecule Bmove through the electric ®eld of molecule A. Theelectrostatic correlation score is used as a binary ®l-ter. Speci®cally, false positive geometries that givehigh shape correlation scores can be excluded iftheir electrostatic correlation is unfavourable (i.e.positive).

Results

Predictive docking of native protein structures

The docking protocol, shown schematically inFigure 1, was applied to the unbound coordinatesof six enzyme/inhibitor and two antibody/antigensystems. In addition, two bound antibodies weredocked to unbound antigens. These ten systemsare referred to as ``unbound'' docking, for simpli-city. Docked solutions are ranked by surface corre-lation score. A correct solution is de®ned as anygeometry with an interface Ca RMS less than orequal to 2.5 AÊ from the crystal structure (seeMethods).

The results of the global search before ®lteringand without consideration of electrostatics showthat geometric complementarity alone (asmeasured by Katchalski-Katzir et al., 1992) cannotreliably dock unbound complexes (Tables 2 and 3).For three out of the ten test systems no correctlydocked complex is ranked in the top 4000 predic-tions. For the other seven systems, the highestranked correct answer was in a list of hundreds ofalternatives. This shows that a high surface corre-lation score does not necessarily indicate a cor-

rectly docked complex. For example, even alimited rotational scan near the correct geometryproduces a broad range of correlation scores(Figure 3). In a complete rotational search it ispossible to ®nd incorrectly docked complexes thatscore higher than the actual crystal structure. Thisdoes not pose a problem because our aim duringthe global search is to place at least one near-cor-rect prediction in the ®nal output; not necessarilyat the highest scoring position. Correctly dockedcomplexes can be screened later using experimen-tal constraints and advanced re®nement tech-niques.

The additional constraint of removing predic-tions with unfavourable electrostatic interactionsmarkedly improves the ranking of correctly dockedstructures in the global search (Table 2). With elec-trostatics, a good solution is found in the top 4000structures in nine out of ten systems. In general, in-clusion of electrostatics reduces the number of geo-metries to be evaluated by approximately 50%.

Global searching and the dependenceon filtering

Knowledge of the location of the binding site onone, or both proteins drastically reduces the num-ber of possible allowed conformations. Knowingspeci®c binding site residues reduces the searchspace even further. It is possible to utilize this in-formation in the form of distance constraints (seeMethods). Generally, information about the bind-ing site is available from experimental data (e.g.site-directed mutagenesis, chemical cross-linking,phylogenetic data). In the absence of experimentaldata, it is often possible to predict the correct bind-ing site by examining potential hydrogen bondinggroups, clefts and/or charged sites on a proteinsurface (Gilson & Honig, 1987; DesJarlais et al.,1988; Nicholls & Honig 1991; Laskowski, 1995;Laskowski et al., 1996; Meyer et al., 1996). Serineproteases and immunoglobulins represent systemswhere the binding sites are known in advance. Thecatalytic triad of the serine proteases and the com-plementarity determining region (CDR) of immu-noglobulins are both well characterized. We takeadvantage of this information to varying degreesin our docking experiments (Table 2). In the serineprotease/inhibitor docking attempts, ®lters are de-®ned as: loose, any residue of the inhibitor in con-tact with any residue of the catalytic triad (i.e. His,Asp, Ser); medium, an inhibitor residue in contactwith both the His and the Ser of the catalytic triad;tight, a speci®c binding site residue of the inhibitorin contact with both the His and the Ser of the cat-alytic triad. In the antibody/lysozyme docking at-tempts, ®lters are de®ned as: loose, any part of thelysozyme in contact with either the L3 or the H3CDR; medium, lysozyme in contact with both theL3 and H3 CDRs; tight, the medium ®lter togetherwith one residue of the epitope in contact with anypart of the CDR. The L3/H3 CDR ®lters are basedon the study of MacCullum et al. (1996), which

Table 1. Charges used in Coulombic electrostatic ®elds

Peptidebackbone Charge

Side-chainatoms Charge

Terminal-N 1.0 Arg-NZ 0.5Terminal-O ÿ1.0 Glu-Oe ÿ0.5Ca 0.0 Asp-Od ÿ0.5C 0.0 Lys-Nz 1.0O ÿ0.5 Pro-N ÿ0.1N 0.5

110 Protein Docking

Page 6: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

analysed general structural principles of antibody/antigen contacts.

If the proteins in question are heavily studied, asare serine proteases and immunoglobulins, infor-mation typical of the medium or the tight ®ltermight be available. In other practical applicationsof docking, there often will be some available con-straints typical of the loose ®lter. Table 2 showsthat loose ®ltering of the output from the globalsearch removes between 79 and 99% of false posi-tives, usually leaving at least one correct answer inthe top 50 predictions. As expected, medium andtight ®ltering removes successively more incorrectpredictions. For example, going from loose to med-ium ®ltering reduces the total number of predic-tions by an order of magnitude. With fewer falsepositives to contend with, the ranking of the cor-rect predictions improves dramatically. A good sol-ution is in the top ®ve in six out of nine cases. Forthe other three systems no more than 45 alternategeometries would have to be examined. In goingfrom medium to tight ®ltering, although the pro-cedure does not signi®cantly alter the number offalse positives, a near correct geometry is ranked inthe top ®ve predictions in all but two of the ninetest systems. In one other system, the good sol-

ution is rank 27. It is clear that given experimentalconstraints this docking procedure can effectivelydock two proteins using crystal structures for theunbound subunits. Since few predictions remain atthis stage, it may be possible to use more computa-tionally demanding, and hence more accuratemodels to predict binding.

Local refinement of predicted geometries

The global search is carried out at an angular de-viation of a � 15�. A ®ner rotational scan is desir-able but computationally expensive. So, a localre®nement of the most reasonable predictions isperformed. We chose to re®ne the structures thatpassed through the loose ®lter because this level ofinformation is generally available, e.g. in the dock-ing challenge (Strynadka et al., 1996). During re-®nement each geometry is shifted (�5 AÊ in eachdirection) and rotated (�5� for each Euler angle)slightly to ®nd the highest surface correlation scorein the local space. Re®nement using the same sur-face thickness as in the global search (1.5 AÊ ) didnot improve selectivity as evaluated by the rank ofthe ®rst good solution; in some systems rankingwas improved but in others it was worse. How-

Table 2. Docking unbound subunits with varying degrees of ®ltering

Global searchBefore ®ltering Loose ®ltering Medium ®ltering Tight ®ltering

Rank (N � 4000)No elec Elec N 4 2.5 AÊ N Rank N 4 2.5 AÊ N Rank N 4 2.5 AÊ N Rank N 4 2.5 AÊ

CGI 207 87 2 172 5 2 24 2 1 7 1 1CHO 282 127 7 234 12 7 19 2 7 13 1 7KAI 462 223 25 377 64 19 85 45 13 48 27 13PTC 989 502 8 242 39 8 34 3 8 21 2 8SNI NF 1518 2 29 19 2 3 2 2 2 1 2SIC NF NF 0 ± ± ± ± ± ± ± ± ±

FDL 219 106 2 717 22 2 331 14 2 51 3 2MLC 188 104 5 623 13 5 97 3 5 20 1 5HFL NF 3318 2 544 460 2 16 13 2 6 5 2HFM 100 67 7 837 27 7 177 5 5 40 1 5

N, the number of predictions remaining at each stage. Rank refers to the position of the ®rst correct answer in the list of predictions.NF, none found. N 4 2.5 AÊ equals the number of predicted complexes with an interface Ca RMS 4 2.5 AÊ .

The following side-chain lengths were used during ®ltering: Gly, 0.5 AÊ ; Ala, Thr, Val, 1.0 AÊ ; Cys, Ile, Leu, Ser, 2.0 AÊ ; Asn, Asp,Pro, 3.0 AÊ ; Gly, Glu, His, Met, Phe, 4.0 AÊ ; Lys, Trp, Tyr, 5.0 AÊ ; Arg, 6.0 AÊ

Table 3. Docking unbound subunits: global search and local re®nement

Filtered global search Local re®nement (1.5 AÊ surface) Local re®nement (1.2 AÊ surface)

N Rank N 4 2.5 AÊ Ca RMS N Rank N 4 2.5 AÊ Ca RMS N Rank N 4 2.5 AÊ Ca RMS

CGI 172 5 2 2.1/1.9 94 3 1 1.8/2.0 133 1 1 1.7/2.0CHO 234 12 7 1.4/1.3 86 11 5 1.2/1.1 178 8 6 1.5/1.7KAI 377 64 19 2.2/1.8 364 130 18 1.5/1.2 181 33 4 1.9/1.6PTC 242 39 8 1.9/2.0 229 16 8 1.5/1.7 177 9 7 1.7/1.6SNI 29 19 2 1.8/1.6 26 8 2 1.8/1.6 15 2 1 2.4/1.9SIC ± ± ± ±/± ± ± ± ±/± ± ± ± ±/±

FDL 717 22 2 2.1/1.8 707 176 2 2.1/1.8 525 ± ± ±/±MLC 623 13 5 2.2/1.9 590 41 4 1.2/1.6 378 1 4 2.0/2.1HFL 544 460 2 2.3/1.8 519 228 2 1.8/1.5 403 78 2 1.5/1.5HFM 837 27 7 1.7/1.5 762 65 6 1.1/1.1 408 1 2 0.7/0.8

Ca RMS no./no. refers to the total Ca RMS and the interface Ca RMS, respectively.

Protein Docking 111

Page 7: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

ever, re®nement tends to improve the ®nal RMSvalues of the highest ranked correct predictions forthe enzyme/inhibitor complexes (exception: KAI;Table 3). Re®ning the antibody/antigen predictionsproved more dif®cult. In each case except HFL, re-®nement of the ®ltered data from the global searchworsened the rank of the ®rst correctly dockedcomplex. HFL, however, had the worst overallrank.

To improve these results, we performed the re-®nement again using a thinner surface thickness(1.2 AÊ ). A thinner surface layer is less tolerant ofoverlapping protein surfaces. This leads to morestringent scoring of shape complementarity, some-thing that is not necessarily bene®cial when doinga global search of docking space for native struc-tures. However, stricter shape ®tting tends to dam-pen correlation scores for false positives whileenhancing those of predicted complexes alreadynear the correct docking geometry, as shown inTable 3. With the exception of FDL, using a slightlythinner surface thickness signi®cantly improvesranking but sometimes worsens RMS. Thissuggests that there is not strict correlation between

surface complementarity score and the correctRMS when docking native structures (Figure 3).

Enzyme/inhibitor docking

The highest ranked correct prediction for eachenzyme/inhibitor test case after re®nement (1.5 AÊ

surface thickness) is shown in Figure 4. Note thatthese are not necessarily the predictions with thebest RMS values. CGI represents the best of all ourdocking attempts. A correct answer scores in thetop 100 predictions after the global search. Loose®ltering leaves a reasonable number of complexesfor subsequent re®nement, after which a correctanswer is ranked in the top ®ve regardless of thesurface thickness used (Table 3). The single remain-ing correct structure is shown in Figure 4(a). In-spection shows that the RMS deviation from thecorrect docking geometry is primarily rotational.

CHO represents another successful docking at-tempt. Loose ®ltering and local re®nement leaveseveral correct geometries, one of which is the top20 predictions (Table 3). The deviation from thecrystal structure of the complex is primarily ro-tational but with a small translational componentas well (Figure 4(b)).

Problems arose during the re®nement of KAI.Unlike the other enzyme/inhibitor test cases, re-®nement using a 1.5 AÊ surface layer worsened therank of the best prediction while improving itsRMS deviation (Table 3), which is primarily ro-tational (Figure 4c). Results were better when athinner surface thickness was used; both the rankand RMS values improved.

The attempt by Katchalski-Katzir et al. (1992) todock trypsin and BPTI was unsuccessful, leadingthe authors to conclude that docking native struc-tures was beyond the capability of the Fouriercorrelation algorithm. We also ®nd that PTC is adif®cult test case. Even with the added electro-static ®ltering (not available in the original workof Katchalski-Katzir and co-workers) the ®rst cor-rect prediction still ranks far down in the list ofpossibilities (Table 3). All of the structures usedin this study experience some induced binding ef-fects, but compared to the binding site of ovomu-coid (Figure 2(a)), for example, the amino acidside-chains of BPTI that are important to binding,especially Lys15, undergo signi®cant confor-mational change upon binding to trypsin(Figure 2(b)). Loose ®ltering, however, reducesthe global search to more manageable pro-portions, leaving several correct geometries, in-cluding one in the top 50 predictions. Subsequentre®nement improves both rank and RMS(Table 3). The RMS deviation is almost entirelytranslational (Figure 4(d)).

Similar to PTC, SNI is another case where elec-trostatic ®ltering and at least some experimentaldata are essential if a correctly docked complex isto be found at all. However, applying the loose ®l-ter removes 99% of the false positives. Re®nementat either surface thickness improves the ®nal rank-

Figure 3. A high surface correlation score does notnecessarily indicate a correct complex. Limited rotationalscans (�15� in 5� increments for each Euler angle)around the correct geometry were performed using thebound and unbound structures of KAI and MLC: (a)complexed KAI structures, (b) native KAI structures (2/pka/1bpi), (c) complexed MLC structures (1mlc), (d)native MLC structures (1cfa/1lza). These plots showthat high surface correlation scores are possible forincorrect geometries and vice versa.

112 Protein Docking

Page 8: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

ings but not the RMS values. The RMS deviation ismostly translational but with a small rotationalcomponent (Figure 4(e)).

SIC is the only case in which no correct answerswere found in the top 4000 predictions. The ®rstcorrect answer is found at rank 7745. The dockingwas performed again but with all side-chainsgreater than 70% solvent-exposed truncated at the

cg position. This time, correctly docked structuresscored in the top 1000 predictions. However, thisdoes not represent a general solution to the dock-ing problem, since truncating exposed side-chainsworsened results for the other test cases (data notshown).

Grid resolutions for the enzyme/inhibitor testcases are: CGI, 0.78; CHO, 0.76; KAI, 0.74; PTC,

Figure 4(a±d) (legend on page 114)

Protein Docking 113

Page 9: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

0.76; SNI, 0.78; and SIC, 0.84 AÊ /grid space. It istempting to blame the failure to dock SIC on thecomparatively large grid spacing. However, anti-body/antigen complexes require even larger gridspacings: FDL, 0.96; MLC, 0.91; HFL, 0.94; andHFM, 0.94 AÊ /grid space. In general, the dockingresults for both sets of molecules are qualitativelysimilar, suggesting that the failure of SIC is not aconsequence of grid resolution.

Antibody/antigen docking

The highest ranked correct prediction for eachantibody/antigen test case after re®nement (1.5 AÊ

surface thickness) is shown in Figure 5. FDL is themost dif®cult of the antibody docking tests(Figure 5(a)). Our re®nement scheme failed to im-prove either the rank of the ®rst correct predictionor its RMS values. Near the interface, the RMSdifference from the crystal structure is primarilytranslational. Further from the interface, the ro-tational component of the RMS difference becomesmore apparent. The highest ranked, re®ned predic-tion for MLC is shown in Figure 5(b). The RMS de-viation from the crystal structure is primarilytranslational.

Since the free antibodies for HFL and HFM havenot been solved crystallographically, it was necess-ary to use the complexed forms of the Fvs. Re®ne-ment of HFL improved both the rank and RMS ofthe ®rst correctly predicted complex. The rank,however, was still lower than the other antibody/antigen test cases in spite of the fact that the boundform of the antibody was used. The RMS differ-

ence for HFL is mostly rotational and for the mostpart away from the interface, which is in closeagreement with the crystal structure (Figure 5(c)).HFM represents the best of the antibody/antigendocking attempts in terms of rank and RMS(Table 3). The RMS difference from the crystalstructure is primarily translational with a slight ro-tational component visible further from the inter-face (Figure 5(d)).

Docking complexed protein structures

The test systems used in this study were chosenbecause crystal structures were available for boththe free and complexed subunits. We decided totest our docking procedure on bound conformersusing the same parameter set as used to dock theunbound proteins. To avoid biasing towards theoriginal binding orientation, the subunits of thecomplexes are rotated at random before startingthe global search. As expected, better results areobtained when docking bound conformers (Table 4)compared to native structures (Tables 2 and 3). Ingeneral, a correct answer ranks in the top 100 pre-dictions after the global search. Electrostatic ®lter-ing improves the ranking of correct geometries by,roughly, a factor of 2, similar to the results for un-bound docking (Table 2). However, even in the ab-sence of electrostatics the results are fairly good. Intwo cases, SIC and FDL, however, electrostatics is es-sential in order to score a correct answer in the top4000 predictions. There are also a greater number ofcorrectly docked predictions in the ®nal output fromthe global search, more so for the enzyme/inhibitorthan for the antibody/antigen complexes.

Figure 4. The highest ranked predicted geometries forthe enzyme/inhibitor test cases superposed onto thecrystal structure of the enzyme (a) CGI, (b) CHO, (c)KAI, (d) PTC, (e) SNI. In each case, the view was chosento highlight the difference between the correctly boundantigen and the prediction. The crystal structure of theinhibitor/enzyme complex is shown in orange/yellowand the predicted docking in magenta/cyan.

114 Protein Docking

Page 10: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

Local re®nement consistently improves the ®nalRMS values of the highest scoring correct predic-tion. However, in all but three cases the ®nal rankof this structure is worsened. Using the electro-static ®lter during local re®nement does not signi®-cantly change this result (not shown). It should benoted that the docking parameters were optimizedfor unbound molecules where shape complemen-

tarity is less speci®c. So improved performanceusing bound molecules is not necessarily to be ex-pected. Still, the overall results of docking usingbound conformers is superior to predictive dockingusing native structures. A correctly docked com-plex is consistently placed in the top 100 scoringgeometries with an overall greater density of cor-rect answers in the ®nal output. This highlights the

Figure 5. The highest ranked predicted geometries for the antibody/antigen test systems superposed on the crystalstructure of the Fv (a) FDL, (b) MLC, (c) HFL, (d) HFM. The view was chosen to highlight the difference between thecorrectly bound antigen and the prediction. The crystal structure of the lysozyme/antibody complex is shown inorange/yellow and the predicted docking in magenta/cyan.

Protein Docking 115

Page 11: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

problem of evaluating docking protocols usingbound complexes.

Discussion and Conclusions

We have presented a docking protocol based onthe Fourier correlation algorithm (Katchalski-Katziret al., 1992). Our method used a parameter set de-veloped speci®cally for docking native structuresand therefore can be used for predictive docking.In addition, a simple but effective treatment ofelectrostatic interactions further improved our abil-ity to distinguish true from false positive dockings.Because of this we were able to succeed in at leastone test system where surface correlation alonefailed. A complete search of translational and ro-tational space is performed, but within this spacethe correct binding geometry is quite small, somost of the complexes sampled are incorrect byde®nition. This leads to noisy output. Even afterpost-®ltering based on experimental data there aresigni®cantly more incorrectly docked structuresthan correct solutions. Though the present combi-nation of surface omplementarity and electrostaticinteractions performs well for native structures, ex-perimental information in the form of distance con-straints is necessary in order to separate correctsolutions from false positives. Three levels of ®lterwere applied to the output, each with successivelymore information about the binding interface, andeach producing successively better results (Table 2).

Since protein±protein binding is governed,among other things, by electrostatic interactions aswell as hydrophobicity, one cannot expect surfacecomplementarity alone to distinguish true fromfalse positives in predictive docking. Hydrophobi-city is modelled implicitly in the surface comple-mentarity score (e.g. see Cher®ls et al., 1991;Jackson & Sternberg, 1995) but electrostatic inter-actions were not included in the original Fouriercorrelation method (Katchalski-Katzir et al., 1992).We decided to consider electrostatics with the aimof excluding false positives and improving therank of the correct answers. We opted for a simpleCoulombic model rather than a more rigorous,and more computationally expensive, Poisson±Boltzmann description (Warwicker & Watson,

1982; Sharp & Honig, 1990). A dispersed Coulom-bic ®eld was used rather than charge±charge inter-actions described at atomic level. Point chargeinteractions failed to distinguish right from wronganswers (data not shown) for the simple reasonthat the side-chains governing speci®city are notnecessarily in their optimal binding conformations.Individual charges will rarely be positioned for fa-vorable interaction. In fact, they can often beplaced in a repulsive interaction even if the back-bone geometry is correct. Even one unrealisticallyclose contact can severely distort the electrostaticcalculations. A Coulombic ®eld, whose absolutevalue has an upper bound, disperses the charges ina manner that simulates the uncertainty in side-chain conformation. It is therefore suf®ciently``soft'' to dock native structures. However, at thislevel of detail, electrostatics can only be used as abinary ®lter as it is not suf®ciently sensitive to beincluded directly in the scoring function. However,the results show that removing predictions withunfavorable electrostatics signi®cantly improvesthe ranking of correctly docked complexes(Table 2). In some complexes, it is essential inorder to ®nd any correct answers.

Other researchers have used lower-resolutiongrids and coarser rotational scans for their initialsearch (Katchalski-Katzir et al., 1992; Vakser &A¯alo, 1994; Meyer et al., 1996) Vakser & A¯alo(1994), for example, used a 64 � 64 � 64 grid. Inour initial studies, we used a 64 � 64 � 64 gridwith an angular deviation of 20� for our globalsearch. This produced marginal results for the en-zyme/inhibitor systems but failed to ®nd correctsolutions for the antibody/antigen systems (datanot shown). Antibody/antigen systems are toolarge to model at this grid resolution and angulardeviation. Access to a more powerful computercapable of performing the FFT in parallel enabled usto rapidly perform both stages of docking (i.e. globalsearch and local re®nement) at high resolution. Thissigni®cantly improved our results for both enzyme/inhibitor and antibody/antigen docking.

We obtained good results using a surface layerthat was thinner than that normally used by othergroups. In docking the challenge, for example,Eisenstein and co-workers were successful using a

Table 4. Docking native structures

Global search Loose ®ltered global search Local re®nement (1.5 AÊ surface)

Rank (N � 4000)No elec Elec N 4 2.5 AÊ N Rank N 4 2.5 AÊ Ca RMS N Rank N 4 2.5 AÊ Ca RMS

CGI 87 45 19 174 3 19 1.8/1.7 161 3 19 1.0/1.0CHO 118 59 15 238 3 13 1.6/1.6 218 40 13 0.8/0.9KAI 33 25 30 534 25 30 1.4/1.3 502 38 28 0.4/0.4PTC 232 118 6 515 24 6 0.8/0.8 513 91 7 0.7/0.8SNI 104 40 15 62 2 15 0.6/0.7 54 8 15 0.6/0.7SIC NF 985 2 37 8 2 1.2/1.1 30 22 2 0.8/0.7

FDL NF 1983 1 644 355 1 0.6/0.8 631 240 1 0.7/0.8MLC 52 23 8 522 5 8 0.6/0.7 507 2 8 0.8/0.8HFL 158 89 7 805 16 7 0.7/0.8 784 96 7 0.8/0.9HFM 118 78 2 664 23 2 0.8/0.9 633 104 2 0.8/0.9

116 Protein Docking

Page 12: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

surface thickness of 3.5 AÊ (Strynadka et al., 1996).However, using the bound forms of the proteinsstudied in this paper, we saw very little shapecomplementarity when using a thicker surfacelayer in the procedure (data not shown). The ®naloutput was composed primarily of complexes thatburied large amounts of surface area with substan-tial overlap with the protein core. A thinner surfacelayer demands greater shape speci®city and our re-sults show that a surface thickness of 1.5 AÊ workswell when docking unbound proteins. Decreasingsurface thickness to 1.2 AÊ during local re®nementimproved results even further (Table 3). Local re-®nement using the same surface thickness (i.e.1.5 AÊ ) as the global search is less able to dis-tinguish correctly docked molecules clearly(Table 3). This suggests that a suf®cient level ofsurface complementarity still exists at the protein±protein interface in spite of incorrectly positionedside-chains.

It has been suggested that enzyme/inhibitor andantibody/antigen complexes represent two differ-ent classes of binding (Lawrence & Colman, 1993).Enzymes and their inhibitors have co-evolved toform an interface with a high degree of surfacecomplementarity. On the other hand, the immunesystem produces many different antibodies in re-sponse to an antigen, some of which bind their re-spective epitopes quite well while others bindquite poorly. So a particular antibody/antigencomplex does not necessarily possess the best poss-ible binding interface. Shape correlation may notbe as important in antibody/antigen complexes(Lawrence & Colman, 1993). Our results for thebound forms do show slightly better surface com-plementarity for the enzyme/inhibitor complexesthan for the antibody/antigen complexes. Thismay re¯ect the different classes of binding. Perhapsit is not a coincidence that the b-lactamase/inhibi-tor complex was predicted successfully by all re-search groups in the ®rst docking challenge(Strynadka et al., 1996) whereas no groups coulddock an antibody to haemagglutinin in the seconddocking challenge (Dunbrack et al., 1997).

Protein docking algorithms can provide insightsinto the kinetics and stability of recognition (e.g.see Janin, 1995b). Our procedure does not considerassociation at the atomic level and thus it is inap-propriate to interpret the results in terms of indi-vidual atomic interactions. However, the globalsearch followed by local re®nement could mimicaspects of molecular recognition. Our treatmentconsiders geometric and electrostatic complemen-tarity at a lower resolution. This level of detail maywell model a state in which the two proteins areclose to their native conformation but the preciseatomic interactions have not been achieved. Such astate might be similar to the molten globule phaseduring protein folding (Ptitsyn, 1995). This studyshows that there are a very limited number oforientations by which an optimised binding inter-face can associate with an active site or an antigenepitope. Furthermore, the results show that to

provide speci®city the avoidance of unfavorableelectrostatic interactions augments shape comple-mentarity.

The docking procedure presented here success-fully screens a very large number (�1010) of poss-ible binding geometries to a reasonable number(�50) of predicted complexes using the nativestructures of the proteins. For such a small num-ber of candidates, it is possible to use more com-putationally demanding techniques to re®nefurther the few remaining complexes to accountfor desolvation and, more importantly, confor-mational changes. Unless the interfacial side-chains are correctly positioned, true molecularrecognition will not be achieved. Inclusion of amethod for induced binding effects could lead toa general solution to the protein±protein dockingproblem.

Methods

Distance filtering

Distance ®ltering is implemented as a two-stepprocess. First, a rapid check of intermolecular Ca

distances between constraint residues is performed.For example, in the case of an antibody±antigenbinding where the epitope is unknown, the Ca dis-tances between residues in the hypervariable loopsand all antigen residues would be checked. If apair of Ca atoms is within a cutoff distance, in thiscase the sum of the side-chain lengths for the tworesidues (see legend to Table 2), the distances be-tween all atoms of the two residues are checked. Ifany atom pair is within 4.5 AÊ then the distanceconstraint is satis®ed. Predicted complexes that donot satisfy the distance constraint are discarded.

Global search and local refinement

A complete docking experiment consists of twodistinct phases: global search and local re®nement.It should be noted that high-resolution grids areused in both phases, unlike previous studies(Katchalski-Katzir et al., 1992; Vakser & A¯alo,1994; Meyer et al., 1996) in which smaller, low-res-olution grids were used during the global search.It is possible for us to use a high-resolution gridfor both the initial search and the re®nement dueto the speed afforded by multiprocessing (seeTechnical information, below). The global searchexamines all translation (i.e. within the discretespace of the grid) and rotation space. This pro-duces (1283 � 6385) �1010 possible docking geome-tries. Obviously, geometries with zero or negativecorrelation scores can be excluded immediately.However, several hundred thousand possibledocking geometries still remain. To reduce thenumber of possibilities to manageable levels, onlythree complexes from each translational scan aresaved to the ``stack''; those with the highest surfacecorrelation scores and favourable electrostatics.This leaves (3 � 6385�)19,155 possible complexes

Protein Docking 117

Page 13: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

after the global search. The stack is sorted and thebest 4000 geometries are kept. These complexes are®ltered on the basis of available biochemical infor-mation and those that pass through the ®lter un-dergo local re®nement. Each predicted geometry isshifted (�5 AÊ in each direction) and rotated (�5�for each Euler angle) slightly and the highest scor-ing complex is saved. Electrostatics are not calcu-lated during local re®nement for two reasons. First,it doubles the computational time required for there®nement stage of docking. Second, the dockedgeometry does not change signi®cantly during re-®nement. Consequently, electrostatic interactionwill not change signi®cantly. Tests performedusing electrostatic correlation during re®nementshowed little or no improvement in the ®nal re-sults (data not shown). At the end of the local re-®nement stage of docking, the stack is ®lteredagain and the remaining complexes with the high-est surface correlation scores are consideredreasonable docking predictions.

Calculating RMS deviation

The quality of docking predictions is assessed bycalculating the RMS deviation in the Ca atomsafter superposition (Kabsch, 1978) of the predictedcomplex onto the crystal complex. Both receptorand ligand Ca coordinates are used during thesuperposition. The peptide backbone and aminoacid side-chains undergo conformational changesupon binding. However, induced binding affectsthe side-chains more drastically than the mainchain. So to avoid masking otherwise good results,side-chain atoms are excluded when calculatingRMS deviations. In addition to total Ca RMSvalues, the RMS of the interface Ca atoms is alsodetermined. Residues with any atom within 10 AÊ

of the opposite protein comprise the interface.

Test systems

The enzyme/inhibitor systems used in thisstudy consist of the following crystal structures(with their PDB codes): (1) CGI, human pancreatictrypsin inhibitor (1hpt) (Hecht et al., 1992) boundto a -chymotrypsinogen (1chg) (Freer et al., 1970).The PDB code of the complex is 1cgi (Hecht et al.,1991). (2) CHO, ovomucoid (2ovo) (Bode et al.,1985) bound to a chymotrypsin (5cha) (Blevins &Tulinsky, 1985). The PDB code of the complex is1cho (Fujinaga et al., 1987). (3) KAI, bovine pan-creatic trypsin inhibitor (1bpi) bound to kallikreinA (2pka) (Bode et al., 1983). The PDB code of thecomplex is 2kai (Chen & Bode, 1983). (4) PTC, bo-vine pancreatic trypsin inhibitor (4pti) (Wlodaweret al., 1987) bound to trypsin (2ptn) (Fehlhemmeret al., 1977). The PDB code of the complex is 2ptc(Marquart et al., 1983). (5) SNI, chymotrypsin in-hibitor 2 (2ci2) (McPhalen & James, 1987) bound tosubtilisin (1sup). The PDB code of the complex is

2sni (McPhalen & James, 1988). (6) SIC, Strepto-myces subtilisin inhibitor (2ssi) (Mitsui et al., 1979)bound to subtilisin (1sup). The PDB code of thecomplex is 2sic (Takeuchi et al., 1991).

The antibody/antigen systems used in this studyconsisted of the following Fabs and Fvs bound to ly-sozyme (1lza) (Maenaka et al., 1995): (1) FDL, D1.3Fab (1vfa), complex (1fdl) (Fischmann et al., 1991).(2) MLC, D44.1 (Fv (1mlb), complex (1mlc) (Bradenet al., 1994). (3) HFL, HyHel-5 Fab (2h¯) (Sheriffet al., 1987). (4) HFM, HyHel-10 Fab (3hfm) (Padlanet al., 1989). Unfortunately, native crystal structuresfor the antibody in 2h¯ and 3hfm are not yetsolved. So the bound forms of the Fab are used inHFL and HFM docking. Only the Fv regions of1mlb, 2h¯, and 3hfm were used during docking.

Technical information

The complete docking package, namedFTDOCK, consists of approximately 3,500 lines ofFortran 77 and Perl 5.0 (Wall & Schwartz, 1991)code designed to run under the UNIX operatingsystem. All docking experiments were carried outon an SGI Power Challenge symmetric-array multi-processor with 12 R10000 CPUs. Parallel compilerdirectives as well as the LIBFFT parallel mathslibrary (J.-P. Panziera, SGI Paris, France) containingthe necessary FFT routines are used to maximizecomputational ef®ciency. A complete docking ex-periment including post-®ltering requires approxi-mately six hours of CPU time using eightprocessors simultaneously. Preprocessor com-mands in the source code allow compilation on se-rial workstations.

Acknowledgements

The authors thank Matthew Betts for his help in iden-tifying suitable test cases for docking. R. M. J. is a fellowof the Lloyd's of London Tercentenary Foundation. H.A. G. is a fellow of the European Molecular Biology Or-ganization.

References

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. F., Brice, M. D., Rodgers, J. R., Kennard, O.,Shimanovichi, T. & Tasumi, M. (1977). The proteindata bank: a computer-based archival ®le formacromolecular structures. J. Mol. Biol. 112, 535±542.

Blevins, R. A. & Tulinsky, A. (1985). The re®nement andthe structure of the dimer of alpha-chymotrypsin at1.67-AÊ resolution. J. Biol. Chem. 260, 4264±4275.

Bode, W., Chen, Z., Bartels, K., Kutzbach, C., Schmidt-Kastner, G. & Bartunik, H. (1983). Re®ned 2aÊngstrom X-ray crystal structure of porcine pancrea-tic kallikrein a and a speci®c trypsin-like proteinase,crystallization and structure determination andcrystallographic re®nement and structure and itscomparison with bovine trypsin. J. Mol. Biol. 164,237±282.

118 Protein Docking

Page 14: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

Bode, W., Epp, O., Huber, R., Laskowski, M., Jr &Ardelt, W. (1985). The crystal and molecular struc-ture of the third domain of silver pheasant ovomu-coid (OMSVP3). Eur. J. Biochem. 147, 387±395.

Bracewell, R. N. (1990). Numerical transforms. Science,248, 697±704.

Braden, B. C., Souchon, H., Eisele, J.-L., Bentley, G. A.,Bhat, T. N., Navaza, J. & Poljak, R. J. (1994). Three-dimensional structures of the free and the antigen-complexed Fab form monoclonal anti-lysozyme anti-body D44.1. J. Mol. Biol. 243, 767±781.

Chen, Z. & Bode, W. (1983). Re®ned 2.5 AÊ X-ray crystalstructure of the complex formed by porcine kallik-rein a and the crystallization, patterson search,structure comparison with its components and withthe bovine trypsin-pancreatic trypsin inhibitorcomplex. J. Mol. Biol. 164, 283±311.

Cher®ls, J., Duquerroy, S. & Janin, J. (1991). Protein pro-tein recognition analyzed by docking simulation.Proteins, 11, 271±280.

DesJarlais, R. L., Sheridan, R. P., Seibel, G. L., Dixon,J. S., Kuntz, I. D. & Venkataraghavan, R. (1988).Using shape complementarity as an initial screen indesigning ligands for a receptor binding site ofknown three dimensional structure. J. Med. Chem.31, 722±729.

Dunbrack, R. L., Jr., Gerloff, D. L., Bower, M., Chen, X.,Lichtarge, O. & Cohen, F. E. (1997). Meeting review;the second meeting on the critical assessment oftechniques for protein structure prediction (CASP2),Asilomar, California, December 13±16, 1996. Folding& Design, 2, R27±R42.

Edmonds, D. T., Rogers, N. K. & Sternberg, M. J. E.(1984). Regular representation of irregular chargedistributions: Application to the electrostatic poten-tials of globular proteins. Mol. Phys. 52, 1487±1494.

Fehlhemmer, H., Bode, W. & Huber, R. (1977). Chrystalstructure of bovine trypsinogen at 1.8 AÊ resolution.II crystallographic re®nement, re®ned crystal struc-ture and comparison with bovine trypsin. J. Mol.Biol. 111, 415±438.

Fischmann, T. O., Bentley, G. A., Bhat, T. N., Boulot, G.,Mariuzza, R. A., Phillips, S. E. V., Tello, D. &Poljak, R. J. (1991). Crystallographic re®nement ofthe three-dimensional structure of the D1.3-lyso-zyme complex at 2.5 AÊ resolution. J. Biol. Chem. 266,12915±12920.

Freer, S. T., Kraut, J., Robertus, J. D., Wright, H. T. &Xuong, N. H. (1970). Chymotrypsinogen, 2.5 ang-stroms crystal structure, comparison with alpha-chymotrypsin, and implications for zymogenactivation. Biochemistry, 9, 1997±2009.

Fujinaga, M., Sielecki, A. R., Read, R. J., Ardelt, W.,Laskowski, M., Jr & James, M. N. G. (1987). Crystaland molecular structures of the complex of alpha-chymotrypsin with its inhibitor turkey ovomucoidthird domain at 1.8 AÊ resolution. J. Mol. Biol. 195,397±418.

Gilson, M. K. & Honig, B. (1987). Calculation of electro-static potentials in an enzyme active site. Nature,330, 84±86.

Gilson, M. K. & Honig, B. (1988). Calculation of thetotal electrostatic energy of a macro-molecyular sys-tem: solvation energies, binding energies, and con-formational analysis. Proteins, 4, 7±18.

Hecht, H. J., Szardenings, M., Collins, J. & Schomburg,D. (1991). Three-dimensional structure of the com-plexes between bovine chymotrypsinogen a andtwo recombinant variants of human pancreatic se-

cretory trypsin inhibitor (Kazal type). J. Mol. Biol.220, 711±722.

Hecht, H. J., Szardenings, M., Collins, J. & Schomburg,D. (1992). Three-dimensional structure of a recombi-nant variant of human pancreatic secretory trypsininhibitor (Kazal type). J. Mol. Biol. 225, 1095±1103.

Helmer-Citterich, M. & Tramontano, A. (1994). Puzzle: anew method for automated protein docking basedon surface shape complementarity. J. Mol. Biol. 235,1021±1031.

Hingerty, B. E., Ritchie, R. H., Ferrell, T. L. & Turner,J. E. (1985). Biopolymers, 24, 427.

Jackson, R. M. & Sternberg, M. J. E. (1995). A continuummodel for protein/protein interactions: applicationto the docking problem. J. Mol. Biol. 250, 258±275.

Janin, J. (1995a). Protein±protein recognition. Prog. Bio-phys.Mol. Biol. 64, 145±166.

Janin, J. (1995b). Elusive af®nities. Proteins, 21, 30±39.Jiang, F. & Kim, S. (1991). Soft docking matching of mol-

ecular surface cubes. J. Mol. Biol. 219, 79±102.Kabsch, W. (1978). A discussion of the solution for the

best rotation to relate two sets of vectors. Acta Crys-tallog. sect. A, 34, 827±828.

Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesen,A. A., A¯alo, C. & Wodak, S. J. (1992). Molecularsurface recognition: Determination of geometric ®tbetween proteins and their ligands by correlationtechniques. Proc. Natl. Acad. Sci. USA, 89, 2195±2199.

Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. &Ferrin, T. E. (1982). A geometric approach to macro-molecule-ligand interactions. J. Mol. Biol. 161, 269±288.

Laskiwski, R. A. (1995). Surfnet: A program for visualiz-ing molecular surfaces, cavities, and intermolecularinteractions. J. Mol. Graph. 13, 323±330.

Laskowski, R. A., Luscombe, N. M., Swindells, M. B. &Thornton, J. M. (1996). Protein clefts in molecularrecognition and function. Protein Sci. 5, 2438±2452.

Lattman, E. E. (1972). Optimal sampling of the rotationfunction. In The Molelcular Replacement Method(Rossman, M. G., ed.), pp. 179±185, Gordon andBreach, New York.

Lawrence, M. C. & Colman, P. M. (1993). Shape comple-mentarity at protein/protein interfaces. J. Mol. Biol.234, 946±950.

MacCullum, R. M., Martin, A. C. R. & Thornton, J. M.(1996). Protein±protein recognition. J. Mol. Biol. 262,732±745.

Maenaka, K., Matsushima, M., Song, H., Sunada, F.,Watanabe, K. & Kumagai, I. (1995). Dissection ofprotein±carbohydrate interactions in mutant henegg-white lysozyme complexes and their hydrolyticactivity. J. Mol. Biol. 247, 281±293.

Marquart, M., Walter, J., Deisenhofer, J., Bode, W. &Huber, R. (1983). The geometry of the reactive siteand of the peptide groups in trypsin and trypsino-gen and its complexes with inhibitors. Acta Crystal-log. 39, 480.

McPhalen, C. A. & James, M. N. G. (1987). Crystal andmolecular structure of the serine proteinase inhibi-tor ci-2 from barley seeds. Biochemistry, 26, 261±269.

McPhalen, C. A. & James, M. N. G. (1988). Structuralcomparison of two serine proteinase-protein in-hibitor complexes, eglin-c-subtilisin Carlsberg andCI-2-subtilisin novo. Biochemistry, 27, 6582±6589.

Meyer, M., Wilson, P. & D., S. (1996). Hydrogen bond-ing and molecular surface shape complementarity

Protein Docking 119

Page 15: Modelling Protein Docking using Shape Complementarity, …graphics.stanford.edu/courses/cs468-01-fall/Papers/gabb... · 2001. 9. 19. · Henry A. Gabb, Richard M. Jackson and Michael

as a basis for protein docking. J. Mol. Biol. 264,199±210.

Mitsui, Y., Satow, Y., Watanabe, Y. & Iitaka, Y. (1979).Crystal structure of a bacterial protein proteinaseinhibitor (streptomyces subtilisin inhibitor) at 2.6 AÊ

resolution. J. Mol. Biol. 131, 697±724.Nicholls, A. & Honig, B. (1991). A rapid ®nite difference

algorithm, utilizing successive over-relaxation tosolve the Poisson-Boltzmann equation. J. Comput.Chem. 12, 435±445.

Norel, R., Lin, S. L., Wolfson, H. L. & Nussinov, R.(1995). Molecular surface complementarity at pro-tein±protein interfaces: the critical role played bysurface normals at well placed sparse points indocking. J. Mol. Biol. 252, 263±273.

Padlan, E. A., Silverton, E. W., Sheriff, S., Cohen, G. H.,Smith-Gill, S. J. & Davies, D. R. (1989). Structure ofan antibody-antigen complex. crystal structure ofthe HyHEL-10 Fab-lysozyme complex. Proc. NatlAcad. Sci. USA, 86, 5938±5942.

Press, W. H., Teukolsky, S. A., Vetterling, W. T. &Flannery, B. P. (1986). Numerical Recipes in Fortran,Cambridge University Press, Cambridge.

Ptitsyn, O. B. (1995). Structures of folding intermediates.Curr. Opin. Struct. Biol. 5, 74±78.

Rogers, N. K. & Sternberg, M. J. E. (1984). Electrostaticinteractions in globular proteins: Different dielectricmodels applied to the packing of a-helices. J. Mol.Biol. 174, 527±542.

Sharp, K. & Honig, B. (1990). Electrostatic interactions inmacromolecules: theory and applications. Annu.Rev. Biophys. Biophys. Chem. 19, 301±332.

Sheriff, S., Silverton, E. W., Padlan, E. A., Cohen, G. H.,Smith-Gill, S. J. & Davies, D. R. (1987). Three-dimensional structure of an antibody-antigencomplex. Proc. Natl Acad. Sci. USA, 84, 8075±8079.

Shoichet, B. K. & Kuntz, I. D. (1991). Protein dockingand complementarity. J. Mol. Biol. 221, 327±346.

Shoichet, B. K. & Kuntz, I. D. (1996). Predicting thestructure of protein complexes: a step in the rightdirection. Chem. Biol. 3, 151±156.

Strynadka, N. C. J., Eisenstein, M., Katchalski-Katzir, E.,Shoichet, B. K., Kuntz, I. D., Abagyan, R., Totrov,M., Janin, J., Cher®ls, J., Zimmerman, F., Olson, A.,Duncan, B., Rao, M., Jackson, R., Sternberg, M. J. E. &James, M. N. G. (1996). Molecular docking pro-grams successfully predict the binding of a b-lacta-mese inhibitory protein to TEM-1 b-lactamese.Nature Struct. Biol. 3, 233±239.

Takechi, Y., Satow, Y., Nakamura, K. T. & Mitsui, Y.(1991). Re®ned crystal structure of the complex ofsubtilisin BPN' and streptomyces subtilisin inhibitorat 1.8 AÊ resolution. J. Mol. Biol. 221, 309±325.

Totrov, M. & Abagyan, R. (1994). Detailed ab initio pre-diction of lysozyme-antibody complex with 1.6 AÊ

accuracy. Nature Struct. Biol. 1, 259±263.Vakser, I. A. & A¯ao, C. (1994). Hydrophobic docking:

a proposed enhancement to molecular recognitiontechniques. Proteins, 20, 320±329.

Wall, L. & Schwartz, R. L. (1991). Programming PerlO'Reilly & Associates, Inc., Sebastopol, CA.

Walls, P. H. & Sternberg, M. J. E. (1992). New algorithmto model protein±protein recognition based on sur-face complementarity. J. Mol. Biol. 228, 277±297.

Warwicker, J. & Watson, H. C. (1982). Calculation of theelectric potential in the active site cleft due toa-helix dipoles. J. Mol. Biol. 157, 671±679.

Weng, Z. P., Vajda, S. & Delisi, C. (1996). Prediction ofprotein complexes using empirical free-energyfunctions. Protein Sci. 5, 614±626.

Wlodawer, A., Deisenhofer, J. & Huber, R. (1987). Com-parison of two highly re®ned structures of bovinepancreatic trypsin inhibitor. J. Mol. Biol. 193, 145±156.

Edited by J. Thornton

(Received 3 March 1997; received in revised form 21 March 1997; accepted 7 June 1997)

Note added in proof: For availability of the program FTDOCK please see our home page(http://www.icnet.uk/bmm) or contact M. J. E. S. at [email protected].

120 Protein Docking