drug information journal 1994 meng 735 49

15
Drug Information Journal. Vol. 28. pp. 735-149, 1994 Printed in the USA. All righls reserved. 0092-8615/94 Copyright C 1994 Drug lnformation Association InC. MOLECULAR DOCKING: A TOOL FOR LIGAND DISCOVERY AND DESIGN ELAINE C. MENG, BS PHARM IRWIN D. KUNTZ, PHD Graduate Fellow Professor ,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, California The ability to propose reasonableligand-receptor binding geometries is crucial to the success of structure-based drug design. One approach is to “‘dock”mo1ecules together in many ways and then ‘kcore”or evaluate each orientation; in a database of com- pounds, those which score well should be more likely to bind to the target macromol- ecule. A method that combines a rapid, geometric docking algorithm with the evalua- tion of molecular mechanics interaction energies is presented. The ‘Lforce field score” is successful in identifying the experimental binding mode infour systems, and repre- sents an improvement over the other scoring methods tested. The degree of orienta- tional sampling required to reproduce and identify the known geometries, with and without energy-minimization, is also investigated. Both scoring and sampling issues are of paramount importance to the usefulness of molecular docking in real-life ap- plications. Key Words: Automated; Docking; Interaction; Binding; Complementarity INTRODUCTION A REVOLUTION IS occurring, in which disease is being studied and understood at an increasingly basic level. Some condi- tions are known to result from a single ge- netic defect; more frequently, pathologi- cal processes but not etiologies have been described in molecular terms. Recombinant DNA technology has fos- tered many discoveries in this area, and in addition, can yield the quantities of pro- tein needed for structure determination. Reprint address: Irwin D. Kuntz, Ph.D., Professor, Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143-0446. The database of known protein structures has increased rapidly in recent years. Al- though most disorders cannot be traced to a single molecular defect, there are gener- ally several points at which a drug may act to alter the course or symptoms of a dis- ease. The targets for action are often pro- teins, and when the structure of a target protein is known, one can theoretically find or devise a compound that will bind to it and affect its biochemical activity. Molecular docking may be useful as a screening procedure to find compounds that will bind to a site of known structure. Several computational approaches to mo- lecular docking exist. In some, the user is responsible for positioning the molecules (1-3); in others, the molecules are auto- matically positioned by algorithms that 733 at Jomo Kenyatta University of A on August 4, 2015 dij.sagepub.com Downloaded from

Upload: muhammad-azhari-herli

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

DESCRIPTION

Drug Information

TRANSCRIPT

Page 1: Drug Information Journal 1994 Meng 735 49

Drug Information Journal. Vol. 28. pp. 735-149, 1994 Printed in the USA. All righls reserved.

0092-8615/94 Copyright C 1994 Drug lnformation Association InC.

MOLECULAR DOCKING: A TOOL FOR LIGAND DISCOVERY AND DESIGN

ELAINE C. MENG, BS PHARM

IRWIN D. KUNTZ, PHD Graduate Fellow

Professor

,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, California

The ability to propose reasonable ligand-receptor binding geometries is crucial to the success of structure-based drug design. One approach is to “‘dock”mo1ecules together in many ways and then ‘kcore”or evaluate each orientation; in a database of com- pounds, those which score well should be more likely to bind to the target macromol- ecule. A method that combines a rapid, geometric docking algorithm with the evalua- tion of molecular mechanics interaction energies is presented. The ‘Lforce field score” is successful in identifying the experimental binding mode in four systems, and repre- sents an improvement over the other scoring methods tested. The degree of orienta- tional sampling required to reproduce and identify the known geometries, with and without energy-minimization, is also investigated. Both scoring and sampling issues are of paramount importance to the usefulness of molecular docking in real-life ap- plications.

Key Words: Automated; Docking; Interaction; Binding; Complementarity

INTRODUCTION

A REVOLUTION IS occurring, in which disease is being studied and understood at an increasingly basic level. Some condi- tions are known to result from a single ge- netic defect; more frequently, pathologi- cal processes but not etiologies have been described in molecular terms.

Recombinant DNA technology has fos- tered many discoveries in this area, and in addition, can yield the quantities of pro- tein needed for structure determination.

Reprint address: Irwin D. Kuntz, Ph.D., Professor, Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143-0446.

The database of known protein structures has increased rapidly in recent years. Al- though most disorders cannot be traced to a single molecular defect, there are gener- ally several points at which a drug may act to alter the course or symptoms of a dis- ease. The targets for action are often pro- teins, and when the structure of a target protein is known, one can theoretically find or devise a compound that will bind to it and affect its biochemical activity.

Molecular docking may be useful as a screening procedure to find compounds that will bind to a site of known structure. Several computational approaches to mo- lecular docking exist. In some, the user is responsible for positioning the molecules (1-3); in others, the molecules are auto- matically positioned by algorithms that

733 at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 2: Drug Information Journal 1994 Meng 735 49

734 Elaine C. Meng and Irwin D. Kuntz

vary from exhaustive to stochastic to de- terministic (4-1 1). Automated methods are often preferable to manual docking, as they are less dependent upon the precon- ceptions of the user and can be used to search through databases much too large for human perusal.

During the evaluation of binding orien- tations, a balance between thermody- namic accuracy and computational tracta- bility is desirable. Time efficiency can be enhanced by precomputing terms in the scoring function that are sums over recep- tor atoms. In the current work, as in the program GRID (12) and several of the docking methods (1-3,10), these sums are evaluated at each point on a three-dimen- sional grid. Parameters that approximate the AMBER force field (1 3,14) are used to score ligand orientations from the rigid- body docking procedure of Kuntz and co- workers, DOCK (5,15-17). Encouraging results are obtained in reproducing the crystal structures of four complexes.

COMPUTATIONAL METHODS

The algorithm can be divided into two phases. In the geometric phase, ligand atoms are matched to points that describe the receptor site to produce orientations. In the scoring phase, precalculated grids are used to assign scores per ligand atom. The phases are distinct, in that scores do not drive positioning.

Geometric Phase

The receptor site is described by a collec- tion of spheres (5 ) . Each sphere is con- structed to touch the molecular surface (18) at two points and is centered along the surface normal at one of the points. Only one sphere per surface atom, the largest that does not intersect the surface, is generally retained; groups of overlapping spheres are referred to as clusters. Usu- ally, the largest cluster occupies the site of interest. One or more clusters can be se- lected for docking.

Orientations are generated by finding sets of ligand atoms that match sets of sphere centers, then performing a least- squares superimposition (19). Sets are considered to match if their pair-wise in- ternal distances correspond, within some tolerance. The distances are presorted into “bins” of controllable width and overlap (16,17) such that the thoroughness of ori- entational sampling can easily be varied.

Scoring Phase

Three types of score are tested, each using grids calculated before the docking stage: a contact score, an electrostatic interaction energy, and a molecular mechanics inter- action energy. The program DISTMAP (1 6,17) produces the grid for contact scor- ing. For each grid point, the number of re- ceptor atoms within an acceptable dis- tance range is stored, unless one or more is too close; then a large negative number is stored. Hydrogens are ignored. Each li- gand atom receives the score of the nearest grid point, and the total contact score is the sum over the ligand atoms.

The electrostatic interaction energy is based on potentials from the Delphi pro- gram (20,21). Delphi solves the Poisson- Boltzmann equation for a system of point charges embedded in a region of low dielectric (the receptor molecule) sur- rounded by a region of high dielectric (the solvent). In the current implementation, it is assumed that a suitable potential can be calculated using the receptor alone; it is not recalculated in the presence of the li- gand. Running the Delphi program for each ligand-receptor geometry is presently not feasible, as thousands of orientations must be evaluated. The electrostatic po- tential at each ligand atom is obtained by interpolation and multiplied by the atom’s point charge to give the interaction en- ergy; summing over the ligand atoms yields the total “Delphi score.”

The program CHEMGRID produces the grid values for force field scoring. Force field scores are approximate molec-

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 3: Drug Information Journal 1994 Meng 735 49

Molecular Docking 735

ular mechanics interaction energies, con- sisting of van der Waals (VDW) and elec- trostatic components:

where each term is a double sum over li- gand atoms i and receptor atoms j, A,, and Bij are VDW repulsion and attraction parameters, rli is the distance between atoms i and j , qi and qj are the point charges on atoms i and j, D is the dielectric function, and 332.0 is a factor that con- verts the electrostatic energy into kcal/ mol. Equation (1) contains all of the inter- molecular terms in the AMBER potential function (13) except for the 10-12 “hydro- gen bond” contribution. Hydrogen bond energies are largely accounted for in the electrostatic term; the 10-12 function ex- ists mainly to fine tune hydrogen bond ge- ometries (13).

A geometric mean approximation is used for the VDW parameters (2,22):

A, = a. and B, = 6. (2)

This allows the sums over receptor atoms in equation (1) to be separated out:

Three values are stored for every grid point k, each a sum over receptor atoms within a user-defined cut-off distance of the point:

rec

esval = 332.0 c *. (4) j = I Dri,

These values, with or without interpola- tion, are subsequently multiplied by the appropriate ligand values to give the inter- action energy:

- (bval) + q,(esval)) . (5)

Input to CHEMGRID includes the grid resolution, location, and dimensions; the form of the dielectric function (constant or distance-dependent); a scaling factor for the dielectric function; and the cut-off distance for interactions. The present work uses AMBER united-atom parame- ters (13) for the receptors and AMBER all- atom VDW parameters (14) for the li- gands, except that hydrogens bonded to electronegative atoms are considered vol- umeless. One can easily use other param- eter sets, however, without changing the code. Because the grid values are stored in one-dimensional arrays, any combination of spacing and x, y , and z extents may be used as long as the total number of points does not exceed the array size (lo6 in the current work).

Test Systems and Run Parameters

Four crystallographic complexes with res- olution 2.0 angstroms or lower were cho- sen from the Brookhaven Protein Data Bank (23) (Figure 1): 4dfr (25) (dihydro- folate reductase and methotrexate), 6rsa (26) (ribonuclease A and uridine vana- date), 2gbp (27) (periplasmic binding protein and glucose), and 3cpa (carboxy- peptidase A and glycyltyrosine; structure determined by W. N. Lipscomb). Differ- ent aspects of complementarity are evident in these systems, including salt bridge for- mation, hydrogen bonding, and hydro-

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 4: Drug Information Journal 1994 Meng 735 49

736 Elaine C. Meng and Irwin D. Kuntz

4dfr 6rsa

2gbp UCSF Midasplus 3cpa

FIGURE 1. Test systems: Ca representations of the protelns, shown wlth Ilgands, sphere centers used for docklng (trlangies), and boxes outllnlng the force fleid grids.

Pictures generated wlth UCSF MldasPlus (24).

phobic interactions. In each case, crystal- lographic waters and ions were removed; the ligand and receptor were separated and hydrogens were added as necessary, in standard geometries. A partial molecular surface was calculated, excluding the por- tion of the molecule farthest from the site of interest, and used in sphere calculation. The largest of the resulting sphere clusters was selected for docking. In the DIST- MAP (contact grid) calculation, the polar and nonpolar close contact limits were 2.3 and 2.8 angstroms, respectively, the cut- off for a "good" contact was 4.5 ang- stroms, and the grid spacing was three points per angstrom. Delphi runs used the entire receptor with AMBER united-atom

partial charges (13) and three-step focus- ing (21), in which the protein occupied 20, 60, and then 90% of the electrostatic po- tential grid. Internal and external dielec- tric constants were four and 80, respec- tively, the ionic strength was 0.145 M, the ion exclusion radius was 2.0 angstroms, and the probe radius was 1.4 angstroms. Force field grids were calculated in CHEMGRID, using the entire receptor, 0.3-angstrom spacing, a 10.0-angstrom cut-off, and D = 4r.

For the initial, high-sampling DOCK runs, distance-matching parameters were chosen to yield several thousand orienta- tions that were not in violation of the close contact limits set in DISTMAP. Subse-

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 5: Drug Information Journal 1994 Meng 735 49

Molecular Docking 73 7

quently, runs were performed for each system at intermediate and low sampling levels. Force field scoring was performed with interpolation of grid values, and the root-mean-square deviation (RMSD) of each orientation from the crystallographic position was calculated.

All 12 sets of orientations (three sam- pling levels in each of four systems) were subjected to rigid-body minimization with a version of Blaney’s program RGDMIN (28), modified by Daniel Gschwend to use the force field scoring function.

RESULTS

All predocking calculations were per- formed on Silicon Graphics IRIS 4D/25 workstations. Molecular surface genera- tion took one to two minutes, sphere gen- eration 3-72 minutes, contact grid calcula- tion with DISTMAP one to two minutes, electrostatic potential calculation with Delphi four to five minutes, and force field grid calculation with CHEMGRID 17-24 minutes central processing unit (CPU) time, depending on the system. The DOCK run times and minimization times on a Silicon Graphics IRIS 4D/35 work- station are listed in Table 1.

Results of the high-sampling DOCK runs are shown in Figures 2-5. The RMSD of the ligand from the crystallographical- ly-determined position is plotted versus contact score and force field score. Results of the intermediate- and low-sampling runs are discussed but not shown. In view- ing the plots, it should be noted that there is no reason to expect a simple correlation between RMSD and score, since the most favorable alternative sites are not neces- sarily those closest to the crystallographi- cally-determined binding site.

Dihydrofolate Reductase

N 1-protonated 2,4-diamino-6-methylpter- idine, the rigid part of methotrexate, was docked to dihydrofolate reductase. Use of the entire methotrexate molecule proved

to be too restrictive; relatively few orienta- tions were generated.

STO-3G-derived partial atomic charges (29) were used for the ligand, 86 spheres described the binding site, and 2,617 ori- entations were written out. The best (high- est) contact score corresponds to a low RMSD (Figure 2). Other orientations that receive high contact scores include 2.8-angstrom structures, which are bar- rel-rolled and angled slightly relative to the crystallographic orientation; 4.0-ang- strom structures, which are angled ap- proximately 90°; and 4.8-angstrom struc- tures, which are flipped end-to-end. While these structures are also favorable accord- ing to the force field score (Figure 2) and the Delphi score (not shown), members of the lowest-RMSD family receive the best scores.

The results were not significantly changed by the use of a coarser grid with 0.5-angstrom spacing, an “infinite” cut- off distance for interactions, or different charge sets for the ligand, generated with the Gasteiger-Marsili (30) and Gasteiger- Hiickel(30,31) options in SYBYL 5.4 (32). The lowest-RMSD family of orientations can be identified by force field score at all three levels of sampling, both before and after rigid-body minimization.

Ribonuclease A

Uridine 3 ’-phosphate with AMBER all- atom charges (14) was docked to ribo- nuclease A. It was constructed from the crystallographic ligand, uridine vanadate, by changing atom types as necessary and optimizing the phosphate geometry with the Tripos force field (32).

The cluster for docking contained 47 spheres and 3,738 orientations were writ- ten out. Six of the eight best contact scores, including the highest of all, corre- spond to the lowest-RMSD family of ori- entations (Figure 3). The force field score (Figure 3) and the Delphi score (not shown) are also able to distinguish the

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 6: Drug Information Journal 1994 Meng 735 49

738 Elaine C. Meng and Irwin D. Kuntz

TABLE 1 DOCK Orlentatlonal Sampilng and Run Times’; Mlnlmlzation Times’

DOCK Minimization Run # Foundb #WrittenC timed (s) time’(hr : min)

4dfr-high 67,234 2,406 103 10 : 39 4dfr-med 17,354 869 43 3:51 4dfr-low 7,158 337 30 1 :30

6rsa-high 77,184 3,492 146 9 : 26

6rsa-low 1,693 105 28 0 : 15

29 bp-h ig h 196,752 1,680 1 78 6 : 37 29 bp-med 21,037 849 60 2 : 25 29 bp-low 7,773 389 48 1 :04

*pa-high 794,541 2,684 766 23 : 24

*pa-low 4,053 301 25 1 :42

6rsa-med 4,785 266 34 0 : 39

3cpa-med 7,946 51 8 33 3 : 00

‘CPU time on a Silicon Graphics Iris 4D135 workstation bTotal number of orientations generated ‘Number of orientations written out, controlled by score cut-offs dBoth contact scoring and force field scoring performed ‘Only the orientations written out were minimized

lowest-RMSD dockings from other orien- tations. The highest-ranking alternative structures have RMSDs of 4.0-6.0 ang- stroms and place the phosphate essentially correctly, with the rest of the molecule angled 60-90° relative to the crystallo- graphic orientation. The 10.0-13.0-ang- strom structures with favorable force field scores are approximately related to the known orientation by a plane of reflec- tion; the true and image phosphates face each other through the nitrogen of a nearby lysine side chain. Such results are evidence of the weight placed upon charge-charge interactions by the scoring function, especially in this test case where the ligand bears a net charge of -2. Re- sults were virtually unchanged using Gasteiger-Marsili and Gasteiger-Huckel charge sets for the ligand.

The lowest-RMSD family of orienta- tions can be identified by force field score at all three levels of sampling, both before and after minimization. The separation in score from other orientations, however, is especially small in the intermediate sam- pling set before minimization and in the

low sampling set after minimization. In the case of low sampling, minimization ac- tually decreases the separation in score be- tween representatives of the “correct” binding mode and structures with 5.0- angstrom RMSDs.

Penplasmic Binding Protein

The complex of periplasmic binding pro- tein and glucose was expected to present a challenge to the scoring functions. Since glucose bears no net charge and is roughly an oblate ellipsoid, neither charge nor shape will strongly differentiate among the various orientations possible. In addi- tion, periplasmic binding protein has a high affinity for the a- and 8-anomers of D-glucose and D-galactose; apparently, each of the four isomers can participate in 13 hydrogen bonds with the receptor (27).

P-D-glucose with Gasteiger-Marsili charges was docked, 75 spheres were in the cluster of interest, and 2,265 orientations were written out. Three obvious clusters of RMSD values resulted (Figure 4). The lowest RMSDs correspond to structures

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 7: Drug Information Journal 1994 Meng 735 49

4dfr: RYSD vs. contact score

..

. . -

.. 9 0

2 . . ...

-. .. . . . . . . . - . - . . . . ..-. . . -.- . . . . . . . . . . . - . . -. . ..... . . . . - ... . . - . . . . . . .

_. -. . . .- .. . . . . _. . . . . . . ..... . .- ..... . . . . . . . . . . ...- - . . - . . . . . . . . . . . ..- ....... ... .__ .. -.. ... ......... . . .-.. . .. .- . . . ....... - . . . . . . . . . . . ...... . . .-.. . . . . . . - . . . . . -.-. -... . .-.. . .

9 m 0 . . . .

. . . .... . ... . ... . . . . .

I . .- .- . . . . . ... .. ..

. .- . ." ..... . . . ... . I

I I I I I I I I 0.0 10.0

RYSD

4dfr: RYSD vs. force field score

9 0

2 d

I

8 N I

2 0

I

. * ... . . . . . . . . . . -..

. . . . ..I ..

. -.. - *. , .. .*-< . . ...... . . . . . . . .

. : . - ,$ . . . . . . .

0.0 10.0

R Y S D

FIGURE 2. 4dfr test case, using STO-BG charges for the ligand: RMSDs versus contact scores (top) and force field scores under 0.0 kcallmol (bottom).

739 at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 8: Drug Information Journal 1994 Meng 735 49

6rsa: RMSD vs. contact score

1 - 1: . ". I

* . . . . . ;*.l .* . j. *.A . . . . *I.. ... : -.

I . . ...

.. 3 .. i:

. .

. . * - . . . . . . . . . . .

" . . . . . . . . . .-. I:..% .. * I . . . i . . . . . . . .

0.0 10.0

R Y S D

6rsa: RYSD vs. force field score

I

. . .

0.0 10.0

R Y S D

FIGURE 3. 6rsa test case, using AMBER charges for the Iigand: RMSDs versus contact scores (top) and force field scores under 0.0 kcal/mol (bottom).

740 at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 9: Drug Information Journal 1994 Meng 735 49

2gbp: RYSD vs. contact score

"1

... .- .. . . . . . . * . ? I.... ... *.

-.- .. . . . . .. . .. . . I .

I I I I I I I I I I I I I

0 .o 5.0 10.0

R Y S D

2gbp: RYSD vs. force field score

R Y S D

FIGURE 4. 2gbp test case, using Gasteiger-Marslll charges for the ligand: RMSDs ver- sus contact scores (top) and force field scores under 0.0 kcalhoi (bottom).

74 1 at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 10: Drug Information Journal 1994 Meng 735 49

3cpa: RMSD vs. contact more

. .

. - . . -. ..

I 1 I I I I I I I I

5.0 10.0

RMSD

3cpa: RYSD vs. force field score

- a . .. ... . . . :v- .: *I . . . . ... . *.

a .

- : * :$.: :. . . .

* a .

.. . . - . %

' \ $ .:. . . *. .:: .* * .r :: .

:j.

. . .. - . .,. p:. .<

*.. * .. .. . C

- . . . . .L - . .>. *: * 8. . . . .. .

.I . . * ..**, ;* :- * . - .I .A* * .. - . *,

* .* .* - I

. . . . . .. . .. . - .. >: . . -- I . .. . .. . - - . .- . 8.. . . . 5: . - - .*; $ ' .. . . . - - I l l I l l I l l I l l I

2.0 4.0 0.0 8.0

RMSD

FIGURE 5.3cpa test case, using AMBER charges for the ligand: RMSDs versus contact scores (top) and force field scores under 0.0 kcalhoi (bottom).

742 at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 11: Drug Information Journal 1994 Meng 735 49

Molecular Docking 743

resembling the crystallographic orienta- tion, the 3.0-5.0-angstrom RMSDs corre- spond to orientations that overlap the crystal structure ligand but are flipped or rotated, and the high RMSDs correspond to structures located in either end of the tunnel that traverses the protein (Figure 1). There are constrictions that prevent orientations from being distributed throughout the tunnel. This may pose a problem for methods in which the protein conformation is held constant and the li- gand is moved through a representation of real space, since a large energy barrier must be surmounted to reproduce the known geometry of the complex. In real- ity, of course, protein mobility allows pas- sage of the ligand into the binding site.

The contact score favors orientations in the correct region of space over the end- of-the-tunnel dockings, but the highest rankings go to 3.0-angstrom structures (Figure 4). The Delphi score (not shown) also suggests that 3.0-4.0-angstrom struc- tures are the most favorable. Only the force field score is successful in identifying structures with RMSDs below 1.0 ang- strom (Figure 4).

At the highest level of sampling, the force field score identifies the lowest- RMSD family of orientations as the most favorable both before and after minimiza- tion. At the intermediate and low levels of sampling, however, even though at least one “correct” orientation is generated, it does not receive the best score until after minimization.

Carboxypeptidase A

The carboxypeptidase A/glycyltyrosine structure contains close contacts, so it is not possible to reproduce the experimental geometry exactly without decreasing the close contact limits or allowing them to be violated. Because a ligand atom receives the attributes of the nearest grid point, however, it is possible for an acceptable orientation to violate the limits by as much

as 0.866 x (contact grid spacing), or 0.29 angstroms in the present work.

AMBER all-atom charges (14) were used for the ligand. There were 47 spheres in the cluster of interest, and 4,327 orien- tations were written out. Structures with RMSDs below 2.0 angstroms describe es- sentially the experimental binding mode. Relative to the crystallographic orienta- tion, dockings with RMSDs just above 2.0 angstroms are angled slightly, 3.0-4.0- angstrom structures are barrel-rolled and translated along the long axis of the mole- cule, and structures with RMSDs greater than 6.0 angstroms are flipped end-to- end. The highest contact score corre- sponds to a member of the lowest-RMSD family, but good scores are also given to several of the end-to-end flipped struc- tures (Figure 5). The same could be said of the Delphi scores (not shown). With these scoring methods, the best values are outli- ers. The force field score, however, clearly selects a low-RMSD cluster; the 45 best scores all correspond to structures with RMSDs below 2.0 angstroms (Figure 5 ) .

Results were essentially the same using Gasteiger-Marsili and Gasteiger-Hiickel charge sets for glycyltyrosine. As in the periplasmic binding protein system, mem- bers of the lowest-RMSD family of orien- tations receive the best force field scores both before and after minimization when sampling is intensive, but only after mini- mization when intermediate or low sam- pling is performed.

Summary

The results of docking with high orienta- tional sampling are summarized in Table 2. Some observations can be made. First, only the force field score is successful in identifying Orientations close to the crys- tallographic result in all four systems. Sec- ond, the identification of a family of ori- entations near the experimental structure is robust. This family is well represented in the set of best force field scores. Third, al-

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 12: Drug Information Journal 1994 Meng 735 49

744 Elaine C. Meng and Irwin D. Kuntz

TABLE 2 Comparison of Best-Scoring Orientations from High-Sampling DOCK Runs

Contact Delphi Electrostatic Force Field

System: Score RMSD” Scoreb RMSD” Scoreb RMSD”

4dfr:” 136 1.42 -6.347 1.50 -30.829 0.64 6 r ~ a : ~ 149 0.65 - 16.21 4 0.92 - 58.429 0.56 2gbp:’ 162 3.01 -4.252 3.34 -21.483 0.29 3 ~ p a : ~ 165 1.52 -11.914 1.86 -41.302 1.16

~

“Angstroms, relative to crystal structure orientation bKcal/mol ‘STO-3G charges for ligand “AMBER all-atom charges for ligand ‘Gasteiger-Marsili charges for ligand

ternative binding modes that are reason- able according to the force field score are found in each case; these are inherently plausible and may be worth considering in ligand design. Finally, varying the extent of orientational sampling shows that a thorough search is required for the consis- tent identification of the “correct” binding mode by score.

DISCUSSION

The generation of feasible binding modes is important to the process of structure- based drug design. One application is the identification of the preferred mode of binding of a specific conformation of a li- gand. Only with this information can one suggest structural modifications intended to form, enhance, or disrupt specific inter- actions with the receptor. A further appli- cation is the discovery of ligands by searching through databases of molecules. Performance in either application hinges on adequate sampling and energy evalua- tion (scoring) within affordable amounts of computer time.

Sampling

DOCK samples the six degrees of freedom involved in the relative placement of two rigid, three-dimensional objects. Tens to hundreds of low-RMSD structures were

found in each test case, suggesting that ad- equate orientational sampling can be per- formed with relative ease when the ligand and receptor conformations are known. Conformational sampling remains prob- lematic (see “Limitations”).

Scoring

The best contact scores are associated with low RMSD dockings in three out of the four cases. The number of low-RMSD/ high-score orientations and their separa- tion in score from alternative modes, how- ever, are smaller for contact scores than for the other options tested. These prob- lems are most severe when the ligand has a roughly symmetric shape, such as that of a cylinder (glycyl-L-tyrosine) or oblate el- lipsoid (P-D-glucose). Contact scoring is most useful for discarding orientations that overlap receptor atoms and for find- ing “templates”: frameworks that fit the site but must be altered to produce electro- static complementarity.

The Delphi score is fairly successful when the ligand bears a formal charge, but like the contact score, fails to identify the crystallographic binding mode of glucose in periplasmic binding protein. It should be noted that the calculation of Delphi electrostatic interaction energies as de- scribed above is not rigorously correct. A more rigorous application of Delphi to

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 13: Drug Information Journal 1994 Meng 735 49

Molecular Docking 745

calculate electrostatic interaction energies in solution has been described (33), and in- volves evaluation of a full thermodynamic cycle including the bound and unbound states of the molecules. Assuming that a suitable potential map can be obtained us- ing the receptor alone leads to the underes- timation of favorable interactions; dipoles induced upon binding and solvent exclu- sion by the ligand are not modeled (34).

Of the options investigated, the force field score is the most successful in identi- fying ligand-receptor configurations that resemble the experimental geometry (Ta- ble 2). It is important, however, not to ov- erinterpret the results. The force field score, while combining steric and electro- static aspects of molecular recognition, represents at best an estimate of the en- thalpy of interaction. The calculation of entropy, and thus of free energy, requires sampling of a statistical ensemble of sys- tem configurations. Furthermore, a rigor- ous description of events in aqueous solu- tion must include explicit water molecules, and evaluation of the unbound as well as the bound structures. These issues are ap- plicable to any molecular mechanics study of complexation energetics. Approxima- tions in addition to those of standard force field calculations include the neglect of in- tramolecular terms and the discretization of space (the grid approximation). Com- parisons to interaction energies from AMBER (nongrid calculations) suggest that the results are not degraded for the most favorable orientations, which are also the most interesting and the most im- portant for the success of the method.

Time Requirements

The calculations described above can be grouped into preprocessing and docking stages. Preprocessing includes calculation of a molecular surface, generation of site- filling sphere clusters, and the creation of grids for scoring. For the systems tested, these steps took approximately an hour on a Silicon Graphics 4D/25 workstation.

Docking by internal distance matching is quite rapid (Table 1). The algorithm is flexible as well as powerful, in that the user may easily vary the thoroughness of the procedure and the number of sterically allowed orientations that will be found. The added time requirements of force field scoring compared to contact scoring are small; one can expect up to a 50% in- crease in CPU time for single-molecule docking (typically only one to two min- utes) and up to a 10% increase for data- base searching. Thus, the time costs of more sophisticated calculations can, in part, be shifted to the predocking stage and traded for increased usage of physical memory.

When sampling is relatively sparse, the correct binding mode may not be identi- fied unless minimization is performed. Re- sults show that currently, it is more time- efficient to sample thoroughly than to combine low sampling with minimization (Table 1). There is also the danger of miss- ing the most favorable orientational fami- lies altogether when sampling is too low.

Limitations

It should be noted that the approach de- scribed here does not address conforma- tional flexibility (1,10,35,36), keep track of surface area burial (1,7,11) or solvation energy (4), or include energy minimization (1-4,9,10). These features are present in some of the other docking algorithms, al- beit at a computational cost.

A goal of the method is to find lead compounds in an efficient way, rather than finding every molecule in the data- base that might bind to the receptor. Some leads may be missed because they are in the wrong conformation for binding. In conjunction with DOCK, flexibility has been treated by docking molecule frag- ments separately and then joining them (39, combining docking with conforma- tional searching (36), and including multi- ple conformations of molecules in data- bases for searching. Although surface area

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 14: Drug Information Journal 1994 Meng 735 49

746 Elaine C. Meng and Irwin D. Kuntz

burial and related solvation energy calcu- lations (37) have proven useful in identify- ing misfolded protein structures (37,38), they have been less helpful in docking studies (16).

Full energy-minimization of docked complexes requires parameterization of each “ligand” molecule. This is difficult when large numbers of compounds are to be evaluated. Some docking methods (2- 4) include rigid-body minimization, in which there are no intramolecular degrees of freedom; only nonbonded parameters are required. Although minimization is useful for finding local optima, it adds to the costs of computation and cannot salv- age insufficient sampling.

Finally, there are limitations inherent in the structure of the receptor as well as in the scoring functions. Crystal structures contain the effects of static and thermal disorder; only average atomic positions can be derived from the diffraction data, and these do not necessarily match the co- ordinates for any single molecule within the crystal lattice. Bias may result from the use of energy terms during refinement and the placement of hydrogens in “standard geometries .”

CONCLUSION

Molecular mechanics scoring capabilities have been added to a rapid, geometric docking algorithm. The computational costs of force field scoring are minimized by precalculation steps. The results of re- docking the components of four crystallo- graphic complexes are encouraging, as the force field score is able to identify the cor- rect family of orientations in each case. Scoring methods that consider solely ste- rics or solely electrostatics are less success- ful. Although many approximations are made, it seems that a reasonable balance between rigor and computational tracta- bility has been achieved.

The results of database searching are highly dependent on the scoring function and the degree of orientational sampling performed. Improving the evaluation of

orientations of a single molecule is, there- fore, an important step in improving the effectiveness of searching for lead com- pounds. Sampling must also be sufficient, however, because a favorable orientation must be generated before it can be identi- fied. The current work suggests that unless minimization is performed, it is best to sample several thousand orientations per ligand molecule.

SPHGEN, DISTMAP, and contact scoring are included in DOCK version 2.0 (16,17); CHEMGRID and force field scor- ing are included in DOCK version 3.0. DOCK and associated programs are im- plemented in Fortran77 and available from I. D. Kuntz.

Acknowledgments: The authors gratefully ac- knowledge support by NIH grants GM-31497 (1. D. Kuntz) and GM-39552 (G. L. Kenyon), DARPA grant MDA-91-J-1013 (F. E. Cohen), and Glaxo Inc.; thanks are due also to K. A. Sharp, A. R. Leach, D. A. Pearlman, P. A. Kollman, R. Lan- gridge, and the UCSF Computer Graphics Labora- tory for their advice and assistance; and to Tripos Associates, Inc. (St. Louis) for providing the SYBYL molecular modeling package.

REFERENCES 1. Busetta B, Tickle IJ, Blundell TL. DOCKER, an

Interactive Program for Simulating Protein Re- ceptor and Substrate Interactions. J Appl Cryst. 1983; 16(4):432-437.

2. Pattabiraman N, Levitt M, Ferrin TE, Langridge R. Computer Graphics in Real-Time Docking with Energy Calculation and Minimization. J Comp Chem. 1985;6(5):432-436.

3. Tomioka N, Itai A, Iitaka Y. A Method for Fast Energy Estimation and Visualization of Protein- Ligand Interaction. J CompAided Mol Design.

4. Wodak SJ, Janin J. Computer Analysis of Pro- tein-Protein Interaction. J Mol Biol. 1978;

5. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE. A Geometric Approach to Macromol- ecule-Ligand Interactions. J Mol Biol. 1982;

6 . Goodsell D, Dickerson RE. Isohelical Analysis of DNA Groove-Binding Drugs. J Med Chem.

7. Connolly ML. Shape Complementarity at the Hemoglobin Alpha 1 Beta 1 Subunit Interface. Biopolymers. 1986;25(7): 1229-1 247.

1987; l(3): 197-210.

124(2):323-342.

16 1 (2):269-288.

1986;29(5):727-733.

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from

Page 15: Drug Information Journal 1994 Meng 735 49

Molecular Docking 74 7

8. Billeter M, Have1 TF, Kuntz ID. A New Ap- proach to the Problem of Docking Two Mole- cules: The Ellipsoid Algorithm. Biopolymers.

9. Lipkowitz KB, Zegarra R. Theoretical Studies in Molecular Recognition: Rebek’s Cleft. J Comp Chem. 1989;10(5):595-602.

10. Goodsell DS, Olson AJ. Automated Docking of Substrates to Proteins by Simulated Annealing. Proteins. 1990;8(3):195-202.

1 1 . Jiang F, Kim S-H. “Soft Docking”: Matching of Molecular Surface Cubes. J Mol Biol. 1991;

12. Goodford PJ. A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules. JMed Chem. 1985;28(7):849-857.

13. Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S Jr, Weiner P. A New Force Field for Molecular Mechanical Sim- ulation of Nucleic Acids and Proteins. J Am Chem SOC. 1984;106(3):765-784.

14. Weiner SJ, Kollman PA, Nguyen DT, Case DA. An All Atom Force Field for Simulations of Pro- teins and Nucleic Acids. J Comp Chem. 1986;

15. DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID, Venkataraghavan R. Using Shape Complementarity as an Initial Screen in Design- ing Ligands for a Receptor Binding Site of Known Three-Dimensional Structure. J Med Chem. 1988;3 1(4):722-729.

16. Shoichet BK, Kuntz ID. Protein Docking and Complementarity. JMol Biol. 1991;221(1):327- 346.

17. Shoichet BK, Bodian DL, Kuntz ID. Molecular Docking Using Shape Descriptors. J Comp Chem. 1992; 13(3):380-397.

18. Connolly ML. Solvent-Accessible Surfaces of Proteins and Nucleic Acids. Science. 1983;

19. Ferro DR, Hermans J. A Different Best Rigid- Body Molecular Fit Routine. Acta Crystallogr.

20. Klapper I, Hagstrom R, Fine R, Sharp K, Honig B. Focusing of Electric Fields in the Active Site of Cu-Zn Superoxide Dismutase: Effects of Ionic Strength and Amino-Acid Modification. Pro- teins. 1986;1(1):47-59.

21. Gilson MK, Sharp KA, Honig BH. Calculating the Electrostatic Potential of Molecules in Solu- tion: Method and Error Assessment. J Comp Chem. 1987;9(4):327-335.

22. Hagler AT, Huler E, Lifson S. Energy Functions for Peptides and Proteins. I. Derivation of a Consistent Force Field Including the Hydrogen Bond from Amide Crystals. J Am Chem SOC.

23. Bernstein FC, Koetde TF, Williams GJB, Meyer

1987:26(6):777-793.

21 9( 1):79- 102.

7(2):230-252.

221(4612):709-713.

1977 ;A33(2): 345-347.

1977;96( 17) : 53 19-5335.

EF Jr, Brice MD, Rodgers JR, Kennard 0, Shi- manouchi T, Tasumi M. The Protein Data Bank: A Computer-Based Archival File for Macromo- lecular Structures. JMol Biol. 1977;112(3):535- 542.

24. Ferrin TE, Huang CC, Jarvis LE, Langridge R. The MIDAS Display System. JMol Graph. 1988;

25. Bolin JT, Filman DJ, Matthews DA, Hamlin RC, Kraut J. Crystal Structures of Escherichia coli and Lactobacillus casei Dihydrofolate Re- ductase Refined at 1.7 Angstroms Resolution. I. General Features and Binding of Methotrexate. J Biol Chem. 1982;257(22): 13650-13662.

26. Borah B, Chen C-W, Egan W, Miller M, Wlo- dawer A, Cohen JS. Nuclear Magnetic Reso- nance and Neutron Diffraction Studies of the Complex of Ribonuclease A with Uridine Vana- date, a Transition-State Analogue. Biochemis-

27. Vyas NK, Vyas MN, Quiocho FA. Sugar and Signal-Transducer Binding Sites of the Esche- richia coli Galactose Chemoreceptor Protein. Science. 1988;242(4883): 1290- 1295.

28. Blaney JM. Ph.D. dissertation, University of California, San Francisco, 1982.

29. Singh UC, Kollman PA. An Approach to Com- puting Electrostatic Charges for Molecules. J Comp Chem. 1984;5(2): 129-145.

30. Gasteiger J , Marsili M. Iterative Partial Equal- ization of Orbital Electronegativity - A Rapid Access to Atomic Charges. Tetrahedron. 1980;

31. Streitweiser A. Molecular Orbital Theory for Or- ganic Chemists. New York: Wiley; 1961.

32. Molecular Modeling System SYBYL, Version 5.4, TRIPOS Associates, Inc., St. Louis, MO 63117.

33. Gilson MK, Honig B. Calculation of the Total Electrostatic Energy of a Macromolecular Sys- tem: Solvation Energies, Binding Energies, and Conformational Analysis. Proteins. 1988;4( 1):

34. Davis ME, McCammon JA. Calculating Electro- static Forces from Grid-Calculated Potentials. J Comp Chem. 1990;11(3):401-409.

35. DesJarlais RL, Sheridan RP, Dixon JS, Kuntz ID, Venkataraghavan R. Docking Flexible Li- gands to Macromolecular Receptors by Molecu- lar Shape. JMed Chem. 1986;29(11):2149-2153.

36. Leach AR, Kuntz ID. Conformational Analysis of Flexible Ligands in Macromolecular Receptor Sites. J Comp Chem. 1992;13(6):730-748.

37. Eisenberg D, McLachlan AD. Solvation Energy in Protein Folding and Binding. Nature. 1986;

38. Chiche L, Gregoret LM, Cohen FE, Kollman PA. Protein Model Structure Evaluation Using the Solvation Free Energy of Folding. Proc Natl Acad Sci USA. 1990;87(8):3240-3243.

6( 1): 13-27.

try. 1985;24(8):2058-2067.

36(22):3219-3228.

7-18.

319(6050): 199-203.

at Jomo Kenyatta University of A on August 4, 2015dij.sagepub.comDownloaded from