comparison of ligand affinity ranking using autodock-gpu

13
doi.org/10.26434/chemrxiv.8309690.v1 Comparison of Ligand Affinity Ranking Using AutoDock-GPU and MM-GBSA Scores in the D3R Grand Challenge 4 Léa El Khoury, Diogo Santos-Martins, Sukanya Sasmal, Jérome Eberhardt, Giulia Bianco, Francesca Alessandra Ambrosio, Leonardo Solis-Vasquez, Andreas Koch, Stefano Forli, David Mobley Submitted date: 21/06/2019 Posted date: 24/06/2019 Licence: CC BY 4.0 Citation information: El Khoury, Léa; Santos-Martins, Diogo; Sasmal, Sukanya; Eberhardt, Jérome; Bianco, Giulia; Ambrosio, Francesca Alessandra; et al. (2019): Comparison of Ligand Affinity Ranking Using AutoDock-GPU and MM-GBSA Scores in the D3R Grand Challenge 4. ChemRxiv. Preprint. Molecular docking has been successfully used in computer-aided molecular design projects for the identification of ligand poses within protein binding sites. However, relying on docking scores to rank different ligands with respect to their experimental affinities might not be sufficient. It is believed that the binding scores calculated using molecular mechanics combined with the Poisson-Boltzman surface area (MM-PBSA) or generalized Born surface area (MM-GBSA) can more accurately predict binding affinities. In this perspective, we decided to take part in Stage 2 in the Drug Design Data Resource (D3R) Grand Challenge 4 (GC4) to compare the performance of a quick scoring function, Autodock4, to that of MM-GBSA in predicting the binding affinities of a set of Beta-Amyloid Cleaving Enzyme 1 (BACE-1) ligands. Our results show that re-scoring docking poses using MM-GBSA did not improve the correlation with experimental affinities. We further did a retrospective analysis of the results and found that our MM-GBSA protocol is sensitive to details in the protein-ligand system: i) neutral ligands are more adapted to MM-GBSA calculations than charged ligands, ii) predicted binding affinities depend on the initial conformation of the BACE-1 receptor, iii) protonating the aspartyl dyad of BACE-1 correctly results in more accurate binding pose and affinity predictions. File list (1) download file view on ChemRxiv D3R-AutoDock-MMGBSA.pdf (0.97 MiB)

Upload: others

Post on 16-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

doi.org/10.26434/chemrxiv.8309690.v1

Comparison of Ligand Affinity Ranking Using AutoDock-GPU andMM-GBSA Scores in the D3R Grand Challenge 4Léa El Khoury, Diogo Santos-Martins, Sukanya Sasmal, Jérome Eberhardt, Giulia Bianco, FrancescaAlessandra Ambrosio, Leonardo Solis-Vasquez, Andreas Koch, Stefano Forli, David Mobley

Submitted date: 21/06/2019 • Posted date: 24/06/2019Licence: CC BY 4.0Citation information: El Khoury, Léa; Santos-Martins, Diogo; Sasmal, Sukanya; Eberhardt, Jérome; Bianco,Giulia; Ambrosio, Francesca Alessandra; et al. (2019): Comparison of Ligand Affinity Ranking UsingAutoDock-GPU and MM-GBSA Scores in the D3R Grand Challenge 4. ChemRxiv. Preprint.

Molecular docking has been successfully used in computer-aided molecular design projects for theidentification of ligand poses within protein binding sites. However, relying on docking scores to rank differentligands with respect to their experimental affinities might not be sufficient. It is believed that the binding scorescalculated using molecular mechanics combined with the Poisson-Boltzman surface area (MM-PBSA) orgeneralized Born surface area (MM-GBSA) can more accurately predict binding affinities. In this perspective,we decided to take part in Stage 2 in the Drug Design Data Resource (D3R) Grand Challenge 4 (GC4) tocompare the performance of a quick scoring function, Autodock4, to that of MM-GBSA in predicting thebinding affinities of a set of Beta-Amyloid Cleaving Enzyme 1 (BACE-1) ligands. Our results show thatre-scoring docking poses using MM-GBSA did not improve the correlation with experimental affinities. Wefurther did a retrospective analysis of the results and found that our MM-GBSA protocol is sensitive to detailsin the protein-ligand system: i) neutral ligands are more adapted to MM-GBSA calculations than chargedligands, ii) predicted binding affinities depend on the initial conformation of the BACE-1 receptor, iii)protonating the aspartyl dyad of BACE-1 correctly results in more accurate binding pose and affinitypredictions.

File list (1)

download fileview on ChemRxivD3R-AutoDock-MMGBSA.pdf (0.97 MiB)

Page 2: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU andMM-GBSA scores in the D3R Grand Challenge 4

Lea El Khoury* · Diogo Santos-Martins* · Sukanya Sasmal* · Jerome Eberhardt ·Giulia Bianco · Francesca Alessandra Ambrosio · Leonardo Solis-Vasquez · AndreasKoch · Stefano Forli1 · David L. Mobley1

* Contributed equally to this work.1 Corresponding authors.

L. El KhouryDepartment of Pharmaceutical Sciences, University of California,Irvine.Orcid ID: 0000-0003-3157-9425

D. Santos-MartinsDepartment of Integrative Structural and Computational Biology, TheScripps Research Institute.Orcid ID: 0000-0003-4622-3747

S. SasmalDepartment of Pharmaceutical Sciences, University of California,Irvine.Orcid ID: 0000-0003-2731-5777

J. EberhardtDepartment of Integrative Structural and Computational Biology, TheScripps Research Institute

G. BiancoDepartment of Integrative Structural and Computational Biology, TheScripps Research Institute

F. A. AmbrosioDepartment of Integrative Structural and Computational Biology, TheScripps Research Institute; Department of Health Sciences, “MagnaGræcia” University of Catanzaro, Campus “S. Venuta”, Viale Europa,88100, Catanzaro, Italy.

L. Solis-VasquezEmbedded Systems and Applications Group, Technische UniversitatDarmstadt

A. KochEmbedded Systems and Applications Group, Technische UniversitatDarmstadt Orcid ID: 0000-0002-1164-3082

S. ForliDepartment of Integrative Structural and Computational Biology, TheScripps Research InstituteOrcid ID: 0000-0002-5964-7111Scripps Research,10550 North Torrey Pines RoadLa Jolla, CA 92037-1000, USA.Tel: +1 (858) 784-2055E-mail: [email protected]

Abstract Molecular docking has been successfully used incomputer-aided molecular design projects for the identifi-cation of ligand poses within protein binding sites. How-ever, relying on docking scores to rank different ligands withrespect to their experimental affinities might not be suffi-cient. It is believed that the binding scores calculated usingmolecular mechanics combined with the Poisson-Boltzmansurface area (MM-PBSA) or generalized Born surface area(MM-GBSA) can more accurately predict binding affinities.In this perspective, we decided to take part in Stage 2 inthe Drug Design Data Resource (D3R) Grand Challenge 4(GC4) to compare the performance of a quick scoring func-tion, Autodock4, to that of MM-GBSA in predicting thebinding affinities of a set of β -Amyloid Cleaving Enzyme1 (BACE-1) ligands. Our results show that re-scoring dock-ing poses using MM-GBSA did not improve the correlationwith experimental affinities. We further did a retrospectiveanalysis of the results and found that our MM-GBSA pro-tocol is sensitive to details in the protein-ligand system: i)neutral ligands are more adapted to MM-GBSA calculationsthan charged ligands, ii) predicted binding affinities dependon the initial conformation of the BACE-1 receptor, iii) pro-tonating the aspartyl dyad of BACE-1 correctly results inmore accurate binding pose and affinity predictions.

Keywords Docking · MM-GBSA · AutoDock · Scoringfunctions

D. L. MobleyDepartments of Pharmaceutical Sciences and Chemistry, University ofCalifornia, IrvineOrcid ID: 0000-0002-1083-5533147 Bison Modular, Irvine, CA 92697Tel: +949-824-6383Fax: +949-824-2949E-mail: [email protected]

Page 3: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

2

Introduction

Accurate estimation of protein-ligand interactions and theenergetic basis of ligand binding are of great importance forsuccessful structure-based drug discovery projects. Much ef-fort has been devoted to develop computational methods forevaluating the binding of a ligand to a protein of interest andthe strength of that binding, including docking and scoringapproaches [1], virtual screening [1,2], and physics-basedfree energy methods [3].

Blind community wide-challenges such as the Drug De-sign Data Resource (D3R) Grand Challenge [4] and the Sta-tistical Assessment of Modeling of Proteins and Ligands(SAMPL) (https://samplchallenges.github.io/) [5] provide anexcellent platform for developers to test and validate theircomputer-aided drug design methodologies against exper-imental datasets. Moreover, the unbiased setting of thesechallenges allows the participants to compare the perfor-mance of their workflows and techniques to other computa-tional methods in the field, thus highlighting potential areasof enhancement.

Every year, the D3R Grand Challenge organizers releasepharmaceutically relevant datasets of protein-ligand com-plexes for the evaluation of ligand pose prediction and bind-ing affinity protocols. This year, we decided to bring to-gether our expertise in pose prediction and binding free en-ergy calculations to participate in the D3R Grand Challenge4 (GC4) to rank a set of ligands based on their predictedaffinities towards the β -Amyloid Cleaving Enzyme 1 (BACE-1) involved in Alzheimer′s disease [6].

In Stage 1a of D3R GC4, the participants were asked topredict the binding affinities of 154 BACE-1 compounds us-ing any available PDB structure data or binding data. At theend of Stage 1, co-crystal structures of additional 20 ligandswere released by the D3R organizers, allowing the partici-pants to use these structures to refine the affinity predictions.In the present work, we describe our participation in Stage 2,where we predicted the binding affinities of the 154 ligands.

Many previous studies have evaluated the ability of molec-ular mechanics combined with the Poisson-Boltzmann sur-face area (MM-PBSA) and molecular mechanics combinedwith generalized Born surface area (MM-GBSA) [7,8] meth-ods to predict ligand binding poses and binding affinities,and compared the accuracy of these predictions to that ofdocking scores [9–18]. Kaus et al. [9] have reported thatMM-GBSA method performs better than standard scoringfunctions in ranking binding poses. Similarly, Rastelli et al.[15] have shown that both MM-GBSA and MM-PBSA re-scoring methodologies improve the affinity ranking estimatedby AutoDock scoring function.

On the other hand, participants in past D3R Grand Chal-lenges have reported poor correlations between their esti-mated MM-GBSA binding scores and experimental affini-

ties [16–18]. In fact, MM-GBSA and MM-PBSA are not al-ways accurate methods for drug design projects because thequality of the results appears to depend on the protein-ligandsystem, the ligand sets, and details used in the method, suchas the interior dielectric constant, the continuum-solvationmethod, the charges, and the entropies [7].

The ambition of this collaborative report was to eval-uate only the binding affinity prediction accuracy of MM-GBSA scores relative to AutoDock4 scores. Therefore, start-ing with the same binding poses generated using AutoDock-GPU, we compared our predicted affinities with the experi-mental affinities for the two scoring approaches – AutoDock4scores and MM-GBSA scores. Based on the correlation statis-tics and performance metrics obtained for both submissions,we found that re-scoring the affinities using MM-GBSA freeenergy estimates did not improve the correlation with exper-imental values. During post-analysis, we found that MM-GBSA scores depend on the initial protein conformation, theprotonation states of the BACE-1 active site, and the chargeof the ligands, and we were able to improve our MM-GBSA-based correlation metrics retrospectively.

Theory and Methods

AutoDock4 energy score

The AutoDock4 scoring function [19] can be described as aclassical (additive) force field supplemented with an implicitsolvation model and a ligand entropy term. The force fieldconsists of three terms: a 12-6 Lennard-Jones potential, aColoumb potential with a distance-dependent dielectric con-stant, and a 12-10 potential for hydrogen bonds. The implicitsolvation model is an additive pairwise potential in whichthe solvent accessibility of each atom is estimated based onthe proximity of surrounding atoms [20]. It is an empiricalmodel that depends both on atom type parameters and thepartial charges [19]. It accounts for (de)solvation of both theligand and the receptor. The ligand entropy term accountsfor the loss of conformational freedom of the ligand uponbinding, adding a penalty that is linearly dependent on thenumber of rotatable bonds.

Several approximations are employed in AutoDock, inorder to make the calculations suitable for rapidly dockinglarge libraries of compounds. Scores are single point eval-uations of the scoring function, thus neglecting the major-ity of entropic contributions. Bond lengths and bond anglesare treated as rigid and non-polar hydrogens are omitted.Rotatable bonds are allowed to rotate freely, neglecting ro-tamer preferences. Partial charges are based on the empiricalmethod from Gasteiger and Marsili [21], and can be con-sidered “lower quality” in comparison to charge assignmentmethods based on some form of electronic structure calcu-lation, even via semi-empirical methods such as AM1-BCC

Page 4: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4 3

[22].

MM-GBSA score calculations

MM-GBSA calculations are a widely used tool for estimat-ing protein-ligand binding free energies [7,15,23,24]. Somepropose these as a reasonable compromise between fast andinaccurate methods like scoring functions [25] to computa-tionally rigorous but expensive methods like alchemical freeenergy calculations [26].

MM-GBSA calculations are usually performed on an setof protein-ligand binding conformations generated using ashort MD simulation, which can either be in implicit or ex-plicit solvent. MM-GBSA energy values, which provide anestimate of the free energy of binding (∆Gbind), are then cal-culated using end-point estimates given in the equation be-low:

∆Gbind = Gcomplex−Greceptor−Gligand (1)

The free energy for each of the complex, receptor and ligandis evaluated using contribution from four different terms

G = 〈EMM〉+ 〈GGB〉+ 〈GSASA〉−T 〈Ssolute〉 (2)

where EMM is the molecular mechanical energy in the gas-phase consisting of contributions for electrostatic, van derWaals and internal energies, EGB is the polar solvation freeenergy based on the Generalized-Born implicit solvent model,ESASA is the non-polar solvation term calculated using thesolvent accessible surface area (SASA) and T Ssolute is theproduct of the absolute temperature T and the solute entropySsolute. The solute entropy term can either be ignored (of-ten done for congeneric series) or approximated. The quasi-harmonic approximation [27] and normal mode analysis ofthe vibration frequencies [28] are most commonly used forestimating the solute entropy term.

Here, MM-GBSA scores are reported in units of kcal/moland are based on end-point free energy estimates. However,they should not be confused with true binding free energycalculations [26], as they involve several additional approx-imations. Partly as a result, calculated MM-GBSA valuesare typically much lower (more negative) than experimen-tal binding free energies. The differences arise from the dif-ferent approximations used to account for the solvation andconfigurational entropy of the protein and the ligand. Also,water molecules or cavities in the binding pocket are mod-eled using bulk (continuum) water, which cannot capturethe finer details of water-mediated interactions present inexplicit water simulations, and in some cases can produceartifactual water placements. Hence in this work, we will bereferring the MM-GBSA values as MM-GBSA scores andnot MM-GBSA free energies.

Docking protocol

Docking was performed using AutoDock-GPU, an OpenCLimplementation of AutoDock4. The search procedure uti-lizes three types of ligand motion: translation, rotation (ofthe entire ligand as a rigid body) and rotation of atoms af-fected by each rotatable bond. The search algorithm is a ge-netic algorithm (GA) hybridized with ADADELTA [29], agradient-based optimizer. In every GA generation, all indi-viduals (i.e. poses) are subjected to 500 ADADELTA itera-tions. The number of GA runs was 100, and each GA per-formed 10 million score evaluations, totaling 109 score eval-uations per docking.

Each ligand was docked to 10 different protein confor-mations, corresponding to the following PDB IDs (chain):2B8V (A), 2F3F (C), 2P4J (D), 2WF3 (A), 3L59 (B), 3MSJ(C), 3MSK (A), 4EWO (A), 4FS4 (A) and 4RCF (A). Thesestructures were the selected representatives of different bind-ing pocket conformations. REDUCE [30] added hydrogenatoms assuming pH 7, while allowing Asn, Gln and His sidechains to flip. Then, the standard protocol [31] was used toassign AutoDock4 atom types, partial charges, and deter-mine which bonds are rotatable. After all dockings were per-formed, the binding pose with the best score, as well as theprotein structure it was docked into, constituted the startingstructure for the MD simulation and subsequent MM-GBSAcalculation.

Macrocycle conformations were explored during the dock-ing by artificially breaking one of the chemical bonds withineach macrocycle, generating the corresponding open linearstructure which can be modeled as fully flexible during dock-ing. To restore the original bond, and consequently the cyclicstructure, a potential is applied to attract the atoms formallybonded to their covalent bond distance. This allows the macro-cycle conformation to be fully sampled flexibly during dock-ing.

Nearly all BACE-1 ligands of GC4 share a common sub-structure that is also found in several PDB structures. Thissubstructure consists of a hydroxyl group attached to a shorttwo-carbon aliphatic chain, followed by an amide. Based onpublicly available BACE-1 structures, this substructure hasa conserved binding mode, making several hydrogen bondswith the binding pocket. To exploit this information duringthe docking, we used the ligand in the PDB 4DPF as a tem-plate for the position of the common substructure of GC4ligands. A biasing penalty was applied to three atoms: thehydroxyl oxygen, one of the carbons in the aliphatic chainand the nitrogen in the amide. This penalty increases lin-early with the distance from the corresponding atom of thetemplate structure. If this distance is below 1.2 A, the bias-ing penalty is zero, and the output score corresponds to theoriginal, unaltered AutoDock4.2 scoring function.

Page 5: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

4

To take advantage of further similarities with BACE-1ligands in the PDB, we used pose filters to discard dockedposes that differ from the binding modes observed in 2F3F,4DPF and 4K8S. These pose filters act on specific chemicalmotifs that occur both in GC4 ligands and in these referencestructures. The chemical motifs were identified manually byvisual inspection, and the filtering process was automatedusing Python scripts and OpenBabel [32,33].

This docking protocol, including the biasing penalty andpose filters, was used to predict the binding poses of the 20BACE-1 ligands in Stages 1a and 1b of GC4. We describethe performance of variations of this protocol, as well as fur-ther details about the methodology in a separate publication[34].

Re-evaluating the binding affinities of the ligands usingMM-GBSA scores

We generated a 14ns long MD trajectory for each of theprotein-ligand complexes in explicit solvent and then re-evaluated the binding scores using end-point MM-GBSAcalculations. The MD simulations were performed using thepmemd.cuda module of Amber18 simulation package [35].We added partial charges to the ligand atoms using the An-techamber program (from Amber 16 package [36]) and AM1-BCC charge model [22]. The simulated system was preparedusing tleap (also available in Amber16 package) and usedAmberff99sb [37], GAFF version 1.8 [38] and TIP3P water[39] for the protein, ligand and water force field respectively.The protein-ligand complex was placed in a cubic simula-tion box with 10 A of water surrounding the complex. Wenext added Na+ and Cl- ions to neutralize the system andto ensure a salt concentration of 0.1M. The protein heavyatom-hydrogen bonds were constrained using SHAKE. Thesimulation used a time step of 2 fs. Particle mesh Ewaldmethod was used to evaluate long-range electrostatic inter-actions with 9.0 A cutoff for the real space electrostatics andvan der Waals forces.

We first minimized the ligand, water, and the ions for1000 steps with 25 kcal/mol-A2 positional restraints on theprotein, followed by another 1000 steps of minimization withthe protein restraints reduced to 10 kcal/mol-A2. Next, thesystem was heated from 10 K to 300 K in NVT ensemble for140 ps with 10 kcal/mol-A2 restraints on the protein-ligandcomplex. We then successively decreased the restraints onthe protein-ligand complex for 20 ps to first 5 and then to2 kcal/mol-A2, followed by 2 kcal/mol-A2 restraint only onthe ligand. We used Langevin thermostat with a collisionfrequency of 2 ps−1 to maintain the temperature of the sys-tem. We simulated the system for 14 ns in NPT ensemblefor the production run. Isotropic pressure scaling was usedto regulate the pressure with a relaxation time of 2 ps. Thefirst four ns was discarded as equilibration.

We saved the positions of the atoms every 100 ps duringMD. The final trajectory used for the MM-GBSA calcula-tions consisted of 100 frames that correspond to the last 10ns of the production trajectory. MM-GBSA scores were cal-culated using the MMPBSA.py program [40] in Amber16 at asalt concentration of 0.1 nM and using the GBneck2 model[41]. Quasi-harmonic approximation was used to approxi-mate the solute entropy.

Results and Discussions

Our main goal in this work was to see whether re-scoringdocked poses with MM-GBSA scores can improve the cor-relation of predicted binding affinities with experimental val-ues. We used AutoDock4.2 scores and MM-GBSA scoresto rank the binding affinities of BACE-1 ligands in Stage 2in D3R GC4. Our workflow consists of a series of entirelyautomated steps, starting with a SMILES representation ofthe ligands up to the MM-GBSA score. For this reason, theresults reported herein are representative of automated ap-proaches.

z3uni

urt

76

x0qtn

tjny7

rnbjh

jjbys

wia

0t

jscxd

jvbjy

3iq

gq

ich8p

8dsky

u867k

zcw

fqzanor

utg

v6

dxji8

cq7ug

pupet

k250i

ycdh7

6jy

jp5rd

da

0cz3i

cqnrg

p280k

yboen

ufr

7g

aadjz

zcngp

c6b63

pxhuz

xx4i5

py7r6

dzyxt

i88w

au7r6

ykzsv5

q6m

vt

pngkk

sycgn

n4p8a

p2f2

260tp

qnneeb

hfx

sa

b84m

jtf236

b7bv0

xhiji

fgth

s0fk

x5

mhkzs

auoh3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Fig. 1 Ranking of our submissions based on Kendall’s τ in theBACE-1 affinity ranking challenge. The submissions include bothstructure-based and ligand-based predictions. The gray bar corre-sponds to the submission with AutoDock score and the green one forthe one with re-scored binding affinities using MM-GBSA calcula-tions. There was negligible improvement in the correlations with ex-periments with the re-scored submission.

The performance of docking and MM-GBSA scores inpredicting the affinities of the 154 ligands is assessed by theKendall’s τ and Spearman’s ρ rank correlation coefficients.We also report Pearson’s r and R2. We have compiled allthese metrics for our predictions in Table 1. Since all met-rics lead to the same conclusions, we base our discussion onKendall’s τ values, which is the first metric reported in theD3R evaluation page.

Page 6: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4 5

Table 1 Correlation coefficients between predicted and experimental affinities. R2 is Pearson’s r squared. Standard deviations were calculatedfrom 10000 bootstrap samples.

All (n=154)Kendall’s τ Pearson’s r Spearman’s ρ R2

AutoDock4 a 0.19 ± 0.06 0.30 ± 0.09 0.27 ± 0.08 0.09 ± 0.05MM-GBSA b 0.20 ± 0.06 0.31 ± 0.08 0.30 ± 0.08 0.10 ± 0.05

Charge 0 (n=18)Kendall’s τ Pearson’s r Spearman’s ρ R2

AutoDock4 0.05 ± 0.23 0.25 ± 0.32 0.06 ± 0.30 0.06 ± 0.14MM-GBSA 0.44 ± 0.15 0.65 ± 0.11 0.62 ± 0.18 0.42 ± 0.13

Charge +1 (n=132)Kendall’s τ Pearson’s r Spearman’s ρ R2

AutoDock4 0.23 ± 0.06 0.32 ± 0.09 0.31 ± 0.09 0.10 ± 0.06MM-GBSA 0.19 ± 0.07 0.30 ± 0.09 0.27 ± 0.09 0.09 ± 0.05

a Fig. 2a, submission cq7ugb Fig. 2b, submission utgv6

Re-scoring docked poses with MM-GBSA did not im-prove correlation with experimental values. The Kendall’sτ between experimental pKd and AutoDock4.2 scores (sub-mission cq7ug) is 0.19± 0.06, and that of MM-GBSA scores(submission utgv6) is 0.20± 0.06. Thus, the predictive per-formance of these methods is statistically identical and bothof them correlate poorly with the experimental values. Nev-ertheless, our predictions are statistically better than a ran-dom prediction which has an average Kendall tau of zero. Inthe context of all submissions to Stage 2 of GC4, our pre-dictions ranked in the top third (Fig. 1).

MM-GBSA calculations have more detailed representa-tion of the underlying physics at play than docking scoresand literature work from other groups suggested they wouldbe more accurate [9,11,13]. So it was perhaps surprisingthat there was not any improvement in the affinity estima-tion with the MM-GBSA rescoring. We also saw that therewas no correlation between the AutoDock4 scores and MM-GBSA scores (Kendall’s τ equal to -0.06 ± 0.05). Thesefindings prompted us to investigate more about our protocolto check whether any change in the simulation conditionscould improve the results.

Ligands with different net charge have differentcorrelation metrics

With the goal of identifying aspects of the MM-GBSA ap-proach that could be improved, we searched for features thatare associated with particularly good predictions. Our liganddataset consisted of both positive and neutral ligands. Previ-ous work by Rastelli et al. [15] has shown that there is adecrease in correlation between predicted and experimen-tal affinities for ligands with different formal charges, whichled us to separately analyze ligands with different formalcharges (Table 1).

We found that the predicted affinities of ligands mod-eled in a neutral state (n=18) exhibited better rank correla-tion (Kendall’s τ of 0.44± 0.15) with experiment than thosefor ligands modeled with a +1 charge (Kendall’s τ of 0.19±0.07). This suggests that neutral ligands are more amenableto MM-GBSA calculations. Sun et al. [10] have reported lig-and binding affinity prediction accuracy degrading with netcharge of the ligand. Their Pearson’s r degraded from 0.608± 0.003 for ligands with net charge zero to 0.564 ± 0.003for those with net charge one. It is to be noted here, that inour study the sample size for neutral ligands is small (only18 ligands).

Predictive performance varies with proteinconformations

A total of ten protein conformations were considered fordocking the ligands. The MM-GBSA calculations were per-formed using the protein conformation that produced theligand pose with the best docking score, according to theAutoDock4.2 scoring function. The majority of ligands weresimulated in the protein conformations associated with PDBs4EWO (n=75) and 2WF3 (n=69).

We next decided to investigate whether different proteinconformations resulted in different correlation metrics. Ta-ble 2 lists the correlation statistics for subsets of ligandsdocked and simulated in different protein conformations.Fig. 2 plots the predicted MM-GBSA scores against the ex-perimental binding affinities for different protein conforma-tions.

The subset of ligands simulated in 4EWO exhibited poorerrank correlation with experiment (Kendall’s τ of 0.15 ±0.09) for the MM-GBSA scores compared to the other pro-tein structures used - 2WF3, 2B8V, 2P4J (Kendall’s τ of0.36 ± 0.07). For AutoDock4 scores, we see similar rankcorrelation coefficients for different protein conformations.

Page 7: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

6

Fig. 2 Plots of predicted scores against the logarithm of experimental binding affinities. a) Docking scores (submission ID cq7ug). b) MM-GBSAscores (submission ID utgv6). AutoDock4 scores and MM-GBSA scores had similiar correlation metrics in our original binding affinity prediction(submissions a and b). c) MM-GBSA scores based on MD simulation in 4EWO were modified to have a single protonated aspartate instead of two.Correlation metrics improved with new protonation state. d) 71 out of 75 ligands originally simulated in 4EWO were simulated in 2WF3 instead.We saw further improvement in correlation metrics with the 2WF3 receptor structures.

Hence in the succeeding work, we looked into the proteinstructure 4EWO to see whether we modeled it correctly.

Protonation states affect MM-GBSA scores

We noticed while doing post-analysis of our results that thecatalytic aspartyl dyad (Asp32, Asp228) of BACE-1 in theprepared protein structure for 4EWO had both aspartates inthe protonated form (i.e., Asp32H , Asp228H ). This is a pos-sible source of error, because in the apo state, Asp32 is pro-tonated and Asp228 is de-protonated in the active pH range(3.5-5.5) of BACE-1 [42]. Although the protonation state ofthe aspartyl dyad changes in the presence of inhibitors [43,44], it is a safer choice to model the aspartyl dyad as Asp32protonated, and Asp228 de-protonated (Asp32H , Asp228−).

Hence, we decided to recalculate the MM-GBSA scores ofthe ligands docked and simulated in 4EWO, but with theAsp32H , Asp228− protonation state of the 4EWO structure.

The correlation statistics of the 4EWO simulations withthe Asp32H , Asp228− protonation state are reported in Ta-ble 3 and the plot of predicted versus experimental affinitiesis depicted in Fig. 2c. Also, Fig. S1 shows the distribution ofthe MM-GBSA scores for the two protonation states. Over-all, the MM-GBSA scores were lower for the single pro-tonated aspartyl dyad compared to the double protonated,indicating that the binding pose is more stable for the sin-gle protonation state. The Kendall τ improved by about onestandard deviation, confirming that the Asp32H , Asp228−

form is more adequate than modeling both Asp as proto-nated.

Page 8: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4 7

Moreover, we computed the RMSD of the ligands be-tween the initial docked poses and the final states at theend of the MD simulations for the two protonation statesof 4EWO: i) Asp32H , Asp228H and ii) Asp32H , Asp228−.We used Chimera [45] to align the active site of each initialBACE-1-ligand complex to the last frame of the correspond-ing MD trajectory. The alignment of the active site was donewithin 5 A of the ligand. Then, we computed the RMSD val-ues with Chimera. Out of 75 ligands, 54 had lower RMSDvalues when docked and simulated in BACE-1 with the sin-gle protonated aspartyl dyad (Asp32H , Asp228−). The RMSDvalues are reported in the Supplementary Information avail-able on https://github.com/MobleyLab/D3R-2018-AutoDock-MMGBSA/blob/master/RMSD.csv. This result shows thatthe binding mode of BACE-1 ligands depends on the proto-nation states of aspartates 32 and 228 and confirms that it isbetter to model the protonation state of BACE-1 as Asp32H ,Asp228− for the given ligand dataset. Overall, these find-ings suggest that in large-scale binding affinity calculations,it is better to use the protonation state of the apo form of theprotein.

Simulating in 2WF3 instead of 4EWO improvescalculated binding affinities

Since the correlation between predicted and experimentalaffinities was better for the subset of ligands modeled in thestructure 2WF3 than in 4EWO (Table 2), we recalculatedthe docking and MM-GBSA scores for the ligands mod-eled in 4EWO using the 2WF3 structure instead. Then, thecorrelation coefficients on the entire set were computed us-ing the updated scores (Table 3). The performance of MM-GBSA improved, displaying a Kendall τ equal to 0.30 ±0.05, while the performance of docking remained constant(Kendall τ equal to 0.21 ± 0.06). This shows that MM-GBSA is potentially better than docking, in agreement withthe more accurate physical description, but the results aresensitive to modeling choices, such as protein conformationand protonation state, making it difficult to achieve optimalpredictive performance in a prospective context.

The largest difference between 2WF3 and 4EWO is inthe ‘flap’ region (Fig. 3), which interacts extensively withBACE-1 ligands. Interestingly, docking scores are generallylower when docking to 4EWO (Fig. 2a), while MM-GBSAscores are lower when simulated in 2WF3 (Fig. 2b and S2).The exact reasons behind this opposite trend are unknown,but we speculate that none of the 10 protein conformationsused for docking was “good enough” to accommodate the75 ligands that displayed lower (i.e. better) docking scoresin 4EWO. Arguably, the docking score was better in 4EWObecause the flap is slightly more open, thereby accommodat-ing ligands that could not fit as well in 2WF3. On the other

hand, we argue that 2WF3 is more representative of the ac-tual protein-ligand complex, resulting in lower (i.e. better)MM-GBSA scores on average.

Overall, these arguments highlight the sensitivity of dock-ing to receptor conformation, and motivate the inclusion offlexibility into the receptor [46], not only for the improve-ment of docking methodology by itself, but also for provid-ing better docked poses for free energy calculations.

Fig. 3 Differences in the ‘flap’ region formed by the residues 67-75 inthe 4EWO (in tan) and 2WF3 (in blue) co-crystal structures of BACE-1. The mesh region is the binding site for the macrocycles in the D3Rdataset. The β -hairpin of the ‘flap’ region of 2WF3 is narrower than4EWO with the Gln73 side-chain being pointed towards the bindingsite. This results in the binding pocket of 2WF3 being slightly smallerthan that of 4EWO.

In this study, we looked at two different parameters forthe MM-GBSA calculations, namely protonation state andprotein conformation during our retrospective analysis. Bothof these parameters turned out to have non-trivial impact onthe calculated binding affinities. However, they are manymore MM-GBSA calculation parameters which can possi-bly affect the binding affinities which were not explored inthe current work.

We performed the MM-GBSA calculations using the GB-neck2 model in this work, which has not, to our knowledge,been benchmarked against older GBSA models (GB-HCT,GB-OBC1 and GB-OBC2 [47,48]) for affinity prediction.Protein targets are sensitive to GBSA models [10], henceusing older GBSA models might improve the binding affini-ties. Another option is to perform MM-PBSA calculationsinstead, which are physically more accurate and computa-tionally more expensive.

We used the popular ‘single trajectory protocol’ [49] inthis work, which involves simulating only the protein-ligandcomplex and re-purposing the bound conformations of theprotein and ligand for the unbound calculations. It is as-sumed that the protein and the ligand sample similar con-formations in the bound and unbound states which mightnot be always valid. We do not expect to see a big differencein predicted affinities when using the alternative ‘multiple

Page 9: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

8

Table 2 Correlation coefficients between predicted and experimental ligand binding affinities for subsets of BACE-1 ligands modeled in differentprotein conformations. R2 is Pearson’s r squared. The standard deviations do not account for the experimental uncertainty. 4EWO, 2WF3, 2B8V,and 2P4J are the PDB codes of the different protein conformations used. n refers to the total number of ligands in each dataset. For MM-GBSAscores, affinities of ligands modeled in 4EWO have the weakest correlation with the experimental affinities. For AutoDock4, affinities of ligandsmodeled in different protein conformations have similar correlation coefficients with the experimental values.

4EWO (n=75)Kendall’s τ Pearson’s r Spearman’s ρ R2

AutoDock4 0.26 ± 0.09 0.32 ± 0.14 0.36 ± 0.12 0.10 ± 0.09MM-GBSA 0.15 ± 0.09 0.27 ± 0.12 0.22 ± 0.12 0.08 ± 0.06

2WF3 (n=69), 2B8V (n=5) or 2P4J (n=5)Kendall’s τ Pearson’s r Spearman’s ρ R2

AutoDock4 0.23 ± 0.09 0.33 ± 0.12 0.29 ± 0.12 0.11 ± 0.07MM-GBSA 0.36 ± 0.07 0.53 ± 0.09 0.50 ± 0.09 0.28 ± 0.09

Table 3 Correlation coefficients between predicted and experimental ligand binding affinities for the entire BACE-1 ligands set. R2 is Pearson’sr squared. The standard deviations do not account for the experimental uncertainty. n refers to the total number of ligands. Ligands originallysimulated in 4EWO with a double protonated aspartyl dyad (Asp32H Asp228H ), were modeled again in 4EWO with a single protonated aspartyldyad (Asp32H Asp228−) or in 2WF3 structure. Compared to 4EWO with Asp32H Asp228H , using 4EWO with Asp32H Asp228− improvedthe correlation between predicted and experimental binding affinities. The highest correlation between predicted and experimental affinities wasobtained using the MM-GBSA method and 2WF3 structure.

Retrospective calculations (n=154)Kendall’s τ Pearson’s r Spearman’s ρ R2 Modifications to ligands originally simulated in 4EWO

MM-GBSAc 0.25 ± 0.06 0.38 ± 0.08 0.37 ± 0.08 0.15 ± 0.06 4EWO structure modeled as Asp32H Asp228−

MM-GBSAd 0.30 ± 0.05 0.44 ± 0.07 0.44 ± 0.07 0.20 ± 0.06 Ligands docked and simulated in 2WF3 structureAutoDock4 0.21 ± 0.06 0.29 ± 0.09 0.29 ± 0.08 0.08 ± 0.05 Ligands docked to 2WF3 structure

c Fig. 2cd Fig. 2d

trajectory protocol’. However, it may be worth trying this inthe future for the macrocycles in the current dataset sincethe macrocycles do not have much flexibility in the bindingpocket due to their size, and might possibly sample otherconformations when simulated in pure solvent.

Other parameters which could be investigated in this con-text are different entropic approximations, ligand charge mod-els and the solute or interior dielectric constant [14]. Lastly,another avenue worth exploring from a cost-cutting point ofview is to use single-point minimized structures for the MM-GBSA calculations, similar to single-point docking scorecalculations. There are studies present in the literature [15,24] which have shown similar correlations between MM-GBSA scores obtained using single-point structures and thoseusing ensembles of MD generated structures. However, re-cently published literature suggests that most studies stilluse multiple conformational snapshots from MD for the MM-GBSA calculations [50–52].

Conclusions

There are two components in structure-based affinity predic-tion challenges – pose prediction and binding score evalua-tion. Both of these contribute to the accuracy of predictedbinding affinities. In order to evaluate only the affinity pre-diction capability of different scoring methods, the starting

protein-ligand binding poses need to be the same, which israrely the case in D3R Grand Challenges where each partic-ipating group has its own pose prediction and binding scoreevaluation methodology.

To better separate pose prediction from scoring, our twogroups decided to collaborate and combine our different ar-eas of expertise in this study – docking and free energycalculations and participate in the structure-based bindingaffinity prediction challenge for the target BACE-1 in D3RGC4. We used AutoDock-GPU for docking the ligands, em-ploying the AutoDock4 scoring function, and then calcu-lated MM-GBSA binding affinities. MM-GBSA binding en-ergy calculations did not improve the predictive performancewith respect to AutoDock4 scores. In a retrospective analy-sis, we identified two modeling aspects that were detrimen-tal to the quality of MM-GBSA scores, namely the choice ofprotein conformation and the protonation state of residuesin the binding pocket. While it is clear that MM-GBSA canmake better predictions than docking scores, making the op-timal modeling choices is a non-trivial task that requiresknowledge of the system under study and thus is likely dif-ficult in a prospective setting.

Thus, our assessment of MM-GBSA performance roughlyagrees with that of several previous research groups whichparticipated in previous D3R Grand Challenges [16–18] –specifically, we find that MM-GBSA does not perform par-

Page 10: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4 9

ticularly well at binding affinity estimation. These results –not just our own – highlight the importance of blind chal-lenges for the community to evaluate method performance,particularly for drug design projects.

Acknowledgements SS, LEK and DM thank Christopher I. Bayly(OpenEye Scientific Software) for helpful discussions on MM-GBSAcalculations. SS, LEK and DM also acknowledge OpenEye ScientificSoftware for licensing the pieces of software used in this work. TheNational Institutes of Health supported this work through grants1R01GM108889-01 (DLM), R01 GM069832 (DSM, JE, SF) and U54-GM103368 (GB). LSV and AK thank the German Academic ExchangeService (DAAD) and the Peruvian National Program for Scholarshipsand Educational Loans (PRONABEC) for financial aid.

Supplementary Materials

The Supporting Information is available free of charge onhttps://github.com/MobleyLab/D3R-2018-AutoDock-MMGBSA/and includes all the analysis scripts and input files used forMM-GBSA calculations in this work.

References

1. D.B. Kitchen, H. Decornez, J.R. Furr, J. Bajorath, Nat RevDrug Discov 3(11), 935 (2004). DOI 10.1038/nrd1549. URLhttps://doi.org/10.1038/nrd1549

2. K. Heikamp, J. Bajorath, Chem Biol Drug Des81(1), 33 (2013). DOI 10.1111/cbdd.12054. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1111/cbdd.12054

3. M.K. Gilson, H.X. Zhou, Annu Rev Biophys Biomol Struct 36(1),21 (2007). DOI 10.1146/annurev.biophys.36.040306.132550.URL https://doi.org/10.1146/annurev.biophys.36.040306.132550.PMID: 17201676

4. Z. Gaieb, C.D. Parks, M. Chiu, H. Yang, C. Shao, W.P. Wal-ters, M.H. Lambert, N. Nevins, S.D. Bembenek, M.K. Ameriks,T. Mirzadegan, S.K. Burley, R.E. Amaro, M.K. Gilson, J ComputAided Mol Des 33(1), 1 (2019). DOI 10.1007/s10822-018-0180-4. URL https://doi.org/10.1007/s10822-018-0180-4

5. J. Yin, N.M. Henriksen, D.R. Slochower, M.R. Shirts, M.W.Chiu, D.L. Mobley, M.K. Gilson, J Comput Aided Mol Des31(1), 1 (2017). DOI 10.1007/s10822-016-9974-4. URLhttps://doi.org/10.1007/s10822-016-9974-4

6. R. Vassar, B.D. Bennett, S. Babu-Khan, S. Kahn, E.A. Men-diaz, P. Denis, D.B. Teplow, S. Ross, P. Amarante, R. Loeloff,Y. Luo, S. Fisher, J. Fuller, S. Edenson, J. Lile, M.A.Jarosinski, A.L. Biere, E. Curran, T. Burgess, J.C. Louis,F. Collins, J. Treanor, G. Rogers, M. Citron, Science 286(5440),735 (1999). DOI 10.1126/science.286.5440.735. URLhttps://science.sciencemag.org/content/286/5440/735

7. S. Genheden, U. Ryde, Expert Opin Drug Discov 10(5), 449(2015). DOI 10.1517/17460441.2015.1032936

8. P.A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo,L. Chong, M. Lee, T. Lee, Y. Duan, W. Wang, O. Donini,P. Cieplak, J. Srinivasan, D.A. Case, T.E. Cheatham, Acc ChemRes 33(12), 889 (2000). DOI 10.1021/ar000033j. URLhttps://doi.org/10.1021/ar000033j. PMID: 11123888

9. J.W. Kaus, E. Harder, T. Lin, R. Abel, J.A. Mc-Cammon, L. Wang, J Chem Theory Comput 11(6),2670 (2015). DOI 10.1021/acs.jctc.5b00214. URLhttps://doi.org/10.1021/acs.jctc.5b00214

10. T. Hou, J. Wang, Y. Li, W. Wang, J Chem Inf Model 51(1), 69(2010). DOI 10.1039/c4cp01388c

11. P.A. Greenidge, C. Kramer, J.C. Mozziconacci, W. Sherman, JChem Inf Model 54(10), 2697 (2014). DOI 10.1021/ci5003735.URL https://doi.org/10.1021/ci5003735. PMID: 25266271

12. C. Wang, D. Greene, L. Xiao, R. Qi, R. Luo, Front MolBiosci 4, 87 (2018). DOI 10.3389/fmolb.2017.00087. URLhttps://www.frontiersin.org/article/10.3389/fmolb.2017.00087

13. I. Slynko, M. Scharfe, T. Rumpf, J. Eib, E. Metzger, R. Schle,M. Jung, W. Sippl, J Chem Inf Model 54(1), 138 (2014). DOI10.1021/ci400628q. URL https://doi.org/10.1021/ci400628q.PMID: 24377786

14. H. Sun, Y. Li, M. Shen, S. Tian, L. Xu, P. Pan,Y. Guan, T. Hou, Phys Chem Chem Phys 16,22035 (2014). DOI 10.1039/C4CP03179B. URLhttp://dx.doi.org/10.1039/C4CP03179B

15. G. Rastelli, A. Del Rio, G. Degliesposti, M. Sgobba, J Com-put Chem 31(4), 797 (2010). DOI 10.1002/jcc.21372. URLhttps://doi.org/10.1002/jcc.21372

16. M. Reau, F. Langenfeld, J.F. Zagury, M. Montes, J Comput AidedMol Des 32(1), 231 (2018). DOI 10.1007/s10822-017-0063-0.URL https://doi.org/10.1007/s10822-017-0063-0

17. M. Misini Ignjatovic, O. Caldararu, G. Dong, C. Munoz-Gutierrez, F. Adasme-Carreno, U. Ryde, J Comput Aided MolDes 30(9), 707 (2016). DOI 10.1007/s10822-016-9942-z. URLhttps://doi.org/10.1007/s10822-016-9942-z

18. V. Salmaso, M. Sturlese, A. Cuzzolin, S. Moro, J Comput AidedMol Des 32(1), 251 (2018). DOI 10.1007/s10822-017-0051-4.URL https://doi.org/10.1007/s10822-017-0051-4

19. R. Huey, G.M. Morris, A.J. Olson, D.S. Goodsell, J Comput Chem28(6), 1145 (2007)

20. P.F. Stouten, C. Frommel, H. Nakamura, C. Sander, Mol Simulat10(2-6), 97 (1993)

21. J. Gasteiger, M. Marsili, Tetrahedron 36(22), 3219 (1980)22. A. Jakalian, D.B. Jack, C.I. Bayly, J Comput Chem

23(16), 1623 (2002). DOI 10.1002/jcc.10128. URLhttps://doi.org/10.1002/jcc.10128

23. P.D. Lyne, M.L. Lamb, J.C. Saeh, J Med Chem49(16), 4805 (2006). DOI 10.1021/jm060522a. URLhttps://doi.org/10.1021/jm060522a

24. P.C. Su, C.C. Tsai, S. Mehboob, K.E. Hevener, M.E. Johnson,J Comput Chem 36(25), 1859 (2015). DOI 10.1002/jcc.24011.URL https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.24011

25. S.Y. Huang, S.Z. Grinter, X. Zou, Phys Chem Chem Phys 12(40),12899 (2010). DOI 10.1039/C0CP00151A

26. D.L. Mobley, M.K. Gilson, Annu Rev Biophys 46(1), 531(2017). DOI 10.1146/annurev-biophys-070816-033654. URLhttps://doi.org/10.1146/annurev-biophys-070816-033654

27. C.E. Chang, W. Chen, M.K. Gilson, J Chem Theory Com-put 1(5), 1017 (2005). DOI 10.1021/ct0500904. URLhttps://doi.org/10.1021/ct0500904

28. B.R. Brooks, D. Janezic, M. Karplus, J Comput Chem16(12), 1522 (1995). DOI 10.1002/jcc.540161209. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.540161209

29. M.D. Zeiler, arXiv preprint arXiv:1212.5701 (2012)30. J.M. Word, S.C. Lovell, J.S. Richardson, D.C. Richardson, J Mol

Biol 285(4), 1735 (1999)31. S. Forli, R. Huey, M.E. Pique, M.F. Sanner, D.S. Goodsell, A.J.

Olson, Nat Protoc 11(5), 905 (2016)32. N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeer-

sch, G.R. Hutchison, J Cheminform 3(1), 33 (2011)33. N.M. O’Boyle, C. Morley, G.R. Hutchison, Chem Cent J 2(1), 5

(2008)34. D. Santos-Martins, et. al., In preparation35. D. Case, S. Brozell, D. Cerutti, T.I. Cheatham, V. Cruzeiro,

T. Darden, R. Duke, D. Ghoreishi, H. Gohlke, A. Goetz,D. Greene, R. Harris, N. Homeyer, S. Izadi, A. Kovalenko, T. Lee,

Page 11: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

10

S. LeGrand, P. Li, C. Lin, J. Liu, T. Luchko, R. Luo, D. Mer-melstein, K. Merz, Y. Miao, G. Monard, H. Nguyen, I. Omelyan,A. Onufriev, F. Pan, R. Qi, D. Roe, A. Roitberg, C. Sagui,S. Schott-Verdugo, J. Shen, C. Simmerling, J. Smith, J. Swails,R. Walker, J. Wang, H. Wei, R. Wolf, X. Wu, L. Xiao, D. York,P. Kollman. Amber 2018, university of california, san francisco(2018)

36. D. Case, D. Cerutti, T. Cheateham, T. Darden, R. Duke, T. Giese,H. Gohlke, A. Goetz, D. Greene, N. Homeyer, C. Simmerling,W. Botello-Smith, J. Swail, R. Walker, J. Wang, R. Wolf, X. Wu,L. Xiao, P. Kollman. Amber 2016, university of california, sanfrancisco (2016)

37. V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, C. Sim-merling, Proteins Struct Funct Bioinf 65, 712 (2006). DOI10.1002/prot.21123. URL https://doi.org/10.1002/prot.21123

38. J. Wang, R.M. Wolf, J.W. Caldwell, P.A. Kollman, D.A. Case, JComp Chem 25(9), 1157 (2004). DOI 10.1002/jcc.20035. URLhttps://doi.org/10.1002/jcc.20035

39. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Im-pey, M.L. Klein, J Chem Phys 79(2), 926 (1983). DOI10.1063/1.445869. URL https://doi.org/10.1063/1.445869

40. B.R. Miller, T.D. McGee, J.M. Swails, N. Homeyer, H. Gohlke,A.E. Roitberg, J Chem Theory Comput 8(9), 3314 (2012). DOI10.1021/ct300418h. URL https://doi.org/10.1021/ct300418h

41. H. Nguyen, D.R. Roe, C. Simmerling, J Chem Theory Com-put 9(4), 2020 (2013). DOI 10.1021/ct3010485. URLhttps://doi.org/10.1021/ct3010485

42. H. Shimizu, A. Tosaki, K. Kaneko, T. Hisano, T. Sakurai,N. Nukina, Mol Cell Biol 28(11), 3663 (2008). DOI10.1128/MCB.02185-07

43. C.R. Ellis, C.C. Tsai, X. Hou, J. Shen, J Phys Chem Lett 7(6), 944(2016). DOI 10.1021/acs.jpclett.6b00137

44. M.O. Kim, P.G. Blachly, J.A. McCammon, PLoS Comput Biol11(10), 1 (2015). DOI 10.1371/journal.pcbi.1004341

45. E.F. Pettersen, T.D. Goddard, C.C. Huang, G.S. Couch, D.M.Greenblatt, E.C. Meng, T.E. Ferrin, J Comp Chem 25(13), 1605(2004). DOI 10.1002/jcc.20084

46. P.A. Ravindranath, S. Forli, D.S. Goodsell, A.J. Olson, M.F. San-ner, PLoS Comput Biol 11(12), e1004586 (2015)

47. M. Feig, A. Onufriev, M.S. Lee, W. Im, D.A.Case, C.L. Brooks III, J Comput Chem 25(2),265 (2004). DOI 10.1002/jcc.10378. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.10378

48. A. Onufriev, D. Bashford, D.A. Case, Proteins Struct FunctBioinf 55(2), 383 (2004). DOI 10.1002/prot.20033. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/prot.20033

49. M.R. Shirts, D.L. Mobley, S.P. Brown, Drug design: structure-andligand-based approaches pp. 61–86 (2010)

50. Y. Niu, X. Yao, H. Ji, RSC Adv 9(22), 12441 (2019). DOI10.1039/C9RA01657K

51. S. Hu, Y. Dong, X. Zhao, L. Zhang, J Mol Graph Model (2019).DOI 10.1016/j.jmgm.2019.03.022

52. S.K. Mishra, J. Koca, J Phys Chem B 122(34), 8113 (2018). DOI10.1021/acs.jpcb.8b03655

Page 12: Comparison of Ligand Affinity Ranking Using AutoDock-GPU

Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4 1

Supplementary Information

-80 -70 -60 -50 -40 -30 -20 -10MM-GBSA score

0

5

10

15

# lig

ands

4EWO - Asp32 H , Asp228 H

4EWO - Asp32 H , Asp228 -

Fig. S1 Distribution of the calculated MM-GBSA scores for different protonation states of the receptor 4EWO. Overall, the single protonation state(Asp32H , Asp228−) resulted in lower MM-GBSA scores than the double protonation state (Asp32H , Asp228H ).

-80 -70 -60 -50 -40 -30 -20 -10MM-GBSA score

0

5

10

15

# lig

ands

4EWO - Asp32 H , Asp228 H

2WF3

Fig. S2 Distribution of the calculated MM-GBSA scores for the two receptor conformations - 4EWO and 2WF3 for the same set of 75 ligands. In general,the receptor 2WF3 resulted in lower MM-GBSA scores than 4EWO.