protein:ligand standard binding free energies: a tutorial ... · protein:ligand standard binding...
TRANSCRIPT
School of PhysicsGeorgia Institute of Technology
Department of Biochemistry and Molecular BiologyGordon Center for Integrative ScienceThe University of Chicago
Centre National de la Recherche ScientifiqueLaboratoire International Associe CNRS-UIUCUniversite de Lorraine
University of Illinois at Urbana-ChampaignBeckman Institute for Advanced Science and TechnologyTheoretical and Computational Biophysics Group
Protein:ligand standard binding free energies:A tutorial for alchemical and geometrical transformations
James GumbartBenoıt Roux
Christophe ChipotNovember 5, 2018
Please visit www.ks.uiuc.edu/Training/Tutorials/ to get the latest version of this tutorial, to obtainmore tutorials like this one, or to join the [email protected] mailing list for additionalhelp.
1
Abstract
This tutorial sets out to demonstrate the application of numerical simulations to the calculation ofthe standard binding free energy of a protein:ligand complex. To a large extent, computation of stan-dard binding free energies remains a daunting theoretical challenge on account of the considerablevariations in conformational, translational and orientational entropies that accompany the associationof the substrate to the host protein and is not easily captured by conventional molecular-dynamicssimulations. Sampling these entropic contributions is addressed here, following two distinct routes,an alchemical one and a geometrical one, wherein restraining potentials have been introduced to acton the collective variables that define the conformation of the ligand and its relative position and ori-entation with respect to the protein. The methodology is illustrated by the well-documented exampleof a small, proline-rich peptide, referred to as p41, associating to the Src homology 3 domain of atyrosine kinase with a standard binding free energy of −7.94 kcal/mol. Considering the difficultiesthat current additive force fields face to describe organic ligands, e.g., drugs, the choice of an all-peptide-based molecular assembly is particularly fitting, allowing the reader to focus primarily onthe methodology and the sampling.
c© 2018, Centre National de la Recherche Scientifique, University of Illinois, Urbana–Champaign
2
Protein:ligand standard binding free energies 3
Contents
1. Introduction 4
1.1. Theoretical underpinnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2. Geometrical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Alchemical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Setting up the simulations 11
2.1. Construction of the molecular assemblies . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Definition of the collective variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. The geometrical-route simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4. The alchemical-route simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3. Running and analyzing the simulations 18
3.1. The geometrical route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2. The alchemical route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4. Concluding remarks and extensions of the tutorial 24
Protein:ligand standard binding free energies 4
1. Introduction
The primary objective of this tutorial is to compute the standard binding free energy of a ligand to a
protein, using two distinct strategies, relying upon alchemical transformations, on the one hand, and geo-
metrical transformations, on the other hand [1,2]. From a theoretical perspective, accurate determination
of absolute free energies remains a daunting challenge, owing to the considerable variation of not only
translational and rotational, but also conformational entropies underlying the binding process and which
cannot be fully captured in routine, finite-length statistical simulations [3]. To address this challenge,
a series of geometrical restraints is introduced to act on relevant collective variables [4, 1, 5, 6], thereby
alleviating the inherent sampling limitations of molecular dynamics. In this tutorial, the proposed strat-
egy is applied to the case of the src–homology 3, or SH3, domain of tyrosine kinase Abl binding a short,
proline-rich peptide referred to as p41 and of amino-acid sequence APSYSPPPPP [7,8], employing both
geometrical, i.e., potential-of-mean-force, and alchemical free-energy calculations.
The reader of this tutorial is assumed to be familiarized with the use of NAMD [9] to perform stan-
dard and advanced computations, including energy minimization, molecular-dynamics simulations and
free-energy calculations, both perturbative, i.e., free-energy perturbation [10, 11], and geometrical, i.e.,
adaptive biasing force [12, 13].
This tutorial contains advanced material. Do not attempt to tackle the problems therein if you
have no preliminary experience with free-energy calculations. The neophyte reader eager to
get acquainted with the computation of protein:ligand standard binding affinities is advised to
complete first the introductory tutorials on alchemical, free-energy perturbation calculations
and adaptive-biasing-force calculations.
The ABF algorithm is implemented as part of the “collective variable caculations” (colvars) module
of NAMD. The colvars module [14,15] is extensively documented in the NAMD user’s guide. Other
information about the ABF method can be found in the reference article 26. A basic working knowledge
of VMD is highly recommended.
Completion of this tutorial requires
– the various files contained in the archive tutorial-protein-ligand.tar.gz located at
http://www.ks.uiuc.edu/Training/Tutorials/namd/PLB;
– NAMD 2.12 or later (http://ks.uiuc.edu/Research/namd);
– VMD 1.9 or later (http://ks.uiuc.edu/Research/vmd).
Protein:ligand standard binding free energies 5
1.1. Theoretical underpinnings
The equilibrium binding constant that characterizes the reversible association of the ligand with the
protein, i.e., protein + ligand protein:ligand,
Keq =[protein:ligand][protein][ligand]
(1)
can be restated as,
Keq =1
[ligand]p1
p0(2)
where p0 is the fraction of free protein and p1, the fraction of protein binding the ligand. Assuming low
protein concentration, one can imagine an isolated protein in a solution of N indistinguishable ligands.
Under these premises, the logarithm of the ratio p1/p0 can be related to the reversible work required to
extract one ligand from its bulk environment and bring it to the binding site of the protein,
Keq =1
[ligand]
N
∫site
d1 . . .∫
bulkdN
∫dx e−βU∫
bulkd1 . . .
∫bulk
dN∫
dx e−βU
=1
[ligand]
N
∫site
d1∫
dx e−βU∫bulk
d1∫
dx e−βU(3)
Here, the subscript “site” and “bulk” refer to the bound and the unbound states of the ligand, and the
relevant regions of configurational space over which the integrals are evaluated. As a convention, ligand
“1” occupies the binding site of the protein. Since the bulk environment is isotropic and homogenous, it
follows that,
Keq =1
[ligand]
N
∫site
d1∫
dx e−βU
Vbulk
∫bulk
d1 δ(x1 − x∗1)∫
dx e−βU(4)
Vbulk is the volume of the bulk medium. x1 is the position of the center of mass of the ligand and x∗1,
an arbitrary location in the bulk medium, sufficiently far from the binding site of the protein. Since the
concentration of the ligands is equal to N/Vbulk, the equilibrium binding constant can be simplified to,
Keq =
∫site
d1∫
dx e−βU∫bulk
d1 δ(x1 − x∗1)∫
dx e−βU(5)
In the above expression of the equilibrium constant, the numerator reflects the final state of the binding
Protein:ligand standard binding free energies 6
process and the denominator, its initial state. Per se, this expression is of limited use in the context of
molecular dynamics, evaluation of the individual configurational integrals being euphemistically imprac-
tical. A viable strategy towards the computation of Keq consists in decomposing the reversible associa-
tion phenomenon into several stages, the corresponding contribution of which is determined in separate
simulations. Two possible routes, a geometrical route and an alchemical route, can be followed to reach
this objective. These two routes have a common denominator — assuming that over the timescales
amenable to molecular dynamics the configurational space available to the ligand cannot be sampled ad-
equately, a series of biasing potentials is introduced to restrain the substrate in the native conformation,
position and orientation of the bound state. Introduction of properly chosen restraints [4, 1] constitutes
the preamble to the separation of the ligand from the protein, either geometrically, i.e., translation, or
alchemically, i.e., decoupling.
1.2. Geometrical transformations
From a geometrical standpoint, computation of the equilibrium constant, Keq, could be conceived re-
sorting to a simple one-dimensional potential-of-mean-force calculation, w(r), wherein the ligand is
separated reversibly from the protein [16],
Keq = 4π∫ R
0dr r2 e−βw(r) (6)
Here, R stands for the limit of association.
This vision of the problem at hand is not only naıve, but also extremely deceitful. It assumes
that over the timescale of the simulation, the ligand can sample the available configurational
space. Nothing could be further from the truth, in particular for large, flexible ligands.
One possible route to access with appreciable accuracy the equilibrium constant, Keq, consists in a
series of geometrical transformations, wherein the substrate is progressively restrained in the native con-
formation, position and orientation of the bound state [1, 2]. In other words, granted that finite-length
simulations cannot capture the conformational variability of the ligand in the course of the binding pro-
cess, nor sample the available 8π2 of solid angle, it is preferable to tether the molecule of interest with a
suitable set of restraints and subsequently evaluate the free-energy cost due to these restraints.
Introduction of geometrical restraints in the simulations can be viewed as a loss of confor-
mational, positional and orientational entropies, the contribution to the free energy of which
must be determined independently [3].
Under the assumption that the protein does not undergo perceptible conformational change in the course
Protein:ligand standard binding free energies 7
of ligand association, which is a valid premise in the instance of the SH3 domain of Abl binding p41, the
series of geometrical transformations involves the following steps:
(1) Determine the free-energy change for deforming the ligand in the protein:ligand complex (“site”),
using as a collective variable the root mean-square deviation with respect to the conformation of the
ligand in the bound state.
(2) Determine the free-energy change for reorienting the ligand in the protein:ligand complex (“site”)
about the first Euler angle, Θ, using as a collective variable a valence angle, restraining the confor-
mation to that of the bound state (uc).
(3) Determine the free-energy change for reorienting the ligand in the protein:ligand complex (“site”)
about the second Euler angle, Φ, using as a collective variable a dihedral angle, restraining the
conformation and the orientation with respect to Θ (uΘ) to that of the bound state.
(4) Determine the free-energy change for reorienting the ligand in the protein:ligand complex (“site”)
about the third Euler angle, Ψ, using as a collective variable a dihedral angle, restraining the confor-
mation and the orientation with respect to Θ and Φ (uΦ) to that of the bound state.
(5) Determine the free-energy change for changing the position of the ligand in the protein:ligand com-
plex (“site”) about the first polar angle, θ, using as a collective variable a valence angle, restraining
the conformation and the orientation with respect to Θ, Φ and Ψ (uΨ) to that of the bound state.
(6) Determine the free-energy change for changing the position of the ligand in the protein:ligand com-
plex (“site”) about the second polar angle, φ, using as a collective variable a dihedral angle, restrain-
ing the conformation, the orientation with respect to Θ, Φ and Ψ, and the position with respect to θ
(uθ) to that of the bound state.
(7) Determine the free-energy change for changing the position of the ligand in the protein:ligand com-
plex (“site”) along the vector connecting their respective center of mass, r, using as a collective
variable a Euclidian distance, restraining the conformation, the orientation with respect to Θ, Φ and
Ψ, and the position with respect to θ and φ (uφ) to that of the bound state.
(8) Determine the free-energy change for deforming the ligand in the free, unbound state (“bulk”), using
as a collective variable a root mean-square deviation with respect to the conformation of the ligand
in the bound state.
Once the eight, individual potentials of mean force are generated, the “geometrical” equilibrium constant
is determined according to,
Protein:ligand standard binding free energies 8
Kgeomeq =
∫site
d1∫
dx e−βU∫site
d1∫
dx e−β(U+uc)
×
∫site
d1∫
dx e−β(U+uc)∫site
d1∫
dx e−β(U+uc+uo)
×
∫site
d1∫
dx e−β(U+uc+uo)∫site
d1∫
dx e−β(U+uc+uo+ua)
×
∫site
d1∫
dx e−β(U+uc+uo+ua)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U+uc+uo)
×
∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U+uc+uo)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U+uc)
×
∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U+uc)∫bulk
d1 δ(x1 − x∗1)∫
dx e−βU(7)
where uo = uΘ+uΦ+uΨ denotes the orientational potential and ua = uθ+uφ, the polar-angle potential.
The three first contributions correspond, respectively, to the conformational, orientational and positional
restraints acting on the ligand and represent six independent potentials of mean force. The fourth contri-
bution corresponds to the reversible separation of the ligand from the binding site of the protein towards
the bulk environment. It should be clearly understood that this potential-of-mean-force is performed,
keeping all the other relevant conformational, positional and orientational degrees of freedom at their
equilibrium value by means of geometrical restraints. The fifth contribution highlighted in cyan corre-
sponds to the reorientation of a rigid body, i.e., the ligand restrained in its native conformation when
bound to the protein, and can, thus, be evaluated analytically. The sixth and last contribution corre-
sponds to free-energy cost to restrain in the bulk environment the conformation of the ligand to that in
the native bound state. It constitutes the eighth and last potential-of-mean-force calculation towards the
determination of Keq.
1.3. Alchemical transformations
The second route for the computation of the protein : ligand equilibrium constant consists of a series
of alchemical transformations [17]. This route follows the thermodynamic cycle depicted in Figure 1,
where the ligand, either in the free, unbound state, or in the bound state, is not decoupled reversibly
from the environment as is, i.e., ligand0, but rather restrained in its native conformation, position and
orientation, i.e., ligand∗, characteristic of the bound state. The rationale for the introduction of a suitable
set of geometric restraints in the alchemical route is rooted in the so-called wandering-ligand problem
[18, 19, 20], whereby, upon decoupling of the substrate from its environment, the former becomes free
to drift away from the binding site. It follows that in the backward, coupling transformation, the ligand
is unlikely to form in the binding site the relevant, native network of intermolecular interactions, hence,
violating the underlying principle of thermodynamic micro-reversibility.
Protein:ligand standard binding free energies 9
protein + ligand* protein:ligand*
protein + nothing* protein:nothing*
ΔG*
protein + ligand0 protein:ligand0ΔG0
ΔGasitebulkΔGa
ΔGcsitebulkΔGcΔGositebulkΔGoΔGpsitebulkΔGp Figure 1: Complete thermodynamic cycle delineat-
ing the reversible association of a ligand to a proteinand the necessary steps to determine the correspond-ing standard binding free energy, ∆G0. “ligand0” de-notes an unrestrained ligand, whereas “ligand∗” refersto a ligand restrained in its native conformation, posi-tion and orientation in the protein:ligand complex.
Under the same assumption made previously that the protein does not undergo an appreciable confor-
mational modification as the ligand binds to it, the series of alchemical transformations involves the
following steps:
(1) To circumvent the wandering-ligand problem arising when the substrate is decoupled from the pro-
tein, restrain the former in the conformation, orientation and position representative of the pro-
tein:ligand complex.
(2) Determine in the bound state (“site”) the alchemical free-energy change for decoupling reversibly
from the protein the ligand restrained in its native conformation, orientation and position, i.e.,
protein : ligand∗.
(3) Determine in the free, unbound state (“bulk”) the alchemical free-energy change for decoupling
reversibly from the bulk water the ligand restrained in its native conformation, i.e., ligand∗.
(4) Determine in the bound state (“site”) the free-energy contribution for maintaining the ligand in its
native conformation, orientation and position by means of restraining potentials (uc, uΘ, uΦ, uΨ, uθ,
uφ and ur). Towards this end, the set of collective variables utilized are a root mean-square deviation
with respect to the conformation of the ligand in the bound state, the three Euler angles, Θ, Φ and
Ψ, the two polar angles, θ and φ, and the Euclidian distance, r, separating the substrate from the
protein.
(5) Determine in the free, unbound state (“bulk”) the free-energy contribution for maintaining the ligand
in its native conformation by means of a restraining potential (uc). Towards this end, the chosen
collective variable is the root mean-square deviation with respect to the conformation of the substrate
when bound to the protein.
Once the two alchemical free-energy changes, i.e., the reversible decoupling in the bound and in the
unbound states, alongside the eight free-energy contributions due to the geometrical restraints, i.e., seven
terms when the ligand is in the bound state and a single one when it is in the unbound state, are computed,
the “alchemical” equilibrium constant is determined according to,
Protein:ligand standard binding free energies 10
Kalcheq =
∫site
d1∫
dx e−βU1∫site
d1∫
dx e−β(U1+uc)
×
∫site
d1∫
dx e−β(U1+uc)∫site
d1∫
dx e−β(U1+uc+uo)
×
∫site
d1∫
dx e−β(U1+uc+uo)∫site
d1∫
dx e−β(U1+uc+uo+ua)
×
∫site
d1∫
dx e−β(U1+uc+uo+ua)∫site
d1∫
dx e−β(U1+uc+uo+ua+ur)
×
∫site
d1∫
dx e−β(U1+uc+uo+ua+ur)∫site
d1∫
dx e−β(U0+uc+uo+ua+ur)
×
∫bulk
d1∫
dx e−β(U0+uc+uo+ua+ur)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U0+uc+uo)
×
∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U0+uc+uo)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U0+uc)
×
∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U0+uc)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U1+uc)
×
∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U1+uc)∫bulk
d1 δ(x1 − x∗1)∫
dx e−βU1
(8)
where U0 characterizes the non-interacting (“ghost”) state of the substrate, and U1 the state in which it
is coupled to the environment. The above equation follows precisely the different steps described in the
thermodynamic cycle of Figure 1. Its first two contributions arise, respectively, from the conformational
and the orientational restraints acting on the ligand. Its third contribution corresponds to the polar-
angle term, i.e., ua = uθ + uφ, of the positional restraints. Its fourth contribution corresponds to the
translational term, i.e., ur, of the positional restraints. The fifth and the eighth contributions highlighted
in magenta correspond to the alchemical transformations, whereby the substrate is decoupled reversibly
from its environment, respectively in the bound and in the unbound states. The sixth and the seventh
contributions highlighted in cyan are analytical ones and account for the reorientation and translation of
a rigid body in an homogenous bulk liquid. The ninth and last contribution represents the deformation of
the ligand in the free, unbound state.
One of the main goals of the present tutorial is to demonstrate that the geometrical and the
alchemical routes are overall equivalent, and that the standard binding affinity of p41 towards
the SH3 domain of Abl obeys Keq = Kgeomeq = Kalch
eq .
Protein:ligand standard binding free energies 11
2. Setting up the simulations
The starting point of the setups for the free-energy calculations is PDB entry 1bbz. Generation of
the structure, PSF files and the initial configurations will be performed using the psfgen module of
VMD [21]. Because the two routes for calculatingKeq, from whence the standard free energy of binding
of Abl to the SH3 domain of Abl is inferred, are inherently of different nature, distinct setups will be
considered for the geometrical and the alchemical transformations.
2.1. Construction of the molecular assemblies
In the geometrical route, as has been outlined above, eight different potentials of mean force ought to be
determined. To optimize the computational effort, three distinct simulation cells will be devised, namely,
(1) the protein:ligand assembly solvated in a cubic box of TIP3P water with adequate padding to avoid
periodicity-induced artifacts. Given the size of the dimer and the charged termini of the substrate,
solvation by about 3,400 water molecules, which corresponds to a dimension of about 48× 48× 48A3, has proven appropriate. This setup will be utilized for the computation of potentials of mean
force, wherein the ligand remains associated to the protein, i.e., the conformational, the Euler-angle,
orientational and the polar-angle, positional terms of Equation 7.
(2) the protein:ligand assembly solvated in a rectangular box of TIP3P water with adequate padding to
avoid periodicity-induced artifacts. This setup will be utilized for the computation of the separation
potential of mean force. Considering the extent of the separation of the substrate from the binding
site, a cell of dimension of about 48× 48× 68 A3 is perfectly adapted. Choice of a rectangular cell
supposes that relevant geometrical restraints are enforced to prevent the complex from tumbling as
the ligand moves away from the protein, i.e., collective variables orientation and distance
with respect to a dummy particle. Alternatively, the user can resort to a cubic box of adequate di-
mension to allow the dimer to tumble in the course of the separation. A simulation cell of dimension
roughly equal to 60 × 60 × 60 A3, i.e., about 6,300 water, has proven adapted to the determination
of the separation potential of mean force.
(3) the free ligand solvated in a cubic box of TIP3P water with adequate padding to avoid periodicity-
induced artifacts. For simplicity, the same simulation cell will be utilized for the solvation by about
3,400 water molecules has proven appropriate and corresponds to a dimension of about 48×48×48A3. This setup will be utilized to determine the potential of mean force for deforming the ligand in
the unbound state.
Conversely, in the alchemical route, only two different setups need to be devised, corresponding to the
bound and unbound states of the substrate, or the lefthand– and righthand sides of the thermodynamic
Protein:ligand standard binding free energies 12
cycle of Figure1, namely,
(1) the protein:ligand assembly solvated in a cubic box of TIP3P water with adequate padding to avoid
periodicity-induced artifacts. Given the size of the dimer, solvation by about 3,400 water molecules
has proven appropriate and corresponds to a dimension of about 48 × 48 × 48 A3. This setup will
be utilized for the computation of the alchemical free-energy changes, wherein the restrained ligand
is decoupled reversibly from the protein, together with the free-energy contributions arising from
conformational, orientational and positional restraints.
(2) the free, unbound ligand solvated in a cubic box of TIP3P water with adequate padding to avoid
periodicity-induced artifacts. For simplicity and given the size of the ligand and its charged termini,
solvation by about 3,400 water molecules has proven appropriate and corresponds to a dimension of
about 48 × 48 × 48 A3. This setup will be utilized to determine the alchemical free-energy change
arising from the reversible decoupling of the unbound substrate from its aqueous environment, and
the free-energy contribution incurred in its deformation with respect to the reference conformation
in the protein:ligand complex.
2.2. Definition of the collective variables
A fundamental aspect of the approach detailed in this tutorial, following either a geometrical route, or an
alchemical one, is the introduction of a suitable set of harmonic restraints to preserve the conformation,
the orientation and the position of the ligand as it is dissociated reversibly from the protein.
Since molecular dynamics, in general, cannot capture the deformation, reorientation and
repositioning of the ligand as it binds to the protein, it is preferable to restrain the former and
evaluate the free-energy contribution due to the loss of configurational entropy.
Incorporation of restraints in the free-energy calculations presupposes that the position and the orienta-
tion of the substrate with respect to the protein can be described without ambiguity. Toward this end, a
frame of reference is designed, from which the position of the ligand can be defined [2]. This frame of
reference is formed by three groups of atoms of the protein, { P1, P2, P3 }. Likewise, a frame of reference
formed by three groups of atoms of the ligand, { L1, L2, L3 }, is introduced to define the orientation of
the latter, as depicted in Figure 2.
A-posteriori verification utilizing alternate frames of reference for both the protein and its substrate,
demonstrates that the equilibrium binding constant does not depend on the choice of the triplets { P1,
P2, P3 } and { L1, L2, L3 }, provided unambivalent definition of the restrained degrees of freedom,
notably angles θ and Θ — i.e., in other words, avoiding aligned groups of atoms is crucial to guarantee
geometrical invariance of Keq.
Protein:ligand standard binding free energies 13
P3P2
P1
L1L2
L3r
�
⇥
✓�
Figure 2: Binding of proline-rich ligand p41 to the SH3 domain of tyrosinekinase Abl. The position of p41 with respect to the protein is expressedby means of a Euclidian distance, r, and two polar angles, θ and φ, whichaltogether form a set of spherical coordinates. The relative orientation of theligand is determined by the three Euler angles, Θ, Φ and Ψ. Position andorientation of the substrate rely on the definition of nonambiguous groups ofatoms, referred here to as { P1, P2, P3 } and { L1, L2, L3 } for the proteinand for the ligand, respectively. As a rule of thumb, should the substrate beseparated from the protein, for instance, in the z direction of Cartesian space,it would be a good idea to align P1 and L1 along that direction.
In the particular example of p41 binding to the SH3 domain of tyrosine kinase Abl, a suitable choice
for { L1, L2, L3 } could be the backbone atoms of residue Ser5, of residue Ser3 and of residue Ala1,
respectively. For { P1, P2, P3 }, the backbone atoms of residue Leu25, of residue Glu38 and of residue
Gly46 represent a reasonable choice, albeit, once again, not a unique one. The reader of this tutorial is
strongly encouraged to investigate other possible selections of atoms and show that Keq is independent
of these selections.
2.3. The geometrical-route simulations
Once the position and the orientation of the ligand with respect to the protein are fully defined, the
geometrical restraints outlined in section 1.2. can be safely introduced, one after the other, just like ma-
tryoshka, or Russian nested dolls [1,2]. One-dimensional potentials of mean force are then determined to
assess the contribution to the binding free energy of these geometrical restraints. To attain this objective,
the adaptive biasing force, or ABF algorithm will be utilized. ABF is not the only option available here —
the reader of this tutorial is suggested to try alternate approaches like umbrella-sampling-like stratifica-
tion strategies [22, 17], wherein the reaction pathway is broken down into multiple, overlapping, narrow
windows. In the latter numerical scheme, harmonic potentials confine sampling to the region of interest
of the collective variable. The unnormalized probability distribution and, hence, the free-energy land-
scape are recovered self-consistently by means of such algorithms as the weighting histogram analysis
method, or WHAM [23] — or, alternatively, by piecewise matching of the individual potential-of-mean-
force segments. Umbrella sampling and its replica-exchange variant [24, 25] are available in NAMD.
In the latter instance, given the size of the molecular assembly, a significant number of computer cores
ought to be reserved to see the real benefit of a replica-based approach.
Should the reader decide to resort solely to the adaptive biasing force algorithm, a number of caveats,
Protein:ligand standard binding free energies 14
possibly shortcomings of the methodology, ought to be considered, namely,
Historically, ABF calculations involving geometric restraints were burdened by a stringent
limitation in the colvars implementation of the algorithm to discriminate between thermo-
dynamic forces arising from the potential energy function and forces arising from external
harmonic potential. In recent versions of NAMD, thermodynamic forces are measured by
including contributions of both the potential energy function and the enforced geometrical
restraints.
The aforementioned limitation of the colvars implementation of the ABF algorithm in
earlier versions of NAMD can be circumvented by turning to an extended, generalized co-
ordinate, wherein the collective variable is coupled by means of a stiff spring to a fictitious
particle upon which the biasing force acts. This approach is referred to as extended adaptive
biasing force, or eABF method [26], and will be utilized here as an alternative to plain ABF.
It is invoked by issuing the following command in the definition of the collective variable:
extendedLagrangian on
The true free-energy change and its gradient are recovered through a deconvolution of
the harmonic-spring contribution from that of the force field, performed on the fly within
colvars [27, 28].
Furthermore, particular care ought to be taken when defining the collective variables, specifically,
The collective variable should be fully decoupled from degrees of freedom handled by the
rattle algorithm [13,14]. The latter engender constraint forces, which contaminate the measure
of the instantaneous thermodynamic force acting along the model reaction coordinate.
Since turning off rattle and, hence, decreasing the integration time step, does not consti-
tute a reasonable option, collective variables featuring heavy atoms pertaining to constrained
chemical bonds should also include the hydrogen atoms thereof, e.g., atoms CA and HA.
The reader will find in the present distribution a generic colvars file, in which all relevant collective
variables are defined [15]. The adaptive biasing force (abf) method is applied to the Euclidian distance,
r, separating the substrate from the protein, i.e.,
Protein:ligand standard binding free energies 15
colvar {name R
width 1.0
lowerboundary 0.0upperboundary 40.0
distance {forceNoPBC yesgroup1 {
atomnumbers { 367 368 369 370 384 385 }}group2 {
atomnumbers { 949 950 951 952 958 959 }}
}}
whilst the other angular variables, Θ, Φ, Ψ, θ and φ, are set to their equilibrium value (harmonic), e.g.,
colvar {name Psi
width 1.0
lowerboundary -180.0upperboundary 180.0
dihedral {group1 {
atomnumbers { 575 576 577 578 588 589 }}group2 {
atomnumbers { 367 368 369 370 384 385 }}group3 {
atomnumbers { 949 950 951 952 958 959 }}group4 {
atomnumbers { 917 918 919 920 926 927 }}
}}
harmonic {colvars PsiforceConstant 0.1centers 23.0
}
The conformational variable (rmsd) is set to zero, which corresponds to restraining the ligand in its
native form, when it is bound to the protein.
Protein:ligand standard binding free energies 16
2.4. The alchemical-route simulations
As has been discussed in 1.3., the alchemical route relies upon the introduction of geometrical restraints,
the substrate being decoupled reversibly from the protein in its native conformation, position and orien-
tation, i.e., ligand∗ in the thermodynamic cycle of Figure 1. For consistency reasons, use will be made
here of the same groups of atoms, { P1, P2, P3 } and { L1, L2, L3 }, introduced previously to define the
position and the orientation of the ligand with respect to the protein. Just like in the geometrical route,
the relevant angular variables, Θ, Φ, Ψ, θ and φ, are set to their equilibrium value, employing harmonic
potentials (harmonic). However, in sharp contrast with the geometrical route, wherein the ligand is
separated reversibly from the protein by means of a Euclidian distance, r, the latter will be also set here
to its equilibrium value, inferred, like the other collective variables, from an adequately long thermaliza-
tion simulation of the protein:ligand complex. The substrate will be restrained to its native conformation
by imposing that the root mean-square deviation (rmsd) with respect to the structure representative of
the bound state be equal to zero.
Though the final result is independent from the force constants in the conformational, posi-
tional and orientational restraints [5], their choice necessarily impacts the alchemical free-
energy change. Looser restraints, allowing the ligand to rearrange spatially with respect to
the protein, affect thermodynamic micro-reversibility. Conversely, setting exaggeratedly large
force constants results in possible instabilities in the trajectory.
Setting force constants of 10 kcal/mol·A2 for the distance and rmsd collective variables,
and of 0.1 kcal/mol·degrees2 for the angle and dihedral collective variables constitutes
a reasonable choice [2]. The reader is reminded that force constants in NAMD are expressed
in units of the bin widths. Hint: to confirm that the force constant used in the simulation
is the intended one, search the beginning of the simulation log file for a line similar to the
following:
colvars: The force constant for colvar "R" will be rescaled to 250
according to the specified width..
As a good practice, the reader is advised to perform a bidirectional transformation, wherein the substrate
is decoupled from the protein, i.e., forward transformation, prior to being coupled again to the binding
site, i.e., backward transformation. Convergence of alchemical free-energy calculations, in general, and
free-energy perturbation, or FEP calculations, in particular, can be markedly slow [17, 29], in particular
when the vanishing, or appearing compound is sizable, thus, corresponding to a significant perturbation
of the molecular assembly. In the case of p41, a fine stratification strategy is recommended, consisting
of a minimum of 50 windows, in each of which at least 200,000 molecular-dynamics steps, e.g., 50,000
thermalization steps (alchEquilSteps) followed by 150,000 data-collection steps, are generated,
Protein:ligand standard binding free energies 17
source fep.tcl
alch onalchType FEPalchFile bound.fepalchCol B
alchOutFile forward.fepoutalchOutFreq 10
alchVdwLambdaEnd 1.0alchElecLambdaStart 0.5
alchEquilSteps 50000set numSteps 200000set numMinSteps 1000
runFEPmin 0.0 1.0 0.02 $numSteps $numMinSteps $temperature
In the above free-energy perturbation section of the NAMD configuration file, use is made of fep.tcl,
a Tcl script for setting the range of the general-extent parameter λ over which free-energy differences
are computed.
The reader is suggested to use parseFEP to monitor the convergence of the alchemical transforma-
tions and ultimately combine the forward and backward simulations to obtain the maximum-likelihood
Bennett-acceptance-ratio, or BAR estimator of the free energy [30, 31].
Evaluating the contribution to the standard binding free energy of the geometrical restraints is appre-
ciably less demanding from a computational perspective. The contribution of each restraint acting on
a collective variable is determined alchemically, by decreasing in a stepwise fashion the force constant
from its nominal value to zero — and proceeding similarly in the opposite direction. Yet, in contrast with
the free-energy calculations wherein the substrate is decoupled reversibly from the protein, the present
simulations will be carried out in the framework of thermodynamic integration, within the colvars
module [15],
harmonic {colvars Rcenters 21.3targetNumSteps 100000targetEquilSteps 25000lambdaSchedule 1 0.9999 0.999 0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0forceConstant 0.0targetForceConstant 10.0targetForceExponent 4
In this example, a harmonic potential is imposed on the Euclidian distance, r, separating the ligand from
the protein, defined previously. The nominal value of the force constant, k, utilized in the course of the
alchemical free-energy perturbation calculation is 10 kcal/mol·A2. Here, this represents the value of the
Protein:ligand standard binding free energies 18
force constant at λ = 1, i.e., k1, (targetForceConstant). At λ = 0, the force constant, k0, is equal
to 0 kcal/mol·A2 (forceConstant). The instantaneous value of the force constant at λ is given by,
k = k0 + (k1 − k0) {λn(1− λ) + λ [1− (1− λ)n]} (9)
where n is referred to as targetForceExponent. For each value of lambdaSchedule and for
each collective variable, dG/dλ is computed. The free-energy contribution due to the restraint of interest
is obtained by integrating the gradient profile.
3. Running and analyzing the simulations
Before embarking on the present tutorial, the reader ought to be reminded that the free-energy calcula-
tions proposed herein correspond to the state-of-the-art approach to address protein:ligand binding, and
necessitate several hours of computer time on the parallel architecture of a commodity cluster.
Determination of Keq, following either the geometrical route, or the alchemical route, is
a computationally demanding endeavor and, hence, cannot be undertaken within the time
usually allotted to an hands-on tutorial. The calculations described herein require a significant
amount of computer time to attain fully converged free-energy estimates.
For the geometrical route, stratification of the reaction pathway is always recommended [22, 13, 14],
with a number of strata as small as two for the angular free-energy profiles to as large as ten. Typical
simulation times range from as low as 2 ns for the angular free-energy profiles to as much as 60 ns for
the separation potential of mean force. For the alchemical route, stratification is also strongly advised,
and typical simulation times can amount to as low as 6 ns to evaluate the free-energy contribution of
the geometrical restraints in the bound state to as much as 40 ns for the BAR estimator. The reader is
encouraged to examine alternate sampling strategies capable of offering the best precision:cost ratio.
3.1. The geometrical route
As has been discussed previously, completion of the geometrical route requires the determination of eight
independent potentials of mean force, i.e., seven for the bound state and one for the unbound state. To
explore the performance of an alternate algorithm to standard ABF, we have proposed that the extended
adaptive biasing force, or eABF algorithm be employed. Under these premises, a post-treatment of the
data generated in the course of the simulation is necessary.
Protein:ligand standard binding free energies 19
The grad and pmf files written by colvars in an eABF calculation reflect the average
force acting on the extended, generalized coordinate — not on the actual collective variable,
and should not be used as-is for analysis purposes.
Access to the true gradient imposes a deconvolution of the contribution due to the harmonic spring from
that of the potential energy function. This deconvolution is performed on the fly in colvars, employing
either the Zheng and Yang estimator [32], or the corrected z-averaged restraint estimator [28]. The first
estimator is based on an umbrella integration (UI) [33],
G′(ξ′) =(
dGdξ
)ξ′
=
∑Ξ′
N(ξ′,Ξ′)[
(ξ′ − 〈ξΞ′〉)βσ2
Ξ′−Kξ(ξ′ − Ξ′)
]∑Ξ′
N(ξ′,Ξ′)(10)
where ξ(x) is the collective variable, function of the positions of the real particles of the molecular
assembly and is restrained through potential 1/2 Kξ(ξ′ − ξΞ′)2 to a one-dimensional fictitious particle,
Ξ, moving dynamically. N(ξ′,Ξ′) is the number of samples ξ′ collected from the Ξ′-restrained ensemble,
which is assumed to be Gaussian.
The second estimator is the corrected z–averaged restraint (CZAR) estimator [28],
G′(z′) =(
dGdz
)z′
= − 1β
d ln ρ(z′)dz′
+ k(〈λ〉z′ − z′
)(11)
where z = ξ(q) is the collective variable, λ is the extended variable harmonically coupled to z by means
of a stiff spring of force constant k, and ρ(z) is the observed distribution of collected variable z, upon
which the time-dependent eABF bias is exerted.
The end-user may choose at the level of the colvars configuration file between the CZAR
(CZARestimator) and the Zheng and Yang (UIestimator) estimator. The former is
the default for eABF calculations. Depending upon the choice of the estimator, NAMD will
generate the usual potential-of-mean-force (for one-dimensional free-energy profiles only),
gradient and histogram files with a distinctive czar or UI suffix.
Once the different geometric free-energy calculations are completed, the different potentials of mean
force can be post-processed numerically, using, for instance, XMGrace to evaluate the ratios of con-
figurational integrals embodied in Equation 7, i.e., Data −→ Transformations −→ Evaluate
expression/ Integration.
To illustrate how individual contributions arising from the geometrical restraints ought to be determined,
two collective variables will be chosen, namely Euler angle Ψ and Euclidian distance r. The free-energy
change reflecting the reorientation about Ψ of the substrate in the binding site is depicted in Figure 3.
Protein:ligand standard binding free energies 20
Contribution of this Euler angle to the overall binding affinity amounts to,
e+β∆GsiteΨ =
∫site
d1∫
dx e−β(U+uc+uΘ+uΦ)∫site
d1∫
dx e−β(U+uc+uΘ+uΦ+uΨ)(12)
In practical terms, the configurational and spatial integrals over the full potential energy function can be
reduced to one over the potential of mean force for a given reaction coordinate. Thus,
e+β∆GsiteΨ =
∫dΨ e−β[wsite(Ψ)]∫
dΨ e−β[wsite(Ψ)+uΨ](13)
where wsite(Ψ) is the PMF for Ψ in the bound state and uΨ is the harmonic restraint potential, i.e.,
uΨ = 12kΨ(Ψ−Ψ0)2. Using a force constant of 0.1 kcal/mol·degrees2 for the three Euler angles and an
equilibrium value of Θ0 = +102◦, Φ0 = −7◦ and Ψ0 = +23◦, it follows that ∆GsiteΨ = 0.4 kcal/mol.
0 10 20 30 40 50Ψ (degrees)
0
2
4
6
8
10
12
14
16
∆G
Ψsite (k
cal/m
ol)
0 10 20 30 40 50Ψ (degrees)
-2
-1
0
1
2
5G
(kca
l/mol
/deg
)
0 10 20 30 40 50Ψ (degrees)
0
2×104
4×104
6×104
8×104
Nsa
mpl
es
(a) (b)
(c)
Figure 3: Free-energy change accounting forthe reorientation of p41 with respect to the SH3domain of Abl about Euler angle Ψ defined inFigure 2 (a). This potential of mean force wasobtained, using two windows, approximately40◦ wide. Its harmonic nature, highlighted in adashed, light curve, is noteworthy. The readeris strongly advised to check the continuity ofthe gradient in adjacent windows (b). Here, twowindows were utilized to cover the 80◦ rangespanned by the collective variable. The readeris equally strongly advised to check samplinguniformity across the reaction pathway (c).
Compared with the evaluation of the free-energy contributions associated to the conformational and to
the angular variables, determination of free-energy change arising from the separation of the ligand from
the protein, shown in Figure 4, requires additional care. Returning to the original definition of Kgeomeq ,
this contribution corresponds to the fourth term of equation 7,∫site
d1∫
dx e−β(U+uc+uo+ua)∫bulk
d1 δ(x1 − x∗1)∫
dx e−β(U+uc+uo)= S∗I∗ (14)
wherein
S∗ = r∗2∫ π
0dθ sin θ
∫ 2π
0dφ e−βua and I∗ =
∫site
dr e−β[w(r)−w(r∗)] (15)
Protein:ligand standard binding free energies 21
S∗ is a surface term and represents the fraction of the sphere of radius r∗, centered about the binding
site of the protein, accessible to the substrate. The choice of r∗ should not impact the final value of Keq,
provided that it is sufficiently far from the binding site [2]. In the particular example of Figure 4 and for
illustrative purposes, r∗ will be chosen equal to 35 A. It follows that I∗ = 3.16× 1012 A and S∗ = 17.15
A2, based on the equilibrium values of polar angles θ and φ appearing in the expression of ua.
20 25 30 35 40r (Å)
-20.0
-15.0
-10.0
-5.0
0.0
5.0
w(r
) (kc
al/m
ol)
Figure 4: Free-energy change accounting forthe separation of p41 from the SH3 domainof Abl, restraining conformational, orientationaland positional degrees of freedom of the sub-strate to their equilibrium value in the boundstate. It is the piece de resistance of the geo-metrical route. The apparent decay of the free-energy profile at large separations reflects theprobabilistic definition of the potential of meanforce, wherein the geometric entropy term is notremoved from the gradient. This definition ofw(r) is consistent with the basic expression 6of the standard binding affinity, subsuming com-plete averaging of all other degrees of freedom.
To complete the calculation of the standard binding free energy of p41 to the SH3 domain of tyrosine
kinase Abl, the various contributions arising from the geometrical restraints, either in the bound state,
or in the unbound state, ought to be determined in the spirit of the computation of e+β∆GsiteΨ detailed
previously. The different pieces of the puzzle are then pasted together to recover the binding constant,
Keq = S∗I∗ e−β(∆Gbulkc +∆Gbulk
o −∆Gsitec −∆Gsite
o −∆Gsitea ) (16)
where the second term of the exponent, ∆Gbulko , the free-energy contribution arising from the reorienta-
tion about the three Euler angles of a rigid body in a homogenous, bulk liquid, is calculated analytically,
e−β∆Gbulko =
18π2
∫ π
0dΘ sin Θ
∫ 2π
0dΦ∫ 2π
0dΨ e−βuo (17)
and uo is the sum of uΘ, uΦ and uΨ.
When carrying out integrals over the potentials of mean force and restraints, pay careful
attention to the units! The analytical integrals in Equation 17 are in radians but the restraints,
including equilibrium values and force constants, are defined in degrees.
Following the geometrical route and assuming convergence of the different potentials of
mean force of Equation 7, the final standard binding free energy, ∆G0 = −1/β lnKeq,
amounts to −7.7 kcal/mol [7, 8].
Protein:ligand standard binding free energies 22
3.2. The alchemical route
The piece de resistance of the alchemical route is without a doubt the alchemical transformation, wherein
the restrained ligand is decoupled reversibly from its environment [20]. Even in the homogenous bulk
aqueous medium, convergence of the free-energy perturbation calculation can be appreciably long, which
can be understood in terms of rearrangement of the environment in response to the perturbation. The
reader is strongly suggested to monitor convergence of the free energy in the different strata forming
the reaction pathway that separate the coupled state from the decoupled one. Furthermore, since the
transformation is run bidirectionally, the reader ought to ascertain that thermodynamic micro-reversibility
is satisfied, verifying, using for instance parseFEP [34], that the underlying probability distributions
for the various windows overlap appropriately [31].
As can be seen in Figure 5, which illustrates the reversible decoupling of p41 from the binding site
of the SH3 domain of Abl, it is crucial that the bidirectional transformation reflects thermodynamic
micro-reversibility, manifested here by a virtual absence of hysteresis. The shape of the two free-energy
profiles ought to be commented. It mirrors the schedule chosen to extinguish, or ignite intermolecular
interactions, i.e., alchVdwLambdaEnd set to 1.0 and alchElecLambdaStart set to 0.5. When the
substrate is removed from the protein, electrostatic interactions are turned off faster than van der Waals
interactions. For λ > 0.5, only the latter remain, which explains the point of inflection, midway on the
reaction path.
0 0.2 0.4 0.6 0.8 1λbound
-40.0
-30.0
-20.0
-10.0
0.0
10.0
∆G
site
(kca
l/mol
)co
uple
Figure 5: Free-energy change accounting fordecoupling reversibly p41 from the SH3 do-main of Abl, obtained using free-energy per-turbation. The black profile corresponds tothe forward transformation, wherein the re-strained substrate is removed from the pro-tein by turning off intermolecular interactions.A similar transformation is performed in thebulk aqueous environment, hence the popu-lar designation of double-annihilation simula-tion [20]. The red profile characterizes thebackward transformation, wherein the ligand isgrown back in the binding site.
An example of the probability distributions obtained in the course of the bidirectional free-energy per-
turbation calculation is provided in Figure 6. Overlap of the histograms P0(∆U) and P1(∆U), for the
forward and the backward transformations, respectively, is generally good. The reader is reminded that
the relative inaccuracy of the forward simulation is related to the area under P1(∆U) where there is no
overlap with P0(∆U) [17].
Protein:ligand standard binding free energies 23
Visual examination of the underlying histograms of probability distributions P0(∆U) and
P1(∆U), using parseFEP may not be sufficient to probe convergence. The reader is
suggested to compare P0(∆U) with P1(∆U)e+β∆U and, symmetrically, P1(∆U) with
P0(∆U)e−β∆U , since e−β∆UP0(∆U) = e−β∆GP1(∆U) [31].
Figure 6: Histograms of theprobability dstributions P0(∆U)and P1(∆U) for the forward, i.e.,decoupling, and backward, i.e.,coupling, transformations of p41bound to the SH3 domain of ty-rosine kinase Abl. The abscissais ∆U , the perturbation term, ex-pressed in kcal/mol.
In a sense, post-treatment of the free-energy calculations of the alchemical route is simpler than that of the
geometrical route, requiring no tedious evaluation of ratios of configurational integrals. The contribution
from the geometrical restraints is extracted from the NAMD output files,
> grep "dA/dLambda" rest-01.logcolvars: Lambda= 1 dA/dLambda= 0.995394colvars: Lambda= 0.9999 dA/dLambda= 0.827822colvars: Lambda= 0.999 dA/dLambda= 0.760705colvars: Lambda= 0.99 dA/dLambda= 0.800472colvars: Lambda= 0.9 dA/dLambda= 0.783329colvars: Lambda= 0.8 dA/dLambda= 0.620776colvars: Lambda= 0.7 dA/dLambda= 1.12615colvars: Lambda= 0.6 dA/dLambda= 0.47743colvars: Lambda= 0.5 dA/dLambda= 0.305801colvars: Lambda= 0.4 dA/dLambda= 0.16728colvars: Lambda= 0.3 dA/dLambda= 0.0677748colvars: Lambda= 0.2 dA/dLambda= 0.0269255colvars: Lambda= 0.1 dA/dLambda= 0.00323781colvars: Lambda= 0 dA/dLambda= 0
It suffices then to integrate the gradients, using for instance XMGrace to recover the associated free-
energy profile, from whence the different conformational, orientational and positional terms can be in-
ferred, as illustrated in Figure 7.
To complete the thermodynamic cycle of Figure 1, the analytical terms appearing in Equation 8 ought to
be added, namely, for the positional contribution,
Ft e−β∆Gbulka =
∫ ∞0
dr r2 e−βur
∫ π
0dθ sin θ
∫ 2π
0dφ e−βua (18)
where ua = uθ + uφ and Ft is a translational factor with units of volume [1]. For the orientational
Protein:ligand standard binding free energies 24
0 0.2 0.4 0.6 0.8 1λr
0.0
0.1
0.2
0.3
0.4
0.5
0.6
∆G
site (k
cal/m
ol)
r
Figure 7: Free-energy contribution due to therestraining potential acting on the Euclidiandistance between p41 from the SH3 domainof Abl, based on the equilibrium value in thebound state. The black and red curves de-lineate the forward and backward transforma-tions, wherein, respectively, the force constantof the geometrical restraint is decreased fromits nominal value to 0, and back. The hysteresisbetween the two free-energy profiles provides arough estimate of the inaccuracy of the calcu-lation.
contribution,
e−β∆Gbulko =
18π2
∫ π
0dΘ sin Θ
∫ 2π
0dΦ∫ 2π
0dΨ e−βuo (19)
where uo = uΘ + uΦ + uΨ. Pasting all the contributions together,
Keq = Ft e−β(∆Gbulkc −∆Gbulk
couple+∆Gbulko +∆Gbulk
a −∆Gsitedecouple−∆Gsite
a −∆Gsiter −∆Gsite
o −∆Gsitec ). (20)
Following the alchemical route and assuming convergence of the individual free-energy
calculations embodied in Equation 8, the final standard binding free energy, ∆G0 =−1/β lnKeq, amounts to −7.7 kcal/mol [7, 8].
4. Concluding remarks and extensions of the tutorial
It is not superfluous to reemphasize that this document contains advanced material, which, in glaring
contrast with introductory tutorials, requires that the reader has grasped the theoretical underpinnings
of the methodology to perform the proposed free-energy calculations with utmost efficiency. Such cal-
culations are computationally demanding and still cannot be run in a routine fashion. Aside from the
slowly-converging nature of the simulations [17,29], the size of the molecular assembly imposes adapted
computational resource, in the form of commodity clusters, chiefly making use of either central process-
ing units, or graphical processing units. Contrary to introductory tutorials, the archive does not include
all the files, input and output, but only the bare necessities to complete the study gracefully.
The methodology described herein raises, however, a number of questions and comments. Central to both
the alchemical and the geometrical routes is the choice of atoms, or groups thereof, utilized to define the
Protein:ligand standard binding free energies 25
position and the orientation of the ligand with respect to the protein. It should be clearly understood that
∆G0 is an invariant of this choice [1,2]. The reader is invited to explore alternate selections of atoms for
the triplets { P1, P2, P3 } and { L1, L2, L3 }, and show that the computed standard binding affinity is,
indeed, independent from these selections.
Closely related to the choice of atoms forming { P1, P2, P3 } and { L1, L2, L3 } is the choice
of the reference three-dimensional structure, central to the computation of the conformational
terms, and from whence the equilibrium values of the collective variables are inferred. What
is a good reference structure? A natural choice is the energy-minimized crystallographic
structure of the protein:ligand complex solvated in the aqueous environment. Another rele-
vant option consists in running an adequately long molecular-dynamics simulations past the
thermalization stage and compute an average structure, which ought to be energy-minimized
to remove spurious averaging artifacts, notably from freely rotating methyl groups.
A third comment that comes naturally to mind concerns the necessity to impose geometrical restraints on
the protein. In the particular example of p41 binding to the SH3 domain of Abl, comparison of the protein
conformation in the native complex and in the free state suggests that the spatial rearrangement that
accompanies the association process is extremely limited and, therefore, obviates the need for additional
harmonic potentials. If such were not the case, it would become necessary to restrain the conformation
of the protein to that of the bound state, incorporating in Equation 7, and its alchemical counterpart, the
relevant, supplementary contribution.
One of the virtues of this tutorial, and the study therein, is to reveal and illuminate the pros
and cons of free-energy methods, and how well suited the latter are for a given problem.
Whereas the pseudo-harmonic nature of the orientational contributions to Kgeomeq suggests
that the extended adaptive biasing force algorithm is well adapted for such collective vari-
ables, the same cannot be said for the separation potential of mean force, plagued by slow
orthogonal degrees of freedom and hidden barriers. The beauty of free-energy methods
is their ability to be combined with other algorithms targeted at improving ergodic sam-
pling [24, 25, 35, 36, 37, 38]. The reader is invited to consider alternate options to eABF, or
standard stratified umbrella-sampling approaches, notably multiple-walker schemes, wherein
replicas of the reaction coordinate models are spawned and handled by the different cores of
a parallel computer architecture [37].
Another question eminently relevant to the problem at hand is the provision of a measure of the reliability
of the free-energy estimate. As a matter of principle, just like publishing experimental binding free
energies bereft of an error bar would be inconceivable, a theoretical estimate ought to be reported with
its reliability, i.e., the combination of the accuracy, or systematic error, and of the precision, or statistical
error [39]. While the parseFEP plugin [34] of VMD supplies an estimate of both the precision and
Protein:ligand standard binding free energies 26
the accuracy of the Bennett acceptance ratio free-energy difference, the reader is invited to refer to the
specialized literature [40,41,42,29] to adopt the best strategy towards the computation of these quantities
in the framework of the adaptive biasing force, or the stratified umbrella-sampling method.
The example detailed in this tutorial, p41, a proline-rich peptide binding to the SH3 domain of
tyrosine kinase Abl, however intricate, still represents one standard binding free energy, that
is one number. An interesting extension of the present work would consist in investigating
other, structurally related substrates, thereby demonstrating that the proposed methodology
is predictive. For instance, replacing the decapeptide APSYSPPPPP by APTYHPPLPP, i.e.,
three point mutations, results in an experimental absolute decrease of the standard binding
free energy from−7.94 to−5.30 kcal/mol, whilst a single point mutation like APSYSPPPPP
by APTYSPPPPP yields a more modest increase from −7.94 to −8.72 kcal/mol [43]. The
reader is invited to explore these mutants, using the methodology described herein, and verify
that the computed standard binding free energies are consistent with the results of relative
free-energy calculations.
Appendix: Archive
This tutorial is provided with all the files necessary towards the calculation of the standard binding
constants Kalcheq and Kgeom
eq . The archive is organized in two directories, AlchemicalRoute and
GeometricalRoute. AlchemicalRoute contains four main subdirectories, which correspond to
the alchemical transformations in the bound and in the unbound states, as well as the evaluation of the
free-energy contribution due to the geometrical restraints in these two states. GeometricalRoute
contains eight main subdirectories, which correspond to the eight potential-of-mean-force calculations
required to determine the standard binding free energy.
Caveat emptor. Though the tutorial includes all the relevant files needed to perform the differ-
ent free-energy calculations described herein, the reader is strongly advised to not use these
files blindly, without checking first their contents.
As stated previously, the transformations carried out in this tutorial, either of geometrical or alchemical
nature, ought to be stratified for statistical efficiency. The various subdirectories of the archive only rep-
resent one stratum of the full reaction pathway, thus, requiring duplication of the NAMD and colvars
files.
Protein:ligand standard binding free energies 27
Caveat emptor. The reader may decide to choose triplets { P1, P2, P3 } and { L1, L2, L3} dis-
tinct from those proposed in the archive, in which case the colvars files should be adapted
accordingly. Different choices of atoms can result in different sampling and convergence
properties.
Amongst the many possible adaptations of the input files for the geometrical route, modification of
FullSamples, the threshold beyond which the time-dependent bias is applied on the collective vari-
able, may impact the overall efficiency of the adaptive biasing force simulations. In addition, the reader
might find necessary to alter the stiffness of the spring connecting the collective variable to the ficti-
tious particle, i.e., extendedFluctuation. The reader is also suggested to write information in
the colvars trajectory files, or traj files, at an appropriate frequency for the post-treatment of the
extended adaptive biasing force, or eABF simulations.
In the case of the alchemical route, similar adaptations ought to be considered. For the alchemical trans-
formations, the stratification strategy has proven crucial and can be tailored to increase both efficiency
and reliability. Stratification is also important when zeroing reversibly the force constant of the geomet-
rical restraints utilized to preserve the conformation, the orientation and the position of the ligand repre-
sentative of the bound state, i.e., lambdaSchedule. The number of steps, i.e., targetNumSteps,
including the thermalization stage, i.e., targetEquilSteps, is of paramount importance to insure
both convergence and thermodynamic micro-reversibility of the simulations.
Acknowledgments
Jerome Henin and Giacomo Fiorin are gratefully acknowledged for helpful discussions about the
colvars module.
References
[1] Woo, H. J.; Roux, B., Calculation of absolute protein–ligand binding free energy from computer
simulations, Proc. Natl. Acad. Sci. USA 2005, 102, 6825–6830.
[2] Gumbart, J. C.; Roux, B.; Chipot, C., Standard binding free energies from computer simulations:
What is the best strategy?, J. Chem. Theor. Comput. 2013, 9, 794–802.
[3] Hermans, J.; Wang, L., Inclusion of loss of translational and rotational freedom in theoretical esti-
mates of free energies of binding. Application to a complex of benzene and mutant T4 lysozyme,
J. Am. Chem. Soc. 1997, 119, 2707–2714.
Protein:ligand standard binding free energies 28
[4] Dixit, S. B.; Chipot, C., Can absolute free energies of association be estimated from molecular
mechanical simulations ? The biotin–streptavidin system revisited, J. Phys. Chem. A 2001, 105,
9795–9799.
[5] Deng, Y.; Roux, B., Calculation of standard binding free energies: Aromatic molecules in the T4
lysozyme L99A mutant, J. Chem. Theor. Comp. 2006, 2, 1255–1273.
[6] Wang, J.; Deng, Y.; Roux, B., Absolute binding free energy calculations using molecular dynamics
simulations with restraining potentials, Biophys. J. 2006, 91, 2798–2814.
[7] Pisabarro, M. T.; Serrano, L., Rational design of specific high-affinity peptide ligands for Abl-SH3
domain, Biochemistry 1996, 35, 10634–10640.
[8] Pisabarro, M. T.; Serrano, L.; Wilmanns, M., Crystal structure of the abl-SH3 domain complexed
with a designed high-affinity peptide ligand: implications for SH3-ligand interactions., J. Mol. Biol.
1998, 281, 513–521.
[9] Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, L.;
Schulten, K., Scalable molecular dynamics with NAMD, J. Comput. Chem. 2005, 26, 1781–1802.
[10] Landau, L. D., Statistical physics, The Clarendon Press: Oxford, 1938.
[11] Zwanzig, R. W., High–temperature equation of state by a perturbation method. I. Nonpolar gases,
J. Chem. Phys. 1954, 22, 1420–1426.
[12] Darve, E.; Pohorille, A., Calculating free energies using average force, J. Chem. Phys. 2001, 115,
9169–9183.
[13] Henin, J.; Chipot, C., Overcoming free energy barriers using unconstrained molecular dynamics
simulations, J. Chem. Phys. 2004, 121, 2904–2914.
[14] Henin, J.; Fiorin, G.; Chipot, C.; Klein, M. L., Exploring multidimensional free energy landscapes
using time-dependent biases on collective variables, J. Chem. Theor. Comput. 2010, 6, 35–47.
[15] Fiorin, G.; Klein, M. L.; Henin, J., Using collective variables to drive molecular dynamics simula-
tions, Mol. Phys. 2013.
[16] Shoup, D.; Szabo, A., Role of diffusion in ligand binding to macromolecules and cell–bound re-
ceptors, Biophys. J. 1982, 40, 33–39.
[17] Chipot, C.; Pohorille, A., Eds., Free energy calculations. Theory and applications in chemistry and
biology, Springer Verlag, 2007.
[18] Hermans, J.; Shankar, S., The free energy of xenon binding to myoglobin from molecular–dynamics
simulation, Isr. J. Chem. 1986, 27, 225–227.
Protein:ligand standard binding free energies 29
[19] Roux, B.; Nina, M.; Pomes, R.; Smith, J. C., Thermodynamic stability of water molecules in
the Bacteriorhodopsin proton channel: A molecular dynamics and free energy perturbation study,
Biophys. J. 1996, 71, 670–681.
[20] Gilson, M. K.; Given, J. A..; Bush, B. L.; McCammon, J. A., The statistical–thermodynamic basis
for computation of binding affinities: A critical review, Biophys. J. 1997, 72, 1047–1069.
[21] Humphrey, W.; Dalke, A.; Schulten, K., VMD — Visual molecular dynamics, J. Molec. Graphics
1996, 14, 33–38.
[22] Roux, B., The calculation of the potential of mean force using computer simulations, Comput. Phys.
Comm. 1995, 91, 275–282.
[23] Kumar, S.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A.; Rosenberg, J. M., The weighted his-
togram analysis method for free energy calculations on biomolecules. I. The method, J. Comput.
Chem. 1992, 13, 1011–1021.
[24] Sugita, Y.; Okamoto, Y., Replica–exchange molecular dynamics method for protein folding, Chem.
Phys. Lett. 1999, 314, 141–151.
[25] Sugita, Y.; Kitao, A.; Okamoto, Y., Multidimensional replica–exchange method for free–energy
calculations, J. Chem. Phys. 2000, 113, 6042–6051.
[26] Comer, J.; Gumbart, J. C.; Henin, J.; Lelievre, T.; Pohorille, A.; Chipot, C., The adaptive biasing
force method: Everything you always wanted to know, but were afraid to ask, J. Phys. Chem. B
2015, 119, 1129–1151.
[27] Fu, H.; Shao, X.; Chipot, C.; Cai, W., Extended adaptive biasing force algorithm. An on–the–fly
implementation for accurate free–energy calculations, J. Chem. Theory Comput. 2016, 12, 3506–
3513.
[28] Lesage, Adrien; LeliA¨vre, Tony; Stoltz, Gabriel; Henin, Jerome, Smoothed biasing forces yield
unbiased free energies with the extended-system adaptive biasing force method., J. Phys. Chem. B
2017, 121, 3676–3685.
[29] Lelievre, T.; Stoltz, G.; Rousset, M., Free energy computations: A mathematical perspective, Im-
perial College Press, 2010.
[30] Bennett, C. H., Efficient estimation of free energy differences from Monte Carlo data, J. Comp.
Phys. 1976, 22, 245–268.
[31] Pohorille, A.; Jarzynski, C.; Chipot, C., Good practices in free-energy calculations, J. Phys. Chem.
B 2010, 114, 10235–10253.
Protein:ligand standard binding free energies 30
[32] Zheng, L.; Yang, W., Practically efficient and robust free energy calculations: Double-integration
orthogonal space tempering, J. Chem. Theor. Compt. 2012, 8, 810–823.
[33] Kastner, Johannes; Thiel, Walter, Bridging the gap between thermodynamic integration and um-
brella sampling provides a novel analysis method:?Umbrella integration?, The Journal of chemical
physics 2005, 123, 144104.
[34] Liu, P.; Dehez, F.; Cai, W.; Chipot, C., A toolkit for the analysis of free-energy perturbation calcu-
lations, J. Chem. Theor. Comput. 2012, 8, 2606–2616.
[35] Woods, C. J.; Essex, J. W.; King, M. A., The development of replica-exchange-based free-energy
methods, J. Phys. Chem. B 2003, 107, 13703–13710.
[36] Jiang, W.; Hodoscek, M.; Roux, B., Computation of absolute hydration and binding free energy
with free energy perturbation distributed replica-exchange molecular dynamics , J. Chem. Theory
Comput. 2009, 5, 2583–2588.
[37] Minoukadeh, K.; Chipot, C.; Lelievre, T., Potential of mean force calculations: A multiple-walker
adaptive biasing force approach, J. Chem. Theor. Comput. 2010, 6, 1008–1017.
[38] Jiang, W.; Roux, B., Free energy perturbation Hamiltonian replica-exchange molecular dynamics
(FEP/H-REMD) for absolute ligand binding free energy calculations, J. Chem. Theory Comput.
2010, 6, 2559–2565.
[39] Kofke, D.A.; Cummings, P.T., Precision and accuracy of staged free-energy perturbation methods
for computing the chemical potential by molecular simulation, Fluid Phase Equil. 1998, 150, 41–
49.
[40] Straatsma, T. P.; Berendsen, H. J. C.; Stam, A. J., Estimation of statistical errors in molecular
simulation calculations, Mol. Phys. 1986, 57, 89–95.
[41] Flyvbjerg, H.; Petersen, H. G., Error estimates on averages of correlated data, J. Chem. Phys. 1989,
91, 461–466.
[42] Rodriguez-Gomez, D.; Darve, E.; Pohorille, A., Assessing the efficiency of free energy calculation
methods, J. Chem. Phys. 2004, 120, 3563–3578.
[43] Hou, T.; Chen, K.; McLaughlin, W. A.; Lu, B.; Wang, W., Computational analysis and prediction
of the binding motif and protein interacting partners of the Abl SH3 domain., PLoS Comput. Biol.
2006, 2, e1.