molecular design: one step back and two paths forward
DESCRIPTION
I presented this at the RACI Biomolecular on the Beach conference in December 2011. A correlation inflation teaser followed by alkane/water logP and SAR/SPR based on relationships between structures. The photograph in the title slide was taken in Asunción.TRANSCRIPT
Molecular Design: One step back and two paths forward
Peter W Kenny ([email protected])
Some things that are hurting Pharma
• Having to exploit targets that are less well-linked to
human disease
• Inability to predict idiosyncratic toxicity
• Inability to measure free (unbound) physiological
concentrations of drug for remote targets (e.g.
intracellular or within blood brain barrier)
Dans la merde: http://fbdd-lit.blogspot.com/2011/09/dans-la-merde.html
Keep an eye out for creative data analysis
Add Normally-distributed noise
Data set A Data set B
Points plotted at
constant increment
Equal numbers of points
for each value of x
Preparation of data sets
r2 = 0.99
RMSE = 0.36
Data set A: Fit median value of Y to X
An example of this approach to plotting data can be seen in Leeson & Springthorpe, The influence of
drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 2007, 7, 881-890.
Low Medium High
Data set B: Use value of X to split into three equally-sized groups
and show mean and associated confidence interval for each
An example of this approach to analysing data can be seen in: Gleeson, Generation of a
Set of Simple, Interpretable ADMET Rules of Thumb. J. Med. Chem. 2008, 51, 817-834.
What data set A really looks like
Fit to original data
N=11000; r2 = 0.09 ; RMSE = 9.95
Fit to transformed data
N=11; r2 = 0.99 ; RMSE = 0.36
Percentile plot (see Colclough et al
BMC 2008, 16, 6611-6616)
90%
75%
50%
25%
10%
Residual plot for fit to original data
Fit to original data
N=10000; r2 = 0.08 ; RMSE = 10.0)
Residual plot for fit to original data
Low Medium High
What data set B really looks like
Mean values of Y and (barely visible)
confidence intervals shown with
standard deviations
x
Octanol was the first mistake...
Lipophilic & half ionised Hydrophilic
Introduction to partition coefficients
Polarity
NClogP ≤ 5 Acc ≤ 10; Don ≤5
An alternative view of the Rule of 5
Does octanol/water ‘see’ hydrogen bond donors?
--0.06 -0.23 -0.24
--1.01 -0.66
Sangster lab database of octanol/water partition coefficients: http://logkow.cisti.nrc.ca/logkow/index.jsp
--1.05
Octanol/Water Alkane/Water
Octanol/water is not the only partitioning system
logPoct = 2.1
logPalk = 1.9
DlogP = 0.2
logPoct = 1.5
logPalk = -0.8
DlogP = 2.3
logPoct = 2.5
logPalk = -1.8
DlogP = 4.3
Differences in octanol/water and alkane/water logP values
reflect hydrogen bonding between solute and octanol
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
DlogP = 0.5
PSA/ Å2 = 48
Polar Surface Area is not predictive of
hydrogen bond strength
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
DlogP = 4.3
PSA/ Å2 = 22
1.0 1.1 0.8 1.3 1.7
0.8 1.5
Measured values of DlogP
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
1.6 1.1
DlogP
(corrected)
Vmin/(Hartree/electron)
DlogP
(corrected)
Vmin/(Hartree/electron)
N or ether OCarbonyl O
Prediction of contribution of acceptors to DlogP
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
DlogP = DlogP0 x exp(-kVmin)
logPoct = 0.89
predicted logPalk = -4.2
PSA/Å2 = 53
logPoct = 1.58
predicted logPalk = -1.4
PSA/Å2 = 65
Lipophilicity/polarity of Morphine & Heroin
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
logPhxdlogPoct
log
(Cbra
in/C
blo
od)
DlogP
Prediction of blood/brain partitioning
R2 = 0.66
RMSE = 0.54R2 = 0.82
RMSE = 0.39
R2 = 0.88
RMSE = 0.32
Toulmin et al, J. Med. Chem. 2008, 51, 3720-3730
Difficulties in measuring logPalk:
Many compounds poorly soluble in alkanes
Self-association masks polarity
Alkane/water partition coefficients: Where next?
General access to logPalk
likely to require predictive
models for some time
Carefully measure logPalk
for structurally diverse
compounds
Solvation models: logPalk
easier to measure than
ΔG(gaq)
Another way to look at SAR
(Descriptor-based) QSAR/QSPR:
Some questions
• How valid is methodology (especially for validation)
when distribution of compounds in training/test space
is highly non-uniform?
• Are models predicting activity or locating neighbours?
• Are ‘global’ models ensembles of local models?
• How well do the methods handle ‘activity cliffs’?
• How should we account for sizes of descriptor pools
when comparing models?
Measures of Diversity & Coverage
•• •
•
••
•
•
•
••
•
••
•
2-Dimensional representation of chemical space is used here to illustrate concepts of diversity
and coverage. Stars indicate compounds selected to sample this region of chemical space.
In this representation, similar compounds are close together
Neighborhoods and library design
Examples of relationships between structures
Tanimoto coefficient (foyfi) for structures is 0.90
Ester is methyl-substituted acid Amides are ‘reversed’
Leatherface molecular editor
From chain saw to Matched Molecular Pairs
c-[A;!R]
bnd 1 2
c-Br
cul 2
hyd 1 1
[nX2]1c([OH])cccc1
hyd 1 1
hyd 3 -1
bnd 2 3 2
Kenny & Sadowski Structure modification in chemical databases, Methods and Principles in Medicinal
Chemistry (Chemoinformatics in Drug Discovery 2005, 23, 271-285.
Glycogen Phosphorylase inhibitors:
Series comparison
DpIC50
DlogFu
DlogS
0.38 (0.06)
-0.30 (0.06)
-0.29 (0.13)
DpIC50
DlogFu
DlogS
0.21 (0.06)
0.13 (0.04)
0.20 (0.09)
DpIC50
DlogFu
DlogS
0.29 (0.07)
-0.42 (0.08)
-0.62 (0.13)
Standard errors in mean values shown in parenthesis; see Birch et al, BMCL 2009, 19, 850-853
Effect of bioisosteric replacement
on plasma protein binding
?
Date of Analysis N DlogFu SE SD %increase
2003 7 -0.64 0.09 0.23 0
2008 12 -0.60 0.06 0.20 0
Mining PPB database for carboxylate/tetrazole pairs suggested that bioisosteric
replacement would lead to decrease in Fu so tetrazoles not synthesised.
Birch et al, BMCL 2009, 19, 850-853
Amide N DlogS SE SD %Increase
Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76
Cyclic 9 0.18 0.15 0.47 44
Benzanilides 9 1.49 0.25 0.76 100
Effect of amide N-methylation on aqueous solubility
is dependent on substructural context
Birch et al, BMCL 2009, 19, 850-853
Relationships between structures
Discover new
bioisosteresPrediction of activity
& properties
Recognise
extreme data
Direct prediction
(e.g. look up
substituent effects)
Indirect prediction
(e.g. apply correction
to existing model)
Bad measurement
or interesting effect?
Conclusions
• Data can be massaged and correlations can
be enhanced but it won’t extract us from ‘la
merde’
• There is life beyond octanol/water if we
choose to look for it
• Even molecules can have meaningful
relationships
Selected references
• Seiler (1974) Interconversion of lipophilicities from hydrocarbon/water systems into the octanol/water
system. Eur. J. Med. Chem. 9, 473–479.
• Toulmin, Wood & Kenny (2008) Toward Prediction of Alkane/Water Partition Coefficients. J. Med. Chem.
51, 3720-3730. http://dx.doi.org/10.1021/jm701549s
• Kenny & Sadowskii (2005) Structure modification in chemical databases. Methods and Principles in
Medicinal Chemistry 23(Chemoinformatics in Drug Discovery), 271-285
http://dx.doi.org/10.1002/3527603743.ch11
• Leach et al (2006) Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a
Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure,. J. Med. Chem. 49, 6672-6682.
http://dx.doi.org/10.1021/jm0605233
• Birch et al (2009) Matched molecular pair analysis of activity and properties of glycogen phosphorylase
inhibitors. Bioorg. Med. Chem. Lett. 19, 850-853. http://dx.doi.org/10.1016/j.bmcl.2008.12.003
• Wassermann, Wawer & Bajorath (2010) Activity Landscape Representations for Structure−Activity
Relationship Analysis. J. Med. Chem. 53, 8209-8223. http://dx.doi.org/10.1021/jm100933w
Alkane/water partition coefficents
Relationships between structures