a web-based computational tool for combinatorial library design that simultaneously optimises...
TRANSCRIPT
A Web-Based Computational Tool for Combinatorial Library Design
that Simultaneously Optimises Multiple Properties
Weifan Zheng, Sunny T. Hung,
Joel T. Saunders, Stephen R. Johnson, George L. Seibel
A short paper:http://www-smi.stanford.edu/projects/helix/psb-online/
Outline
• Library Design - Problem Definition• Criteria in Early Computational Techniques• Important Developability Parameters• Multifactorial Nature of Library Design • PICCOLO
– Optimisation Protocol
– Individual Penalty Terms and Their Definition
– Snapshots of the Intranet-Based System
• Conclusions
Library Design - Problem Definition
10 x 10 => 5 x 5
R1
R2
5x5 full combination
?
Criteria Used in Early Computational Design Techniques
• Diverse Design: – diversity analysis and void-filling
• Targeted Design:– similarity to leads– docking to a binding site– predicted activity using QSAR/QualSAR
models– Pphore models
Failure of Compounds in Development
• Poor biopharmaceutical properties, 39%
• Lack of efficacy, 29%
• Toxicity, 21%
• Market reasons, 6%
- Venkatesh & Lipper, J. Pharm. Sci. 89, 145-154 (2000)
“an efficacious but non-absorbed agent is no better than a well absorbed but in-efficacious one”
- Curatolo W. Pharm Sci Tech Today 1, 387 (1998)
Developability Should Be Considered in Library Design
To avoid serious ADME liabilities as early as possible in the drug discovery process
• Empirical rules– Lipinski rules of 5 (MW, clogP, #HD, #HA)
• Drug-likeness– Ajay & Murcko (JMC, 1998, 41, 3314-3324) – Sadowski & Kubinyi (JMC, 1998 , 41, 3325-3329)
Some Fundamental Properties Contributing to Pharmacokinetics (PK)
• Aqueous solubility
• Membrane passive permeability
• Cytochrome P450 activities
• Plasma protein binding
• Efflux pumping and active transport
• ...
Factors That Are Optimised Similarity to leads Reagent diversity/coverage Product novelty with respect to the corporate
compound inventory Lipinski parameters Liabilities against P450 enzymes Aqueous solubility; [Permeability] Molecular flexibility; MS redundancy; reagent price
Penalty Scores
Iteration
Initial Library
Better Library
Optimal Library
Lipinski PropertiesP450 Activity
Diversity
PICCOLO: reagent PICking by COmbinatorial Library
Optimisation
R1 R2
R1
R2
R1
R2
R1
R2
The Size of the Solution Space is Huge
50 Amines + 50 carboxylic acids
• Total number of compounds
50 x 50 = 2500
• Total number of solutions for an 8 x 12 library
50!/(8!42!) * 50!/(12!38!) = 6.52 x 1019
Randomly Pick5x5
EnumerateCalc penalty scores for
the trial solution &save scores
Metropoliscriteria?
Reject trialsolution
ReagentPool
Swap aFraction ofReagents
N
Y
Stochastic Optimisation to Sample the Solution Space
Save the trialsolution
Perturbation Scheme
• Which R-group to perturb – bias toward the R-groups that need more
sampling
• Which new reagent to pick– uniform sampling by cycling through the
selected R-group list
• Which old reagent to kick out– randomly chosen
Total Penalty Score is the Weighted Sum of Individual Penalty Terms
)()( SEwSE ii
Similarity to Leads
• Esim(S) = Daylight Tanimoto “distances” between all the compounds in a given library and the lead, averaged over the size of the library
• In case of multiple leads, the Tanimoto distance between a compound and the leads is defined as the nearest neighbour distance
Reagent Diversity: S-Optimal Criterion
• Esdiv (S) = Reverse S optimal scores for all R-groups averaged over the number of R-groups
Sopt
d
N1
y,D - y
D
y D
D: a set of design points (i.e., the selected reagents)
d(x, A): minimum TD between point x and set of points A
Product Novelty with Respect to Corporate Collection
• All S.B. compounds were mapped onto a 6D cell space (PCA, or formed by selected features to distinguish biological activities)
• Epn (S) = the smoothed average number of S.B. compounds in the neighbouring cells
Developability Penalty Scores
• Lipinski Parameters– MW < = 500– ClogP: -1 to 5– NHD <= 5– NHA <= 10
• P450s - non-inhibitory predicted by the P450 classifiers
• Solubility - should be higher than a limit
Each penalty term is the percentageof library compounds that violatethe limits for each term
P450 Classifiers and Solubility Predictor
• P450s: 2d6, 3a4, 1a2, 2c9– dataset(2d6): Active: ~3500; Inactive: ~4000
– method: 3 layer ANN
– FP: 20%; FN: 10%; Ambiguous - 12 - 18%
• Solubility– N = ~550
– 3 layer ANN
– rms error ~1.0 log unit
Logon page
Experiment list
New Experiment Page
Spreadsheet Page
Structure Show
MW/ClogP
MW/ClogP
Conclusions
• PICCOLO is an in-house library design system that can simultaneously optimise all the factors we care about
• Important developability parameters are taken into account
• Expandable to include other criteria
• A Web based system being used by SB chemists worldwide
Acknowledgements
Colleagues in Cheminformatics DepartmentKen KoppleJie Liang (now at Univ. Illinois at Chicago)
Medicinal Chemists Todd Graybill, Jian Jin , Ronggang Liu, Tom Ku, Dennis Yamashita, Scott Thompson, Jia-Ning Xiang