tam june 2009
TRANSCRIPT
Prediction of Animal Clearance Using Naïve
Bayesian Classification and Extended Connectivity
FingerprintsTimothy A McIntyre
Introduction
• The pharmaceutical industry faces unprecedented pressure from payers, regulators, ethicists and the general public.
• The cost of drug development is staggering: US $1 Billion.
• There is increasing demand for new medicines with well established safety and efficacy.
• Attrition during drug discovery and development remains high.
• Absorption, distribution, metabolism and excretion (ADME) screening has significantly reduced attrition due to poor pharmacokinetics in humans.
Pharmacokinetics and Clearance
• What is Pharmacokinetics (PK)?
– The study of what the body does to a drug
– Characterization of the drug concentration-time profile
• Drug Clearance (CL) is a primary determinant of drug PK.
– Rate of Elimination = CL x Drug Concentration.
– Measures the efficiency of irreversible elimination from the body.
– Often a result of metabolism by the liver.
• High CL limits systemic exposure and oral bioavailability.
Experimental ADME and lead optimization
• In vitro metabolic stability (intrinsic CL in liver microsomes) is commonly used to prioritize compounds for in vivo studies.
• Animal studies are more resource intensive, more expensive, lower throughput.
• Rodent PK evaluated prior to higher species using smaller amounts of compound.
• Appropriate PK essential for subsequent evaluation in resource intensive pharmacology or toxicology models.
• Animal PK is used to predict PK in humans and set appropriate starting doses for initial clinical trials.
In Silico ADME – Background of our approach
• Potential to provide unlimited information at low cost.
– Often based on descriptive physico-chemical properties.
• Limited precedence for modeling animal CL directly using detailed structural information.
• Our approach: Bayesian classification and extended connectivity fingerprints.
• Compared model performance to common experimental approaches.
– in vitro liver microsome intrinsic CL (CLi)
– animal CL in a lower species
Experimental Data
• Mouse, rat, dog and monkey in vivo CL and in vitro CLi.
• GSK corporate database (20,000 unique compounds).
• Animal CL normalized to liver blood flow.
– 30 to 90 mL/min/kg depending on the species.
• CL >70% liver blood flow considered high (<70%, low).
• CLi >5 mL/min/g considered high (<5 mL/min/g, low).
– Mid-point of the assay dynamic range (0.5 to 50 mL/min/g tissue).
Bayesian Modeling
• Pipeline Pilot™ (Accelrys, Inc., San Diego, CA, USA).
• Extended connectivity fingerprints (six bond diameter) using simplified molecular input line entry specification (SMILES) strings as input.
• Compounds randomly assigned to training or test sets in a 5:1 ratio.
• Model predictions for each test set (high/low CL) compared to experimental data for each test set.
– Accuracy (ACC), Positive Predictive Value (PPV), Negative Predictive Value (NPV), True Positive Rate (TPR), False Positive Rate (FPR), Receiver Operating Characteristic (ROC) AUC
– 90% confidence intervals, p values
Descriptive Statistics: Experimental CL and CLi
Summary of animal in vivo clearance and in vitro intrinsic clearance dataa Mouse Rat Monkey Dog Mouse Rat Monkey Dog
Statistic CL CL CL CL CLi CLi CLi CLi
N 1369 17529 1129 2690 3021 42470 3600 3700
Q1 0.28 0.33 0.20 0.31 1.25 1.10 2.02 0.86
Q2 0.60 0.68 0.38 0.65 4.78 3.50 7.20 2.00
Q3 0.92 1.21 0.63 1.17 21.74 11.96 22.00 7.20
aCL values are expressed as a percentage of liver blood flow; CLi units are mL/min/g liver. Q1, Q2 (median value) and Q3 are the first second and third quartiles.
Summary of animal in vivo clearance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.5 1 1.5
Normalized CL
Per
cen
tile
Mouse CL, N = 1369
Rat CL, N = 17529
Dog CL, N = 2690
Monkey CL, N = 1129
Chemical Diversity
• 20,000 unique compounds representing hundreds of lead optimization programs.
• Self-similarity tests, ring analysis, Murcko assemblies.
• Demonstrate substantial structural diversity.
• The top 20 rings accounted for 45-51% of the compounds depending on the species
• Median frequency of any particular Murcko assembly less then 0.1%
– Corresponds to ~18 compounds with Rat CL sharing the same assembly
Benzimidazole Quinoline pyrido-pyrimidine
Near Neighbors
(0)
(1-2)
(3-4)
(5-9)
(>= 10)
Rat CL
(0)
(1-2)(3-4)
(5-9)
(>= 10)
Mouse CL
(0)
(1-2)(3-4)
(5-9)
(>= 10)Dog CL
(0)
(1-2)
(3-4)
(5-9)
(>= 10)Monkey CL
Dog Model
(0.70 – 0.79)
(0.55 – 0.66)
(0.89 – 0.94)
(0.67 – 0.80)
(0.67 – 0.74)
(0.68 – 0.74)0.750.60.910.740.710.71610Dog CLi
(0.62 – 0.75)
(0.35 – 0.53)
(0.65 – 0.78)
(0.47 – 0.65)
(0.65 – 0.78)
(0.60 – 0.71)0.680.440.720.560.720.65202Mouse
CL
(0.71 – 0.76)
(0.38 – 0.45)
(0.75 – 0.79)
(0.54 – 0.61)
(0.75 – 0.80)
(0.68 – 0.72)0.740.420.770.580.770.71417Rat CL
(0.78 – 0.85)
(0.27 – 0.37)
(0.79 – 0.87)
(0.76 – 0.85)
(0.68 – 0.77)
(0.72 – 0.79)0.810.320.830.80.720.76490Dog
Model
ROC AUC
FPRTPR NPVPPVACCNPredictor
Summary of Performance Diagnostics for Methods Useda
aThe upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses.
FPR Comparisons
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Dog M
odel
Rat C
L
Dog C
Li
Rat M
odel
Mou
se C
L
Rat C
Li
Mou
se M
odel
Mou
se C
Li
FP
R
p = 0.012 p = 0.008
p < 0.001 p < 0.001 p = 0.002
ROC AUC Comparisons
0.50
0.60
0.70
0.80
0.90
Dog M
odel
Rat C
L
Mou
se C
L
Rat M
odel
Rat C
Li
Mou
se M
odel
Mou
se C
Li
Mon
key M
odel
Mon
key C
Li
RO
C A
UC
p < 0.001 p = 0.008 p = 0.007
p = 0.002
p = 0.003
NPV Comparisons
0.4
0.5
0.6
0.7
0.8
0.9
Dog M
odel
Rat C
L
Mou
se C
L
Rat M
odel
Rat C
Li
NP
V
p < 0.001 p < 0.001
p < 0.001
Effect of Optimization on Rat CL
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
Normalized CL (CL/Q)
Pe
rcen
tile
Rat
Rat with Dog
Rat with Monkey
Key Messages• Bayesian model performance exceptional.
• ROC AUC, ACC, PPV, NPV and TPR ranging from 0.72 to 0.82.
• In predicting dog CL, the Bayesian model was better than experimental rat or mouse CL.
• In predicting rat CL, the Bayesian model performed just as well as mouse CL.
• Models outperformed mouse, rat and monkey CLi for predicting mouse, rat and monkey CL, respectively.
• Models have higher negative predictive value (compounds with high experimental CL have high Bayesian CL).
• Lead optimization bias can affect modeling success (monkey).
Conclusions
• Study demonstrates the potential of naïve Bayesian classification to predict animal CL based on structural fingerprints.
• Models can be used to
– optimize chemical libraries
– direct new chemical synthesis
– increase the efficiency of screening cascades
• Significant potential to reduce cost, time and animal usage associated with the discovery of new medicines.
Acknowledgements
• GSK Drug discovery scientists
• Charles B Davis
• Robert Gagnon
• Amber Anderson
• Chao Han
• John Conway (Accelrys)
• Keith Ward (Bausch & Lomb)
Backup Slides
Near Neighbors
Distribution of Near Neighbors: Animal CL and CLia
NN Mouse Rat Monkey Dog
CL N = 1,369 N = 17,529 N = 1,129 N = 2,690
0 32.4 22.4 34.4 25.8
1-2 26.2 23.4 27.7 24.0
3-9 26.8 29.0 27.0 26.7
≥10 14.6 25.2 11.0 23.5
CLi N = 3,691 N = 43,118 N = 3,634 N = 3,732
0 23.7 15.4 21.8 23.7
1-2 26.4 19.2 23.6 26.2
3-9 26.2 29.5 25.6 26.5
≥10 23.8 35.9 29.0 23.6 aPercentage of compounds in various NN (near neighbor) bins. Tantimoto distance <0.15.
Rat ModelSummary of Performance Diagnostics for Methods Useda
aThe upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses.
(0.64 – 0.67)
(0.57 – 0.60)
(0.78 – 0.80)
(0.64 – 0.68)
(0.57 – 0.59)
(0.60 – 0.62)
0.650.580.790.660.580.615947Rat Cli
(0.76 – 0.84)
(0.35 – 0.48)
(0.80 – 0.88)
(0.66 – 0.78)
(0.70 – 0.79)
(0.70 – 0.77)
0.80.420.840.720.740.74409Mouse CL
(0.80 – 0.83)
(0.29 – 0.33)
(0.77 – 0.81)
(0.74 – 0.78)
(0.71 – 0.75)
(0.73 – 0.75)
0.820.310.790.760.730.743077Rat Model
ROC AUCFPRTPR NPVPPVACCNPredictor
Monkey Model
(0.66 – 0.73)
(0.24 – 0.37)
(0.53 – 0.62)
(0.30 – 0.40)
(0.81 – 0.89)
(0.57 – 0.64)0.690.310.580.350.850.60486Monkey
CLi
(0.69 – 0.77)
(0.37 – 0.52)
(0.67 – 0.74)
(0.30 – 0.42)
(0.81 – 0.87)
(0.64 – 0.71)0.730.440.710.360.840.67569Dog CL
(0.71 – 0.77)
(0.29 – 0.41)
(0.68 – 0.74)
(0.31 – 0.40)
(0.87 – 0.92)
(0.67 – 0.73)0.740.350.710.350.900.7835Rat CL
(0.75 – 0.87)
(0.17 – 0.41)
(0.72 – 0.83)
(0.31 – 0.52)
(0.88 – 0.96)
(0.71 – 0.81)0.810.290.770.420.920.76206Monkey
Model
ROC AUC
FPRTPR NPVPPVACCNPredictor
Summary of Performance Diagnostics for Methods Useda
aThe upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses.