prediction model for absorptivity of drug-like compounds...
TRANSCRIPT
Prediction Model for Absorptivity of Drug-like Compounds
Based on Structural Features and Interfacial Properties
Chihae Yang, Glenn Myatt, Paul BlowerLeadScope, Inc.
Jim RathmanThe Ohio State University
Objectives
§ Compound Description by Structural Features– Selection of features
§ Compound Description by Physical Properties– Molecular parameters– Interfacial parameters
§ Prediction Models– Structure feature based– Property based– Structure feature and property based
Develop Prediction Model for Absorptivity:
Structural Description of Dataset
143,4 ring system
21Pyridine (partially saturated)
65,6 or 6,6-fused rings
11SteroidsNatural Products
2Naphthalenes
9Pyrrolidone
375 membered ring
70N-containing heterocyclesHeterocycles
# of compoundsSub classificationClasses
L.G. Martini, et.al, European journal of pharmaceutics and biopharmaceutics 48 (1999) 259-263K. Palm, K.Lutman, et. al., J. Med. Chem, 1998, 41, 5382-5392W.L. Chiou, Pharmaceutical Research, Vol 17, No 2, 135-140, 2000P. Stenberg; U. Norinder, et. al., J. Med. Chem, 2001 44, 1927-1937
Total compounds ~100
Structural Description of Dataset
37Ether
15Sulfide
38Halide
2Quinone
18Carboxylate and carboxylic acid
91Amines
56AlcoholFunctional Group
80Any 1-substitution
501,4 substitution
451,3 substitution
551,2 substitutionBenzenes
4Bases, nucleosides
14Amino acids
# of compoundsSub classificationClasses
Distribution of % Fraction Absorbed Data
Fraction Absorbed after oral administration to humans*
* P. Stenberg; U. Norinder, et. al., J. Med. Chem, 2001 44, 1927-1937K. Palm et. al, Pharm Research 1997, 14, 568-571
K. Palm, K.Lutman, et. al., J. Med. Chem, 1998, 41, 5382-5392
Clustering of Compounds Against %FA
90 – 99 %
71-89 %
100 % (removed from model)
0 –10 %
25-44 %
50-70 %
Factors Affecting Absorption
§ Physical Properties: - Solubility- Dissolution rate- Molecular size- Partition coefficient
§ Physiological Properties:- Regional pH- Intestinal Permeability
§ Not considered:- Active transport, binding, complexation, etc.- Pericellular- Metabolism- Gastric and intestinal transit
Compound Description By Physical Properties
§ Molecular weight
§ Hydrogen bond acceptors and donors
§ Log P
§ Log DCalculated at pH 1, pH 4, pH 7, pH8
§ pKa and solubility (at pH 1, 4, 7, 8)
§ Polar surface area
§ Thermodynamic solution/interfacial property- Activity coefficients at infinite dilution
Property Distributions of Dataset
molecular weight rotatable bonds Hydrogen bond acceptors Hydrogen bond donors
polar surface area aLogP
Relationships (or Lack of) between FA and Properties
Prediction Based on Properties using NIPALS*
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
actual_FA
pre
dic
ted
_FA
Properties: MW, HBD, HBA, PSA, LogP, Log D(@ pH 1,4,7,8), solubility (pH 1,4, 7), pKa
Compounds: 93 compounds ranging %FA from 0 -100
R2 =0.40 R2 =0.49 (if 100 % absorption is excluded)
Nonlinear iterative partial least squares algorithm from Geladi and Kowalski, Analytica Chimica Acta, 185 (1986) 1-17.
Prediction Based on Structural Features
§ Selection of representative features from the dataset– global – local neighbors
§ Scoring or extraction criteria § Reduction of dimensionality
Feature Selection by Scoring Criteria
§ Method 1: Scoring of features (select 25 from ∼1600)- Coverage atoms - maximize- Partition of compound set
• prioritize features to partition the compound set to ~50:50- Complementarity of features
• minimize the overlap between features
§ Method 2: Extraction from principal components- diagnostic: influence function*
§ Compared features from method 1& 2 for selection
§ Used 25 feature counts per compound as fingerprint values
* Brooks, S.P. The Statistician (1994), 43, 483-494* Pack, P.; Jolliffe, I.T.; Morgan, B.J.T. Journal of Applied Statistics (1988), 15, 39-52.
Selected Features According to CriteriaCounts in the data setFeatures
14[benzene, 1-amino-] + [benzene, 1-amino-]
27
21
17
39
58
33
39
20
11
31
37
51
33
22
21
48
47
36
31
40
[amine, alkyl, acyc-] + [tert-amine, p-alkyl-]
[alcohol] + [ether, p-alkyl-]
[benzene, 1-(alkyl, cyc)-] + [benzene, 1-(alkyl, cyc)-]
[carbonyl, alkyl, acyc-] + [carboxamide]
[carbonyl] + [methane, 1-aryl-,1-carbonyl-]
[amine, alkyl, cyc-] + [carboxamide(NHR), alkyl-]
carboxamide
benzene, 1-chloro-
amine(NR), diphenyl
benzene, 1-(alkyl, acyc)-
ether
alcohol, alkyl-
benzene, 1-oxy-
benzene, 1-(alkyl, cyc)-
pyridine(H)
tert-amine
amine, alkyl, cyc-
alkene
benzene, 1-amino-
carbonyl, alkyl, acyc-
Counts in the data setFeatures
14[benzene, 1-amino-] + [benzene, 1-amino-]
27
21
17
39
58
33
39
20
11
31
37
51
33
22
21
48
47
36
31
40
[amine, alkyl, acyc-] + [tert-amine, p-alkyl-]
[alcohol] + [ether, p-alkyl-]
[benzene, 1-(alkyl, cyc)-] + [benzene, 1-(alkyl, cyc)-]
[carbonyl, alkyl, acyc-] + [carboxamide]
[carbonyl] + [methane, 1-aryl-,1-carbonyl-]
[amine, alkyl, cyc-] + [carboxamide(NHR), alkyl-]
carboxamide
benzene, 1-chloro-
amine(NR), diphenyl
benzene, 1-(alkyl, acyc)-
ether
alcohol, alkyl-
benzene, 1-oxy-
benzene, 1-(alkyl, cyc)-
pyridine(H)
tert-amine
amine, alkyl, cyc-
alkene
benzene, 1-amino-
carbonyl, alkyl, acyc-
Fingerprint Table: Features and Counts
TemplateName Acebutolol Acetazolamine Alprazolam Amiodarone Amitriptylline .......carbonyl, alkyl, acyc- 2 1 0 0 0benzene, 1-alkylamino- 0 0 0 0 0benzene, 1-amino- 1 0 1 0 0alkene 0 0 0 0 1amine, alkyl, cyc- 0 0 0 0 0tert-amine 0 0 0 1 1benzene, 1-alkoxy- 1 0 0 1 0benzene, 1-(alkyl, cyc)- 0 0 0 0 2benzene, 1-oxy- 1 0 0 1 0alcohol 1 0 0 0 0alcohol, alkyl- 1 0 0 0 0ether 1 0 0 1 0benzene, 1-(alkyl, acyc)- 0 0 0 0 0amine(NR), diphenyl 0 0 0 0 0benzene, 1-chloro- 0 0 1 0 0carboxamide 1 1 0 0 0[tert-amine] + [pyridine(H)] 0 0 0 0 0.....
Introduction of Solution/Interfacial Properties
§ Factors important for passive diffusion through lipid bilayer- Headgroup interaction- Hydrophobic tail interaction- Hydrophilic to lipophilic balance (HLB)
§ Partition model of drug molecules in lipid layer :
lipid Drug Drug
at equilibrium
partition coefficient:
:activity coefficient
bulk
Drug bulk Drug lipid
Drug-lipid Drug bulk
Drug bulk Drug lipid
a a
xK
x
γ
γ
γ
− −
−
− −
⇔
=
≈ =
Partition and Activity Coefficients
Partition coefficient: (in dilute solution)
log log log
bulkDrug
lipidDrug
bulk lipiddrug drug
K
K
γ
γ
γ γ
≈
≈ −
tan tan
tan
tan
Compare with LogP:
(octanol-water)
log log log
Oc ol wateroc olDrug pure Drug
water oc olwaterpureDrug Drug
water oc oldrug drug
C CP
CC
P
γ
γ
γ γ
= ≈ ⋅
∝ −
UNIFAC Activity Coefficient Model
molecular volume and surface area effects(size, shape, packing)
intermolecular energy effects (interaction)
“combinatorial” term
“residual” term
ln ln lnC Ri i iγ γ γ= +
1
ln ln ln2
CC i i ii i i j j
ji i i
zq l x l
x xφ θ φ
γφ =
= + + − ∑
( ) ( )12i i i i
zl r q r= − − −where:
Combinatorial Termln γi
C is calculated using a group contribution approach:
• The drug and solvent molecules are decomposed into simple fragments.
• Volume (r) and surface area (q) parameters are computed for each molecule by summing values for the appropriate fragments.
• At a given mole fraction xi, the fraction of the total volume (φι) and total surface area (θi) due to compound i are calculated.
Residual Termln γi
R is calculated using the same fragments:
• Pairwise interaction terms (Ψmn and Ψ nm) are available for the fragments.
• Ψ values are directly related to intermolecular potentials:
•Ψmn = exp[(unn – umn)/RT] Ψnm = exp[(umm – umn)/RT]
• Although in theory these can be calculated from intermolecular potential functions, in practice they are based on experimental data (from primarily petrochemical and polymer databases).
ln 1 lnθγ θ
θ
Ψ ∝ − Ψ − Ψ
∑ ∑∑R m kmi k m mk
m m n nmn
q
UNIFAC Group Contribution
CH3CH2CHCCH2=CHCH=CHCH2=CC=CArHArCArCH3ArCH2ArCH
OHCH3OHH2OArOHCH3C(O)CH2C(O)CH(O)CH3C(O)OCH2C(O)OHC(O)OCH3OCH2OCH-ORing-CH2O
CH3NH2CH2NH2CHNHCH3NCH2NArNH2C5H5NC5H4N C5H3N CH3CNCH2CNCOOHHCOOH
CH2ClCHClCClCH2Cl2CHCl2CCl2CHCl3CCl3CCl4ArClCH3NO2CH2NO2CHNO2ArNO2
CS2CH3SHCH2SHCF3CF2CF (CH2OH)2FurfuralCl(C=C)Me2SOC(O)N(Me)2C(O)N(Me)CH2C(O)N(CH2)2
The properties of Gases & Liquids, 4th ed., R. Reid, J. Prausnitz, B. Poling, McGraw Hill, 1987
Lipid As A Solvent Phase
POO
O
O
O
OO
NO
O
O
O
O
O
O
O
O
OO
Example of Activity Coefficients in Various Environment
O
O H
0.73Hexadecane
0.12Glycolipid
-0.40Lipid tail
0.05Octanol
5.23Water
Log10 γ∞Solvent
Due to its origin in petrochemical applications, standard UNIFAC tables do not include a few of the basic drug-like fragments present in this preliminary study.
030
6090
0
5
10
15
-4
-1
2
5
-6-303
-505
15
0
20
40
FA
0 20 50 80110
water
0 5 10 15
octanol
-4 -2 0 2 4 6
glycolipid
-6 -4-2 0 2 4
tails
-5 0 5 10 15
hexadecane
0 10203040
Pairwise Correlations of Variables
Model Comparisons
0.670.69Structure features + PSA + HBA
0.34
0.32
other
0.4011 Properties only (LogP, PSA, LogD, pKa, MW, HBA, HBD, etc.)
R2Model
0.72
0.73
0.69
0.70
0.69
0.67
20 factors
0.68Structure feature + activity coefficients + HBA
0.68Structure features + PSA
PSA only1
0.70Structure feature + activity coefficients + PSA
0.66Structure features + HBA
0.66Structure features + activity coefficients
0.65Structural features only
Activity coefficients only1
7 factors
1 By a simple linear regression; all other by nonlinear iterative partial least squares (NIPALS).Order of importance: Features>activity coefficient ≈ PSA >H-bond acceptors
Feature and Interfacial Property Based Prediction Model
0
20
40
60
80
100
pred
icte
d
0 10 20 30 40 50 60 70 80 90 100actual
Model: Structural features, Activity coefficients, PSAMethod: nonlinear iterative partial least squares (NIPALS) with 7 factors
R2 = 0.70
Preliminary Prediction
§ Test set: 5 compounds were randomly selected (one from each cluster of the FA values) and were not included in the model building§ Training set: 66 compounds were used as the training set using
NIPALS method with 7 factors. The model was based on structural features, PSA, 5 activity coefficients
5920Penicillin-G
7170Mianserine
8395Metoprolol
85Doxorubicine
4450Acebutolol
Predicted (%)Actual (%)Drug name
Conclusions
§ Assuming passive diffusion to be the most critical factor for small molecule absorption in the GI tract, structural features extracted from the compound dataset described %FA much better than any properties.
§ Activity coefficient calculations may explain why LogP does not correlate well with absorption: partitioning into in a highly hydrophobic environment (lipid tail region) is not modeled properly using octanol.
§ This preliminary study shows that models based on structural features may be further improved by addition of interfacial properties such as activity coefficients and polar surface area.
Next Steps§ Apply to larger dataset§ Further elaborate the scoring function for feature selection§ Method refinement of UNIFAC to model drug-like compounds
– Calculate R and Q values for the selected features from this dataset.
– Calculate activity coefficients at infinite dilution– Explore activity coefficients in multicomponent environments
§ Model can be applied to Caco-2 cell permeability studies– Human or animal absorption data may be too complicated to
model with predictive accuracy§ The model will also be compensated for transport phenomena.
Acknowledgement
§ Julie Roberts, LeadScope, Inc.– building structures
§ Kevin Cross, LeadScope, Inc.– calculation of LogP and PSA
§ Tim Sötherlund, Kibron, Inc.– application of surface (air-liquid) properties to ADME properties