Download - Urmila Joshi
-
8/3/2019 Urmila Joshi
1/53
2D QSAR
-
8/3/2019 Urmila Joshi
2/53
QSAR IN DRUG DESIGN
To introduce order in the universe, man must
pay attention to the quantitative aspects of
the universe and try to find a mathematical
relationship between them.
--------- Galileo Galili
-
8/3/2019 Urmila Joshi
3/53
HistoryofQSAR
Biological properties of molecules are related totheir structure.
A mathematical relationship between thestructure and biological properties was proposedat the end of the 19th century
Hydrophobicity was the first physicochemicalproperty which showed a quantitativerelationship with narcotic activity
-
8/3/2019 Urmila Joshi
4/53
HistoryofQSAR
Crum-Brown and Fraser Postulate : = f (C)
Meyer and Overtone : Practical Evidence :
Toxicity as a function of Lipophilicity
Fergussons Principle : Depressant action
related to the relative saturation in vapourphase. First attempt to use a thermodynamic
constant
-
8/3/2019 Urmila Joshi
5/53
HistoryofQSAR
Use of Substituent Constants :
1. Hammett Constant
2. Taft Steric Constant Es and the Hancock
correction
3. Lipophilicity Constant
-
8/3/2019 Urmila Joshi
6/53
A VarietyofPhysicochemical
Parameters Lipophilicity : log P, , Rm
Electronic Effect : , F, R, Dipole moments,Spectral shifts, Ionization constants, Quantum
chemical indices for electron density
Steric Parameters :
-
8/3/2019 Urmila Joshi
7/53
MolecularConnectivity Index
-
8/3/2019 Urmila Joshi
8/53
MolecularDescriptors
Calculated/Computed : Can be calculated
using a mathematical procedure that converts
chemical structure/ information into a number
Experimentally determined : using
standardized experiments to measure some
molecular attributes
-
8/3/2019 Urmila Joshi
9/53
Whydo weneed Descriptors?
Describe different aspects of molecules
Compare different molecular structures
Compare different conformation of same
molecule
Database storage
-
8/3/2019 Urmila Joshi
10/53
MolecularDescriptors
Structural information is given by molecular
descriptors
Molecular descriptors can be classified as
1D, 2D and 3D descriptors
1D descriptors give information about thewhole molecule, 2D about the substituent and
3D about the molecular fields
-
8/3/2019 Urmila Joshi
11/53
TypesofDescriptors
Counts of features: For example HBAs, HBDs,
aromatic ring systems, substructures/fragments (
e.g. , carbonyl groups, basic nitrogens, carboxyl
groups,),etc.
Physicochemical Properties: LogP, solubility,
MW, MP, BP, heat of sublimation, molarrefractivity, Hammett parameters, etc.
-
8/3/2019 Urmila Joshi
12/53
TypesofDescriptors Contd.
Topological Indices: Wiener index, branching
indices, kappa shape indices, electrotopological state
indices, atom-pairs, topological torsions, etc.
BCUTs (3-D, 2-D, 2-T): Electrostatic, charge, and
polarizability (hydrophobic).
Others: Volsurf, polar surface area, etc.
-
8/3/2019 Urmila Joshi
13/53
-
8/3/2019 Urmila Joshi
14/53
HistoryofQSAR
Subsequent efforts resulted in electronic and the
steric parameters being correlated with biological
activity.
The first successful application of QSAR is credited
to the efforts of Prof. Corwin Hansch, who developed
an equation correlating biological activity of a set of
molecules to a linear combination of lipophilicy and
electronic effects of a series of closely relatedmolecules
-
8/3/2019 Urmila Joshi
15/53
ModelsinMedicinal Chemistry
Model is a transformation of a pototype which
can be more conveniently handled.
Need for a Model : Ethical considerations and
Reduction in complexity
Types of Models
-
8/3/2019 Urmila Joshi
16/53
What is QSAR
A QSAR is a mathematical relationship between a
biological activity of a molecular system and its
geometric and chemical characteristics.
QSAR attempts to find consistent relationship
between biological activity and molecular
properties, so that these rules can be used to
evaluate the activity of new compounds.
-
8/3/2019 Urmila Joshi
17/53
Why QSAR?
The number of compounds required for
synthesis in order to place 10 different groups
in 4 positions of benzene ring is 104
Solution: synthesize a small number of
compounds and from their data derive rules
to predict the biological activity of other
compounds.
-
8/3/2019 Urmila Joshi
18/53
NecessitiesofQSAR
Good input data
Meaningful Structural Information
Predictive Models
-
8/3/2019 Urmila Joshi
19/53
Experimental Data Set
Needs of Experimental Data :
1. As numerous as possible
2. Correct
3. Representative
4. Homogenous ( ideally same lab, same
method)
-
8/3/2019 Urmila Joshi
20/53
Criteria ofExperimental Dataset
Compounds should belong to the same
congeneric series
The compounds should have similar binding
mode
Binding affinity should correlate with
interaction energy
Biological activity should correlate withbinding affinity
-
8/3/2019 Urmila Joshi
21/53
Garbagein- Garbageout
The models will only be as good as
the dataset used to develop them
-
8/3/2019 Urmila Joshi
22/53
A Dataset will looklikethis!
No. Comp. Activity
1 A-1 1.23
2 A-2 1.87
3 A-3 2.65
4 A-4 2.08
5 A-5 1.956 A-6 2.43
7 A-7 2.28
-
8/3/2019 Urmila Joshi
23/53
A Dataset Along With DescriptorsA Dataset Along With Descriptors
XX log(1/EClog(1/EC5050))MRMR TT WW EEss
HH 4.934.93 1.031.03 0.000.00 0.000.00 0.000.00
ClCl 5.915.91 6.036.03 0.710.71 0.230.23 --0.970.97
NONO22 5.345.34 7.367.36 --0.280.28 0.780.78 --2.522.52
CNCN 4.584.58 6.336.33 --0.570.57 0.660.66 --0.510.51
CC66HH55 6.626.62 25.3625.36 1.961.96 --0.010.01 --3.823.82
NMe2NMe2 5.365.36 15.5515.55 0.180.18 --0.830.83 --2.902.90
II 6.466.46 13.9413.94 1.121.12 0.180.18 --1.401.40
NHCHO ?NHCHO ? 10.3110.31 --0.980.98 0.000.00 --0.980.98
-
8/3/2019 Urmila Joshi
24/53
TooMany Descriptors!!
Reduce to manageable size by
1. Principal Component Analysis
2. Cluster Analysis
3. Choice of important Descriptors : Remove
Descriptors which show similar value for 90%
of the compounds
4. Investigators Intervention : Computers do
mathematics, they do not understand biology
-
8/3/2019 Urmila Joshi
25/53
NumberofDescriptors
The data set should contain at least 5 times as many
compounds as the no. of descriptor in the QSAR if
MLR is used as a method; and 60% of the no. of
compounds if PLS is used as a method
Too few compounds relative to the number of
descriptors will give a false meaningless and high
correlation
-
8/3/2019 Urmila Joshi
26/53
Biological Data ForQSAR
Experimentally generated data
Reported Data
-
8/3/2019 Urmila Joshi
27/53
HanschModel
Steps Involved
1. Selection of Lead Compound2. Selection of Substituents
3. Synthesis and Biological Evaluation
4. Determination of Descriptors5. Generation of Regression Equation
-
8/3/2019 Urmila Joshi
28/53
SelectionofSubstituents
Batchwise Selection
Stepwise Selection
-
8/3/2019 Urmila Joshi
29/53
IndicatorVariable
Indicated by I
Arbitrorily assigned only two values, 1 and 0
Should always be used along with other
Descriptors, and that descriptor should be
significant in the regression equation even inabsence of an indicator variable.
-
8/3/2019 Urmila Joshi
30/53
Mathematical Relationship
QSAR attempts to find a mathematicalrelationship between Descriptors and thebiological activity in form of an equation
This is traditionally done using Regressionanalysis
Descriptors are considered to be independentparameters and the biological activity as thedependent parameter
-
8/3/2019 Urmila Joshi
31/53
Statistical Techniques
Two major statistical techniques are used for
this purpose
1. MLR : Multiple Linear Regression
2. PLS : Partial Least Squares
-
8/3/2019 Urmila Joshi
32/53
NewerTechniques
Artificial Neural Networks
Genetic Algorithm
Advantages : Can detect nonlinear
relationship between the descriptors and
biological activity
-
8/3/2019 Urmila Joshi
33/53
TheMathematical Result
The result is an equation with associated
statistical parameters
The Statistical Parameters are :1. n
2. r
3. s4. F
-
8/3/2019 Urmila Joshi
34/53
Putting it all together
For a group of antihistamines,
Log (1/C) = 0.440 Es 2.204
(n=30, s=0.307, r= 0.886)Log (1/C) = 2.814 W - 0.223
(n=30, s=0.519, r= 0.629)
Log (1/C) = 0.492 Es - 0.585 W- 2.445
(n=30, s= .301, r= 0.889)
-
8/3/2019 Urmila Joshi
35/53
ValidationofQSAR Equations
Statistical Parameters and their significance
Scrambling the Y-values (Biological activity)
Leave One Out and Leave Many Out Method
Test set-Training set Method
-
8/3/2019 Urmila Joshi
36/53
InterpretationofEquations and Use
Prediction of Activity ofUnknown Compounds
Improving the series of compounds with
reference to the biological activity bysynthesis of new and active compounds
Restricting the number of compounds
synthesized for maximising the activity
-
8/3/2019 Urmila Joshi
37/53
LimitationsofQSAR
False correlations due to noisy data
False positives is a major problem as compared tofalse negatives
Statistical Gimmick
Metabolism of the compounds not taken intoconsideration
Alternate binding modes may affect the results in asignificant way.
-
8/3/2019 Urmila Joshi
38/53
3D QSAR : CoMFA
Cramer and Milne (1979)
Comparison of molecules by alignment and
field generation
Wold (1986)
Proposes using PLS instead of PCA for
overrepresented (1000s of field non-orthogonal
variables) problem Cramer, Patterson andBunce (1988)
Introduced CoMFA
-
8/3/2019 Urmila Joshi
39/53
Free Energyof Binding andEquilibriumConstants
The free energy of binding is related to thereaction constants of ligand-receptor complexformation:
Gbinding = 2.303 RT log K= 2.303 RT log (kon / koff)
Equilibrium constant K
Rate constants kon (association) and koff(dissociation)
-
8/3/2019 Urmila Joshi
40/53
Free Energyof Binding
(Gbinding = (G0+ (Ghb+ (Gionic + (Glipo + (Grot
(G0 entropy loss (translat. + rotat.) +5.4
(Ghb
ideal hydrogen bond 4.7
(Gionic ideal ionic interaction 8.3
(Glipo lipophilic contact 0.17
(Grot entropy loss (rotat.bonds) +1.4
(Energies in kJ/mol per unit feature)
-
8/3/2019 Urmila Joshi
41/53
CoMFA
Set of chemically related compounds
3D structures needed
Bioactive conformations of the active
compounds are to be aligned
-
8/3/2019 Urmila Joshi
42/53
CoMFA Alignment
L
LL
d1
d2
d3
L
LL
d1
d2
d3
"Pharmacophore"
C7OH
OH
A
D
B
L
LL
d1
d2
d3
O
OC
7OH
OHOH
A
B
-
8/3/2019 Urmila Joshi
43/53
CoMFA Grid and Field Probe
-
8/3/2019 Urmila Joshi
44/53
Molecular Fields in CoMFA
CoMFA standard: steric and electrostatic,
additional: H-bonding, indicator, parabolic and
others.
A grid with energyfieldsiscalculatedbyplacing a probe atom ateach voxel.
Themolecularfields are:
Steric (Lennard-Jones)interactionsElectrostatic (Coulombic)interactions
A probeissp3 carbon atom with chargeof+1.0
-
8/3/2019 Urmila Joshi
45/53
Common 3D molecular fields
MEP Molecular Electrostatic Potential (unit
positive charge probe).
MLP Molecular Lipophilicity Potential (no
probe necessary).
GRID total energy of interaction: the sum of
steric (Lennard-Jones), H-bonding and
electrostatics (any probe can be used).
-
8/3/2019 Urmila Joshi
46/53
Themolecularfields are:
Steric (Lennard-Jones)interactions
Electrostatic (Coulombic)interactions
A probeissp3 carbon atom with chargeof+1.0
El t t ti P t ti l C t
-
8/3/2019 Urmila Joshi
47/53
Electrostatic PotentialContourLines
-
8/3/2019 Urmila Joshi
48/53
3DContour Map forElectronegativity
-
8/3/2019 Urmila Joshi
49/53
CoMFA Pros andCons
Suitable to describe receptor-ligand
interactions
3D visualization of important features Good correlation within related set
Predictive power within scanned space
Alignment is often difficult
Training required
-
8/3/2019 Urmila Joshi
50/53
3D-QSAR: CoMFA
1st needtostructurally align
2-D Alignment Methods Maximum common substructure based methods
Feature-Based
Vector Methods
Discrete feature values (Bit Strings)
Continuous feature values
3-D Alignment Methods
Field-based methods
Structure-based methods (generalized RMSD approaches)
S b S
-
8/3/2019 Urmila Joshi
51/53
Subsequent Steps
Calculate property fields for each molecule at every
grid point (training set)
Property value at each grid point is equivalent to a
descriptor value in 2-D QSAR
Grid points with low variance may be neglected;
nevertheless this may result in hundreds of grid points
Many more descriptor values than experimental data
points, thus traditional least-squares approach cannot
be used Perform partial least squares (PLS) analysis
Validate model (test set)
Predict activities of new molecules
-
8/3/2019 Urmila Joshi
52/53
CoMFA CountourPlots
Inactive Molecule Active Molecule
-
8/3/2019 Urmila Joshi
53/53
TheHansch equation
ORJ&.ORJ3 .ORJ3.
.
:KHUH...DQG.DUHFRQVWDQWV
/RJ3LVWKHSDUWLWLRQFRHIILFLHQW
is the substituent constant describing the
electronic effect of the substituent