computer aided drug design qsar related methods
Post on 26-Jan-2015
115 Views
Preview:
DESCRIPTION
TRANSCRIPT
Importance of PROCESS is not less than PRODUCT5/27/2014 1
Computer Aided Drug Design:
QSAR Related Methods
Jahan B Ghasemi
DDSLab K N Toosi Univ of Tech.
Tehran, Iran
5/27/2014 Importance of PROCESS is not less than PRODUCT
Topics in this Talk are:
General Introduction
Some of These QSAR Steps:
3
Data Pre-Processing
Normalization
Standardization
Variable Selection
Subset Selection
Outlier Detection
Multivariate Analysis
MLR
PCA
PLS
SVM
ANN
CART
Molecular Descriptors
Constitutional
Electronic
Geometrical
Hydrophobic
Lipophilicity
Solubility
Steric
Quantum Chemical
Topological
Molecular Structures
OC1=CC=CC=C1 1D
2D
3D
Statistical Evaluation
R
R2
Q2
MSE
RMSE
PRESS
Importance of PROCESS is not less than PRODUCT
"Well begun is half done“ Aristotle
Renes Descartes in 1619 Quantitative
Measurement in Science
Research Types
Inductive Approach
Deductive Approach
Abductive
Approach
5/27/2014 4
General Introduction
Importance of PROCESS is not less than PRODUCT
Theory
Hypothesis
Confirmation
Observation
Theory
Hypothesis
Observation
Pattern
Induction is usually described as moving from the specific to the general, while deduction begins
with the general and ends with the specific.
Arguments based on laws, rules and accepted principles are generally used for Deductive
Reasoning. Observations tend to be used for Inductive Arguments.
5/27/2014
-Metrics as soft-computing or soft-modeling are Inductive Research Approaches. Uncertainty
Are humans natural logic reasoners?
No!!!
5
5/27/2014 Importance of PROCESS is not less than PRODUCT
What Do We Need to Know in a Successful QSAR Modeling as a Drug Design Tool?
6
I- Math-Science or Informatique or Informatics Aspect
Linear Algebra
Vectors, Matrices, Tensors…
Homogenous and regular linear and nonlinear simultaneous equations
Graph Theory
Maximal Subgraph
Clique Detection
Multivariate Statistical Analysis
Column Space, Row SpacePattern Recognition
(Dis)Similarity Distance Metrics, Euclidean,
Manhattan, Mahalanobis
Fingerprints, Tanimoto, Jaccard
Supervised and Unsupervised Pattern Recognition
Clustering, Agglomerative(bottom up), Divisive(top down) MLR, PCA, PLS
Optimization
Selection of the most informative variables,
GA
Selection of the most representative objects, KS
Function minimization, Newton, Gauss-Newton, Marquradt-Levenberg
Computer
Computer Graphic
HPC
5/27/2014 Importance of PROCESS is not less than PRODUCT 7
5/27/2014 Importance of PROCESS is not less than PRODUCT
II-
Bio
-Sci
ence
A
spec
t
Chemistry
Organic Chemistry
Quantum/Molecular Mechanics
Forcefield, Conformer, Bioactive Conformer
Medicinal Chemistry
BiologyMolecular Biology
Systems Biology
Pharmacology
Pharmacokinetics
Pharmacodynamics
Toxicity
ADMET
8
Combination of I and II
OMICS
Bioinformatics
Proteomics
Metabolomics
Genomics
Metrics
Biometrics
Chemometrics
Technometrics
Chem(o)informatics
5/27/2014 Importance of PROCESS is not less than PRODUCT 9
QSAR is related to the most of –OMICS and –
METRICS routines
Bio-Science
Part Start Here:
5/27/2014 Importance of PROCESS is not less than PRODUCT 10
Chemical Space
(Gathering Information from All Involved Species)
Aggregation
Host-Guest Complex
Receptor-Inhibitor Complex
Macromolecules
Protein
Receptor
Host
Small Molecules
Guest
Ligand
Inhibitor
5/27/2014 Importance of PROCESS is not less than PRODUCT 11
Chemical Space
Chemical Information
Information
due to
Macromolecule Structure
Information
due to
Aggregation Structure
Information
Due to
Small Molecule Structure
5/27/2014 Importance of PROCESS is not less than PRODUCT 12
To have and use
Chemical Space:
Extract and Convert
Chemical Information
to
Numerical Values
We Are Calling These Numerical
Values: MolecularDescriptors
5/27/2014 Importance of PROCESS is not less than PRODUCT 13
Descriptors should be associated with
the following desirable features:
Easy Interpretation
Show Correlation with a Property
Discrimination of Isomers
Independence
Simplicity
Not to be based on properties
Not to be trivially related to other descriptors
Allow for efficient construction
Use familiar structural concepts
Show gradual change with gradual change in structures
5/27/2014Importance of PROCESS is not less than PRODUCT
End Points to Be Modeled
Chemical properties
Boiling point
Retention time
Dielectric constant
Diffusion coefficient
Dissociation constant
Melting point
Reactivity
Solubility
Stability
Thermodynamic properties
Viscosity 5/27/2014Importance of PROCESS is not less than PRODUCT
End Points to Be Modeled
Biological Properties
Bioconcentration
Biodegradation
Carcinogenicity
Drug metabolism and clearance
Inhibition constant
Mutagenicity
Permeability
Blood brain barrier
Skin
Pharmacokinetics
Receptor binding 5/27/2014Importance of PROCESS is not less than PRODUCT
There are more than 5500 Mol.
Des. BUT!
Why do we need more and more Molecular
Descriptors?
Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule and, as a consequence, the number of descriptors is continuously increasing
with the increasing request of deeper investigations on chemical and biological systems.
Different descriptors have independent methods or perspectives to view a molecule, taking into account the various features of chemical structure. Molecular
descriptors have now become some of the most important variables used in molecular modeling,
and, consequently, managed by statistics, chemometrics, and chemoinformatics.
5/27/2014 Importance of PROCESS is not less than PRODUCT 17
Molecular Descriptors
Cost to Generate:
Cheap Expensive
5/27/2014 Importance of PROCESS is not less than PRODUCT 18
Molecular Descriptors
How to Calculate Molecular Descriptors?
By Hand! By Software
Dragon SYBYLPaDEL-
DescriptorAdrianaCode
5/27/2014 Importance of PROCESS is not less than PRODUCT 19
Molecular Descriptors
Classes!
Different Classes?
Yes
How many?
Many classes
What are the bases of Classification?
Based of Dimensionality
0D-4D
Geometric Constitutional TopologicalQuantum Chemical
etc….
Based of Origin
Theoretical Experimental
Both!
5/27/2014 Importance of PROCESS is not less than PRODUCT 20
Molecular Descriptors
Do they have equal importance?
0D<1D<2D<2.5D<3D<4D…<nD
Low Information Content High Information Content
5/27/2014 Importance of PROCESS is not less than PRODUCT 21
Now We Have Molecular Descriptors and Chemical, Molecular or Information Space
But first define and introduce:
Objects=Molecules
Variables=
Descriptors
Object to Variable ratio ≥ 4
Why? Least-Squares Need it!
5/27/2014 Importance of PROCESS is not less than PRODUCT 22
5/27/2014 Importance of PROCESS is not less than PRODUCT 23
Math-Science Part Start Here: Using a Very Efficient Way to Show
Chemical Information:
Matrix-Vector
Objects
as rows
Variables as Columns
123....
.
.
.
.
.
.n
1 2 3 . . . . . . . . . m
Objects
as rows
123....
.
.
.
.
.
.n
Preprocessing
On End Point Vector y
nM unit
log Transformation
To Linearized the Variation
To Have LFER InterpretationMean Centering
Autoscaling
On Molecular Descriptors Matrix
X
Mean Centering-Has its general purpose
AutoscalingHas its general purpose
Outlier Detection AD
Dimensionality Reduction
PCA
5/27/2014 Importance of PROCESS is not less than PRODUCT 25
Geometrical Interpretation of Information Matrix
Spaces
Row Space
Column Space: Object Map
Metrics
Distances
Euclidean and….
Classes Clusters Groups
5/27/2014 Importance of PROCESS is not less than PRODUCT 26
Row Space!
Is it informative? How? What does it mean? How can we use it?
On
O1
O2
Each Point is a Vector!
m-dimensional space Sm
n- points pattern Pn
Importance of PROCESS is not less than PRODUCT5/27/2014 27
Column Space
Objects Map Scientists(Chemists, Biologists..) are interest in!!!
Is it informative? How? What does it mean? How can we use it?
Vn
V1
V2
Class I or Group I
Class II or Group II
Each Point is a Vector!
n-dimensional space Sn
m- points pattern Pm
Importance of PROCESS is not less than PRODUCT5/27/2014 28
QSAR Model Building
Based on Molecular Geometry
2D-QSAR 2.5D-QSAR 3D-QSAR
5/27/2014 Importance of PROCESS is not less than PRODUCT 29
QSAR Model Building
Type of Mapping Function
A Crucial Decision
Linear
MLR kNN PLS
Nonlinear
ANN SVM
Linear+Non-Linear
DT + other Tree and Ensemble
Methods
5/27/2014 Importance of PROCESS is not less than PRODUCT 30
QSAR Model Building
Object Selection-Data Splitting-Train-Test SetsTo have Good 1- Representative and 2- Diversity
y-Based Method
Randomly Evenly
X-Based Methods
Random Selection
kNNSelection
Similarity Principle
KS,SOM, LMD, Duplex, MDC
5/27/2014 Importance of PROCESS is not less than PRODUCT 31
QSAR Model Building
Variable Selection
Filters(Subjective)
Uninformative Variable Elimination (UVE)
Correlation Ranking (CR)
Wrappers(Objective)
GA-PLS
Embedded(Selection+Mapping Integrated)
Stepwise Selection
RM, ERM, FFD
5/27/2014 Importance of PROCESS is not less than PRODUCT 32
QSAR Model Building
Model Validation- There are different Criteria in the Literatures
Residual Analysis
Analysis of Varaince
Applicability Domain
Residual Leverage
Good Leverage
Bad Leverage
Q_Residual T2_Hotelling
Model Precision(Confidence Intervals of Model Parameters)
Bootstrap Resampling
Jackknife Resampling
Model Accuracy(Predic
tion Error)
Internal Validation
Cross Validation
Leave One Out
Leave Many Out
Scrambling
X-randomization
y-randomization
External Validation
External and Fully Unseen or
Independent Data Set
5/27/2014 Importance of PROCESS is not less than PRODUCT
Final word on Validation: The
external Independent Unseen Data
Set Is Mandatory for a Successful
QSAR Model: Do you know why?
Local-X-Global or Induction
Research has Uncertainty
33
Purposes OF QSAR:
Rational Identification of New Leads with:
Pharmacological, Biocidal or Pesticidal Activity.
Optimization of New Leads with:
Pharmacological, Biocidal or PesticidalActivity.
The Rational Design of:
Surface-active agents, Perfumes,
Dyes, and Fine Chemicals. 5/27/2014Importance of PROCESS is not less than PRODUCT
Purposes OF QSAR:
The Selection of Compounds with
Optimal Pharmacokinetic
Properties.
The Prediction of a variety of Physico-
chemical Properties of Molecules.
The Prediction of the Fate of Molecules.
The Rationalization and Prediction of
the Combined Effects of
Molecules.
5/27/2014Importance of PROCESS is not less than PRODUCT
Purposes OF QSAR:
The Identification of Hazardous
Compounds at Early Stages.
The Designing out of Toxicity and Side-Effects in
New Compounds.
The Prediction of Toxicity of
Compounds to Humans.
The Prediction of Toxicity to
Environmental Species.
5/27/2014Importance of PROCESS is not less than PRODUCT
Original Data Set
CuratedDataset
Split into training, test and external validation set
Multiple Training
Sets
Y-Randomization
Combi-QSAR modeling
Multiple Test Sets
Activity Prediction
Only Retain Models that
pass both internal and
external accuracy
filters
Validated Predictive
models with High Internal and External
Accuracy
External Validation using Applicability Domain
Virtual Screening Using Applicability
Domain
Experimental Validation
The Most Rigorous and Currently Accepted QSAR Methodology
5/27/2014Importance of PROCESS is not less than PRODUCT
5/27/2014 Importance of PROCESS is not less than PRODUCT
A S
mal
l Q
ues
tio
n!!
! Why is QSAR alive in spite of the existence of very strong rivals like Docking, MDs, Pharmacophore, SB
and LB methods?
Modeling and taking into account all pharmacological phenomena is:
Nearly or totally impossible even in high level and advanced research laboratories.
38
5/27/2014 Importance of PROCESS is not less than PRODUCT 39
Thank You All!
1
2
a
d
c
b
Which one would be the third point? a, b, c or d?
1 and 2 have the largest distance.
They are firstly selected. Then
distance between of all unselected
points and all selected points
calculated.
Calculate distances 1a and 2a then min(1a,2a)= 2a.
Calculate distances 1b and 2b then min(1b,2b)= 2b.
Calculate distances 1c and 2c then min(1c,2c)= 1c.
Calculate distances 1d and 2d then min(1d,2d)= 1d.
Max(min(1a,2a),min(1b,2b),min(1c,2c),min(1d,2d))=1d
Then the point d is selected as the Third Point and so on…
1a
2a
1b
2b
1c
2c1d
2d
KSA Graphical Algorithm
5/27/2014 40Importance of PROCESS is not less than PRODUCT
5/27/2014 Importance of PROCESS is not less than PRODUCT
Applicability Domain
41
Q Residuals and Hotelling T2
5/27/2014 Importance of PROCESS is not less than PRODUCT 42
5/27/2014 Importance of PROCESS is not less than PRODUCT 44
5/27/2014 Importance of PROCESS is not less than PRODUCT
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8 9 10 11
Original Data
log Values
45
Activity Descr 1 Descr 2 … Descr m
Y1 X11 X12 … X1m
Y2 X21 X22 … X2m
… … … … …
Yn Xn1 Xn2 … Xnm
Yi = a0 + a1 Xi1 + a2 Xi2 +…+ am Xim
Don’t consider the nonlinearity effects
Multiple Linear Regression (MLR)
465/27/2014 Importance of PROCESS is not less than PRODUCT
nnn FqtqtqtY 2211
• t latent variables or scores
• q loading vectors
Partial Least Square (PLS)
Robust with respect to collinear descriptors
Only one model optimization parameter (LV’s )
Fast computational 47
48
Works on Similarity Principle
A compound in space close to, its kNN compounds from the training set and predicts the activityclass that is most highly represented among these neighbors.
The k-NN scheme is sensitive: 1-Distance Metric 2-Number of training compounds 3- k can be optimized to yield best results.
5/27/2014 Importance of PROCESS is not less than PRODUCT
The k-Nearest Neighbor Method kNN
Artificial Neural Network (ANN)
495/27/2014 Importance of PROCESS is not less than PRODUCT Des
crip
tors
or
Ori
gin
al S
pac
e
Nonli
nea
r or
Hid
den
Spac
e
Pro
per
ties
Bei
ng P
redic
ted
otherwise
if
0:Only the points outside the ε-tube are penalized in a
linear fashion
ε-Insensitive Loss Function
Support Vector Regression (SVR)
Support Vector Classification (SVC)
505/27/2014 Importance of PROCESS is not less than PRODUCT
Non-linear SVMs
Datasets that are linearly separable with some noise work out great:
But what are we going to do if the dataset is just too hard?
How about… mapping data to a higher-dimensional space:
0x
0 x
0 x
x2
5/27/2014 Importance of PROCESS is not less than PRODUCT 51
Non-linear SVMs: Feature spaces
General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
5/27/2014 Importance of PROCESS is not less than PRODUCT 52
Decision Trees as a Greedy Algorithm:
CART: Classification and regression TreeBinary recursive partitioning tree
Best First
Left Right
Up down
Here the Variable to classify
Audience! Here the First
Variable is “Biologist or Not”?
Why? We are in Bio-Dept.
535/27/2014 Importance of PROCESS is not less than PRODUCT
3D-QSARNotes
Advantages over 2D-QSAR
No reliance on experimental values
Can be applied to molecules with unusual substituents
Not restricted to molecules of the same structural class in (Pharmacophre 3D-QSAR case)
Predictive capability 5/27/2014 Importance of PROCESS is not less than PRODUCT 54
No experimental constants or measurements are involved
Properties are known as ‘Fields’
Steric field - defines the size and shape of the molecule
Electrostatic field - defines electron rich/poor regions of molecule
3D-QSAR
Comparative molecular field analysis (CoMFA) - Tripos
Build each molecule using modelling software
Identify the active conformation for each molecule
Identify the pharmacophore
Method
NHCH3
OH
HO
HO
Active conformation
Build 3Dmodel
Define pharmacophore
5/27/2014 Importance of PROCESS is not less than PRODUCT 55
3D-QSARMethod
NHCH3
OH
HO
HO
Active conformation
Build 3Dmodel
Define pharmacophore
5/27/2014 Importance of PROCESS is not less than PRODUCT 56
Comparative molecular field analysis (CoMFA) - Tripos
Build each molecule using modelling software
Identify the active conformation for each molecule
Identify the pharmacophore
3D-QSAR
•Place the pharmacophore into a lattice of grid points
Method
•Each grid point defines a point in space
Grid points
..
.
.
.
5/27/2014 Importance of PROCESS is not less than PRODUCT 57
3D-QSARMethod
•Each grid point defines a point in space
Grid points
..
.
.
.
•Position molecule to match the pharmacophore
5/27/2014 Importance of PROCESS is not less than PRODUCT 58
3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Probe atom = a proton or sp3 hybridised carbocation
..
.
.
.Probe atom
5/27/2014 Importance of PROCESS is not less than PRODUCT 59
3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Measure the steric or electrostatic interaction of the probe atom with the molecule at each grid point
..
.
.
.Probe atom
5/27/2014 Importance of PROCESS is not less than PRODUCT 60
3D-QSARMethod
Compound Biological Steric fields (S) Electrostatic fields (E)
activity at grid points (001-998) at grid points (001-098)
S001 S002 S003 S004 S005 etc E001 E002 E003 E004 E005 etc
1 5.1
2 6.8
3 5.3
4 6.4
5 6.1
Tabulate fields for each compound at each grid point
Partial least squares analysis (PLS)
QSAR equation Activity = aS001 + bS002 +……..mS998 + nE001 +…….+yE998 + z
. ..
..
5/27/2014 Importance of PROCESS is not less than PRODUCT 62
3D-QSAR
•Define fields using contour maps round a representative molecule
Method
5/27/2014 Importance of PROCESS is not less than PRODUCT 63
A procedure based on the information included in the
MIF
generating a handful of informative variables,
independent of the location of the molecules within the
grid
Two main steps of the procedure of transformation:
Field filtering
Maximum auto-cross correlation(MACC2) encoding.
2 means distance between two points in the space.
2.5D-QSAR or GRIND methodology
5/27/2014 Importance of PROCESS is not less than PRODUCT 64
MACC2 transform
The MACC transform hasmaximum value of the products ofthe two i and j field values, found
at each different rij distance.
Here the colors represent theactivity of the compounds (blueinactive, red active)
33 means the energy productsproduced by two N1 probes
8 means the 8th variable of auto-correlogram 33
5/27/2014 Importance of PROCESS is not less than PRODUCT 65
GRID interaction fields
calculated using the N1 probe:
positive (yellow) interactions
describe unfavorable and
negative (blue) interactions
describe favorable interactions
they should have low
energy values
(representing highly
favorable interactions)
they should be as far as
possible one from each
other.
5/27/2014 Importance of PROCESS is not less than PRODUCT 66
5/27/2014 Importance of PROCESS is not less than PRODUCT 67
Each number are corresponds to
a specific distance of the fields
5/27/2014 Importance of PROCESS is not less than PRODUCT 68
5/27/2014 Importance of PROCESS is not less than PRODUCT 69
5/27/2014 Importance of PROCESS is not less than PRODUCT 70
5/27/2014 Importance of PROCESS is not less than PRODUCT 71
5/27/2014 Importance of PROCESS is not less than PRODUCT 72
One of the unique features of the MACC
transform is that it is possible to trace back the
variables that generated this "most intense"
interaction.
5/27/2014 Importance of PROCESS is not less than PRODUCT 73
VRS
top related