2 nd joint sheffield conference on chemoinformatics: computational tools for lead discovery
DESCRIPTION
2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery. Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening. Outline. Introduction The Flexsim-R Methodology Validation Conclusion and Outlook. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
ABCD
Flexsim-R: A new 3D descriptor for combinatorial library design
and in-silico screening
2nd Joint Sheffield Conference on Chemoinformatics: Computational Tools
for Lead Discovery
ABCDOutline
• Introduction
• The Flexsim-R Methodology
• Validation
• Conclusion and Outlook
ABCDIntroduction
What is Flexsim-R?
Flexsim-R calculates 3D descriptors for reagents,
based on the virtual affinity fingerprint idea
ABCDMotivation to develop Flexsim-R
• Reagent-based descriptors are important for– combinatorial library design– virtual screening experiments– bioisosteric replacements– rational augmentation of inhouse reagent pool
• For large combinatorial libraries, product-based descriptor calculation is often not feasible -> possible solution: reagent-based product selection (e.g. by a GA)
• Descriptor calculation should be fast and automizable
• Descriptor should be related to experimental affinity data
• Encouragement by virtual affinity fingerprint methods
ABCDIn-vitro Affinity Fingerprints
Terrapin's Affinity Fingerprint Approach:(Kauvar et al., Chemistry & Biology, 1995, 2, 107-118)
Molecular similarityis defined by in-vitro binding patterns("Affinity Fingerprints")of a ligand set (L) in reference binding assays (A)
L1L2L3
L4
L5
L6
A1 A2
A3
A4
A5
A6
A7
A8
ABCDVirtual Affinity Fingerprints (VAF)
Terrapins in-vitro screening in diverse reference assays is simulated
• by Computational Docking into a reference panel of protein pockets (Docksim, Flexsim-X)
• by Computational Fitting onto a reference panel of small molecules (Flexsim-S)
(Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) 231-244)
ABCDThe Flexsim-R Method
R
R
O
NH2
NHN
O
O
X
Rgroups Core
NHN
O
O
NHN
O
OO
NH2
Products
ABCDThe Flexsim-R Method
Protein pocket
Problems with Rgroups in conventional VAF approaches:
• Rgroups tend to be smaller than „drug-like“ molecules
• Alignment rule by common core attachment point gets lost
Solution: Core-constrained multiple-site docking
ABCDThe Flexsim-R Method
Components of core-constrained multiple-site docking:1. Rgroup Set 2. Common Core 3. Protein Binding Pockets
ABCDThe Flexsim-R Method
First step: • Docking of common core group with FlexX• Multiple (e.g. 50 best) solutions are stored • RMS threshold can be applied to prevent
clustering
ABCD
Example: Thrombin active site with 50 best FlexX solutions of hydantoin
(RMS threshold = 2.0)
The Flexsim-R Method
ABCDThe Flexsim-R Method
Second step: • Docking of core group + rgroup with FlexX• Pre-stored core positions serve as reference• FlexX scores are stored in descriptor matrix
Core Pos1
13.5
22.0
Core Pos2
15.7R1
R2
R3
...
Descriptor Matrix
Protein pocket
15.5
11.2
21.7
...
...
... ...
...
...
...
ABCD
Affinity Profiles for Ala and Gly
-15
-10
-5
0
5
0 10 20 30 40 50
Core Position
Do
ck
ing
Sc
ore
A
G
The Flexsim-R Method
ABCD
Affinity Profiles for Asp and Glu
-15
-10
-5
0
5
0 10 20 30 40 50
Core Position
Do
ck
ing
Sc
ore
D
E
The Flexsim-R Method
ABCD
Affinity Profiles for A/G and D/E
-15
-10
-5
0
5
0 10 20 30 40 50
Core Position
Do
ck
ing
Sc
ore A
G
D
E
The Flexsim-R Method
ABCD
C1 C2 C3
Pocket 3
C1 C2 C3
Pocket 2
The Flexsim-R Method
Multiple protein pockets -> Concatenated descriptor matrix
R1
R2
R3
...
C1 C2 C3
Pocket 1
ABCD
NHNH
O
O
X4
C1 C2 C3
X4
C1 C2 C3
X3
NHNH
O
O
X3
C1 C2 C3
X2NNH
O
OX2
The Flexsim-R Method
Multiple core attachment points -> Concatenated descriptor matrix
NHN
O
O
X1
R1
R2
R3
...
C1 C2 C3
X1
ABCDThe Flexsim-R Method
Example: Hydantoin Core
NN
O
O
X1
X2
X3
X4
4 attachment points * 7 protein pockets * 50 FlexX solutions
-> descriptor vector length = 1,400
ABCDThe Flexsim-R Method
Test set for method development and evaluation: • Rgroups: 20 natural amino acids• Core groups:
NN
O
O
X1
X2
X3
X4
OH
X1
X2
X3
N
NN
N X1
X2
N
N
X3X2
X1
Hydantoin Phenole Pyrimido-pyrimidine
Benzimidazole
• 7 protein pockets:1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)
ABCDCorrelation Analysis
• Analyses were performed to check correlation between• different protein pockets• different cores• different attachment points
• Analyses are based on euclidian distance matrices for all 190 pairwise amino acid vector combinations
ABCDCorrelation Analysis
Protein 1eed 1pop 2tsc 3cla 3dfr 5ht2
1dwc 0.922 0.917 0.852 0.794 0.889 0.8261eed 0.937 0.784 0.726 0.863 0.8111pop 0.853 0.740 0.940 0.8942tsc 0.723 0.924 0.8643cla 0.795 0.8383dfr 0.932
• Correlation matrix of protein pockets: (hydantoin core, all 4 attachment points)
ABCDCorrelation Analysis
Core PhenolePyrimido-
pyrimidineBenz-
imidazole
Hydantoin 0.954 0.971 0.978
Phenole 0.963 0.973
Pyrimido-pyrimidine 0.987
• Correlation matrix of core groups: (all 7 protein pockets, all attachment points)
ABCDCorrelation Analysis
Position X2 X3 X4 All
X1 0.985 0.981 0.988 0.995X2 0.964 0.994 0.995X3 0.967 0.983X4 0.995
• Correlation matrix of attachment points: (hydantoin core, all 7 protein pockets)
ABCDCorrelation Analysis
Reduction of descriptor vector length (dimensionality) :
• no PCA was performed, since we want to get information about the most uncorrelated descriptor columns• instead, an elimination method has been applied:
the complete pairwise correlation matrix is calculate all pairs of columns with correlation coefficient (r) above a user-defined threshold (e.g. 0.7) are considered for elimination from each correlating pair, that column is eliminated which can be better described by multiple linear regression of the remaining descriptors resulting matrix doesn‘t contain pairs of columns with correlation coefficient above the threshold
ABCD
1100
443
13054201570
200
400
600
800
1000
1200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
r (Threshold)
Des
crip
tor
Vec
tor
Len
gth
Example: hydantoin core, all 7 proteins, all 4 attachment points
Correlation Analysis
Descriptor set 1
Descriptor set 2Descriptor set 3
ABCDCorrelation Analysis
Thrombin with three most information-rich core positions
ABCDDescriptor Validation
• Five peptide datasets, taken from literature
(Refs. in Matter, H., J. Peptide Res. 52 (1998) 305-314)
• Product descriptors are generated by concatenation of
respective reagent descriptors
• Validation by PLS Analysis
• leave-one-out (LOO) and leave-random-groups-out (LRGO)
cross-validation
ABCDDescriptor Validation
• Datasets:
Activity N Peptide length
ACE ACE-Inhibitors 58 2
BIT Bitter-tasting 48 2
BRA Bradykinin- potentiating
29 5
ENK Enkephalin- analogs
19 5
BR9 Bradykinin- analogs
26 9
ABCD
00.10.20.30.40.50.60.70.80.9
1
q2
1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5
r Thresholds
ACE BIT BRA ENK BR9
Leave-random-groups-out (LRGO) results:
Descriptor Validation: Results
ABCDSummary
• Flexsim-R comprises a novel virtual affinity fingerprint method, which calculates meaningful 3D descriptors for reagents
• High correlation between different cores and attachment points
• For 3 out of 5 validation sets, significant cross-validated q2 values could be obtained
• Rgroup alignment problem is tackled inherently
• Flexsim-R calculations are fast and can be automated easily:
• only clipped reagent structures are required
• core positions need to be calculated only once
ABCDOutlook
• More validation sets have to be tested (e.g.
„real-life“ combichem dataset)
• Is there a set of descriptors, which works well
for different datasets?
• Integration in Boehringer Ingelheim library
design and virtual screening workflow
ABCDAcknowledgements
• Alexander Weber (Boehringer
Ingelheim/University of Marburg)
• Andreas Teckentrup (Boehringer Ingelheim)
• Hans Matter (Aventis)
• BMBF for financial support