2 nd joint sheffield conference on chemoinformatics: computational tools for lead discovery

ABCD

Flexsim-R: A new 3D descriptor for combinatorial library design

and in-silico screening

2nd Joint Sheffield Conference on Chemoinformatics: Computational Tools

for Lead Discovery

ABCDOutline

• Introduction

• The Flexsim-R Methodology

• Validation

• Conclusion and Outlook

ABCDIntroduction

What is Flexsim-R?

Flexsim-R calculates 3D descriptors for reagents,

based on the virtual affinity fingerprint idea

ABCDMotivation to develop Flexsim-R

• Reagent-based descriptors are important for– combinatorial library design– virtual screening experiments– bioisosteric replacements– rational augmentation of inhouse reagent pool

• For large combinatorial libraries, product-based descriptor calculation is often not feasible -> possible solution: reagent-based product selection (e.g. by a GA)

• Descriptor calculation should be fast and automizable

• Descriptor should be related to experimental affinity data

• Encouragement by virtual affinity fingerprint methods

ABCDIn-vitro Affinity Fingerprints

Terrapin's Affinity Fingerprint Approach:(Kauvar et al., Chemistry & Biology, 1995, 2, 107-118)

Molecular similarityis defined by in-vitro binding patterns("Affinity Fingerprints")of a ligand set (L) in reference binding assays (A)

L1L2L3

L4

L5

L6

A1 A2

A3

A4

A5

A6

A7

A8

ABCDVirtual Affinity Fingerprints (VAF)

Terrapins in-vitro screening in diverse reference assays is simulated

• by Computational Docking into a reference panel of protein pockets (Docksim, Flexsim-X)

• by Computational Fitting onto a reference panel of small molecules (Flexsim-S)

(Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) 231-244)

ABCDThe Flexsim-R Method

R

R

O

NH2

NHN

O

O

X

Rgroups Core

NHN

O

O

NHN

O

OO

NH2

Products


Protein pocket

Problems with Rgroups in conventional VAF approaches:

• Rgroups tend to be smaller than „drug-like“ molecules

• Alignment rule by common core attachment point gets lost

Solution: Core-constrained multiple-site docking


Components of core-constrained multiple-site docking:1. Rgroup Set 2. Common Core 3. Protein Binding Pockets


First step: • Docking of common core group with FlexX• Multiple (e.g. 50 best) solutions are stored • RMS threshold can be applied to prevent

clustering

ABCD

Example: Thrombin active site with 50 best FlexX solutions of hydantoin

(RMS threshold = 2.0)

The Flexsim-R Method


Second step: • Docking of core group + rgroup with FlexX• Pre-stored core positions serve as reference• FlexX scores are stored in descriptor matrix

Core Pos1

13.5

22.0

Core Pos2

15.7R1

R2

R3

...

Descriptor Matrix

Protein pocket

15.5

11.2

21.7

...

...

... ...

...

...

...

ABCD

Affinity Profiles for Ala and Gly

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore

A

G


ABCD

Affinity Profiles for Asp and Glu

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore

D

E


ABCD

Affinity Profiles for A/G and D/E

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore A

G

D

E


ABCD

C1 C2 C3

Pocket 3

C1 C2 C3

Pocket 2


Multiple protein pockets -> Concatenated descriptor matrix

R1

R2

R3

...

C1 C2 C3

Pocket 1

ABCD

NHNH

O

O

X4

C1 C2 C3

X4

C1 C2 C3

X3

NHNH

O

O

X3

C1 C2 C3

X2NNH

O

OX2


Multiple core attachment points -> Concatenated descriptor matrix

NHN

O

O

X1

R1

R2

R3

...

C1 C2 C3

X1


Example: Hydantoin Core

NN

O

O

X1

X2

X3

X4

4 attachment points * 7 protein pockets * 50 FlexX solutions

-> descriptor vector length = 1,400


Test set for method development and evaluation: • Rgroups: 20 natural amino acids• Core groups:

NN

O

O

X1

X2

X3

X4

OH

X1

X2

X3

N

NN

N X1

X2

N

N

X3X2

X1

Hydantoin Phenole Pyrimido-pyrimidine

Benzimidazole

• 7 protein pockets:1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)

ABCDCorrelation Analysis

• Analyses were performed to check correlation between• different protein pockets• different cores• different attachment points

• Analyses are based on euclidian distance matrices for all 190 pairwise amino acid vector combinations


Protein 1eed 1pop 2tsc 3cla 3dfr 5ht2

1dwc 0.922 0.917 0.852 0.794 0.889 0.8261eed 0.937 0.784 0.726 0.863 0.8111pop 0.853 0.740 0.940 0.8942tsc 0.723 0.924 0.8643cla 0.795 0.8383dfr 0.932

• Correlation matrix of protein pockets: (hydantoin core, all 4 attachment points)


Core PhenolePyrimido-

pyrimidineBenz-

imidazole

Hydantoin 0.954 0.971 0.978

Phenole 0.963 0.973

Pyrimido-pyrimidine 0.987

• Correlation matrix of core groups: (all 7 protein pockets, all attachment points)


Position X2 X3 X4 All

X1 0.985 0.981 0.988 0.995X2 0.964 0.994 0.995X3 0.967 0.983X4 0.995

• Correlation matrix of attachment points: (hydantoin core, all 7 protein pockets)


Reduction of descriptor vector length (dimensionality) :

• no PCA was performed, since we want to get information about the most uncorrelated descriptor columns• instead, an elimination method has been applied:

the complete pairwise correlation matrix is calculate all pairs of columns with correlation coefficient (r) above a user-defined threshold (e.g. 0.7) are considered for elimination from each correlating pair, that column is eliminated which can be better described by multiple linear regression of the remaining descriptors resulting matrix doesn‘t contain pairs of columns with correlation coefficient above the threshold

ABCD

1100

443

13054201570

200

400

600

800

1000

1200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

r (Threshold)

Des

crip

tor

Vec

tor

Len

gth

Example: hydantoin core, all 7 proteins, all 4 attachment points

Correlation Analysis

Descriptor set 1

Descriptor set 2Descriptor set 3


Thrombin with three most information-rich core positions

ABCDDescriptor Validation

• Five peptide datasets, taken from literature

(Refs. in Matter, H., J. Peptide Res. 52 (1998) 305-314)

• Product descriptors are generated by concatenation of

respective reagent descriptors

• Validation by PLS Analysis

• leave-one-out (LOO) and leave-random-groups-out (LRGO)

cross-validation

ABCDDescriptor Validation

• Datasets:

Activity N Peptide length

ACE ACE-Inhibitors 58 2

BIT Bitter-tasting 48 2

BRA Bradykinin- potentiating

29 5

ENK Enkephalin- analogs

19 5

BR9 Bradykinin- analogs

26 9

ABCD

00.10.20.30.40.50.60.70.80.9

1

q2

1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5

r Thresholds

ACE BIT BRA ENK BR9

Leave-random-groups-out (LRGO) results:

Descriptor Validation: Results

ABCDSummary

• Flexsim-R comprises a novel virtual affinity fingerprint method, which calculates meaningful 3D descriptors for reagents

• High correlation between different cores and attachment points

• For 3 out of 5 validation sets, significant cross-validated q2 values could be obtained

• Rgroup alignment problem is tackled inherently

• Flexsim-R calculations are fast and can be automated easily:

• only clipped reagent structures are required

• core positions need to be calculated only once

ABCDOutlook

• More validation sets have to be tested (e.g.

„real-life“ combichem dataset)

• Is there a set of descriptors, which works well

for different datasets?

• Integration in Boehringer Ingelheim library

design and virtual screening workflow

ABCDAcknowledgements

• Alexander Weber (Boehringer

Ingelheim/University of Marburg)

• Andreas Teckentrup (Boehringer Ingelheim)

• Hans Matter (Aventis)

• BMBF for financial support

2 nd joint sheffield conference on chemoinformatics: computational tools for lead discovery

Documents