2 nd joint sheffield conference on chemoinformatics: computational tools for lead discovery

32
ABCD Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening 2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

Upload: lesley-crawford

Post on 01-Jan-2016

35 views

Category:

Documents


7 download

DESCRIPTION

2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery. Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening. Outline. Introduction The Flexsim-R Methodology Validation Conclusion and Outlook. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

Flexsim-R: A new 3D descriptor for combinatorial library design

and in-silico screening

2nd Joint Sheffield Conference on Chemoinformatics: Computational Tools

for Lead Discovery

Page 2: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDOutline

• Introduction

• The Flexsim-R Methodology

• Validation

• Conclusion and Outlook

Page 3: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDIntroduction

What is Flexsim-R?

Flexsim-R calculates 3D descriptors for reagents,

based on the virtual affinity fingerprint idea

Page 4: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDMotivation to develop Flexsim-R

• Reagent-based descriptors are important for– combinatorial library design– virtual screening experiments– bioisosteric replacements– rational augmentation of inhouse reagent pool

• For large combinatorial libraries, product-based descriptor calculation is often not feasible -> possible solution: reagent-based product selection (e.g. by a GA)

• Descriptor calculation should be fast and automizable

• Descriptor should be related to experimental affinity data

• Encouragement by virtual affinity fingerprint methods

Page 5: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDIn-vitro Affinity Fingerprints

Terrapin's Affinity Fingerprint Approach:(Kauvar et al., Chemistry & Biology, 1995, 2, 107-118)

Molecular similarityis defined by in-vitro binding patterns("Affinity Fingerprints")of a ligand set (L) in reference binding assays (A)

L1L2L3

L4

L5

L6

A1 A2

A3

A4

A5

A6

A7

A8

Page 6: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDVirtual Affinity Fingerprints (VAF)

Terrapins in-vitro screening in diverse reference assays is simulated

• by Computational Docking into a reference panel of protein pockets (Docksim, Flexsim-X)

• by Computational Fitting onto a reference panel of small molecules (Flexsim-S)

(Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) 231-244)

Page 7: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

R

R

O

NH2

NHN

O

O

X

Rgroups Core

NHN

O

O

NHN

O

OO

NH2

Products

Page 8: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

Protein pocket

Problems with Rgroups in conventional VAF approaches:

• Rgroups tend to be smaller than „drug-like“ molecules

• Alignment rule by common core attachment point gets lost

Solution: Core-constrained multiple-site docking

Page 9: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

Components of core-constrained multiple-site docking:1. Rgroup Set 2. Common Core 3. Protein Binding Pockets

Page 10: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

First step: • Docking of common core group with FlexX• Multiple (e.g. 50 best) solutions are stored • RMS threshold can be applied to prevent

clustering

Page 11: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

Example: Thrombin active site with 50 best FlexX solutions of hydantoin

(RMS threshold = 2.0)

The Flexsim-R Method

Page 12: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

Second step: • Docking of core group + rgroup with FlexX• Pre-stored core positions serve as reference• FlexX scores are stored in descriptor matrix

Core Pos1

13.5

22.0

Core Pos2

15.7R1

R2

R3

...

Descriptor Matrix

Protein pocket

15.5

11.2

21.7

...

...

... ...

...

...

...

Page 13: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

Affinity Profiles for Ala and Gly

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore

A

G

The Flexsim-R Method

Page 14: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

Affinity Profiles for Asp and Glu

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore

D

E

The Flexsim-R Method

Page 15: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

Affinity Profiles for A/G and D/E

-15

-10

-5

0

5

0 10 20 30 40 50

Core Position

Do

ck

ing

Sc

ore A

G

D

E

The Flexsim-R Method

Page 16: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

C1 C2 C3

Pocket 3

C1 C2 C3

Pocket 2

The Flexsim-R Method

Multiple protein pockets -> Concatenated descriptor matrix

R1

R2

R3

...

C1 C2 C3

Pocket 1

Page 17: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

NHNH

O

O

X4

C1 C2 C3

X4

C1 C2 C3

X3

NHNH

O

O

X3

C1 C2 C3

X2NNH

O

OX2

The Flexsim-R Method

Multiple core attachment points -> Concatenated descriptor matrix

NHN

O

O

X1

R1

R2

R3

...

C1 C2 C3

X1

Page 18: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

Example: Hydantoin Core

NN

O

O

X1

X2

X3

X4

4 attachment points * 7 protein pockets * 50 FlexX solutions

-> descriptor vector length = 1,400

Page 19: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDThe Flexsim-R Method

Test set for method development and evaluation: • Rgroups: 20 natural amino acids• Core groups:

NN

O

O

X1

X2

X3

X4

OH

X1

X2

X3

N

NN

N X1

X2

N

N

X3X2

X1

Hydantoin Phenole Pyrimido-pyrimidine

Benzimidazole

• 7 protein pockets:1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)

Page 20: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

• Analyses were performed to check correlation between• different protein pockets• different cores• different attachment points

• Analyses are based on euclidian distance matrices for all 190 pairwise amino acid vector combinations

Page 21: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

Protein 1eed 1pop 2tsc 3cla 3dfr 5ht2

1dwc 0.922 0.917 0.852 0.794 0.889 0.8261eed 0.937 0.784 0.726 0.863 0.8111pop 0.853 0.740 0.940 0.8942tsc 0.723 0.924 0.8643cla 0.795 0.8383dfr 0.932

• Correlation matrix of protein pockets: (hydantoin core, all 4 attachment points)

Page 22: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

Core PhenolePyrimido-

pyrimidineBenz-

imidazole

Hydantoin 0.954 0.971 0.978

Phenole 0.963 0.973

Pyrimido-pyrimidine 0.987

• Correlation matrix of core groups: (all 7 protein pockets, all attachment points)

Page 23: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

Position X2 X3 X4 All

X1 0.985 0.981 0.988 0.995X2 0.964 0.994 0.995X3 0.967 0.983X4 0.995

• Correlation matrix of attachment points: (hydantoin core, all 7 protein pockets)

Page 24: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

Reduction of descriptor vector length (dimensionality) :

• no PCA was performed, since we want to get information about the most uncorrelated descriptor columns• instead, an elimination method has been applied:

the complete pairwise correlation matrix is calculate all pairs of columns with correlation coefficient (r) above a user-defined threshold (e.g. 0.7) are considered for elimination from each correlating pair, that column is eliminated which can be better described by multiple linear regression of the remaining descriptors resulting matrix doesn‘t contain pairs of columns with correlation coefficient above the threshold

Page 25: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

1100

443

13054201570

200

400

600

800

1000

1200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

r (Threshold)

Des

crip

tor

Vec

tor

Len

gth

Example: hydantoin core, all 7 proteins, all 4 attachment points

Correlation Analysis

Descriptor set 1

Descriptor set 2Descriptor set 3

Page 26: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDCorrelation Analysis

Thrombin with three most information-rich core positions

Page 27: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDDescriptor Validation

• Five peptide datasets, taken from literature

(Refs. in Matter, H., J. Peptide Res. 52 (1998) 305-314)

• Product descriptors are generated by concatenation of

respective reagent descriptors

• Validation by PLS Analysis

• leave-one-out (LOO) and leave-random-groups-out (LRGO)

cross-validation

Page 28: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDDescriptor Validation

• Datasets:

Activity N Peptide length

ACE ACE-Inhibitors 58 2

BIT Bitter-tasting 48 2

BRA Bradykinin- potentiating

29 5

ENK Enkephalin- analogs

19 5

BR9 Bradykinin- analogs

26 9

Page 29: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD

00.10.20.30.40.50.60.70.80.9

1

q2

1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5 1.0 0.7 0.5

r Thresholds

ACE BIT BRA ENK BR9

Leave-random-groups-out (LRGO) results:

Descriptor Validation: Results

Page 30: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDSummary

• Flexsim-R comprises a novel virtual affinity fingerprint method, which calculates meaningful 3D descriptors for reagents

• High correlation between different cores and attachment points

• For 3 out of 5 validation sets, significant cross-validated q2 values could be obtained

• Rgroup alignment problem is tackled inherently

• Flexsim-R calculations are fast and can be automated easily:

• only clipped reagent structures are required

• core positions need to be calculated only once

Page 31: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDOutlook

• More validation sets have to be tested (e.g.

„real-life“ combichem dataset)

• Is there a set of descriptors, which works well

for different datasets?

• Integration in Boehringer Ingelheim library

design and virtual screening workflow

Page 32: 2 nd  Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCDAcknowledgements

• Alexander Weber (Boehringer

Ingelheim/University of Marburg)

• Andreas Teckentrup (Boehringer Ingelheim)

• Hans Matter (Aventis)

• BMBF for financial support