numerical characterization of structure-activity relationships from a medicinal chemists point of...

38
Defining & Using Structure-Activity Landscapes Rajarshi Guha Background Visualization Utilization Predictive Models 3D models Chemical spaces Summary Numerical Characterization of Structure-Activity Relationships from a Medicinal Chemists Point of View Rajarshi Guha School of Informatics Indiana University National Medicinal Chemistry Symposium Pittsburgh, PA 15 th June, 2008

Upload: rguha

Post on 14-Jul-2015

1.108 views

Category:

Education


1 download

TRANSCRIPT

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Numerical Characterization ofStructure-Activity Relationships from a

Medicinal Chemists Point of View

Rajarshi Guha

School of InformaticsIndiana University

National Medicinal Chemistry SymposiumPittsburgh, PA15th June, 2008

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Structure Activity Relationships

Assumptions

I Similar molecules will have similar activities

I Small changes in structure will lead to small changes inactivity

I One implication is that SAR’s are additive

I This is the basis for QSAR modeling

Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Structure Activity Landscapes

Melanocortin-4 receptor inhibitors

N

N

O

F3C

NH2

NH2

Cl Cl

N

N

O

F3C

NH2

NH

Cl Cl

O

Ki = 39.0 nM Ki = 1.8 nM

N

N

O

NH2

NH

Cl Cl

O

F3C

NH2

N

N

O

F3C

NH2

NH

Cl Cl

O NH2

Ki = 10.0 nM Ki = 1.0 nM

Tran, J.A. et al., Bioorg. Med. Chem. Lett., 2007, 15, 5166–5176

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Structure Activity Landscapes

Rugged gorges or rolling hills?

I Small structural changes associated with large activitychanges represent steep slopes in the landscape

I Activity Cliffs

I But traditionally, QSAR assumes gentle slopes

I Machine learning is not very good for special cases

Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Characterizing the Landscape

Converting activity cliffs to numbers

I A cliff can be numerically characterized

I Structure-Activity Landscape Index (SALI)

SALIi ,j =|Ai − Aj |

1− sim(i , j)

I Cliffs are characterized by elements of the matrix withvery large values

Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Visualizing the SALI Matrix

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Visualizing SALI Values

Alternatives?

I A heatmap is an easy to understand visualization

I Coupled with brushing, can be a handy tool

I A more flexible approach is to consider a network view ofthe matrix

The SALI graph

I Compounds are nodes

I Nodes i , j are connected if SALIi ,j > X

I Only display connected nodes

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Visualizing the SALI Graph

●1

●2●3

●4

●5

●6

●7

●8

●9

●10●12

●13

●15

●17

●18

●19

●20 ●21●22

●23●24

●25

●27

●28

●29

●30

●31

●34●35

●37

●38

●39

●40

●41 ●42

●43

●44

●45

●46

●49

●50

●51

●52

●53

●54

●55

●56●57

●58

●59

●60

●62

●63

●64

●65

●66

●68

●69

●71 ●72 ●73

●75

●76

I Nodes are ordered such that the tail node in an edge haslower activity than the head node

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Better Visualization

SALIViewer

I Java application for generating and visualizing SALIgraphs

I Create SALI graphs from SMILES and activity data,using the CDK fingerprints

I Easily examine SALI graphs at different cutoffs

I Provides 2D depictions for nodes and edges

I Generate SALI curves

http://cheminfo.informatics.indiana.edu/~rguha/code/java/salivis

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Better Visualization - SALIViewer

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Varying Fingerprint Methods

BCI 1052 bit

Tanimoto Similarity

Den

sity

0.70 0.75 0.80 0.85 0.90 0.95 1.00

02

46

8MACCS 166 bit

Tanimoto Similarity

Den

sity

0.70 0.75 0.80 0.85 0.90 0.95 1.00

02

46

8

CDK 1024 bit

Tanimoto Similarity

Den

sity

0.6 0.7 0.8 0.9 1.0

02

46

8

I Shorter fingerprints will lead to more “similar” pairs

I Requires a higher cutoff to focus on significant cliffs

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Varying the Similarity Metric

I The similarity metric does not affect the SALI values

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Graphs & Predictive Models

I The graph view allows us to view SAR’s and identifytrends easily

I The aim of a QSAR model is to encode SAR’s

I Traditionally, we consider the quality of a model in termsof RMSE or R2

I But in general, we’re not as interested in RMSE’s as weare in whether the model predicted something as moreactive than something else

I What we want to have is the correct orderingI We assume the model is statistically significant

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Graphs & Predictive Models

Measuring model quality

I A QSAR model should easily encode the “rolling hills”

I A good model captures the most significant cliffs

I Can be formalized as

How many of the edge orderings of a SALIgraph does the model predict correctly?

I Define S(X ), representing the number of edges correctlypredicted for a SALI network at a threshold X

I Repeat for varying X and obtain the SALI curve

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Graphs & Predictive Models

Measuring model quality

I A QSAR model should easily encode the “rolling hills”

I A good model captures the most significant cliffs

I Can be formalized as

How many of the edge orderings of a SALIgraph does the model predict correctly?

I Define S(X ), representing the number of edges correctlypredicted for a SALI network at a threshold X

I Repeat for varying X and obtain the SALI curve

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves - An Example

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

3−descriptor5−descriptorScrambled 3−descriptor

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves & Model Comparison

I Considered four datasets

I Developed linear regression models, using exhaustivesearch for feature selection

I Identify three models for each datasetI Minimum RMSE (“best”)I Median RMSEI Maximum RMSE (“worst”)

I Generate SALI curves for each model and summarize bydataset

Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., submitted

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves & Model Comparison

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

Min−RMSEMedian−RMSEMax−RMSE

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

Min−RMSEMedian−RMSEMax−RMSE

PDGFR (3) Artemisinin (3)

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

Min−RMSEMedian−RMSEMax−RMSE

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)Min−RMSEMedian−RMSEMax−RMSE

Melanocortin (6) Benzodiazepine (5)

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves & Model Comparison

I The initial and final portions ofthe curve are of interest

I It’s also useful to summarizethe whole curve

I We evaluate the area betweenthe curve and the X-axis (SCI)

I -1 ≤ SCI ≤ 1 0.0 0.2 0.4 0.6 0.8 1.0−

1.0

−0.

50.

00.

51.

0

X

S(X

)

SCI = 0.12

SALI Curve Integral

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves & Model Comparison

PDGFR Artemisinin Melanocortin Benzodiazepine

SC

I

−0.

10.

00.

10.

20.

30.

40.

50.

6Min−RMSEMedian−RMSEMax−RMSE

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Examining Any Type of Model . . .

I Previous examples make use of predicted values fromQSAR models

I We can consider any “prediction” that is supposed totrack observed activity

I RanksI Energies

I Allows us to apply this approach to any type ofcomputational model that predicts something

I DockingI CoMFAI Pharmacophore

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Docking & CoMFA Models

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

SCI = 0.95

0.0 0.2 0.4 0.6 0.8 1.0−

1.0

−0.

50.

00.

51.

0

X

S(X

)SCI = 0.99

Docking CoMFA

I Not surprising that 3D models capture more cliffs

I The CoMFA model is nearly perfect!

Holloway, K. et al, J. Med. Chem., 1995, 38, 305–317

Cavalli, A. et al, J. Med. Chem., 2002, 45, 3844–3853

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Comparing Landscapes

I The SALI curve is a function ofI datasetI descriptor space

I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs

I What is the size of the graph as a function of SALIcutoff?

I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible

I Work in progress

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Comparing Landscapes

I The SALI curve is a function ofI datasetI descriptor space

I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs

I What is the size of the graph as a function of SALIcutoff?

I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible

I Work in progress

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Comparing Landscapes

I The SALI curve is a function ofI datasetI descriptor space

I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs

I What is the size of the graph as a function of SALIcutoff?

I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible

I Work in progress

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Comparing Landscapes

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SALI Cutoff

Nor

mal

ized

Num

ber

of C

liffs

FingerprintBest 3−DescriptorWorst 3−Descriptor

A Type 2 SALI curve for the PDGFR dataset, comparing 3different molecular representations

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

What’s Next?

I SALI graphs and curves represent a framework forexploring structure-∗ landscapes

Open questions

I Weighted SALI graphs (ADMET, synthetic feasibility)

I Is it correct to identify cliffs using fingerprints, and thenpredict cliffs using different descriptors?

I Can we use SALI curves to compare 3D and 2Ddescriptor spaces?

I Can we use SCI for feature selection?

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Conclusions

I The SALI is an effective way to numerically encodeactivity cliffs

I The network view of these values allows us to exploreSAR’s in an intuitive way

I Using the SALI curve allows us to compare predictivemodels in a manner that is intuitive for a medicinalchemist

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Acknowledgments

I John Van Drie

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Making Use of the SALI Graph

I A little difficult with a non-interactive graph

I We can investigate a series of transformations thatincrease (or decrease) activity

I Identify two types of SAR’sI BroadI DetailedI Depends on what cutoff we choose

I These correspond somewhat to the continuous anddiscontinuous SAR’s described by Peltason et al.

Peltason, L. et al., J. Med. Chem., 2007, 50, 5571–5578

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

06−20 06−16

06−30 07−12

07−17

06−42 06−24 06−44 07−30

07−38

07−4106−21 06−32 06−41

06−43 07−13 07−14

07−19 07−20 07−23

07−35 07−36 07−37 07−39 07−42

I 62 dihydroquinoline derivatives

I IC50’s reported, some values were censored

I 50% SALI graph generated using 1052 bit BCIfingerprints

Takahashi, H. et al, Bioorg. Med. Chem. Lett., 2007, 17, 5091–5095

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

I Moving from ally or phenylethylto ethyl causes a 6-fold increasein activity

I Reducing bulk at this positionseems to improve activity

I Pretty broad conclusion

I But ethyl is not much smallerthan allyl

I We need more detail

NH

O

O

07-20, 2000 nM

NH

O

O

07-23, 2000 nM

NH

O

O

07-17, 355 nM

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

●06−20

●06−23

●06−16 ●06−36

●06−40●06−41 ●06−30 ●06−31 ●07−12●07−15

●07−17

●07−18

●07−19 ●07−20 ●07−23

●07−22●06−42●06−24

●06−44

●07−30●07−38

●07−41●06−21 ●06−32 ●06−37●06−39

●06−43 ●07−13 ●07−14●07−21●07−25●07−26 ●07−27 ●07−29

●07−35●07−36

●07−37

●07−39

●07−42

I Generated using a 30% cutoff

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

●06−20

●06−23

●06−16 ●06−36

●06−40●06−41 ●06−30 ●06−31 ●07−12●07−15

●07−17

●07−18

●07−19 ●07−20 ●07−23

●07−22●06−42●06−24

●06−44

●07−30●07−38

●07−41●06−21 ●06−32 ●06−37●06−39

●06−43 ●07−13 ●07−14●07−21●07−25●07−26 ●07−27 ●07−29

●07−35●07−36

●07−37

●07−39

●07−42

I Generated using a 30% cutoff

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

NH

O

O

07-15, 2000 nM

I Suggests that electron density isalso important

I Lower π density possibly correlatesto increased activity

I Confirmed by 07-23 → 07-18

I 07-15 → 07-17 is interesting sincethe change increases the bulk

NH

O

O

07-20, 2000 nM

NH

O

O

07-18, 710 nM

NH

O

O

07-17, 355 nM

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

Glucocorticoid Inhibitors

I These observations match those made by Takahashiet al.

I More detailed graphs exhibit longer paths that focus onthe bulk of side chains at the C4-α position

I A number of paths consider changes to the epoxidesubstitution

I Usually of length 1I Highlights the fact that bulk at the C4 α has greater

impact on activity than epoxide substitutions

I The SALI graph stresses the non-linearity of the SAR

Defining & UsingStructure-Activity

Landscapes

Rajarshi Guha

Background

Visualization

Utilization

Predictive Models

3D models

Chemical spaces

Summary

SALI Curves - Control Experiments

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

X

S(X

)

No noise1050100200

Scrambling

I Scramble the Y-variable andrebuild the model

I Evaluate the SALI curve

I Repeat 50 times and takethe mean of the counts for agiven cutoff

Noise

I Add uniform noise to eachdescriptor, rebuild the model

I We expect little variation inthe plateau