application of chemometric and qsar approaches...

29
Application of Chemometric and QSAR Approaches to Scoring Ligand Receptor Binding Affinity Alexander Tropsha Laboratory for Molecular Modeling University of North Carolina at Chapel Hill Collaborators: C. Breneman, W. Deng, N. Sukumar (RPI); A. Golbraikh, J. Feng, Y.D.Xiao (UNC)

Upload: others

Post on 08-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Application of Chemometric and QSAR Approaches to Scoring Ligand

Receptor Binding Affinity

Alexander TropshaLaboratory for Molecular Modeling

University of North Carolina at Chapel Hill

Collaborators: C. Breneman, W. Deng, N. Sukumar (RPI);A. Golbraikh, J. Feng, Y.D.Xiao (UNC)

Page 2: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

OUTLINE

• Representation and scoring of ligand-receptor interactions in the context of structure-based design

• Chemometric and QSAR Approaches to Scoring– Complementarity between ligands and active sites in

multidimensional descriptor space– QSAR modeling of diverse ligands of different

receptors– Database mining

Page 3: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Scoring function is the crucial component of structure-based drug

designDatabase Compounds

Complexes

Docking methods

Scoring function

Candidate Compounds

Page 4: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Molecular Representations and Scoring Functions for Ligand-Receptor Interaction

Chemical similarity, QSAR (QSBR)

Multiple Chemical Descriptors

Statistical, or knowledge-based (PMF, SMoG, BLEEP)

Chemical atom types

Empirical estimates of intermolecular interaction terms (VALIDATE, SCORE)

Molecular mechanics

Force-Field basedMolecular mechanics

Scoring functionRepresentation

Page 5: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Representation of Ligand-Receptor Interface in Chemometric Space

• Both ligands and active sites are represented independently by multidimensional chemical descriptors

• Chemical similarity measures are used to establish complementarity between ligands and their receptors

• QSAR approaches are used to establish correlations between descriptors of ligands and their binding affinity to diverse receptors

Page 6: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

PMF test set: 68 compounds (Muegge & Martin,

J.Med.Chem.1999,42,791-804)

Serine Proteases: 16Metalloproteases: 15L-arabinose binding protein: 9Endothiapepsin: 11Other proteins: 17

Aspartic Proteases: 18 complexesSerine Proteases: 20Metalloproteases: 22Human Carbonic Anhydrase II (HCA, 19 Complexes) Sugar-Binding Proteins (14 Complexes)Endothiapepsin (11 Complexes)Purine Nucleoside Phosphatase(PNP, 5 Complexes) Other Proteins (10 Complexes)

SMoG2001 test set: 111 compounds (Ishchenko & Shakhnovich

J. Med. Chem. 2002, 45, 2770-2780)

Datasets

Page 7: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

SCORING…BUT NO DOCKING: QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS:

MOLECULAR DIVERSITY

Page 8: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Transferable Atom Equivalents (TAE Descriptors)*

uA library of atomic charge density fragments has been built in a form that allows for the rapid retrieval of the fragments and molecular assembly.

uAtoms in the library correspond to structurally distinct atom types

uAssociated with each atomic charge density fragment in the TAE library are a set of descriptors.

uThe atomic descriptors in the TAE library are calculated ab initio at a high level of theory (HF 6-31+G*)

*e.g., Breneman et al, "Electron Density Modeling of Large Systems Using the Transferable Atom Equivalent Method" Computers & Chemistry, 1995, 19(3), 161.

Page 9: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Mapping of PMF dataset from high-dimensional to 2D TAE Descriptor Space

using SVD/TNPACK (Xie, Tropsha, Schlick. JCICS, 1999, 2000; 167-177)

Projections Based on Ligand-TAE

-4

-3

-2

-1

0

1

2

3

4

0 2 4 6 8

TNPack1

TN

Pack2

Ki < -45.0 -45.0 < Ki < -35.0-35.0 < Ki < -25.0 Ki > -25.0

Projections Based on Protein-TAE

-2.5-2

-1.5-1

-0.50

0.5

11.5

2

2.53

0 2 4 6 8 10TNPack1

TN

Pa

ck

2

Ki < -45.0 -45.0 < Ki < -35.0

-35.0 < Ki < -25.0 Ki > -25.0

Page 10: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Algorithm to establish complementarity between ligands and their receptors in multidimensional TAE descriptor

space using variable selection

Calculate pairwise distances in selected subspace based on protein TAE

Randomly Select a Subset of MMvarvar Descriptors

Calculate pairwise distances in selected subspace based on ligand TAE

Calculate r2 in selected descriptor subspace

SA

Build correlations between corresponding pairwise distances

Page 11: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

RR vs. LL Distances in the Entire TAE Space (2016 points)

Ligand-Protein Distance Pairs

y = 1.0771xR2 = 0.5028

0

1

2

3

4

5

6

7

8

0 2 4 6 8

LL-Distances

RR-D

ista

nces

Page 12: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

LL vs RR Distances in Selected Descriptors Space of 20 TAE

descriptors (2016 points)Ligand-Protein Distance Pairs After Variable

Selection (Nselected=10)

y = 1.2244xR2 = 0.7059

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5

LL

RR

Page 13: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Most frequently occurring TAE descriptors in 35 best models

05

101520253035

EP2PIP

13

SIEPA

2

SIEPA

3

Del(G)N

A5

Del(K)N

A4 Lapl5SIK

A3PIP

11SIK

A2 EP3Fu

k5La

pl6

Del(K)M

ax EP4

PIP6

SIEPA

4PIP

3

Del(G)N

A3

Del(Rho

)NA3

Fre

qu

ency

Page 14: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Chemical meaning of selected descriptors

• Electrostatic interactions, induced-dipole interactions and hydrogen bonding

• Hydrophobic surface area

Page 15: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS

OBJECTIVE:

BUILD QSAR MODELS USING INFORMATION ABOUT LIGAND STRUCTURES ONLY

Page 16: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS: COMPUTATIONAL PROCEDURE

59 COMPOUNDS MOLCONNZ DESCRIPTORS

NORMALIZATION:

Xij and are the non-normalized and normalized j-th descriptor values for compound i, Xj,min and Xj,max are the minimum and maximum values for j-thdescriptor

kNN QSAR:FOR EACH TRAINING SETnvar 10 to 50 with step 4 (11 values)10 models for each nvar value

min,max,

min,

jj

jijnij XX

XXX

−=

nijX

SELECTION OF TRAINING AND TEST SETS

MODEL VALIDATION:Prediction of binding energies for the test set compounds. Comparison of predicted and observed binding energies.

Page 17: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

2D Descriptors2D DescriptorsSet of descriptors includes topological features of compounds

Hydrogen - depleted molecular graph and vertex degrees ai

1

1

2

13

Molecular connectivity indices

∑=

−=N

iia

1

5.00 )(χ ∑ −=edges All

5.01 )(21 ii aaχ ∑ −− =

subgraphs edge- 1)-(n All

5.01 )...(21 viii

n aaaχ

EXAMPLE: Molecular connectivity indices

Molecular shape indices

Wiener number

Information indices

Platt number

Graph radius and diameter

etc.

The number of descriptors exceeds the number of compounds

Hall, L. H.; Kier, L. B. In Reviews in Computational Chemistry II, VCH, 1991, 367-422.

Page 18: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

VARIABLE SELECTION kNN QSAR

Randomly select a subset of descriptors

Select the best QSAR model for nvar and K

SIM

UL

AT

ED

AN

NE

AL

ING

LEAVE-ONE-OUT CROSS-VALIDATION

Exclude a compound

Predict activity y of the excluded compound as the weighted average of activities of 1 to K nearest neighbors

Calculate the predictive ability (q2) of the “model”

Modify descriptor subset

∑∑

−=

neighborsnearest

neihborsnearest

)exp(

)exp(ˆ

i

ii

d

dyy

∑∑

−−= 2

22

)(

)ˆ(1

i

ii

yy

yyq

Zheng, W. and Tropsha, A. JCICS., 2000; 40; 185-194

Page 19: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

y = 0.9383xR 0

2 = -3.3825

y = 0.3154x + 3.4908R2 = 0.9778

0

2

4

6

8

10

0 2 4 6 8 10

Predicted

Ob

serv

ed

y = 1.0023xR 0

2 = 0.5238

y = 3.1007x - 10.715R

2 = 0.9778

0

2

4

6

8

10

0 2 4 6 8 10

Observed

Pre

dic

ted

∑∑∑

−−

−−=

22 )~~()(

)~~)((

yyyy

yyyyR

ii

ii

yky r ~0 =

2

220 )~~(

)~(1

0

yy

yyR

i

rii

−−=

∑∑

2

220 )(

)~(1'

0

yy

yyR

i

rii

−−

−=∑∑

''~ byay r +=yky r '~ 0 =

PRACTICING SAFE QSAR

Correlation coefficient

2~~

i

ii

yyy

k∑∑=

2)~~()~~)((

yyyyyy

ai

ii

−−−

=∑

yayb ~−=

y = 1.2458x - 1.8812

R2 = 0.8604

y = 0.9796x

R02 = 0.8209

3

5

7

9

3 5 7 9

Predicted

Ob

serv

ed

2)(

)~~)(('

yy

yyyya

i

ii

−−−

=∑

yayb '~' −=2

~'

i

ii

y

yyk

∑∑=

Coefficients of determinationRegressionRegression through

the origin

CRITERIA

22'0

20

'

22

or ;0.1or

;6.0;5.0

RRRkk

Rq

≈≈

>>

19

Page 20: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF PMF DATASET: RESULTS

59 COMPOUNDS

TRAINING SET: 50TEST SET: 9

TRAINING SET: 46TEST SET: 13

TRAINING SET: 39TEST SET: 20

q2=0.737R2

pred=0.850R0

2=0.847k=1.137

S.E.=0.479F=39.58

q2=0.510R2

pred=0.718R0

2=0.716k=0.850

S.E.=0.653F=45.74

q2=0.584R2

pred=0.757R0

2=0.743k=1.108

S.E.=0.509F=34.22

Page 21: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES

TRAINING SET: 50TEST SET: 9

TRAINING SET: 46TEST SET: 13

y = 1.0183xR2 = 0.7393

y = 0.8551xR2 = 0.841

y = 1.0198x + 0.0637R2 = 0.7393

y = 0.7835x - 2.914R2 = 0.8497

-80

-60

-40

-20

0

-80 -60 -40 -20 0

Predicted

Obs

erve

d

y = 1.0391xR2 = 0.5978

y = 0.8586x - 7.3812R2 = 0.6287

y = 1.1078xR2 = 0.7436

y = 0.9911x - 3.8447R2 = 0.7568

-80

-60

-40

-20

0

-80 -60 -40 -20 0

Predicted

Obs

erve

d

Page 22: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS

111 RECEPTOR-LIGAND COMPLEXES

111 LIGANDS

ELIMINATION OF IDENTICAL COMPOUNDS

95 COMPOUNDS

Page 23: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

TRAINING SET: 88TEST SET: 7

TRAINING SET: 67TEST SET: 28

TRAINING SET: 44TEST SET: 51

QSAR ANALYSIS OF SMoG2001 dataset: BEWARE OF

q2!

0.2

0.4

0.6

0.8

1

0.5 0.6 0.7 0.8

q2

R2

0.1

0.3

0.5

0.7

0.9

0.4 0.5 0.6 0.7 0.8

q2

R2

0.2

0.4

0.6

0.8

0.4 0.5 0.6 0.7

q2

R2

Page 24: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: RESULTS

95 COMPOUNDS

64.31.750.960.610.610.5043526

72.12.930.970.660.670.5735605

70.72.040.970.570.570.5451447

61.42.830.950.680.700.6728674

88.31.980.970.780.820.6021743

57.51.520.950.800.840.6313822

122.80.500.950.960.960.677881

Fs2kR02R2q2Test setTraining setN

Page 25: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES

TRAINING SET: 88TEST SET: 7

TRAINING SET: 67TEST SET: 28

y = 1.2418x + 1.9386R2 = 0.6938

y = 0.9979xR2 = 0.6662

y = 1.1454x + 1.5642R2 = 0.7024

y = 0.9467xR0

2 = 0.6794

-16

-14

-12

-10

-8

-6

-4

-2

0

-14 -12 -10 -8 -6 -4 -2 0

Predicted log(Kd)

Ob

se

rve

d l

og

(Kd)

y = 0.9733x - 0.1802R2 = 0.673

y = 1.0051x + 0.3972R2 = 0.9609

y = 0.9528xR0

2 = 0.9577

y = 0.9958xR0

2 = 0.6726

-16

-14

-12

-10

-8

-6

-4

-2

0

-14 -12 -10 -8 -6 -4 -2 0

Predicted log(Kd)

Ob

se

rve

d l

og

(Kd)

Page 26: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

TRAINING SET: 44TEST SET: 51

QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES

y = 1.123x + 0.8693R2 = 0.5464

y = 1.0115xR0

2 = 0.5408

y = 0.8957x - 0.5852R2 = 0.5731

y = 0.9691xR0

2 = 0.5689

-16

-14

-12

-10

-8

-6

-4

-2

0

-14 -12 -10 -8 -6 -4 -2 0

Predicted log(Kd)

Obs

erve

d lo

g(K

d)

Page 27: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

Activity randomization to assess model robustness

Struc.1

Struc.2

Struc.n

..

..

..

..

..

..

Pro.1

Struc.3............

Pro.2

Pro.3

Pro.n

Struc.1

Struc.2

Struc.n

..

..

..

..

..

..

Pro.1

Struc.3

..

..

..

..

..

..

Pro.2

Pro.3

Pro.n

None of the random models was acceptable

Page 28: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

CONCLUSIONS• Chemometric and QSAR approaches to representation

and scoring of protein ligand interaction provide an important supplement to existing structure based methods

• Based on the description of active site atoms one can search for complementary ligands in chemical databases

• Ligand affinity can be predicted from multi-target QSAR models of ligands in the absence of the receptor structure

• (Optimal) binding free energy is defined by ligand structure alone?

Page 29: Application of Chemometric and QSAR Approaches …acscinf.org/docs/meetings/224nm/presentations/224nm25.pdfApplication of Chemometric and QSAR Approaches to Scoring Ligand Receptor

ACKNOWLEDGMENTS• UNC associates

Former:– Weifan ZHENG– Sung Jin CHO– Xin Chen– Stephen CAMMER

• Collaborators:– C. Breneman (RPI)– D. Bonchev (Texas A&M)– R. Hormann (R&H)– C. Reynolds

R&HàOrtho)– V. Gombar (GSK)– E. Gilford (Pfizer)

• Funding– NIH– Millenium– Rohm & Haas– Ortho McNail– GSK– Pfizer

QSAR group:

– Alex GOLBRAIKH

– Yun-De XIAO– Min SHEN– Scott OLOFF– Yuanyuan QIAO

Protein folding group:

–John GRIER

–Jun FENG–Shuxing ZHANG

–David BOSTICK

–Bala KRISHNAMOORTHY

–Ruchir SHAH

–Sagar KHARE