presentation of a structurally diverse and commercially available drug data set for correlation and...

23
Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University O HO HO O NH 2 NH 2 H 2 N NH 2 OH O O HO OH O O HO OH NH 2 H CH 2 NH 2 H3C HO CH 3 CH3 O CH3 CH3 CH3 CH3 CH3 H2N SH O P P O HO HO HO HO OH NH 2 CH3 O H 3 C CH 3 O N O OH N H O O N O O H CH3 H3C H3C CH3 S N H3C CH3 O O

Upload: karley-leuty

Post on 31-Mar-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies

Anders KarlénUppsala University

OHOHO

O

NH2

NH2

H2N

NH2OHO

OHO

OH

OO

HOOH

NH2H CH2NH2

H3C

HO

CH3

CH3

O CH3CH3 CH3

CH3

CH3

H2NSH

O

P

P

O

HOHO

HO

HO OH

NH2CH3

O

H3C CH3

O

NO

OHNH

O

O

N

O

O

H

CH3H3C

H3C

CH3S

NH3C

CH3

OO

Page 2: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Aim of study

• Derive a “benchmark data set“– Drug-like– Physicochemically diverse – Commercially available and inexpensive– Amenable to analytical measurements

• Start the generation of benchmark data– Derive good-quality data from the same

lab

Page 3: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Possible use of the data set

• General description of drugs• Developing ADME/TOX filters

(permeability, solubility, plasma protein binding etc.)

• To validate novel experimental techniques

Page 4: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Generation of a ”benchmark” data set based on the list of drugs in Sweden (FASS 2001)

691 cpds

Remove compounds•Molecular weight >900•Polymers, polypeptides•Inorganic and metal containing

799 cpds 370 cpds

Select commercially available< $800/g

332 cpds

•Select only oral, nasal, pulminal, ocular, parenteral and rectal administered drugs

284 cpds

Remove “odd” ATC classese.g. A01(Mouth and teeth),A05(Bile acids)A06 (Laxative)…

Exp.design

24-compound data set

450

Page 5: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Cost and availability of the 691-compound data set

Histogram

Binned Price/gram ($)0.0284 - 24.9 24.9 - 50.2 50.2 - 79.6 79.6 - 100 100 - 995 995 - 3228000

50

100

150

200

450 of the 691 compounds can be boughtPrice range $0.03/gram - $3,228 000/gram (2001)

NN

N

N

Methenamine

HO

CH2

OH

H

H3C

CH3

OH

CH3

Calcitrol

Back0.03 -24.9 24.9 – 50.2 50.2 – 79.6 79.6 – 100 100 – 995 995 – 3,228 000

Page 6: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16SIMCA-P 11 - 2006-11-01 16:08:45

Principal component analysis

Lipophilicity

Size

Polarity

• General descriptors

• General hydrogen bonding descriptors

• Hydrogen bond donor descriptors

• Hydrogen bond acceptor descriptors

28 molecular descriptors

Page 7: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Principal component analysis

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16

t[2]

t[1]

Series (Variable MOL_WEIGHT)

0 - 200200 - 400400 - 600600 - 800800 - 1000

SIMCA-P+ 11 - 2006-11-10 10:27:53

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16

t[2]

t[1]

Series (Variable MLOGP)

-7 - -4-4 - -1-1 - 22 - 55 - 8

SIMCA-P+ 11 - 2006-11-10 10:32:21

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16

t[2]

t[1]

Series (Variable PSASAVOL)

0 - 100100 - 200200 - 300300 - 400

SIMCA-P+ 11 - 2006-11-10 10:34:12

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16SIMCA-P 11 - 2006-11-01 16:08:45

Polarity

SizeLipophilicity

Page 8: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

The factorial design“A face-centered central composite

design”

+ + -

+ - -

+ - +

- + -

+ + +- + +

- - +

- - -

Page 9: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

20 proteolytes4 nonproteolytes

NSH

COOHO

H2N-SO2

F3C

S

NH

NH

OO

SNH

NH

O

NH

O

N

N

OO

COOHOH

OHNH2

H

COOH

NH2

H

O

I

I

OH

I

I

N

NSH

NH2 F

HOOC

S

ON

NCl

NH2 NH2

NH

NH2

O NH

N

NH2O

Cl

S

N

S

NH

NHH2N-SO2

Cl

OO

NH

OO

Cl

O

OH

O

H H

H

OHO

SNN

NO2

OO

CF3

NOH

SN

Cl

NH2

NH

O

N

O

Cl OO

O

O

O OHOH

O

OHHH

OH

OH

O

NH2

N

N

N N

N

NH2

OH

NH

NH

O H COOH

COOH

ONH

O

ONH2

O

N

NCl

OH

N

OH

OH

O

O

O O

O

OHOH

O

OH

O OH

N

O

Captopril ()

Bendroflumethiazidea ( )

Glipizide ( )

Levodopa ()

Levothyroxine ( )

Thiamazole ( )

Amantadine ( )

Sulindac ( )

Amiloride ()

Carbamazepine ( )

Chlorprothixene ( )

Hydrochlorothiazide ( )

Chlorzoxazone ( )

Prednisone ()

Tinidazole ( )

Flupenthixol ( )

Metoclopramide ()

Fenofibrate ( )

Tetracycline ()

Folic acid ( )

Carisoprodola ()

Meclizinea ( )

Terfenadineb ( )

Erythromycin ( )

24-compound data set

The cost of buying the entire data set (at least 1 gram of each compound) is less than $1,500

Page 10: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Comparison of the data sets with respect

to some common molecular descriptors 691-compound data set 24-compound data set

Min Max Mean Min Max Mean

MW 60 854 347 114 777 349

PSA 0 373 93 8 246 99

logPMor 6.4 7.6 1.9 2.0 5.3 1.9

logDACD_6.5 10.6 12.3 0.74 5.0 4.8 0.94

HBD 0 19 2.4 0 8 2.7

HBA 0 19 4.9 1 14 4.7

OHOHO

O

NH2

NH2

H2N

NH2OHO

OHO

OH

OO

HOOH

NH2H CH2NH2

N

NO

CH3

O ON

NN

NHO CH3O

O

Candesartan cilexetillogPMor= 7.6

NeomycinHBD = 19

Page 11: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Comparison of the data sets with respect to functional groups

0,00%

25,00%

50,00%

75,00%

ALIPHATI

C q-A

MIN

E

ALIPHATI

C t-AM

INE

ALIPHATI

C s-AM

INE

ALIPHATI

C p-A

MIN

E

COOH

BENZENE

ALIPHATI

C OH

AROMATIC

t-AMIN

E

AROMATIC

s-AMIN

E

AROMATIC

p-A

MINE

AROMATIC

OH

ESTER

HETEROCYCLIC

Functional group

Pe

rce

nt

of

co

mp

ou

nd

s c

on

tain

ing

th

e f

un

cti

on

al g

rou

p

24-set

FASS (druglike only)691- set

Page 12: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Number of substances Percent of dataset

ATC Description 24-set 691-set 24-set 691-setA GI 1 69 4,2% 9,99%B Blood 0 21 0,0% 3,04%C Cardio 2 89 8,3% 12,88%D Topical 0 36 0,0% 5,21%G Gen.hormones 1 38 4,2% 5,50%H Hormones 3 14 12,5% 2,03%J Infection 5 89 20,8% 12,88%L Tum.,immuno 1 53 4,2% 7,67%M Muscle,mov. 3 37 12,5% 5,35%N Nervous 6 134 25,0% 19,39%P Antiparasite 0 13 0,0% 1,88%R Respiration 1 52 4,2% 7,53%S Eye,ear 1 24 4,2% 3,47%V Various 0 22 0,0% 3,18%

Distribution in ATC

Comparison of the data sets with respect to ATC classes

The Anatomical Therapeutic Chemical (ATC) classification system is the most commonly used classification system for drug substances

Page 13: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Start the generation of benchmark data.Derive good-quality data from the same

lab

1. Measurment of pKa by pH-metric or pH-UV technique (n=20)

2. Measurment of lipophilicity(a) pH-metric logP (n=18)(b) capacity factors by RP-HPLC (n=21)

3. Measurment of intrinsic and kinetic solubility pH-metric solubility (CheqSol technique) or shake-plate solubility (n=17)

4. Measurment of permeability across Caco-2 Cells. A to B direction (n=22)

Page 14: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

2. LipophilicitypH-metric measurment of logP and logD

-3,00

-2,00

-1,00

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

Aman

tadine

Amilo

ride

Bendr

oflum

ethia

zide

Capto

pril

Chlorp

roth

ixene

Chlorz

oxaz

one

Erythr

omyc

in

Fenof

ibrat

e

Flupen

thixo

l

Glipizi

de

Hydro

chlor

othia

zide

Levo

dopa

Levo

thyr

oxine

Mec

lizine

Met

oclop

ram

ide

Sulind

ac

Terfen

adine

Tetrac

yclin

e

Thiam

azole

Tinida

zole

Series1

Series2logP (neutral)logD (pH 7.4)

logP missing for;•Folic acid•Carbamazepin•Prednisone•Carisoprodol

Page 15: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

2. LipophilicityExperimental logP vs calculated logP

R2 = 0,70

-4,0

-2,0

0,0

2,0

4,0

6,0

8,0

-2,0 0,0 2,0 4,0 6,0 8,0

logPexp

log

Pcr

ip

Crippen logP

R2 = 0,88

-4,0

-2,0

0,0

2,0

4,0

6,0

8,0

-2,0 0,0 2,0 4,0 6,0 8,0logPexp

log

PA

CD

ACD/LogP

R2 = 0,89

-4,0

-2,0

0,0

2,0

4,0

6,0

8,0

-2,0 0,0 2,0 4,0 6,0 8,0

logPexp

log

PC

log

P

ClogP (BioByte)

R2 = 0,80

-3,0

-2,0

-1,0

0,0

1,0

2,0

3,0

4,0

5,0

6,0

-2,0 0,0 2,0 4,0 6,0 8,0

logPexp

log

PM

or

Moriguchi logP

Page 16: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

2. LipophilicityCorrelation between the measured HPLC

capacity factor (k) and pH-metric logD (pH 6.8)•Compounds from the 8 corner points have different colors

•The 2 compounds at each corner point have the same color

•The axis points are colored black

•Center point pink

R2 = 0.92

(pH=6.8)

Page 17: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

3. SolubilityMeasurment of intrinsic solubility using CheqSol

(24-compound data set)

Log

(g

/mL

)

-3,0

-2,0

-1,0

0,0

1,0

2,0

3,0

4,0

Terfena

dine

Mecli

zine

Chlorpro

thixe

ne

Fenofib

rate

Glipizi

de

Folic A

cid

Sulinda

c

Bendro

flum

ethiazid

e

Levo

thyr

oxine

Flupe

nthixo

l

Meto

clopr

amide

Carbam

azepin

e

Prednis

one

Tetracy

cline

Hydro

chlor

othiaz

ide

Chlorzoxa

zone

Aman

tadin

e

names

Solubility ranges from 0.009 g/ml to 2119 g/ml

Page 18: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

3. Solubility

http://www.cheqsol.com/download%20files/download01.pdf

19 of the compounds studied also present in the 691-compound data set

CheqSol solubility ranges from 0.9 g/mL to 3500 g/mL in these 19 compounds

Compound not present in the 691 data set

Kinetic Solubility

Kinetic Solubility

CheqSol Shake-Flask Literature Chaser non-chaser

1 Phthalic Acid 5330 5950 8462

2 Quinine 363 201 491 391

3 Trazodone 134.6 138.0 435

4 Nitrofurantoin 112.5 109.5 78.9 319

5 Nortriptyline 27.0 49.3 20.0 27.3

6 Verapamil 48.5 48.5 9.7 47.8

7 Niflumic Acid 9.53 29.5 59

8 Imipramine 17.2 21.7 18.1 17.3

9 Flumequine 34.2 20.7 121

10 Furosemide 19.7 20.4 5.9 96

11 Maprotiline 5.80 8.05 3.49 77

12 Piroxicam 5.92 5.95 3.16 233

13 Warfarin 5.30 5.25 5.60 120

14 Chlorpromazine 2.70 2.41 1.71 2.70

15 Lidocaine 3500 3810 4600

16 Famotidine 740 1100 5900

17 Hydrochlorothiazide 630 700 2400

18 Chlorpheniramine 608.3 615.2 668

19 Sulfamerazine 200.3 203.0 701

20 Ketoprofen 130.6 178.0 336

21 Propranolol 81.0 70.0 340

22 Ibuprofen 50.0 49.0 180

23 Pindolol 41.7 32.7 1424

24 Miconazole 1.00 0.67

25 Diclofenac 0.90 0.80 45

26 Amodiaquin 0.41 8.8

27 Pamoic acid 0.0003 0.019

All results in µg/mL

Name Equilibrium solubility

In the 24-compound data set the solubility ranges from 0.009 g/ml to 2119 g/ml

Page 19: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

24-compound data set is structurally diverse

-10

-8

-6

-4

-2

0

2

4

6

8

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

t[2]

t[1]

No ClassClass 1Class 2

SIMCA-P+ 11 - 2006-11-10 14:05:50

-10

-8

-6

-4

-2

0

2

4

6

8

-8 -6 -4 -2 0 2 4 6 8 10 12 14 16SIMCA-P 11 - 2006-11-01 16:08:45

Polarity

SizeLipophilicity

No class19-data set24-data set

Page 20: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

0.01

0.1

1

10

100

0.01 0.1 1 10 100 1000

Caco-2 permeability (x 10-6 cm/s) at pH 6.5

Hu

ma

n j

eju

nu

m p

erm

eab

ilit

y (x

10

-4 c

m/s

) at

pH

6.5 Furosemide

Hydrochlorothiazide

Atenolol

Cimetidine

Manni tol

Terbutaline

Amoxi ci l l i n (C)

Lisinopril(C)

Metoprolol

Cephalexin (C)

Enalapril (C)

Propranolol

Phenylalanine (C)

Desipramine

Antipyrine

Piroxi cam

Verapamil (C)

Ketoprofen

Naproxen

D-Glucose (C)

logY = 0.6532 logX - 0.3036, R2 = 0.7276 (all drugs)logY = 0.7524 logX - 0.5441, R2 = 0.8492 (passively diffusive)LogY = 0.542LogX + 0.06, R2 = 0.7854 (Carrier-mediated)

Sun, D. et al. Comparison of Human and Caco 2 Gene Expression Profiles for 12,000 Genes and the Permeabilities of 26 Drugs in the Human Intestine and Caco 2 Cells. Pharm Res 2002, 19, 1398-1413

4. Permeability/absorption

Page 21: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Low

Med

ium

Hig

h

4. Permeability/absorption In vitro Papp values in human Caco-2 cells

Page 22: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Suggestions on the ”Uppsala diverse data set” usage

• The 24 compounds can be used– as a test set for testing already derived models of permeability,

lipophilicity, solubility etc.– as a validation set for new experimental techniques– on its own for building and validating models by dividing it into a

training set and a test set

We hope that other groups are willing to help us to supplement the herein-started characterization

”Bench mark data set”

J. Med. Chem.; (ASAP); 2006; 49(23); 6660-6671

Page 23: Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University

Acknowledgements

AstraZeneca R&D MölndalSusanne Winiwarter Anna-Lena UngellJohan WernevikFredrik BergströmLeif Engström

Sirius Analytical Instruments LtdJohn Comer Karl BoxRuth Allen Jon Mole

Faculty of Pharmacy Uppsala UniversityChristian SköldTorbjörn LundstedtAnders HallbergHans Lennernäs