assessing the chemical diversity of an hsp90 database

lable at ScienceDirect

European Journal of Medicinal Chemistry 45 (2010) 2000–2009

Contents lists avai

European Journal of Medicinal Chemistry

journal homepage: ht tp: / /www.elsevier .com/locate/e jmech

Original article

Assessing the chemical diversity of an hsp90 database

Davide Audisio a,b,*, Samir Messaoudi a, Ismail Ijjaali b, Elodie Dubus b, François Petitet b,Jean-François Peyrat a, Jean-Daniel Brion a, Mouad Alami a,*

a Univ Paris-Sud, CNRS, BioCIS UMR 8076, Laboratoire de Chimie Therapeutique, Faculte de Pharmacie, 5 rue J.-B. Clement, Chatenay-Malabry F-92296, Franceb Aureus Pharma, 174 Quai de Jemmapes, Paris F-75010, France

a r t i c l e i n f o

Article history:Received 19 October 2009Received in revised form18 January 2010Accepted 20 January 2010Available online 28 January 2010

Keywords:Heat shock protein 90hsp90 inhibitorsMolecular descriptorsChemical fragmentsChemical diversity

Abbreviations: Akt, protein kinase B (also PKB); Hefactor receptor 2 (also ErbB-2); c-Met, hepatocyte grFlt3, FMS-like receptor tyrosine kinase (also CD135); Rthreonine protein kinase; Bcr–Abl, an oncogene fusand Abl.

* Corresponding authors. Univ Paris-Sud, CNRS, BioChimie Therapeutique, Faculte de Pharmacie, 5 rue J.-BF-92296, France. Tel.: þ33 1 46 83 58 87/53 92; fax:

E-mail addresses: [email protected] (D. Aufr (M. Alami).

0223-5234/$ – see front matter � 2010 Elsevier Masdoi:10.1016/j.ejmech.2010.01.048

a b s t r a c t

The 90-kDa heat shock protein (hsp90) has emerged as a new, promising target for cancer drugdiscovery. With the simultaneous disruption of a large range of oncogenic pathways, hsp90 inhibitionresults in either cytostasis or cell death. Diverse inhibitors of this molecular chaperone are currentlyunder intensive study, and several have reached clinical trials. In the present work, patented andpublished structure–activity relationships on hsp90 inhibitors were organised in a database format thatassociates chemical structures with their biological activities.

This hsp90 database contains 814 unique structures and, to our knowledge, is the most complete everreported. With the aim to provide a general overview and evaluation of the chemical diversity of theligands included in the dataset, a two-dimensional analysis was performed. A set of twenty-five topo-logical molecular descriptors was calculated, allowing the emphasis of those that have higher importancefor hsp90 active compounds, and for the three chemical scaffold families, geldanamycins, purines andpyrazole–isoxazoles. We have used a principal-component analysis (PCA) computational approach toanalyse the 2D descriptor space of active and non-active hsp90 ligands. Furthermore, a fragment-basedstudy highlighted the most frequently moieties represented in the active purine and pyrazole–isoxazolederivatives that are likely to be responsible for the observed biological activities.

� 2010 Elsevier Masson SAS. All rights reserved.

1. Introduction

Heat shock protein 90 is a molecular chaperone that plays a keyrole for protein regulation in cells, such as protecting proteinsagainst aggregation, assisting refolding of damaged proteins,facilitating the folding of nascent proteins and, in the case of acutemisfolding and aggregation, targeting proteins to degradation bythe proteasomal pathway [1]. In the past few years, the number ofknown client proteins chaperoned by hsp90 has grown rapidly, andwe can now count more than 200 examples [2]. A number of themare oncoproteins, including the kinases Akt, Her-2, c-Met, Flt3, Raf1,and Bcr-Abl, steroid hormone receptors, and mutated forms of the

r-2, human epidermal growthowth factor receptor (HGFR);af1, a proto-oncogene serine/

ion protein consisting of Bcr

CIS UMR 8076, Laboratoire de. Clement, Chatenay-Malabryþ33 1 46 83 58 28.disio), mouad.alami@u-psud.

son SAS. All rights reserved.

tumour suppressor p53 [2,3]. Interestingly, hsp90 client proteinsare involved in all six hallmarks of cancer, as defined by Hanahanand Weinberg [4]. Moreover, it has been observed that cancer cellsare significantly more sensitive to chaperone inhibitors, and hsp90is present in these cells in an activated high-affinity conformation[5]. These features make hsp90 an exciting target in cancer drugdiscovery [6].

Under non-stress conditions, hsp90 quaternary structure is wellestablished to be a dimeric complex [7]. Each monomer consists ofthree well-conserved structural domains: (i) the N-terminaldomain (NTD), involved in nucleotide and inhibitor binding (e.g.geldanamycin, radicicol, purines, pyrazole–isoxazole and theirderivatives); (ii) the middle domain (MD), involved in the bindingof both co-chaperones and client proteins; and (iii) the C-terminaldomain (CTD), implicated in dimerisation processes [7]. This lastdomain (CTD) is suspected of containing a second nucleotide-binding site inhibited by novobiocin, but its crystal structure hasyet to be solved [8].

Two natural products, geldanamycin (GA) and radicicol (RD),were studied as first inhibitors of the chaperone (Fig. 1), but limi-tations due to liver toxicity for GA, and limited in vivo stability forRD, have precluded their use as drugs [9,10]. Two promisingderivatives of GA, 17-allylamino-17-demethoxygeldanamycin

mailto:[email protected]



www.sciencedirect.com/science/journal/02235234

http://www.elsevier.com/locate/ejmech

N

N N

N

Cl

H2N

N

MeOMe

Me

BIIB021

N

HN

HO

HO

Cl

OMe

EtHN

O

VER-49009

O

R

N

H

Me

O

O

R = -OCH3

HO

OH

Cl

O

O

Me

O

O

Radicicol

O

Me

OTsH

N

O

O

O

HO

Me

Me

4-TCNA

Me

Me

MeO

Me

OH

OCONH2

MeO

Geldanamycine

17-AAG R = -NHCH2CH=CH

2

17-DMAG R = -NHCH2CH

2N(CH

3)2

Fig. 1. Structures of known hsp90 inhibitors.

D. Audisio et al. / European Journal of Medicinal Chemistry 45 (2010) 2000–2009 2001

(17-AAG) and 17-(2-dimethylamino)ethylamino-17-demethoxyge-ldanamycin (17-DMAG), are currently in clinical trials [11].Furthermore, several new classes of hsp90 inhibitors have beendeveloped, including the rationally designed purine family [12] (seeBIIB021, Fig. 1) and the pyrazole–isoxazole analogues, identified byhigh-throughput screening [13]. Novobiocin and coumarin scaffoldanalogues, such as 4TCNA (Fig. 1), are instead inhibitors of a secondputative ATP binding site localised in the carboxyl-terminaldomain [14].

Management systems that gather and organise biological andchemical data are valuable tools that help to analyse and bettercomprehend the pharmacological properties and complex rela-tionships that exist between chemical structures and biological

0

2

4

6

8

10Mol. Weight/10

AroAromatic bond count/3Aromatic ring count

Fused aliphatic ring count

Fused aromatic ring count

Fused ring count

Heteroaromatic ring count

Hetero ring count

Largest ring size

Ring atom count/3

Ring bond count/3

Ring countSmallest ring size

Fig. 2. Radar view of the three hsp90 datasets: IC50< 300 nM (blue area), IC50 between 5 anddescriptors were analysed, and median values are represented. For a larger radar figure and ecolour in this figure legend, the reader is referred to the web version of this article.)

activities [15–17]. Analyses of such databases may increase theability to identify promising new therapeutic compounds.

In order to better evaluate the chemical diversity of ligandsreported in scientific literature, including patents, to be activeagainst the hsp90 chaperone, a database containing 814 uniquestructures was constructed. Two recent papers reported the anal-yses of small, and limited in number, hsp90 databases containing129 [18] and 187 [19] molecules, respectively.

In this paper, we present a study of the most complete hsp90database (814 structures), with the aim of providing a generaloverview and a useful evaluation of the chemical diversity ofligands associated with this biological target using variouscomputational approaches.

0Polar surface area/10

LogP

LogD

Acceptor count

Donor count

Rotatable bond count

Polarizability/10

Refractivity/15

Aliphatic atom count/3

Aliphatic bond count/3

Aliphatic ring countmatic atom count/3

Hsp90 < 300nM

5µM < Hsp90 < 25µM

Hsp90 > 80µM

25 mM (red area), IC50> 80 mM (green area). For each dataset, 25 topological molecularxact values, see supplementary content section. (For interpretation of the references to

Fig. 3. Radar views of three classes of hsp90 inhibitors: purine family (IC50< 1 mM, on the top); pyrazole–isoxazole scaffold (IC50< 10 mM, in the middle); geldanamycin analogues(IC50< 300 nM, on the bottom). Median values are represented for each molecular descriptor. For larger radar figures and exact values, see supplementary content section.

D. Audisio et al. / European Journal of Medicinal Chemistry 45 (2010) 2000–20092002

-4

-2

0

2

4

6

8

10Mol. Weight/100

Polar surface area/10

LogP

LogD

Acceptor count

Donor count

Rotatable bond count

Polarizability/10

Refractivity/15

Aliphatic atom count/3

Aliphatic bond count/3

Aliphatic ring count

Aromatic atom count/3Aromatic bond count/3

Aromatic ring count

Fused aliphatic ring count

Fused aromatic ring count

Fused ring count

Heteroaromatic ring count

Hetero ring count

Largest ring size

Ring atom count/3

Ring bond count/3

Ring count

Smallest ring sizePU < 1 µM

PY-IS < 10 µM

Fig. 4. Radar view of PU (IC50 <1 mM, blue outline) and PY (IC50 <10 mM, red outline) scaffolds. For better visualisation, the radars have been re-scaled and superimposed. Greenand yellow areas highlight the main different descriptors. Median values are represented for each molecular descriptor. For exact values, see supplementary content section. (Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


2. Results and discussion

Results of the exploration of the hsp90 chemical space under-went three different analyses: two-dimensional moleculardescriptors (with a radar view representation), moleculardescriptor-based principal-component analysis (PCA) and a frag-ment-based approach.

2.1. Two-dimensional molecular descriptors

‘‘Chemical space’’ is a term that is often used in place of ‘‘multi-dimensional descriptor space’’, which is a region defined bya particular choice of descriptors encompassing all the possiblesmall organic molecules that could be mapped onto coordinates ofthis multi-dimensional space [20,21]. This concept is closely relatedto the notion of chemical diversity. Small organic molecules comein all shapes and sizes, and can be characterised by a wide range of‘‘descriptors’’, such as their molecular mass, lipophilicity or topo-logical features. The diversity of a chemical library is a quantitative

5

4

3

N2

X1

O

O

Cl

H

H

R

R

X = NH, O

Fig. 5. The 5-chloro-2,4-dihydroxyphenyl motif was present in 64% of the pyrazole–isoxazole active dataset (IC50< 10 mM). This resorconilic fragment, which donates twohydrogen bonds, may explain the difference found for the descriptor related tohydrogen bond donor count between PUs and PYs.

description of how different these compounds are from each other[22], with similar molecules falling in the same chemical spaceregion. In this work, to encode the chemical space, two-dimen-sional molecular descriptors have been used and analyses wereperformed using ChemAxon’s applications [23]. Before starting theassessment of the chemical diversity, biological activity thresholdswere defined and active or inactive categories were assigned to thecompounds under study. Categories differed on inhibitionthreshold values, which were expressed mostly as IC50 values. Toanalyse active, inactive compounds and those having an interme-diate activity, IC50 values were chosen to define the categories

Fig. 6. Projection of hsp90 dataset (red dots, 814 structures) within a literature-basedkinase database [26] (grey dots, more than 107,000 structures). Each dot representsa ligand mapped into the coordinates of the chemical space according to the first twoPCA axes computed using twenty-five intuitive topological molecular descriptors. (Forinterpretation of the references to colour in this figure legend, the reader is referred tothe web version of this article.)

Fig. 7. Projection of hsp90 ligands. Red spheres represent GA analogues with IC50< 300 nM, blue spheres are PU active ligands (IC50 <1 mM) and yellow spheres are PY–IS activeligands (IC50< 10 mM). Each sphere represents an active ligand (activity values referred to the family thresholds) mapped into the coordinates of the chemical space defined byprincipal components constructed from 2D-molecular descriptors. Grey dots are all other structures contained in the hsp90 database, not belonging to the three datasets analysed.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


without ambiguity (see dataset extraction 4.1). Thus, three well-balanced datasets were considered: the first including activeligands with IC50< 300 nM (130 molecules), the second containingstructures with IC50 values between 5 mM and 25 mM (122

O

O

N

H

O

Me

Me

MeO

Me

OH

Me

MeO

OCONH2

H

N

SN

S

Cl

A8 (29) B8 (13)

H2N

C

S

I

OMe

F8 (6)

N

NN

Cl

H2N

G8 (6)

N

H

N

NN

N

Br

H2N

K8 (5)

N

MeMe

Br

O

L8 (4)

N

Me

Br

M8

Fig. 8. Fifteen most common fragments identified in the hsp90 active dataset (IC50< 300 nMare associated with the motifs.

molecules), and the third containing inactive compounds withIC50> 80 mM (121 molecules) [24].

Twenty-five intuitive topological molecular descriptors wereanalysed for the three datasets (see supplementary content for

N

NN

N

Cl

8 (11)

N

NN

N

NH2

D8 (9)

N

MeMe

OMe

E8 (8)

NN

N

NH2

8 (5)

N

NN

N

NH2

I8 (5)

OMe

J8 (5)

Me

(4)

N

MeMe

Cl

N8 (4)

N

NN

N

NH2

O8 (4)

). The numbers between parentheses indicate the occurrence in the database. Letters

N

NN

N

NH2

A9 (33)

N

NN

N

Cl

B9 (13)

H2N

N

NN

Cl

C9 (7)

H2N

N

NN

N

Br

D9 (6)

H2N

N

NN

N

NH2

N

NN

N

Cl

F9 (5)

H2N

N

NN

N

Cl

G9 (4)

H2N

N

NN

Br

H9 (3)

H2N

E9 (6)

F

Me

N

NN

N

Cl

I9 (3)

H2N

N

NN

N

Cl

J9 (2)

H2N

N

NN

N

Cl

K9 (1)

H2N

N

NN

N

Me

L9 (1)

H2N

Me

MeMeMe

N

N

Cl

M9 (1)

H2N

Me

Fig. 9. Most frequent scaffolds for the PU active dataset (IC50< 1 mM). The numbers between parentheses indicate the number of molecules containing the motif.


a complete listing) and median values are represented in a graphicradar view (Fig. 2).

For inactive compounds (IC50> 80 mM) and those having an IC50

ranging from 5 to 25 mM, there is no significant change in moleculardescriptor median values. However, important differences wereobserved in the profile of the active compounds dataset(IC50< 300 nM). Indeed, for several descriptors, such as molecularweight, polar surface area, hydrogen acceptor count, refractivity,aliphatic atom count, aliphatic bound count, ring atom count andring bound count, higher median values were observed.

Among the 130 most active compounds identified (IC50<

300 nM), 117 were analogues of the geldanamycin and purines, 8were analogues of the radicicol and 5 belonged to the pyrazolefamily. These four chemical families were of great interest for

SN

S

Cl

A10 (17)

N

MeMe

[Cl, Br]

B10 (12)

N

Me

[

O

C10

MeO

OMe

OMe

[Cl, Br]

N

SN

S

[Cl, Br]O O

[Cl,

F10 (5) G10 (4) H10 (4)

O O

[Br, I]S

L10 (3)

P OEtO

OEt

N10 (3)M10 (3)

Fig. 10. Most frequent side chains for the PU active dataset (IC50< 1 mM). The numbe

further analysis on their covered chemical space. To note, radicicolanalogues could not be analysed due to the lack of structures withinthe database.

Three datasets were created encompassing geldanamycinanalogues (GA) with a threshold of 300 nM (44 structures), purinescaffold (PU) analogues with a threshold of 1 mM (89 structures)and pyrazole–isoxazole scaffold (PY–IS) analogues with a thresholdof 10 mM (87 structures) [24]. In order to accurately balance thethree datasets, greater thresholds were considered for the PU andPY–IS datasets. Moreover, the pyrazole–isoxazole scaffold was onlyrecently discovered [25], and the number of biologically activecompounds was lower than that of the GA and PU sets.

As shown in Fig. 3, significant differences between the radarviews of GA and PU/PY–IS datasets were observed. As expected, GA

Me

Cl, Br]

(10)

N

MeMe

OMe

D10 (8)

S

I

OMe

Br]

E10 (8)

NH

MeMe

N

MeMe

OMe

Me

I10 (4)J10 (4) K10 (3)

O

Me O

NH

MeMe

Me

O10 (2) P10 (2)

HN

Me

Me

Me

Q10 (2)

rs between parentheses indicate the number of molecules containing the motif.

Table 1Average rate of occurrence for adenine and 7-fluoroadenine across the active(IC50<1 mM) and inactive (IC50> 25 mM) PU datasets.

Entry Fragment Active Inactive

A

N

N N

N

NH2

37% 79%

B N

N N

N

NH2

F

7% 19%

Table 2Average rate of occurrence for the three side chains A, B, and C across the active(IC50<1 mM) and inactive (IC50> 25 mM) PU datasets.

Entry Fragment Active Inactive

A 4% 32%

B 3% 18%

C

MeO

OMe

OMe

[Cl, Br] 6% 4%


has a different profile in regard to median values of moleculardescriptors, due to its complex chemical structure, which hashigher values of molecular weight, polar surface area, aliphaticatom/bond count, ring atom/bond count and a larger ring size.However, higher similarity was found for the PU and PY–IS data-sets. For a better visualisation, the radars of these two scaffoldswere re-scaled and superimposed (Fig. 4).

Among the twenty-five molecular descriptors used, only fivepresented remarkable differences: the number of H-bond donors,the fused aromatic ring count, the fused ring count, the hetero-aromatic ring count and the hetero-ring count. Concerning thenumber of H-bond donors (Fig. 4, green area), PU had a medianvalue of 1 and PY–IS of 3, with the difference being explained by thefact that the PY–IS scaffold has the 5-chloro-2,4-dihydroxyphenylfragment in position 3 of the pyrazole–isoxazole ring for 64% of thecompounds, which gives two more H-bond donors per molecule(Fig. 5).

A11 (31)

MeO

OMe

OMe

MeO

OMe

OMe

SO

O

[H, Cl]

MeO

OMe

OMe

[Cl, Br]

O

[Cl, Br, I]

OMe

S

OMe

MeO

S

C

Cl

B11 (27) C11 (

F11 (4) G11 (3) H11

K11 (2) L11 (2) M11

P11 (2) Q11 (2)

Fig. 11. Most frequent side chains for the PU inactive group (IC50> 25 mM). The numb

The second main difference concerned the number of fusedrings and heteroaromatic rings (Fig. 4, yellow area). Thesedescriptors were directly tied to the chemical structures of the PUand PY–IS scaffolds, as the purine ring is composed of two fusedheteroaromatic rings, while the pyrazole is only one hetero-aromatic ring.

So, except for molecular descriptors tightly associated to thestructure of chemical scaffolds, PU and PY–IS active compoundsshowed a surprising similarity in their radar view representations.For comparison, molecular descriptor median values are reportedin the supplementary content section.

O

MeO

OMe

OMe

S

Me

OMe

OMe

S

O

OMe

S O

O

Cl

l

MeO

OMe

OMe

S

O

O

O

15) D11 (8) E11 (5)

(3) I11 (3) J11 (3)

(2) N11 (2) O11 (2)

R11 (2) S11 (2)

ers between parentheses indicate the number of molecules containing the motif.

Table 3Average rate of occurrence for 8 common fragments across the active (IC50<10 mM) and inactive (IC50> 30 mM) PY datasets.

Entry Fragment Active Inactive Entry Fragment Active Inactive

D

Cl

HO

OH

64% 34% H

O

O

6% 23%

E

OMe

29% 11% I

Br

HO

OH

5% 4%

F

HO

OH

8% 4% J

O

O

3% 3%

G Br 7% 4% K

OMe

OMe 3% 3%


2.2. Principal-component analysis

Another meaningful and accessible way of visualising medicinalchemistry space consists in performing a principal-componentanalysis (PCA). PCA is a relatively easy and widely used method toreduce high-dimensional data into a lower-dimensional space, thusmaking them more manageable and comprehensible by extractingessential information. Indeed, PCA transforms the originalmeasured variables (i.e. n-molecular descriptors) into new uncor-related variables called principal components, which are a linearcombination of the original measured variables.

To evaluate the overall distribution of the hsp90 database, weprojected the 814 structures, represented in Fig. 6 by red dots, onthe wider literature-based kinase database [26] that containedmore than 107,000 chemical structures (Fig. 6, grey dots).

Fig. 6 illustrates that hsp90 ligands are widely distributed in thekinase chemical space, reflecting the chemical diversity of thedataset. Fig. 7 charts a more specific view of hsp90 chemical space,with red spheres representing active compounds of the GA family(IC50< 300 nM), yellow spheres for active ligands of the PY–ISfamily (IC50<10 mM) and blue spheres for active members of thePU family (IC50<1 mM). Grey dots are other hsp90 ligands includedin the database, but not belonging to the three datasets analysed.According to the results extrapolated from the 2D-moleculardescriptor radar view analysis, it emerged that GA distribution wasshifted from that of PU and PY–IS. Additionally, we could confirmthe hypothesis that PU and PY–IS active compounds have quitesimilar properties, as could be seen in the PCA chart (Fig. 7).

2.3. Fragment-based approach

The recognition of fragments in small organic molecules is anintuitive process for medicinal chemists, reflecting the mannerin which molecules are synthesised from chemical buildingblocks [27]. Fragmentation studies are very interesting tools in

the drug discovery process, especially in the lead optimisationphase, where exploration of particular regions of the chemicalspace is required to improve activity of a lead structure. With theaim of identifying common motifs in hsp90 inhibitors that arelikely to be responsible for biological activity, we employedChemAxon’s Fragmenter [23] that implements the RECAP algo-rithm [28].

In a first attempt, we fragmented the dataset including hsp90active ligands with IC50< 300 nM (130 molecules) and Fig. 8 showsthe 15 most represented fragments identified in the group.

Motif A8 (Fig. 8), a 17-N-substituted-17-demethoxygeldanamy-cin scaffold, was the most frequent fragment. This result seemslogical, since the GA analogues were the more represented deriv-atives in this dataset and their modulations at the position 17[29,30,31] don’t affect the biological activities [32]. Moreover,a limitation of the RECAP algorithm cleavage rules established thatonly acyclic bonds are cleaved, so that the benzoquinone ansamy-cins motif was considered as a single entity. The large size of motifA8 makes the fragmentation of the GA family less attractive. All theother fragments shown in Fig. 8 belong to the PU family analogues,represented either by purine scaffold fragments (see C8, D8, G8, H8,I8, K8 and O8) or by side chain motifs (see B8, E8, F8, J8, L8, M8and N8).

To have a more detailed and comprehensive overview, weapplied fragmentation analysis to the purine (PU) and pyrazole–isoxazole (PY–IS) families of hsp90 inhibitors. For the PU analogues,we created two datasets: the first including compounds withIC50<1 mM (active compounds, including 89 structures), anda second for ligands with IC50> 25 mM (inactive compounds,including 84 structures) [24]. For a better visualisation, the resul-tant fragments were shown separately as scaffolds (Fig. 9) and sidechains (Fig. 10).

In Fig. 9, the scaffolds of the PU active dataset (IC50<1 mM) areshown. Of note was the chemical diversity of ligands related to theheterocyclic ring, including purines, 7-azapurines, 8-aza-7-


deazapurines and indazoles. The most frequent ligand was adenine(33 times), which was followed by 2-amino-6-chloropurine (13times).

Fig. 10 represents the common side chains for members of the PUfamily with IC50<1 mM. As already reported in Fig. 8, the motif A10was the most frequent (17 times), followed by B10 (12 times), C10(10 times) and D10 (8 times).

Next, the PU inactive molecules dataset (IC50> 25 mM) wasanalysed as above. Concerning the chemical scaffold, we found thatthe two hetero-rings adenine (66 times) and 7-fluoroadenine (16times) accounted for 98% of the entire dataset, excluding singletons(Table 1). This was an interesting difference compared to the PUactive group, where 13 different scaffolds were found, featuringadenine and 7-fluoroadenine at 37% and 7%, respectively.

Fig. 11 shows the 19 most common side chains identified formembers of the PU inactive group (IC50> 25 mM). Significantdifferences were found compared to the active dataset (Fig. 10).Indeed, the active and the inactive datasets share only threecommon motifs: the two aliphatic chains A and B (Table 2) thatwere mostly represented in the inactive group (32% and 18%,respectively), and fragment C, with comparable frequency in bothdatasets (6% for the active set, 4% for the inactive set, Table 2).

The same analysis discussed above was also performed on thePY–IS family. Two datasets were considered: one with compoundswith IC50<10 mM (87 structures), and a second containing 100structures with IC50> 30 mM [24]. After the fragmentation, wefound a surprising similarity between motifs resulting from the twodatasets. Indeed, they shared 8 of the 16 most common fragments.This observation led us to evaluate the ‘‘presence rate’’ of these 8common fragments in the two groups (Table 3).

Fragments D, E and H (Table 3) showed the largest rate differ-ence between active and inactive groups. Motifs D and E had anoccurrence of 64% and 29%, respectively, in the active group. MotifH, however, was more represented in the inactive dataset. Theother fragments showed similar values across the active and inac-tive groups. Concerning the chemical scaffold of the pyrazole–iso-xazole family, no significant differences were found between theactive and inactive groups (Data not shown).

3. Conclusions

In this study, we achieved a global exploration of the chemicalspace of ligands extracted from an hsp90 database that containschemical structures associated with their respective biologicalactivities. The chemical diversity of the dataset was assessed using2D-molecular descriptors method, principal-component analysisand a fragment-based approach. Molecular descriptors wereapplied to investigate active versus inactive compounds, and threeclasses of hsp90 inhibitors, geldanamycin, purine and pyrazole–isoxazole derivatives, were evaluated in this analysis. The mainstructural differences between these classes were underlined. Weshowed that the PU and PY–IS families displayed high similarity inregard to their molecular descriptors, on the contrary to GA family.Furthermore, molecular descriptor-based principal-componentanalysis was performed to visualise and project different clustersinto the chemical space, taking into account the related biologicaldata. Finally, rule-based chemical fragmentation was applied to thepurine and pyrazole–isoxazole families, allowing us to identifycommon motifs that are likely to be responsible for biologicalactivities. The information obtained in this study provided a wideoverview of the chemical space associated with the hsp90 target,and can be used for lead optimisation. Future directions for thepresent work include investigations of other families of hsp90inhibitors, such as novobiocin analogues [6,14,33], that bind toa second putative ATP binding site situated in the C-terminal

domain of the chaperon, and tetrahydro-4H-carbazol-4-ones,recently developed by Serenex [34,35].

4. Experimental

4.1. Dataset extraction

The training sets used for models generation have beenextracted from the Aureus Pharma hsp90 database [26]. Thisdatabase covers biological data published on hsp90 and provideschemical structure information, references to the original publica-tion or patent and detailed information on experimentalconditions.

The hsp90 database contained a total of 896 chemical structurescoming from 73 bibliographic references (including 50 articles and23 patents). At the time of this analysis (January 2008 release),a total number of 1923 activities were recorded in the database. Asour analysis was a two-dimensional approach, duplicate structureswith different stereochemistry, as well as molecules with differentcounter ions, were eliminated leading to 814 unique molecules.

Before analysing the chemical diversity of the datasets, wedefined biological activity thresholds and assigned active or inac-tive categories to the compounds under study. The number ofactives and inactives differed depending on inhibition thresholds,which were expressed mainly as IC50 values. To examine the effectof the threshold, exact and modulated IC50 values were meticu-lously examined to constitute a variety of datasets. As a result, well-balanced datasets were created, especially when comparisonbetween active and inactive molecules was performed. Ligandsassociated to different IC50 values (i.e., on different cancer celllines), which could not be included in a dataset without ambiguity,were excluded from this study.

The hsp90 database is continuously updated and readers maycontact the corresponding author for additional information.

4.2. Molecular descriptors computations

In this analysis, to encode the chemical space and performsimilarity searches, 2D-molecular descriptors have been used.Analyses were performed using ChemAxon’s calculator module[23]. Twenty-five topological molecular descriptors were chosenfor their intuitiveness to medicinal chemists. These descriptorsencode the topological information of a molecular structure. Seethe supplementary content for a complete listing of the descriptors.

For principal-component analysis (PCA), variables werecomputed from the set of twenty-five 2D-molecular descriptors,mentioned above, using SpotFire visualisation.

For the fragment-based study, ChemAxon’s Fragmenter [23] wasemployed. Fragmenter cleaves single bonds to generate molecularfragments. The cleavage rules correspond to chemical reactions inorder to enhance synthetic accessibility. This application fragmentsmolecules based on predefined cleavage rules, a method known asRECAP [28]. In regards to PU and PY–IS fragmentation, RECAP ruleswere manually implemented to obtain the desired separationbetween ‘‘scaffolds’’ and ‘‘side chains.’’

Acknowledgments

The CNRS is gratefully acknowledged for financial support ofthis research. We thank the European Union (EU) within the ESTnetwork BIOMEDCHEM (MEST-CT-2005-020580) for a Ph.D. grant(to D.A.) and for financial support. Region Ile-de-France is alsoacknowledged for support. The authors express their appreciationto the ChemAxon team for providing JChem tools and their helpfulsupport. We thank the Aureus Pharma Knowledge Management


team, as well as the IT team, for their valuable help during thepreparation of this work.

Appendix. Supplementary data

Supplementary data associated with this article can be found inthe online version, at doi:10.1016/j.ejmech.2010.01.048.

References

[1] L.X. Wu, J.H. Xu, K.Z. Zhang, Q. Lin, X.W. Huang, C.X. Wen, Y.Z. Chen, Leukemia22 (2008) 1402–1409.

[2] For a more detailed client protein list see: http://www.picard.ch/downloads/hsp90interactors.pdf.

[3] U. Banerji, Clin. Cancer Res. 15 (2009) 9–14.[4] D. Hanahan, R.A. Weinberg, Cell 100 (2000) 57–70.[5] A. Kamal, L. Thao, J. Sensintaffar, L. Zhang, M.F. Boehm, L.C. Fritz, F.J. Burrows,

Nature 425 (2003) 407–410.[6] S. Messaoudi, J.-F. Peyrat, J.-D. Brion, M. Alami, Anticancer Agents Med. Chem.

8 (2008) 761–782.[7] L.H. Pearl, C. Prodromou, Annu. Rev. Biochem. 75 (2006) 271–294.[8] M.G. Marcu, T.W. Schulte, L. Neckers, J. Natl. Cancer Inst. 92 (2000) 242–248.[9] D.B. Solit, G. Chiosis, Drug Discov. Today 13 (2008) 38–43.

[10] G. Chiosis, A. Rodina, K. Moulick, Anticancer Agents Med. Chem. 6 (2006) 1–8.[11] T. Taldone, A. Gozman, R. Maharaj, G. Chiosis, Curr. Opin. Pharmacol. 8 (2008)

370–374.[12] S.R. Kasibhatla, K. Hong, M.A. Biamonte, D.J. Busch, P.L. Karjian,

J.L. Sensintaffar, A. Kamal, R.E. Lough, J. Brekken, K. Lundgren, R. Grecko,G.A. Timony, Y. Ran, R. Mansfield, L.C. Fritz, E. Ulm, F.J. Burrows, M.F. Boehm, J.Med. Chem. 50 (2007) 2767–2778.

[13] B.W. Dymock, X. Barril, P.A. Brough, J.E. Cansfield, A. Massey, E. McDonald,R.E. Hubbard, A. Surgenor, S.D. Roughley, P. Webb, P. Workman, L. Wright,M.J. Drysdale, J. Med. Chem. 48 (2005) 4212–4215.

[14] (a) G. Le Bras, C. Radanyi, J.-F. Peyrat, J.-D. Brion, M. Alami, V. Marsaud,B. Stella, J.-M. Renoir, J. Med. Chem. 50 (2007) 6189–6200;(b) C. Radanyi, G. Le Bras, V. Marsaud, S. Messaoudi, J.-F. Peyrat, J.-D. Brion,M. Alami, J.-M. Renoir, Cancer Lett. 274 (2009) 88–94;(c) C. Radanyi, G. Le Bras, C. Bouclier, S. Messaoudi, J.-F. Peyrat, J.-D. Brion,M. Alami, J.-M. Renoir, Biochem. Biophys. Res. Commun. 379 (2009) 514–518;(d) D. Audisio, S. Messaoudi, J.-F. Peyrat, J.-D. Brion, M. Alami, Tetrahedron Lett.48 (2007) 6928–6932;

(e) S. Messaoudi, D. Audisio, J.-D. Brion, M. Alami, Tetrahedron 63 (2007)10202–10210;(f) S. Sahnoun, S. Messaoudi, J.-F. Peyrat, J.D. Brion, M. Alami, Tetrahedron Lett.49 (2008) 7279–7283;(g) S. Sahnoun, S. Messaoudi, J.-D. Brion, M. Alami, Org. Biomol. Chem. 7(2009) 4271–4278.

[15] T.I. Oprea, A. Tropsha, Drug Discov. Today: Technol. 3 (2006) 357–365.[16] A.K. Ghose, T. Herbertz, J.M. Salvino, J.P. Mallamo, Drug Discov. Today 11

(2006) 1107–1114.[17] I. Ijjaalli, E. Dubus, E. Bourinet, F. Petitet, Channels 1 (2007) 291–299.[18] A.J. Knox, T. Price, M. Pawlak, G. Golfis, C.T. Flood, D. Fayne, D.C. Williams,

M.J. Meegan, D.G. Lloyd, J. Med. Chem. 52 (2009) 2177–2180.[19] A. Lauria, M. Ippolito, A.M. Almerico, Comput. Biol. Chem. 33 (2009) 386–390.[20] C.M. Dobson, Nature 432 (2004) 824–828.[21] C. Lipinski, A. Hopkins, Nature 432 (2004) 855–861.[22] B.R. Stockwell, Nature 432 (2004) 846–854.[23] ChemAxon, Budapest, 1037, Hungary, http://www.chemaxon.com.[24] To note: thresholds were chosen to obtain well balanced datasets, containing

enough structures (i.e. information) for the analysis.[25] M.G. Rowlands, Y.M. Newbatt, C. Prodromou, L.H. Pearl, P. Workman,

W. Aherne, Anal. Biochem. 327 (2004) 176–183.[26] Aureus Pharma, Paris, 75010, France, http://www.aureus.pharma.com.[27] J.J. Sutherland, R.E. Higgs, I. Watson, M. Vieth, J. Med. Chem. 51 (2008)

2689–2700.[28] X.Q. Lewell, D.B. Judd, S.P. Watson, M. Hann, J. Chem. Inf. Comput. Sci. 38

(1998) 511–522.[29] J.Y. Le Brazidec, A. Kamal, D. Busch, L. Thao, L. Zhang, G. Timony, R. Grecko,

K. Trent, R. Lough, T. Salazar, S. Khan, F. Burrows, M.F. Boehm, J. Med. Chem. 47(2004) 3865–3873.

[30] Z.Q. Tian, Y. Liu, D. Zhang, Z. Wang, S.D. Dong, C.W. Carreras, Y. Zhou,G. Rastelli, D.V. Santi, D.C. Myles, Bioorg. Med. Chem. 12 (2004) 5317–5329.

[31] G. Chiosis, J. Aguirre, C.V. Nicchitta, Bioorg. Med. Chem. Lett. 16 (2006)3529–3532.

[32] G. Chiosis, B. Lucas, H. Huezo, D. Solit, A. Basso, N. Rosen, Curr. Cancer DrugTargets 3 (2003) 371–376.

[33] A. Donnelly, B.S.J. Blagg, Curr. Med. Chem. 15 (2008) 2702–2717.[34] T.E. Barta, J.M. Veal, J.W. Rice, J.M. Partridge, R.P. Fadden, W. Ma, M. Jenks,

L. Geng, G.J. Hanson, K.H. Huang, A.F. Barabasz, B.E. Foley, J. Otto, S.E. Hall,Bioorg. Med. Chem. Lett. 18 (2008) 3517–3521.

[35] K.H. Huang, J.M. Veal, R.P. Fadden, J.W. Rice, J. Eaves, J.P. Strachan,A.F. Barabasz, B.E. Foley, T.E. Barta, W. Ma, M.A. Silinski, M. Hu, J.M. Partridge,A. Scott, L.G. DuBois, T. Freed, P.M. Steed, A.J. Ommen, E.D. Smith, P.F. Hughes,A.R. Woodward, G.J. Hanson, W.S. McCall, C.J. Markworth, L. Hinkley, M. Jenks,L. Geng, M. Lewis, J. Otto, B. Pronk, K. Verleysen, S.E. Hall, J. Med. Chem. 52(2009) 4288–4305.

http://dx.doi.org/doi:10.1016/j.ejmech.2010.01.048

http://www.picard.ch/downloads/hsp90interactors.pdf

http://www.picard.ch/downloads/hsp90interactors.pdf

http://www.chemaxon.com

http://www.aureus.pharma.com

assessing the chemical diversity of an hsp90 database

Documents