opening up and connecting antimalarial data: progress with caveats

19
www.guidetopharmacology.org Opening up and connecting antimalarial data: Progress with caveats Christopher Southan ACS CINF session: The Growing Impact of Openness in Chemistry: A Symposium in Honour of JC Bradley 1 http:// www.slideshare.net/cdsouthan/southan-malaria-acs

Upload: chris-southan

Post on 18-Aug-2015

19 views

Category:

Science


1 download

TRANSCRIPT

1

www.guidetopharmacology.org

Opening up and connecting antimalarial data: Progress with caveats

Christopher Southan

ACS CINF session: The Growing Impact of Openness in Chemistry: A Symposium in Honour of JC Bradley

http://www.slideshare.net/cdsouthan/southan-malaria-acs

2

Abstract

Among JCBs achievements his work on Open notebook science (ONS) has not only perhaps the largest impact but the ripple effect continues to broaden. This is particularly the case in Open Source Drug Discovery (OSDD) where ONS is a natural fit. This presentation will review the “findability” of new antimalarial drug discovery data. While antimalarials are very much a poster child for OSDD the patterns of result disclosure and practical extent of openness varies widely. This recent blogpost(http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html) describes “digging out” 26 antimalarial leads to add to a new MMV pathogen box. The difficulties associated with this task will be outlined. In particular, examples are still emerging from conventional (i.e. closed) drug discovery operations, even to the extent of finding patent-only lead compounds. Even for the academic groups that do publish papers, examples show the system can be slow and patchy in getting the structures surfaced in database records. This may not happen at all if MeSH curation fails to index the lead compound in PubChem so curation of paper is necessary. This slowness contrasts with the Sydney University Open Source Malaria project (OSM http://opensourcemalaria.org/) with its declared open source principles. It thus comes closest to ONS in that they and their collaborators endeavour to surface results in close to real time. Technical aspects of extracting the information from open web instantiations will be described including the use of SMILES, InChI strings and Keys. The latter comes close to a perfect ONS vehicle for chemistry since it makes an explicit chemical structure globally “findable” literally within minutes of being written into a blogpost, via a search taking ~0.3 seconds (PMID 23399051). Because JCBs ideas still need wider implementation issues around improving connections between papers, patents, database entries, OSM data and potential new box inclusions will be discussed.

3

Introduction• As we have heard, Jean-Claude Bradley’s (JCB) work on Open Notebook

Science (ONS) was a major innovation

• The core revolutionary philosophy is real-time data surfaced on the open web via an Electronic Laboratory Notebook (ELN).

• It has become embraced by Open Source Drug Discovery (OSDD used here as a generic term not specific to any group)

• The openness is a radical departure from what could be termed Traditional Closed Drug Discovery (TCDD)

• ONS touches several contemporary themes• Disclosure of results for others to build on• Exposure of detailed protocols • Reproducibility (i.e. warts-and-all sharing of positive and negative results )• A logical extrapolation of the “open access” publication principle• Transparency – knowing what different groups are doing globally• Potential to accelerate discovery research by telescoping timelines

4

Origins of Open Notebook Science from 2005

A 2012 page from the JCB lab run through ChemAxon chemicalize.org

5

Antimalarial research and context

• Research progress for all NTDs is crucial but antimalarials has become somewhat of a poster-child for OSDD

• The boundaries between OSDD and TCDD are blurred • The majority of current leads have still come through TCDD route (e.g. many are

patented)• Antimalarials has become a test bed for new approaches (e.g. open data sets from

GSK and others, the Medicines for Malaria Ventures (MMV) “Malaria Box” of physical compounds, and WIPO Re:Search intellectual property sharing)

• So far, the Sydney Open Source Malaria project is the only ONS instantiation http://opensourcemalaria.org/#

• For context, I have donated small amounts voluntary support to the OSM team since 2012

• This has focused on chemical structure searching, data organisation and surfacing strategies

• I blog occasionally on the themes of data connectivity in general and for antimalarial leads in particular

• The surfacing of these leads illustrate “shades of openness” and the problems thereof, particularly well

6

Useful recent review of leads- but

• Link-free zone (except for references)

• PDF “tomb”• Images for structures• No systematic chemical

descriptions • No chemical database

identifiers• No target protein

database identifiers• DDD107498 was

blinded at that time (no structure)

• I decided to address the problem as a community service

7

Consequently, much effort was neededto get from this to this

http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48460617/public/

8

Getting compounds out of papers into the Pathogen Box: not easyhttp://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html

On a good day, MeSH curators will index the lead structures specified in PubMed and connect them to PubChem. On a bad day (as in this case), they may record the name but without any link to a chemical structure.

9

But a little curatorial perspicacity did resolve DDD107498

IUPAC from supp dat > chemicalize.org > PubChem > SureChEMBL > SAR table

10

But was still a tough job to get 28 antimalarial structures

• The 6 structures not in PubChem are de facto unfindable in open databases but some may get Google InChIKey matches via chemicalize.org cache

• The only systematic identifier encountered was the IUPAC name which often had to be dug out of the supplementary data (i.e. neither SMILES nor InChI in papers or patents)

• No authors made direct database submissions

• The code name was often not a PubChem synonym

• ChEMBL had picked up 16 with data in PubChem BioAssay

• 13 had patent-extraction matches and 11 chemical vendor matches

• The MeSH annotation had only linked two directly to PMIDs

11

Because OSM practices ONS finding stuff is much easier

12

The entire portfolio is open: even the new designs

Chemicalize.org does open name-to-struc (n2s) on the web pages

13

Googling the InChIKeyfor global findability

• Direct from the Open Lab Book sheet

• Or from a chemicalize conversion

• Gives exact match instantly • Works also with inner layer• Can cross-check from

PubChem <> ELN• Many directly uploaded >

ChEMBL then > PubChemBioAssay

14

Speed sharing via OSM > Twitter

15

If PubChem –ve; then search the chemicalize.org cache

In this case we similarity hit other OSM compounds

16

Rapid triage in PubChem

identity matches 90% similarity

chemicalize download > PubChem upload > search

18

Conclusions

• Challenges of curating published antimalarial leads were similar to those encountered by the GtoPdb team for human targets and their ligands on a daily basis

• This impedes progress in many ways• Authors spend little effort on ensuring their leads and SAR are

surfaced and connected in databases with a retrievable name• There are also gaps in reciprocal mappings between leads, targets

and pathways• Journals should step up efforts towards author chemistry mark up

(Nature Chemical Biology being a good example)• Authors seem peculiarly reluctant to cite even their own patents• Compared to TCDD, the way Sydney OSM and their collaborators

work in the open makes a huge difference in the pace of research• JCBs pioneering work continues to spread out into the open science

community and will extend its impact