building an integrated system for chemistry markup and online publishing integrated to online...

41
Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

1.676 views

Category:

Technology


0 download

DESCRIPTION

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

TRANSCRIPT

Page 1: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Page 2: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Electronic Publishing

Publishers know that embracing electronic publication is a must.

“Cell Press and Elsevier have launched a project called Article of the Future … to redefine how the scientific article is presented online. ” Prototype: http://beta.cell.com/

Page 3: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Publishers are experimenting

…invited researchers to prototype tools dealing with the ever-increasing amount of online life sciences information

The winners built: Reflect: Automated Annotation of Scientific

Terms : http://reflect.ws

Page 4: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Reflect

Page 5: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Entity-Extraction, Mark-up, Annotate

Page 6: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Entity-Extraction, Mark-up, Annotate

Page 7: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

And linked to STITCH…

Page 8: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Success Depends on Dictionaries

Page 9: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Concept Web Alliance, Knewco, Others

Page 10: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

NextBio and ScienceDirect

Page 11: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Semantic Mark-up for Chemistry

Semantic mark-up for chemistry is here

RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies

Nature publishing group compound linking

ChemSpider Journal of Chemistry

Page 12: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Nature Chemistry Compound Pages

Page 13: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Project Prospect

Page 14: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

ChemSpider and Publishing

The curation efforts on ChemSpider led to a set of validated dictionaries

Integrate best-in-class entity extraction (SureChem) with validated name dictionaries

Additional dictionaries gave reactions, groups, families, hardware and software vendors etc

Page 15: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

ChemMantis and CJOC

Page 16: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Name-Structure Pairs

Page 17: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Converting Detected Names…

Names are searched against a validated dictionary (this expands as ChemSpider is curated)

If not found then they are passed through a Name to Structure algorithm

If they cannot convert then ChemSpider is searched for non-validated names

Page 18: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Manual Curation is Necessary

Page 19: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Deposit Structures

Page 20: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Custom Dictionaries

Entity Extraction built around modified algorithms from SureChem

Optimized for “publications”

Dictionaries for chemical entities, groups, reactions, elements, families, species…

Dictionaries can be expanded

Page 21: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Dictionaries are Easily Enhanced

Copy-Paste into appropriate Entity Dictionary

Impacts all future markups

Expanding knowledge bases of information

Page 22: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Build Dictionaries

Page 23: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Species – linked to Wikipedia

Page 24: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 25: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

ChemSpider and its content

ChemSpider is:

A link farm for > 21 million compounds and 200 data sources

A curation platform to improve the quality of data online

A deposition platform for chemicals and content

Page 26: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Link off a structure in ChemSpider

Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 27: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

SureChem Services

The SureChem Portal is a gateway for patent searching – can be searched by structure/substructure

ChemSpider previously integrated by depositing structures and linking out

Page 28: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

SureChem Services

Previous integration lacked any sense of numbers of patents, titles etc

New integration:

Page 29: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

SureChem Services

Page 30: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Pubmed Articles Linked

Page 31: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

From Compounds to Syntheses

ChemSpider will support synthesis procedures moving forward

The ChemSpider Journal of Chemistry is an ideal platform for text-based procedures

Page 32: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Org Prep Daily (Blog)

Page 33: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Molbank (Open Access Journal)

Page 34: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Synthetic Pages (Website)

Page 35: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

RSC Supplementary Info

Page 36: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

RSC Supplementary Info

Page 37: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

ChemSpider Synthesis

ChemSpider Synthesis will be a home for all things “synthetic”

An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.

We will mine the RSC supplementary info backfile for reactions

Public peer-review and feedback for synthetic procedures

Page 38: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Online Journals and Live Data

Page 39: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Moving forward

Integrate ChemMantis and Project Prospect as appropriate – best of both worlds

Expand ChemSpider validated dictionaries with RSC content

Expand information in “Compound Boxes” after markup – take advantage of all ChemSpider resources

Invite the community to help build ChemSpider Synthesis

Page 40: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Acknowledgments

SureChem (Nicko Goncharoff and Richard Koks)

RSC – Richard Kidd and Colin Batchelor

CJOC – multiple authors and reviewers ChemSpider curation – a cast of many

Page 41: Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blog