escience resources for the chemistry community from the royal society of chemistry
DESCRIPTION
Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of the solutions that it helps to enable. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.TRANSCRIPT
![Page 1: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/1.jpg)
eScience Resources for the Chemistry Community from the Royal Society of Chemistry
Antony Williams
NCSU, College of Textiles
October 2nd 2013
![Page 2: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/2.jpg)
We Have …Too Much Data!!!
![Page 3: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/3.jpg)
The World of Online Chemistry• Property databases• Compound aggregators• Screening assay results• Scientific publications • Encyclopedic articles (Wikipedia)• Metabolic pathway databases• ADME/Tox data – eTOX for example• Blogs/Wikis and Open Notebook Science
![Page 4: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/4.jpg)
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is lost forever?
![Page 5: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/5.jpg)
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is lost forever?
• Public Domain reference databases of value?– Syntheses– Properties– Spectra– CIFs– Images
![Page 6: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/6.jpg)
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is lost forever?
• Public Domain reference databases of value?– Syntheses– Properties– Spectra– CIFs– Images
• Much of chemistry is chemical structure-based – where and how could we host these data?
![Page 7: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/7.jpg)
RSC’s ChemSpider
![Page 8: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/8.jpg)
ChemSpider
• >29 million unique chemicals from >500 data sources
• Focus on improving data quality, enhancing functionality, integrating and enabling
![Page 9: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/9.jpg)
Crowdsourced “Annotations”• Users can add
– Descriptions/Syntheses/Commentaries– Links to PubMed articles– Links to articles via DOIs – Add spectral data– Add Crystallographic Information Files– Add photos– Add MP3 files– Add Videos
![Page 10: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/10.jpg)
![Page 11: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/11.jpg)
Spectra
![Page 12: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/12.jpg)
Chemistry Data online are messy• We have inherited errors• All public compound databases have errors• “Incorrect” structures – assertions, timelines etc• “Incorrect” names associated with structures• Properties• Links• Publications• ENORMOUS CHALLENGE
![Page 13: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/13.jpg)
Crowdsourced Curation
• Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
![Page 14: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/14.jpg)
Search “Vitamin H”
![Page 15: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/15.jpg)
“Curate” Identifiers
![Page 16: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/16.jpg)
“Curate” Identifiers
![Page 17: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/17.jpg)
“Curate” Identifiers
![Page 18: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/18.jpg)
Validated Name-Structure Dictionaries
• Chemical name dictionaries are used for:• Text-mining (publications, patents)
– Used to index PubMed and link to Google Patents
• Linking to other databases – think Biology!– When structures are not available drug names link
• Searching the web– Names link to structures link to InChIs
![Page 19: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/19.jpg)
I want to know about “Vincristine”
![Page 20: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/20.jpg)
Vincristine: Identifiers and Properties
![Page 21: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/21.jpg)
Vincristine: Vendors and SourcesLinked by Structure
![Page 22: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/22.jpg)
Vincristine: PatentsLinked by Name
![Page 23: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/23.jpg)
Vincristine: ArticlesLinked by Name
![Page 24: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/24.jpg)
Semantic Mark-up of Articles
![Page 25: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/25.jpg)
Linking Names to Structures
![Page 26: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/26.jpg)
The InChI Identifier
![Page 27: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/27.jpg)
InChIStrings Hash to InChIKeys
![Page 28: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/28.jpg)
Vancomycin – Search the Internet
![Page 29: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/29.jpg)
Vancomycin
Search Molecular SKELETON
Search Full Molecule
![Page 30: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/30.jpg)
Full Skeleton Search: 104 Hits
![Page 31: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/31.jpg)
Full Molecule Search: 4 Hits
![Page 32: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/32.jpg)
ChemSpider Resources for Chemistry
![Page 33: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/33.jpg)
Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits
per day• Mar 4-Apr 3, 2013
– Visits = 731,656– Unique Visitors = 527,008
• Independent servers to support other projects
![Page 34: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/34.jpg)
Publications - a summary of work• Scientific publications are a summary of
work– Is all work reported?– How much science is lost to pruning?– What of value sits in notebooks and is lost?
• How much data is lost?– How many compounds never reported?– How many syntheses fail or succeed?– How many characterization measurements?
![Page 35: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/35.jpg)
About Me…as a Chemist
• I’ve performed a few dozen chemical syntheses
• I’ve run thousands of analytical spectra
• I’ve generated thousands of NMR assignments
• I’ve probably published <5% of all work
• Most of it has been lost
• But things can be different today….
• But it still needs to be associated with me…
![Page 36: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/36.jpg)
Micropublishing Syntheses
![Page 37: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/37.jpg)
ChemSpider SyntheticPages
![Page 38: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/38.jpg)
Olympicene
![Page 39: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/39.jpg)
So you Want a Profile???
![Page 40: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/40.jpg)
![Page 41: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/41.jpg)
![Page 42: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/42.jpg)
Interactive Data
![Page 43: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/43.jpg)
Rewards and Recognition
Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
![Page 44: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/44.jpg)
Integrate to instruments and software
• Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters
• Also, Cheminformatics vendors link to ChemSpider– Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
![Page 45: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/45.jpg)
![Page 46: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/46.jpg)
PharmaSea
• Dereplication via ChemSpider• Segregation of natural products datasets• Analytical data algorithms & integration
– Mass spec searching – predicted fragmentation
– NMR feature searching – NMR prediction– Computer-assisted structure elucidation
![Page 47: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/47.jpg)
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
![Page 48: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/48.jpg)
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
![Page 49: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/49.jpg)
ChemSpider Contributions
• The host of the chemistry services– Supplier of “standardized” chemical data files– Chemistry searching (structure, substructure
etc)– Curator and data quality checking
• We built the Open PHACTS chemical registration system
![Page 50: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/50.jpg)
Open Source Drug Discovery
![Page 51: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/51.jpg)
Chemical Database Service
• National Chemical Database Service for UK Academics
• Integrating Commercial Databases and Services
• Chemicals, analytical data, prediction algorithms
• Development of data repository
![Page 52: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/52.jpg)
Community Repository for Data• Funding agencies encourage sharing of data• Increasing availability of “Open Data”• Institutional repositories no specific domain
support • Develop a community repository for
chemistry data – private, public, embargoed• Provides data to develop models and
algorithms
![Page 53: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/53.jpg)
Community Repository for Data• Automated depositions of data• DOI’ed data objects for citation purposes• A database of reference data, but validated by
the community • National services feeding the repository –
crystallography, mass spectrometry• Integrate to blogging tools for chemistry• Integrate to Electronic Lab Notebooks as
feeds
![Page 54: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/54.jpg)
Model Building with Community Data
• Community data as a basis of model building– Consume data from available databases,
community data, new publications and build predictive algorithms for the community
– How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
![Page 55: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/55.jpg)
Inside our Publication Archive
• How much data is in the archive, in the publications and in the supplementary info?– How many compounds for ChemSpider?– How many syntheses for ChemSpider
reactions?– How many characterization measurements?
• Property Data• Spectral Data• Graphs and charts to be used for modeling?
![Page 56: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/56.jpg)
What if we could capture it all?Digitally Enhancing the RSC Archive
![Page 57: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/57.jpg)
Start with data in publications
![Page 58: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/58.jpg)
Turn “Figures” Into Data
![Page 59: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/59.jpg)
ChemSpider Reactions• Starting with data from CSSP, MOS and CCR• Will cover reactions extracted from:
• Patents• RSC journal articles and ESI
![Page 60: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/60.jpg)
E-Lab Notebooks
• Integration between ELNs and:• ChemSpider• ChemSpider Reactions• Chemistry Data Repository
![Page 61: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/61.jpg)
Internet Data
The Future
Commercial SoftwarePre-competitive Data
Open ScienceOpen DataPublishersEducators
Open DatabasesChemical Vendors
Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals
![Page 62: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/62.jpg)
The Future of Chemistry on the Web?
• Public compound databases federate & build a linked environment of validated data!
• Data validation needs are not ignored
• Publishers layer on information to make publications discoverable
• Open Data proliferate
• The “Semantic Web” will continue to develop…
![Page 63: eScience Resources for the Chemistry Community from the Royal Society of Chemistry](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554e7e2db4c90545698b517d/html5/thumbnails/63.jpg)
Thank you
Email: [email protected] Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams