the possibilities and pitfalls of internet-based chemical data · the possibilities and pitfalls of...

125
The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams Royal Society of Chemistry

Upload: others

Post on 12-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Possibilities and Pitfalls of Internet-Based Chemical Data

Antony Williams

Royal Society of Chemistry

Page 2: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

I’ve performed a few dozen chemical syntheses

I’ve run thousands of analytical spectra

I’ve generated thousands of NMR assignments

I’ve probably published <5% of all work

But things can be different today….

About Me…as a Chemist

Page 3: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

My Early Scientific Computing

Page 4: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

If it was not just about me…

Page 5: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

If it was not just about me…

Together we might:

build an encyclopedia

…and rate restaurants

…provide book reviews to each other

…or movie reviews

…or reviews of service providers

…organize sit-ins and social action

…and more data might just be Open

Page 6: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

If it was not just about me…

Together we might:

build an encyclopedia

…and rate restaurants

…provide book reviews to each other

…or movie reviews

…or reviews of service providers

…organize sit-ins and social action

…and more data might just be Open

…more Chemists might share rather than just take!

Page 7: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A hobby-project to connect chemistry data on the web

Three servers – one purchased, two hand-built

Software begged and borrowed – and thanks to Microsoft!

Some late nights – 10pm to 2am for over a year

Some survival of the naysayers in the community

…and taking advantage of a changing world of data availability and the crowdsourcing of willing participants

NO formal funding. Simply passion and abilities lining up.

A story of a hobby gone wild… Years 1 and 2

Page 8: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ChemSpider (Year 2-present)

Building a Free Chemical Database

A central hub for chemists to source information

>28 million unique chemical records

Aggregated from >400 data sources

Chemicals, analytical data, movies, images, podcasts, links to patents, publications, predictions

Web services for integration

Daily updates of data

Page 9: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Answer Questions for Chemists

Questions a chemist might ask…

What is the melting point of n-heptanol?

What is the chemical structure of Xanax?

Chemically, what is phenolphthalein?

What are the stereocenters of cholesterol?

Where can I find publications about xylene?

What are the different trade names for Ketoconazole?

What is the NMR spectrum of Aspirin?

What are the safety handling issues for Thymol Blue?

Page 10: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A LITTLE Chemistry First

Page 11: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Structural Diagrams

Page 12: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Structural Diagrams

Page 13: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Analytical Data

Page 14: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Does Stereochemistry Matter?

Page 15: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Does one stereocenter matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide

Page 16: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Structural Representations

Page 17: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The InChI Standard

Page 18: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

InChIKeys Search the Web by Structure

Page 19: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

I want to know about “Vincristine”

Page 20: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 21: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Vincristine: Identifiers and Properties

Page 22: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Vincristine: Vendors and Sources

Page 23: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Vincristine: Patents

Page 24: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Chemical Names and Synonyms VALIDATION OF NAMES

Page 25: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Validated Names for Searching…

Page 26: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Information System Architecture

Input Filtering Curation Archival

Storage

Indexing

Processing Search Browse

Presentation

API

Page 27: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Quality of Chemical Data Online What is the Structure of Vitamin K?

A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K1 (phytomenadione) derived from plants, VITAMIN K2 (menaquinone) from bacteria & synthetic naphthoquinone provitamins, VITAMIN K3 (menadione).

Page 28: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

What is the Structure of Vitamin K1?

Page 29: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

What is the Structure of Vitamin K1?

Page 30: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

CAS’s Common Chemistry

Page 31: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Wikipedia

Page 32: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Wolfram Alpha

Page 33: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

DailyMed

Page 34: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 35: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

People Use Trusted Resources…

Page 36: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Just Yesterday…

Page 37: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

How will it improve?

Participation

and

contribution

Page 38: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ALL Different, ALL “Domoic Acids”

Page 39: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ALL Different, ALL “Domoic Acids”

Page 40: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The EXPERTS must get it right?!

Page 41: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Question Everything Online: www.dhmo.org

Page 42: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ANYBODY can annotate a record on ChemSpider

Registered users can deposit new data

Registered users can validate existing data

Deposition, Annotation and Validation

Page 43: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

CURATION Search “Vitamin H”

Page 44: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

“Curate” Identifiers

Page 45: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

“Curate” Identifiers

Page 46: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ChemSpider Web Services

Page 47: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ChemSpider via web service access

For structure identification for mass spectrometry

For name and structure resolution

For structure and substructure searching

For an “innovative medicines initiative” semantic web project…

Open APIs for Science

Page 48: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Open PHACTS Project Develop a set of robust standards

Integrate Chemistry and Biology data by implementing the standards in a semantic integration hub

Deliver services to support drug discovery programs in pharma and public domain

INITIALLY 22 partners, 8 pharmaceutical companies, 3 biotechs

36 months project – first public release version is imminent

Guiding principle is open access, open usage, open source - Key to standards adoption -

Page 49: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Using RDF permalinks

http://www.chemspider.com/Chemical-Structure.7787.rdf

Using a Search Term

http://www.chemspider.com/rdf.ashx?q=cyclohexane

http://rdf.chemspider.com/cyclohexane

RDF and the semantic web

Page 50: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

RDF and the semantic web

Page 51: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

www.SpectralGame.com http://www.jcheminf.com/content/1/1/9

Page 52: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Times have changed

Immediacy of social networks

Commenting on articles/data is here

The “participating scientist” has high profile

And who can be a scientist now???

The World of Contribution

Page 53: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A Ten Year Old Scientist

Page 54: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 55: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Challenging a Publication

Page 56: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 57: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Oops…

Page 58: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

>2 Years to Resolution

Page 60: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Blogosphere “Discusses”…

Page 61: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Oxidation by Sodium Hydride?

Page 62: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Blogosphere Analyzes…

Page 63: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Blogosphere Analyzes…

Page 64: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

How much is in the archives?

Page 65: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Open Notebook Science Analysis

Page 66: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Motivation Faster Science, Better Science

Page 67: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Openness – Still Carries Licensing

Openness may be hard..

Open Access flavors

Open Source licenses

Open Data licenses

Open Notebook Science

Page 68: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

License data based on GOALS: scientific, commercial, or mixed

Explore the benefits of open licensing and drawbacks of enclosure

Provide simple explanations terms of use

If you can't make the data public domain, make the metadata public domain.

We Suggest Rules for Licensing Data

Page 69: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

We Suggest Rules for Licensing Data

Page 70: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Challenged in the Twittersphere

Page 71: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Annotating Articles Today…

Page 72: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Attribution to me…

Page 73: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Other Publications to Annotate…

Page 74: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Other Publications to Annotate…

Page 75: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Publications to Annotate…

“We then established a collaboration with professor Sum Ting Wong, a fugitive from the North Korean University Hu Yu Hai Ding”

“..identified as the new protein Wai So Dim”

Page 76: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A New World for Publishing?

Page 77: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

An Adventure into the World of Small but significant contribution..

Page 78: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ChemSpider SyntheticPages

Page 79: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Micropublishing with Peer Review (a chemical synthesis blog?)

Page 80: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Multi-Step Synthesis

Page 81: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Interactive Data

Page 82: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A New Route for Scientific Recognition?

Page 83: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

How do “we” measure a scientist?

The funding bodies, department heads etc. use

Publication profile

Impact factors

An index – h, m, g, i10, c, s …

Grants brought in

Scientists are notable in different ways – technology can help measure different types of “impact”

The Measure of a Scientist?

Page 84: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

What makes a Scientist Notable?

Page 85: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Online tools track activities of scientists

Some are totally opt-in, an increasing number are about you and need checking!

Take responsibility for your profile online

Actively BUILD your online profile

Public Profiles of Scientists

Page 86: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Microsoft Academic Search

Page 87: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

My Academic Search Profile

Page 88: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

My Co-author Graph

Page 89: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

How many times do you see errors where:

1) You have not been able to annotate or curate

2) You have chosen not to annotate or curate

Q: How Often Do You Contribute? Annotation and Validation

Page 90: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

My Co-author Graph

Page 91: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Contribute when you can!

Page 92: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Contribute when you can!

Page 93: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 94: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Scientists and Orcids?

Page 95: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

A unique identifier for a scientist – a Scientists InChI !

Will enable aggregation of a scientists activities

ORCIDs associated with publications, data, blog comments, other contributions (Wikipedia, reviews etc.) will be a way to measure their impact

Page 96: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Alt-Metrics Manifesto

http://altmetrics.org/manifesto/

Page 97: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ImpactStory

Page 98: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ImpactStory

Page 99: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

SlideShare

Page 100: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

SlideShare via ImpactStory

Page 101: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

ImpactStory

Page 102: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Where do I contribute? How might I be measured?

Page 103: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Article Level Metrics

Page 104: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Article Level Metrics

Page 105: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Impact will be an aggregate measure of

Publications – classic measures and article level metrics

Data, algorithms and code – and its distribution and reuse

Contributions as comments, annotation and curation activities

New “impact factors” will develop with time

New Measures of Impact

Page 106: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Some challenges are technology based

The growth in data – storage and compute speed

Ontologies, dictionaries and trusted sources

Many challenges are “about us”

Licenses and rights

Rewards and recognition

Participation, contribution and collaboration

The Challenges

Page 107: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

There are many government institutions building public compound databases that should collaborate more:

National Cancer Institute (NCI)

National Institutes of Health (NIH)

Environmental Protection Agency (EPA)

Food and Drug Administration (FDA)

National Library of Medicine (NLM)

Tear Down Walls between Government Labs

Page 108: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open
Page 109: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Release STRUCTURES Please!

Page 110: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

What Does the Future Hold?

Page 111: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Linked Network Will Grow

Page 112: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

The Data Deluge Will Not Go Away

Page 113: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

RSC Activities in Development

Deliver a Global Chemistry Hub

“Data enable” the RSC archive back to 1841:

Extract chemistry – chemicals, reactions, experimental data points, complex data

Enrich the articles for interactive viewing and crowdsourced annotation and curation

Enhance queries possible across the archive

Page 114: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Federated Data Segregation

Page 115: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Future System Architecture

Input Filtering Curation Archival

Storage

Indexing

Processing Search Bro

Presentation No more complex

API Complexity is hidden

Input Input

Curation Curation

Storage Storage

Elastic, distributed Indexing Indexing

New algorithms

Processing Processing Distributed

Search Search Over federated

systems

Archival Archival

Filtering Filtering Smarter

algorithms

Browse Over federated

systems

Page 116: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Data Validation is Exacting Work

Page 117: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

“Challenge” the Community

Page 118: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Chemistry is NOT just small molecules!

Data in RSC publications will be “enabled”

Data available for validation and curation

The delivery of the “Datument”

Data will be fed to models for validation, to retrain the models, full provenance retained

Algorithms will be provided to the community

Chemistry Data at RSC

Page 119: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Enhanced Mark-Up?

Page 120: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

An Error in my Abstract?

Page 121: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

An Error in my Abstract?

Chemists have embraced the web as a rich source of data and knowledge. However, all that glisters is not gold

Page 122: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Thanks Shakespeare

Page 123: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Acknowledgments

RSC and RSC|Cheminformatics team

All data source providers, curators and annotators

All software providers: commercial and open source

Contributors, curators, collaborators

Trusted Advisors: Jean-Claude Bradley, Sean Ekins, Lee Harland, Gary Martin, Martin Walker and…

Page 124: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Meet Valery… We’d love to chat…

Page 125: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open

Thank you Email: [email protected] Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams