integration of orechem with the ecrystals repository for crystal structures

25
Integration of oreChem with the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey 15 September 2010

Upload: mark-borkum

Post on 01-Nov-2014

1.304 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Integration of oreChem with the eCrystals repository for crystal structures

Integration of oreChem with the eCrystals repository for crystal structures

Mark Borkum, Simon Coles and Jeremy Frey15 September 2010

Page 2: Integration of oreChem with the eCrystals repository for crystal structures

2

Overview• Motivation

• Implementation

• Discussion and Summary

Page 3: Integration of oreChem with the eCrystals repository for crystal structures

3

Current Practice in Crystallography• Crystallography data

is highly structured

– The de facto standard adopted by the community is the CIF (Crystallographic Information File)

• Relatively few crystal structures are openly published

http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs

Page 4: Integration of oreChem with the eCrystals repository for crystal structures

4

Open Access Journals• Advantages:

– Rapid publication– Highly cited– Data is available to download

• Disadvantages:– Electronic only– Not all data is of primary

importance to the underlying chemistry

• By-products, unexpected results, tracking reactions, etc.

Page 5: Integration of oreChem with the eCrystals repository for crystal structures

5

Crystallography and Fraud

Page 6: Integration of oreChem with the eCrystals repository for crystal structures

6

The eCrystals Federation• JISC project to establish a network of

crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services

– Led by the UK National Crystallography Service (NCS)

– With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics

Page 7: Integration of oreChem with the eCrystals repository for crystal structures

7

eCrystals – University of Southampton• Located @ http://ecrystals.chem.soton.ac.uk

• Archive for crystal structures that are generated by:

– Southampton Chemical Crystallography Group– UK National Crystallography Service (NCS)

• Modified version of EPrints 3.1

– OAI-PMH compliant– Extensible platform (with plug-ins architecture)

Page 8: Integration of oreChem with the eCrystals repository for crystal structures

8

What is an eCrystal?• “all the fundamental

and derived data resulting from a single crystal X-ray structure determination”

• “the information supplied should enable any reader to check the reliability and validity”

http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif

Page 9: Integration of oreChem with the eCrystals repository for crystal structures

9

The Scientific Web

Page 10: Integration of oreChem with the eCrystals repository for crystal structures

10

The Data Deluge• In Haiku:

– Lots of producers;Generating more datathan ever before.

• 40 years ago, a PhD student would determine 3 structures over the entire course of their study!

The Great Wave off Kanagawa by Katsushika Hokusai

Page 11: Integration of oreChem with the eCrystals repository for crystal structures

11

Provenance• The 7 W’s [Goble

2002]

– Who, What, Where, Why, When, Which, & (W)How

• The Why aspect is usually ignored

– Rational, intent, hypothesis, protocol, methodology, workflow, etc.

“Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four

countries since it was painted for Philip II of Spain in the 1550s.”

Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29

Page 12: Integration of oreChem with the eCrystals repository for crystal structures

“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)

12

Page 13: Integration of oreChem with the eCrystals repository for crystal structures

13

Why “Why” Matters• It is the reason for

the data’s existence

• It gives us the ability to interpret the data in the correct context

• It allows us to align the data with the big picture http://www.myexperiment.org/workflows/16.html

Page 14: Integration of oreChem with the eCrystals repository for crystal structures

14

The oreChem Core Ontology• Describes three concepts:

1. The methodology (planned method) of a scientific experiment

2. The enactment of methodologies

3. The provenance of realised artefacts

Page 15: Integration of oreChem with the eCrystals repository for crystal structures

15

Methodology (Planned Method)• The “plan” is

modelled as a directed graph

• Two node types:

– Plan Stagedescription of an activity that will be enacted

– Plan Object description of an artefact that will be realised

Page 16: Integration of oreChem with the eCrystals repository for crystal structures

16

Enactment (of a Methodology)• Each “run” (of a plan)

is modelled as a directed graph

• Two node types:

– Stagedescription of an activity that has been enacted

– Objectdescription of an artefact that has been realised

Page 17: Integration of oreChem with the eCrystals repository for crystal structures

17

Provenance• Prospective

– The plan describes a scientific experiment that will be enacted

• Retrospective

– The run describes a scientific experiment that has been enacted

– Every ‘run thing’ is linked to exactly one ‘plan thing’

Page 18: Integration of oreChem with the eCrystals repository for crystal structures

18

oreChem Plug-in for eCrystals• Three components:

1. orechem:Plan (the eCrystals methodology)

2. “eCrystal orechem:Run” mapping

3. “orechem:Run provenance graph” pipeline

Page 19: Integration of oreChem with the eCrystals repository for crystal structures

19

The eCrystals Methodology

Before After

Page 20: Integration of oreChem with the eCrystals repository for crystal structures

20

Example: eCrystal #643

Before After

Page 21: Integration of oreChem with the eCrystals repository for crystal structures

21

SPARQL RequestPREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reportedWHERE { ?run a orechem:Run ; orechem:hasPlan ecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObject ecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObject ecrystals:CIF ; orechem:derivedFrom ?derived .}

Page 22: Integration of oreChem with the eCrystals repository for crystal structures

22

SPARQL Response (for eCrystal #643)

?run

?raw

?reported

?derived

Page 23: Integration of oreChem with the eCrystals repository for crystal structures

23

Summary• <summary/>

Page 24: Integration of oreChem with the eCrystals repository for crystal structures

24

Acknowledgments• oreChem is funded by Microsoft External Research

• eCrystals is funded by both EPSRC and JISC

• The oreChem project team:– Nico Adams, Mark Borkum, William Brouwer, Rameswara Sashi

Kiran Challa, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, Prasenjit Mitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.

Page 25: Integration of oreChem with the eCrystals repository for crystal structures

25

#ahm2010

#ahm

#ahm10

#pch2010

http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010