ebank uk: dissemination of research data using eprintseprints.rclis.org/6194/1/coles.pdf · ebank...

21
EPrints Workshop, January 2005 1 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton

Upload: hakhanh

Post on 27-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 1

eBank UK: Dissemination of research data using EPrints

Simon Coles, School of Chemistry, University of Southampton

Page 2: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 2

Overview• Scholarly communications in Chemistry

Data, information, workflows and provenance

• The data publication bottlenecke-Science and chemistry

• eBank UK Information architecture, data flow and interoperability

• Challenges for the futureExpansion into other disciplines and data formats

Page 3: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 3

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

The scholarly knowledge cycle.

Liz Lyon, eBankUK article. Ariadne, July 2003.

Page 4: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 4

Learning & Teaching workflows

Research & e-Science workflows

Aggregator services: eBankUK

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Resource discovery, linking, embedding

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

Page 5: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 5

Current chemistry publishing protocolsIdeas and interpretations Hooks into the literature

Results & derived data

Raw data!

Page 6: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 6

Page 7: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 7

Data Overload!

How do we disseminate?

EPSRC National Crystallography

Service

The data deluge

Page 8: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 8

CombeChem: eScience testbed

Properties

X-Raye-Lab

Analysis

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

Page 9: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 9

Establishing common ground…

• Understand the data creation process • Terminology and definitions

– Data– Metadata– Datafile– Dataset– Data holding

• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner

• Aim for a common ontology• Modelling the domain• Creating a metadata schema

Page 10: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 10

Crystallography workflow• Initialisation: mount new sample on diffractometer &

set up data collection• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File

format)• Report: generate Crystal Structure Report

RAW DATA DERIVED DATA RESULTS DATA

Page 11: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 11

Deposition into the archive

Page 12: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 12

An Archive entry

ecrystals.chem.soton.ac.uk

Page 13: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 13

Access to the underlying data

Page 14: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 14

Some metadata issues

• Using simple and qualified Dublin Core • Additional chemical information in schema for

harvesting e.g. empirical formula• Schema contains International Chemical Identifier

(InChI)• Links to all datasets associated with an experiment• Links to individual datasets within an experiment• Links to EPrints (and other published literature)

derived from the data• Using vocabularies specific to crystallography• Engaging the broader scientific community to ensure

different schemas are compliant and standards can emerge

Page 15: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 15

ebank_dcrecord (XML)

Crystal structure (data holding)

Crystal structure report (HTML)

Dataset

Dataset

Institutional repository

eBank UK aggregator service

ePrint UK aggregator service

Subject service

DepositHarvesting OAI-PMH

ebank_dc

Harvesting OAI-PMHoai_dc

Harvesting OAI-PMHoai_dc

Dataset

dc:identifier

dcterms:references

Linking

dc:type=“CrystalStructure” and/or “Collection”

Model input Andy Powell, UKOLN.

Eprint oai_dcrecord (XML)

dcterms:isReferencedBy

dc:type=“Eprint” and/or ”Text”

Data flow in eBank

Eprint“jump-off” page (HTML)

dc:identifierEprintmanifestation (e.g. PDF)

Linking

Page 16: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 16

Harvesting: OAIster

Page 17: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 17

Linking and aggregating

Page 18: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 18

Embedded in a science portal

Page 19: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 19

Current situation

• Version 2.0 eBank metadata schema• Pilot institutional e-data repository for harvesting

(raw, derived, results data) using EPrints.orgsoftware

• Exports records as ebank_dc and oai_dc• Validation of schema & discussion with

International Union of Crystallography for final developments and wider deployment

• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal –

embedding eBank UK

Page 20: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 20

What’s next?

• Progress towards generic metadata schemas • Validation against other schema (CCLRC Model)• Eprints.org software: allow for more generic scientific data

and schemas? • Metadata enhancement: keywords based on knowledge of

keywords in related publications?• Investigate identifiers: International Chemical Identifier • Explore context sensitive linking• Full embedding into chemical and crystallographic research

and publishing• e-Learning embedding and pedagogic evaluation• Feasibility study in related domains

Page 21: eBank UK: Dissemination of research data using EPrintseprints.rclis.org/6194/1/coles.pdf · eBank UK: Dissemination of research data using EPrints ... Aggregator services: ... transformation,

EPrints Workshop, January 2005 21

Breakout Session?• Describing non ‘Dublin Core’ terms

Qualified Dublin CoreComplex object formats: METS vs MPEG-21 DIDL Set & Friends containers

• Compliance between schemasOne generic schemaDevelop multiple schemas

• RightsUse / reusePublisher

• Linking & aggregatingDOIKeyword ontologiesIdentifiersContext sensitive linking