create and recieve scientific data

51
a centre of expertise in data curation and preservation Digital Curation 101, October 6 th -10 th , 2008, NeSC, Edinburgh Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc- sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Create or Receive Scientific data Dr. Frank Gibson and Dr. Phillip Lord [email protected] [email protected]

Upload: frank-gibson

Post on 18-May-2015

2.929 views

Category:

Technology


0 download

DESCRIPTION

A talk given at the DCC digital curation 101 workshop which illustrates how to curate and manage scientific data, considering the content, syntax and semantics of the data

TRANSCRIPT

a centre of expertise in data curation and preservation

Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Create or Receive Scientific data

Dr. Frank Gibson and Dr. Phillip [email protected]@newcastle.ac.uk

a centre of expertise in data curation and preservation

Create or Receive

“In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.”

- Geoffrey Bowker

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or Receivehttp://flickr.com/photos/nicmcphee/2756494307/

If we have a paper who cares about the data?

a centre of expertise in data curation and preservation

Create or Receive

A paper = a claim (or claims)

The full record that supports that claim should be available for detailed

examination and critique

Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or Receive

1000+Databases

a centre of expertise in data curation and preservation

Create or Receive

Biocuration: Databases

a centre of expertise in data curation and preservation

Create or Receive

Biocuration: Wiki

a centre of expertise in data curation and preservation

Create or ReceiveSlide by Cameron Neylon http://www.slideshare.net/CameronNeylon

a centre of expertise in data curation and preservation

Create or Receive

a centre of expertise in data curation and preservation

Create or Receive

Funders

http://flickr.com/photos/luismimunoznajar/2093185804/

a centre of expertise in data curation and preservation

Create or Receive

Create or

Receive

a centre of expertise in data curation and preservation

Create or Receive

Curation aims

AmenablePreservableOwnableAccessibleCitable

a centre of expertise in data curation and preservation

Create or Receive

Content

Syntax

Semantics

Significant Properties of Data

a centre of expertise in data curation and preservation

Create or Receive

Content

a centre of expertise in data curation and preservation

Create or Receive

Title

Creator

Type

Source

Date

Identifier

Publisher

Rights

a centre of expertise in data curation and preservation

Create or Receive

Simple Dublin Core

Title Creator Subject Description Publisher Contributor Date

Type Format

Identifier Source

Language Relation

Coverage Rights

a centre of expertise in data curation and preservation

Create or Receive

Content:Domain Specific

a centre of expertise in data curation and preservation

Create or Receive

Syntax

a centre of expertise in data curation and preservation

Create or Receive

a centre of expertise in data curation and preservation

Create or Receive

Choosing a Syntax• Openness

• -is there an open, publicly available specification for the format; are its specifications in the public domain; is it unencrypted?

• Portability • -is the format independent of hardware, operating system, of

other software; is it independent of particular institutions, groups, or events; is it in widespread current use; does it contain little or no built-in functionality?

• Quality • -is it robust; simple; highly tested; loss-free?

a centre of expertise in data curation and preservation

Create or Receive

Semantics

a centre of expertise in data curation and preservation

Create or Receive

Semantics can be complex

One semantic = many wordsMany words = one semantic

a centre of expertise in data curation and preservation

Create or Receive

• Excel data example – do I need it?

•Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80 •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80

a centre of expertise in data curation and preservation

Create or Receive

What is fly?

•http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg

•http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg

•http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg

•http://en.wikipedia.org/wiki/Image:Fly_poster.jpg

•Fly

•Fly•Fly

•Fly

a centre of expertise in data curation and preservation

Create or Receive

Ontology• A controlled vocabulary is an association

between formal names (identifiers) and their definitions.

• An ontology is a controlled vocabulary augmented with logical constraints that describe their interrelationships.

a centre of expertise in data curation and preservation

Create or Receive

Ontologies for Life science• Emergence has occurred for two reasons• Consistent annotation of data• To add meaning and understanding that can

be interpreted computationally• Bio-ontologies registered on the OBO foundry

a centre of expertise in data curation and preservation

Create or Receive

Application of Significant PropertiesInProteomics

a centre of expertise in data curation and preservation

Create or Receive

Minimum Information about a Proteomics Experiment (MIAPE)• Sufficiency.

• The MIAPE guidelines should require sufficient information abouta dataset and its experimental context to allow a reader to understand and critically evaluate the interpretation and conclusions, and to support their experimental corroboration.

• Practicability. • Achieving compliance with MIAPE should not be so burdensome

as to prohibit its widespread use.

a centre of expertise in data curation and preservation

Create or Receive

a centre of expertise in data curation and preservation

Create or Receive

Minimum reporting guidelines• Describe content• Implementation

independent

• Impacts • Publication• Syntax• Semantics

a centre of expertise in data curation and preservation

Create or Receive

Syntax for proteomics• The content in MIAPE GE needs to be structured to

facilitate • dissemination • transfer• storage

• A community development process to agree on a syntax • building upon the FuGE data model• A pre-existing community developed representation of

scientific experiments• Interoperable

a centre of expertise in data curation and preservation

Create or Receive

FuGE• Model of common components in science investigations, such

as materials, data, protocols, equipment and software. • Provides a framework for capturing complete laboratory

workflows, enabling the integration of pre-existing data formats.

a centre of expertise in data curation and preservation

Create or Receive

UML/XML/RDBMS• UML gives structure (but not syntax)

• Very abstract, very general• XML provides a concrete syntax

• Meta language is interoperable, checkable, viable and has basic metadata support (language, character coding and so on).

• Tends toward the verbose. Not (very) searchable for itself.• Therefore, transfer and archive format.

• RDBMS• SQL is (sort of) a standard• Highly computationally amenable form; v. good for searching• Conversion from XML is possible, but in a number of ways. • Hard work – nice to have an off-the-shelf implementation.

a centre of expertise in data curation and preservation

Create or Receive

GelML

a centre of expertise in data curation and preservation

Create or Receive

Semanticsfor

Gels

a centre of expertise in data curation and preservation

Create or Receive

Semantics for science

a centre of expertise in data curation and preservation

Create or Receive

Curation of Gel experiments

MAIPEGE

MAIPEGI

LaboratoryPublic repositoriesData entry and transfer

I) GelML data entry tools

GelML

II) Direct database submission

III) Automated export of GelInfoML

sepCV

a centre of expertise in data curation and preservation

Create or Receive

Discoverability and reuse

•Persistent Identifiers•Rights management

a centre of expertise in data curation and preservation

Create or Receive

Persistent Identifiers• a name for a resource which will remain the same

regardless of where the resource is located • In biology typically assigned to data upon publication• Type of identifier dependent on publication method

• Description and Representation Information provides more information about persistent identifiers

a centre of expertise in data curation and preservation

Create or Receive

Rights management• Difficult to determine • Lots of legal issues• In biology/bioinformatics

tends to be open access

•Creative commons

a centre of expertise in data curation and preservation

Create or Receive

Receiving data for curation

ContentSyntaxSemantics

a centre of expertise in data curation and preservation

Create or Receive

Route map

Route mapWho will receive it?

What are their policies on: Content, Syntax, Semantics

Plan your experiment to conform to Content, Syntax, Semantics

Implement experiment to;Collect appropriate ContentStructure in appropriate SyntaxEnsure Semantics are preserved

Curate…

a centre of expertise in data curation and preservation

Create or Receive

Meta Route Map• How to build the map if you don’t have one

yet.

a centre of expertise in data curation and preservation

Create or Receive

Appraise and Select• Investigates the evaluation and selection of

data for longterm curation and preservation

a centre of expertise in data curation and preservation

Create or Receive

Acknowledgments• The CARMEN project

• www.carmen.org.uk• The Proteomics Standards Initiative (PSI)

• http://psidev.info• Colleagues at Newcastle University

• Phillip Lord, Anil Wipat, Allyson Lister

a centre of expertise in data curation and preservation

Create or Receive