using the biological collections ontology to advance biodiversity science

20
Using the Biological Collections Ontology to Advance Biodiversity Science TDWG 2014, Jönköping, Sweden Ramona Walls John Wieczorek Robert Guralnick John Deck

Upload: rlwalls2008

Post on 14-Jul-2015

110 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Using the Biological Collections Ontology to Advance Biodiversity Science

Using the Biological Collections Ontology to Advance Biodiversity

Science

TDWG 2014, Jönköping, SwedenRamona Walls

John WieczorekRobert Guralnick

John Deck

Page 2: Using the Biological Collections Ontology to Advance Biodiversity Science

Overview

1. How we model biodiversity information in the Biological Collections Ontology

2. Integrating ontologies into biodiversity information workflows

Page 3: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

Page 4: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

RECORD

Page 5: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

MATERIAL SAMPLE& ORGANISM

Page 6: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

EVENT & OCCURRENCE

Page 7: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

LOCATION

Page 8: Using the Biological Collections Ontology to Advance Biodiversity Science

Properties in an example Darwin Core record

• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day

• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn

Meters• georeferencedBy

• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship

IDENTIFICATION/TAXON

Page 9: Using the Biological Collections Ontology to Advance Biodiversity Science

Using DwC properties in BCO: Event as an example

Page 10: Using the Biological Collections Ontology to Advance Biodiversity Science

Material entities, information entities, and processes in the Basic Formal Ontology

Page 11: Using the Biological Collections Ontology to Advance Biodiversity Science

Mapping DwC classes to BCO:basisOfRecord terms as an example

Page 12: Using the Biological Collections Ontology to Advance Biodiversity Science

How to create RDF triples (using Ontology terms) for biodiversity data

Check for an easy way first!See if you can use the BiSciCol triplifier (http://biscicol.org/triplifier/) or similar tool that automates file conversion for specific formats. If not, proceed.

Create Mapping File• Create groups of columns and assign to relevant classes• Define columns containing a URI identifier for each class within each distinct record. • If you’re not importing an existing ontology, create relationships between classes Assemble into Mapping File, the format depending on the tool used in the next step.

Use Conversion Tool Check out WebKarma (http://www.isi.edu/integration/karma/) or D2RQ (http://d2rq.org/).

Send to Triple-StoreUpload data to a Triple-Store or SPARQL Endpoint (e.g Virtuoso http://www.openlinksw.com/)

http://www.wikihow.com/Create-RDF-Triples-%28Using-Ontology-Terms%29-for-Biodiversity-Data

Page 13: Using the Biological Collections Ontology to Advance Biodiversity Science

Specimen data from a Darwin Core Archive: VertNet

Page 14: Using the Biological Collections Ontology to Advance Biodiversity Science

Collecting event:

location

depth

weather

cruise

biome

site description

temperature

*

*

*Metagenomicsequence:library accession #sequencing methodmolecule typenumber of reads…

iMicrobe data links specimens to metagenomicsequences and environmental parameters

Parameters:salinitypHfluorescenceturbiditysample volumesilicateoxygendissolved organic carbon….

Page 15: Using the Biological Collections Ontology to Advance Biodiversity Science

iMicrobe data mapped to BCO

Page 16: Using the Biological Collections Ontology to Advance Biodiversity Science

Linking prospective data to ontologies is much easier!

quer

y

Page 17: Using the Biological Collections Ontology to Advance Biodiversity Science

Conclusions

• BCO can work across different data types, not just for DwC.

• The work of producing BCO has forced us to look at DwC definitions more rigorously.

• BCO provides an opportunity to manage parts of the DwC vocabulary as controlled vocabularies that are rigorously, logically defined.– example: basisOfRecord

• Road map for this work includes the intention to propose BCO as a TDWG standard.

Page 18: Using the Biological Collections Ontology to Advance Biodiversity Science

Acknowledgments

• Dozens of participants at BCO workshops and hackathons over the past two years

• NSF-EAGER: An Interoperable Information Infrastructure for Biodiversity Research (I3BR)

• NSF: Research Coordination Network for GSC (RCN4GSC)

• Gordon and Betty Moore Foundation (iMicrobe)

• VertNet

• University of Kansas Biodiversity Institute

Page 19: Using the Biological Collections Ontology to Advance Biodiversity Science
Page 20: Using the Biological Collections Ontology to Advance Biodiversity Science