using the biological collections ontology to advance biodiversity science
TRANSCRIPT
Using the Biological Collections Ontology to Advance Biodiversity
Science
TDWG 2014, Jönköping, SwedenRamona Walls
John WieczorekRobert Guralnick
John Deck
Overview
1. How we model biodiversity information in the Biological Collections Ontology
2. Integrating ontologies into biodiversity information workflows
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
RECORD
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
MATERIAL SAMPLE& ORGANISM
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
EVENT & OCCURRENCE
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
LOCATION
Properties in an example Darwin Core record
• occurrenceID• modified• rights• institutionCode• collectionCode• datasetName• basisOfRecord• dynamicProperty• catalogNumber• recordedBy• sex• preparations• otherCatalogNumbers• associatedMedia• associatedReferences• associatedSequences• eventDate• year• month• day
• fieldNumber• eventRemarks• higherGeography• continent• waterBody• islandGroup• island• country• stateProvince• county• locality• minimumDepthInMeters• maximumDepthInMeters• locationRemarks• decimalLatitude• decimalLongitude• geodeticDatum• coordinateUncertaintyIn
Meters• georeferencedBy
• georeferencedDate• georeferenceSources• georeferenceRemarks• identifiedBy• dateIdentified• typeStatus• scientificName• kingdom• phylum• class• order• family• genus• specificEpithet• infraspecificEpithet• scientificNameAuthorship
IDENTIFICATION/TAXON
Using DwC properties in BCO: Event as an example
Material entities, information entities, and processes in the Basic Formal Ontology
Mapping DwC classes to BCO:basisOfRecord terms as an example
How to create RDF triples (using Ontology terms) for biodiversity data
Check for an easy way first!See if you can use the BiSciCol triplifier (http://biscicol.org/triplifier/) or similar tool that automates file conversion for specific formats. If not, proceed.
Create Mapping File• Create groups of columns and assign to relevant classes• Define columns containing a URI identifier for each class within each distinct record. • If you’re not importing an existing ontology, create relationships between classes Assemble into Mapping File, the format depending on the tool used in the next step.
Use Conversion Tool Check out WebKarma (http://www.isi.edu/integration/karma/) or D2RQ (http://d2rq.org/).
Send to Triple-StoreUpload data to a Triple-Store or SPARQL Endpoint (e.g Virtuoso http://www.openlinksw.com/)
http://www.wikihow.com/Create-RDF-Triples-%28Using-Ontology-Terms%29-for-Biodiversity-Data
Specimen data from a Darwin Core Archive: VertNet
Collecting event:
location
depth
weather
cruise
biome
site description
temperature
…
*
*
*Metagenomicsequence:library accession #sequencing methodmolecule typenumber of reads…
iMicrobe data links specimens to metagenomicsequences and environmental parameters
Parameters:salinitypHfluorescenceturbiditysample volumesilicateoxygendissolved organic carbon….
iMicrobe data mapped to BCO
Linking prospective data to ontologies is much easier!
quer
y
Conclusions
• BCO can work across different data types, not just for DwC.
• The work of producing BCO has forced us to look at DwC definitions more rigorously.
• BCO provides an opportunity to manage parts of the DwC vocabulary as controlled vocabularies that are rigorously, logically defined.– example: basisOfRecord
• Road map for this work includes the intention to propose BCO as a TDWG standard.
Acknowledgments
• Dozens of participants at BCO workshops and hackathons over the past two years
• NSF-EAGER: An Interoperable Information Infrastructure for Biodiversity Research (I3BR)
• NSF: Research Coordination Network for GSC (RCN4GSC)
• Gordon and Betty Moore Foundation (iMicrobe)
• VertNet
• University of Kansas Biodiversity Institute