2 donat agosti-1
DESCRIPTION
Interoperability of taxon treatments. Lecture at at the Final Meeting of the Pro-iBiosphere Conference, Meise, Belgium. http://wiki.pro-ibiosphere.eu/wiki/Final_ConferenceTRANSCRIPT
Donat AgostiPlazi
Brussels, June 2, 2014
Supported by the European Commission through its FP7 research funding programme
Interoperability of Taxon Treatments
Hardisty, Nature 502, 171 (2013)
BUT: predictive ecology has substantial data needs
Harfoot, BIH2013, Rome, 2013
The big question
What is the future of the biological world?
Imagine if we could:
…Predict community level dynamics of ecosystems atscales from local to global, based on the ecology andbiology of all individual organisms
200,000,000+ printed pages1,900,000 species described20,000,000+ species treatments 17,000 new species per year
Biodiversity libraries
BUT: The data are hidden
Incomplete digitization Publications are not semantically enhancedCollections are incompleteData is not linkedMost data are not open
Interoperability of taxa
Can we build a system (e.g. Open Biodiversity KnowledgeManagement System) that includes a component that extracts, stores and serves and serves information on taxa in a system thatis agnostic of Biota?
Traditionally Floras, Faunas, Mycotas are dealt with by different communities
Pro‐iBiosphere project is to develop a blue print of an Open Knowledge Management System
It is not building a system
Pilots to demonstrate specific issuesinteroperability of taxaexplore workflows to produce recommendations of «best» practicesinteroperability of infrastructuresregistration of namesadvanced publishing
Do not expect production level products
Each taxonomic name usage has it’s treatment
Treatment
Formica obsoleta Linnaeus, 1758: 580
Treatment as standard containers
http://en.wikipedia.org
Pilot 1: Taxa used for markup
Taxa Documents Treatments
Mistletoes 3 124
Chenopodium 15 174
Fungi 5 5
Bryophyta 2 25
Nephrolepis 1 35
Centipedes 50 154
Ants 40 486
Spiders 30 219
TOTAL ca. 140 ca. 1500
Chenopodium pilot
Pardosa logunovi
Spider pilot: machine access to content through markup
Spider pilot: overview of 34 OA Zootaxa publications
5170 specimens4062 plottable specimens from1138 unique locations
Pseudomyrmex ants and Vachellia ant‐acaciasare a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzenikuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulansspinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments
Acacia‐ant species: Pseudomyrmex gracili
Treatment: original description
Treatment: redescription
Associated ant‐acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked through citations
Transbiotic interoperability
Pro‐iBiosphere1,000 treatementsPlazi10,000 treatments
Pensoft23,000
Total34,000 treatments
Legacy literature
Prospectiveliterature
0°
All data in Plazi
14,590 specimens8900 plottable specimens from1138 unique locations
Brazil
5170 specimens4062 plottable specimens from1138 unique locations
Brasil
Journal of Hymenoptera Research
5170 specimens4062 plottable specimens from1138 unique locations
Interoperability of taxa
Can we build a system (e.g. Open Biodiversity KnowledgeManagement System) that includes a component that extracts, stores and serves and serves information on taxa in a system thatis agnostic of Biota?
Yes, we can.
Legacy Prospective
Digitization √
OCR / Text capture √
Markup √ (√)
Standardization √ √
Strategies to markup √
External links √ (√)
Semanticenhancment
√ (√)
Create content √ (√)
Isssues and Recommendations
Find the right mix of generic and domain specific solutions
Plazi SRS
find scan «OCR» markup store
?domain domaingeneric
Digitization and Markup Workflow:
$$$$ ?
200,000 Taxonomic Articles in Zoological Record Since 1864
Create Content: selection strategy
Markup / data extraction strategies
Dedicated external services, bulkApplications for individual contributor, small scaleInvolve community / crowd / wikimediaAd hoc Web Services, individualMixed strategies
Combination with re‐publishing, small scale
Create market for treatments, large scale
Variation in status labels
TaxStatus ctd. Total ctdREVISED STATUS 10s. str. 1
sp. n. 130
sp. nov. 4057sp.n. 3
spec. nov. 34
stat. nov. 56Status revised 9
subsp. nov. 26var. nov. 80
(blank)
Grand Total 5965
TaxStatus Total
comb. nov. 246G. N. 65gen. nov. 19gen.nov. 10hybr. nov. 13n sp 12n. comb. 2n. nom. 6n. sp. 267n. stat. 5n. subg. 3new combination 139new species 651NEW STATUS 114nomen novum 6nov. spec. 1
Standardize and apply in prospective publishing …
Quality Control and Standardization
«sp.nov.»
Standardization of markup
Formica rufa Linnaeus 1758: 426Genus name year of pub.
Speciesepithet page of
publicatName
Authority
Bibliographic reference
Treatment citation
Linking of treatment as an example for external links
Treatment citation
Treatment identifier
Conclusions
• Biodiversity literature is very rich in data
• BL has a basic structure (treatments) across all Biota
• Legacy literature should be strategically marked up
• Prospective literature should be semantically enhanced
• Markup tools exist and should be optimized
• Identifiers for treatments exist to link to treatments