irmng – the interim register of marine and nonmarine genera: rationale and current status tony...
Embed Size (px)
- Slide 1
IRMNG the Interim Register of Marine and Nonmarine Genera: rationale and current status Tony Rees CSIRO Marine and Atmospheric Research, Australia for: GN-CoL names and taxonomy sharing workshop, Hawaii, March 2012 www.obis.org.au/irmng Slide 2 Tony Rees: IRMNG March 2012 The Dream Imagine a system that would Automatically classify any genus & species name to kingdom / phylum / class / order / family (as far down as possible) what is this critter plus hierarchical relations e.g. parents / children / siblings Return whether a current (valid) or non-current name e.g. synonym Check spelling for correctness, also authority details, plus supply original publication ref. as available Return associated attributes such as extant / fossil status, habitat information, geographic / geologic range, more Work seamlessly, with a single point of entry, across all groups and geologic epochs including present day Be as up-to-date as possible (latest content), and authoritative (maintained by relevant experts) Slide 3 Tony Rees: IRMNG March 2012 Realising the Dream For extant taxa: role of Cat. of Life, however ~30% of species still to go; for fossil taxa: PaleoDB (unknown proportion missing, maybe 50%?) In mean time, could make progress by assembling global genera list, and infilling with species names as available IRMNG is an attempt along these lines a work in progress, with modest resourcing, but available for use now. genera species Slide 4 Tony Rees: IRMNG March 2012 IRMNG data sources Animal genera + auths from Nomenclator Zoologicus and elsewhere, tax. placements and synonymies from multiple sources including CoL, individual taxon treatments and printed works Botanical genera and auths from Index Nominum Genericorum (ING) supplemented with other sources, tax. placements and synonymies from multiple sources including GRIN (APGIII in the main), Index Fungorum, AlgaeBase, CyanoDB, more Prokaryote genera, auths and tax. placements from LSPN (Euzby list), previous/non-valid names from multiple sources Virus genera and tax. placements from ICTV db (multiple versions very different through time) Species lists (all groups) from CoL 2006, Aphia/WoRMS 2006, AFD, NZ Organisms Register + more. Slide 5 Tony Rees: IRMNG March 2012 IRMNG content as at March 2012 (cf. e.g. Cat. of Life): Not all IRMNG genera yet linked to relevant families, but ~370k are (remainder linked to higher taxon i.e. phylum, class or order) Extant/fossil, marine/nonmarine flags held for majority of names Nomenclatural status known for most names, tax. status i.e. valid name/synonym for only a subset at this time (varies by group) Authority known for >97% of genera, publication details for animal subset (from Nomenclator Zoologicus in the main) Fuzzy matching (TAXAMATCH) deployed over all web-based queries for correction of potential errors in input names to be matched. IRMNG: 19k families 454k genera 1.46m species names (including synonyms) Cat. of Life (2011 version) : 8k families 178k genera 2.25m species names (including synonyms) Slide 6 Tony Rees: IRMNG March 2012 IRMNG in practice example genus = Lawsonia Same name is currently a valid genus in 3 Codes i.e. plants, animals and bacteria (no barriers to this) Slide 7 Tony Rees: IRMNG March 2012 Required base information is scattered in multiple systems / printed works at this time (etc.) plant animal bacterium Slide 8 Tony Rees: IRMNG March 2012 Required base information is scattered in multiple systems / printed works at this time (etc.) plant animal bacterium Slide 9 Tony Rees: IRMNG March 2012 IRMNG query as at March 2012 Slide 10 Tony Rees: IRMNG March 2012 IRMNG query as at March 2012 parents children synonym of (as known) extant, habitat flags Slide 11 Tony Rees: IRMNG March 2012 Note: IRMNG fields displayed on the web are only a subset of full information held for any name, e.g.: Slide 12 Tony Rees: IRMNG March 2012 IRMNG core fields IRMNG ID, Rank Scientific name (for species: epithet + parent ID) Authority Publication (as microcitation subset with link to refs. module) Source(s) for above Orthography verified against (authoritative source) Parent ID (+ according to) Linnaean ranks only at this time Nomenclatural status (+ relation with other names as needed) + according to Taxonomic status (same) Nomenclatural Code Taxonomic or nomenclatural remarks Extant/fossil, marine/nonmarine flags + according to (could be as per parent) Date entered, last modified, deprecated (where required) (under consideration) Intermediate ranks e.g. subfamily, subgenus, also infraspecies (not currently held) Type genus / species indicator Freshwater / terrestrial flags vs. present nonmarine Geo flags (country codes etc.) Palaeo range (periods/epochs) Vernacular names as available Slide 13 Tony Rees: IRMNG March 2012 IRMNG is not just a passive aggregator Editorial / curatorial decisions / actions required to: Correct obvious data errors Assemble complete records from multiple sources (where one source data deficient) Normalise authority data (in particular) to a house style Digitise or transcribe print material into electronic form where not otherwise available Decide between conflicting content in data sources e.g. for authority orthography/year, taxonomic placement, valid/synonym status and more Cross-link names e.g. synonyms -> current names, basionyms -> replacement names, misspelled names to their correctly spelled counterparts, etc. etc. Reconcile variant higher taxonomies as supplied to a single hierarchy Add nomenclatural or taxonomic remarks as required. Slide 14 Tony Rees: IRMNG March 2012 Relevance to present meeting? Demonstrates utility of a single entry point to a system permitting query on any name i.e., a [comprehensive] Taxonomic Name Resolution Service (TNRS) covering all life Envisage something like OBIS or GBIF, but for taxonomy the aggregator / central query point is not a content author, but provides integration and value-added services IRMNG based on static snapshot/s of multiple data sources; cf. a super catalogue should be based on live feeds from relevant authoritative sources, continuously updated as available (?+ some static data not available as feeds) Maybe the static data lives outside the data aggregation/query point, becomes a separately managed source How does / should GNA facilitate this? Will the need for an IRMNG (or IRMNG equivalent) disappear or grow in the above scenario? (for example could this role be taken by another player or group of players) Slide 15 Tony Rees: IRMNG March 2012 Thank you! Slide 16 Tony Rees: IRMNG March 2012 (supplementary slides) Slide 17 Tony Rees: IRMNG March 2012 Size of the task: IRMNG 2011 content cf. Cat. of Life 2011 Cat. of Life - 2011 edition % with auth's IRMNG Oct 2011 - extant + fossil % with auth's IRMNG Oct 2011 - fossil only Kingdoms8 7 (0) Phyla111 153 (12) Classes288 509 (64) Orders1,233 2,645 (715) Families8,0710%19,63922.1%(6,542) Subfamilies- - - - - Genera178,5150%452,84897.1%(90,278) Subgenera- - - -- Species (valid)1,347,224~100%1,020,519~100%(16,792) Species (synonyms)895,441~100%440,738~100%(100) CoL has 70% of valid extant species names (of est. 1.9m total), thus maybe also 70% of valid extant genera (with subset of genus-level synonyms) IRMNG has further ~180k extant genus names and ~90k fossil names at this time (including syns) est. ~25k still missing Slide 18 Taxonomic names: what the customer is currently offered (+ more) Tony Rees: IRMNG March 2012 ZOOBANK? publication discovery official registers taxon-specific DBs integrated DBs all names Botany Zoology New names published (in primary literature) ICTV Viruses DB LPSN (Prokaryote names) ICBN Decisions ICZN Decisions Journal TOCs, RSS feeds, text mining Journal TOCs, RSS feeds, text mining Abstracting services Subject bibliographies Reviews, secondary literature Zoological Record ION (Index of Organism Names) ChecklistBank GNI GNUB ChecklistBank GNI GNUB Catalogue of Life ITIS NCBI Taxonomy WoRMS etc. ITIS NCBI Taxonomy WoRMS etc. CyanoDB Plant GSDs PaleoDB Animal GSDs other compilations e.g. regional lists, Wikispecies, Wikipedia, more The Plant List, IPNI, TROPICOS, ING Index Fungorum MycoBank Index Fungorum MycoBank AlgaeBase Nomenclator Zoologicus Slide 19 Tony Rees: IRMNG March 2012 Two approaches - GNI and Cat. of Life NameBank / GNI 20m+ names all ranks, no hierarchy mix of clean and dirty names many duplicates extant + fossil, most sectors with at least some names Slide 20 Tony Rees: IRMNG March 2012 GNI search result Lawsonia (all ranks returned) (Mar 2012) candidate genus names highlighted in red (although could be other ranks too) need access to original taxonomic / nomenclatural resources to sort out / see if anything missed Slide 21 Tony Rees: IRMNG March 2012 Two approaches - GNI and Cat. of Life NameBank / GNI Cat. of Life 20m+ names all ranks, no hierarchy mix of clean and dirty names many duplicates extant + fossil, most sectors with at least some namesSlide 22 Tony Rees: IRMNG March 2012 Cat. of Life search result Lawsonia (Mar 2012)