remsen lect04

Post on 16-Jan-2015

469 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

David Remsen lecture on Tuesday, Sept 15, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.

TRANSCRIPT

GLOBALBIODIVERSITYGLOBALBIODIVERSITYINFORMATIONFACILITYINFORMATIONFACILITY

David Remsen, Senior Programme David Remsen, Senior Programme Officer, GBIFOfficer, GBIF15 September 2009, Biodiversity 15 September 2009, Biodiversity InformaticsInformatics WWW.GBIF.O

RGWWW.GBIF.O

RG

Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents

Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents

All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.

- Grimaldi & Engel, 2005, Evolution of the Insects

Biodiversity Information: A focus on taxaBiodiversity Information: A focus on taxa

Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity informationBiodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information

A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…

Unlike many other domains of science, historic publications have continued importance.

…and that we today add to the body of knowledge.…and that we today add to the body of knowledge.

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

GBIF indexGBIF index

177 million records (> 5%/month)Gigabytes of text (~100 now)

All data mobilized through GBIFAll data mobilized through GBIF

Biodiversity InformationBiodiversity Information

Species information “tied” to scientific names

The “Names Problem”The “Names Problem”

Not Stable 5-10% names invalidated/decade

Not unique No complete list of names No complete list of species

No agreement on how many Even within a single group

Impacts discovery and access of information about species

The “Names Problem”The “Names Problem”

Properties of Names Orthographic (As labels of text that are “tied” to

information about species) Nomenclature (As the core “words” of taxonomy

that tie a name to a original publication and type) Taxonomy (As components of taxon definitions

derived via authoritative taxonomic rigor)

OrthographyOrthography

Orthography and the Names Problem

Objectives for Remediation

Variations in name spellingVariations in name spelling

Loligo pealeiiLoligo pealiiLoligo pealei

Some names are more hard to spell than othersSome names are more hard to spell than others

Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants

Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans

• Difficulties with Latinized Names• Transcription errors

Which one is the correct one?Which one is the correct one?

Agalinus paupercula borealisAgalinus pauperculum borealisAgalinis paupercula var. BorealisAgalinus pauperculum var. borealisAgalinus paupercula var. borealisAgalinus paupercula var. borealis PennellAgalinus paupercula Britton var. borealis PennellAgalinus paupercula (Gray) Britt. var. borealis PennellAgalinis paupercula (A.Gray) Britton var. borealis PennellAgalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934

Gerardia paupercula borealisGerardia paupercula var. borealisGerardia paupercula var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam

Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) PennellGerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell

Many ways to correctly spell a nameMany ways to correctly spell a name

Should GBIF/EoL/BHL display all/one/some?Should GBIF/EoL/BHL display all/one/some?

ObjectivesObjectives

Informatics can contribute Index names occurring in content we wish to

publicise and access Develop tools to extract, catalog, and match

names. Reconcile names to authoritative names

sources via a common resolution path Reconcile name occurrence to taxonomic

concepts via a common concept resolution path

NomenclatureNomenclature

Nomenclatural aspects of the names problem.

Approaches for remediating them

Don’t pass on bad information.Don’t pass on bad information.

How can we determine the status of the names we discover in content that we serve?

How can we determine the status of the names we discover in content that we serve?

Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval

Where can I find out these names are related?Where can I find out these names are related?

Zoological Code doesn’t track recombinations

Botanical Code does.

Zoological Code doesn’t track recombinations

Botanical Code does.

Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval

HomonymsHomonyms

Peranema – the fern

Peranema – the euglenid

How many Peranema are there?

How can I tell them apart?

How many Peranema are there?

How can I tell them apart?

HomonymsHomonyms

Kingdom Phylum Class Order Family Genus

Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe

Plantae Oenanthe Oenanthe

Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe

Plantae Orchidaceae Oenanthe

Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe

Animalia Chordata Aves Passeriformes Turdidae Oenanthe

Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus

Animalia Chordata Pisces Perciformes Serranidae Pomatomus

Taxonomic context alone doesn’t tell me enough.

Approaches to remediationApproaches to remediation

Consolidate the major nomenclatural databases A single nomenclatural dictionary

Populate with provisionally verified records and enable open annotation

Provides nomenclatural status of a name Collectively identifies all homonyms. Identifiers used

in taxonomic data provide disambiguation context Ties all distinct nomenclatural combinations to the

original published name.

Informatics Promote global identifiers and simple resolution

pathway for these data

TaxonomyTaxonomy

Taxonomic Examples of the Names problem

Approaches for remediating them

Taxonomic synonymsTaxonomic synonyms

Halichondria panicea (Pallas 1776) sec Van Soest 2002 (WoRMS)

Consequences of SplittingConsequences of Splitting

Taxon Concept problem: What does someone mean when they refer to P. carinii

The Perils of LumpingThe Perils of Lumping

Bear Lodge meadow jumping mouse.Zaphus hudsonius campestris

Zaphus hudsonius preblei

INCLUDES

DOES NOT INCLUDE

Dr. Rob Roy Ramey says

Dr. Tim King says

Preble’s meadow jumping mouse.

What should a search for “Zaphus hudsonius campestris” return?

Different taxonomic views, different # species, different namesDifferent taxonomic views, different # species, different names

Taxonomic Backbones: Scope and completeness

Organisational value of Non-Taxonomic ListsOrganisational value of Non-Taxonomic Lists

Approaches to remediationApproaches to remediation

An inventory of different taxonomic catalogues Inform if there are concept issues for the

species Provide synonymised taxon concepts with

unique and resolvable identifiers Multiple classifications via checklists and

catalogues accessible and utilised as organisational frameworks for species information

SummarySummary

A data publication framework that enables A complete index of all names that are tied to

information about species Tools and infrastructure to support this.

A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record.

A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon.

A system that puts taxonomy as a global organisational framework for species information.

Inventory and IndexInventory and Index

uBio IndexesuBio Indexes

Web Service outputs Taxon ObjectWeb Service outputs Taxon Object

Web Service calls from client applicationsWeb Service calls from client applications

Taxonomic organisation of contentTaxonomic organisation of content

Taxonomic organisation of contentTaxonomic organisation of content

Indexes support processes that support discoveryIndexes support processes that support discovery

That enable new and better tools and servicesThat enable new and better tools and services

Formalise the ArchitectureFormalise the Architecture

Coordinate Communities of InterestCoordinate Communities of Interest

Summary: GNA ObjectivesSummary: GNA Objectives

A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary.

This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species.

These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval.

Collectively, these components serve the delivery and utilisation of biological knowledge.

Thank youThank you

dremsen@gbif.orgSkype:dremsen

top related