115 october 2005richard white - sp2000/enbi - stockholm litchi: interlinking species information...

33
1 15 October 2005 Richard White - Sp2000/ENBI - Stockho lm Litchi: interlinking species information systems Richard White , Andrew Jones, Ed Donovan Computer Science, Cardiff University Naomi Russell, John Robinson Biological Sciences, University of Southampton Judith Stephen Plant Sciences, University of Reading [email protected]

Upload: maurice-hubbard

Post on 24-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

115 October 2005Richard White - Sp2000/ENBI - Stockholm

Litchi: interlinking species information systems

Richard White, Andrew Jones, Ed DonovanComputer Science, Cardiff University

Naomi Russell, John RobinsonBiological Sciences, University of Southampton

Judith StephenPlant Sciences, University of Reading

[email protected]

215 October 2005Richard White - Sp2000/ENBI - Stockholm

Key role of species namesKey role of species namesSpecies names are the key to biodiversity information• Trend towards large biodiversity databases and

global systems • Manual merging of taxonomic databases very time-

consuming• Users want to browse “seamlessly” from one web-site

to another• Users want to assemble reliable data sets drawn

from several sources, but information on naming “conflicts” is hard to find and checking for them is tedious

315 October 2005Richard White - Sp2000/ENBI - Stockholm

Ambiguous nomenclature

• Challenges in creating global biodiversity information systems by merging and linking databases:– ambiguities arise from the way scientific names refer to

species– for example, if two species are combined, one of the

original names must be re-used to refer to the new concept

– conversely, when a species is divided into two, one part must retain the original name

415 October 2005Richard White - Sp2000/ENBI - Stockholm

Ambiguous nomenclature

• The problems are inherent in the subjective nature of the species concept– they cannot be removed by, for example, using

numbers instead of names

– (unless a completely new name or number is invented every time the circumscription changes, and even then, the correspondence between these new names or numbers still needs to be captured)

515 October 2005Richard White - Sp2000/ENBI - Stockholm

A problem in Biodiversity Informatics

• The way species are named may affect the reliability and usability of species information systems

• The differing interpretations of species names is being addressed in the TCS (Taxon Concept Schema) and “Berlin data model” databases, but such data sets need to be created and the differing concepts documented and recorded – this is a major task

• Techniques to handle the problem semi-automatically can be developed

• Some of these issues were addressed in the LITCHI project …

615 October 2005Richard White - Sp2000/ENBI - Stockholm

Litchi version 1• We modelled the knowledge integrity rules implicit in the

assemblage of scientific names and synonyms used to represent each taxon in a taxonomic treatment (checklist or database

• We formulated rules for integrity and conflict– in English– in definite clauses of logic– in a Prolog model

• Devised and tested algorithms– to detect and report conflicts – to manage the partially-automated correction of the conflicting

elements

• Built and operated a prototype software system for merging checklists & checking integrity of individual checklists; freely available (but scarcely usable)

715 October 2005Richard White - Sp2000/ENBI - Stockholm

Litchi 2

• “Intelligent linking” is to protect users from and explain nomenclatural ambiguities

• Development of these techniques would be easier if we had an explicit representation of the overlaps between species in different databases

• Such “cross-maps” can be constructed automatically using similar rules in the new Litchi version 2

815 October 2005Richard White - Sp2000/ENBI - Stockholm

LITCHI architecture - requirements

• Web-based – taxonomic editors in different institutions may wish to collaborate

• Persistence - an editor may update a checklist over many weeks or months

• Common interchange format(s) for the communication of data sets between applications

• Ability to support related software components such as cross-map editors and servers

915 October 2005Richard White - Sp2000/ENBI - Stockholm

Litchi’s Role in Sp2000 Europa

To create revised versions of LITCHI for taxonomic database integrity checking:

• Task 5.6A: Quality Checker for taxonomic integrity in a single taxonomic treatment

• Task 5.6B: Taxonomic Conflict Detector for checking and reporting breaches of taxonomic integrity between two taxonomic treatments

• Task 5.6C: Cross-map Generator (version of the Taxonomic Conflict Detector) to generate a “cross-map” semi-automatically for use in “intelligent linking” between databases

1015 October 2005Richard White - Sp2000/ENBI - Stockholm

How Does Litchi Work?

• Specially crafted rules are created that have been derived from taxonomic practice in the creation of checklists.

• These rules are used by Litchi to examine the taxonomic treatments:– Check a single checklist or two checklists for errors– Compare two checklists to discover relationships

between their taxa

• Some of the rules used on a single checklist can also be used on two checklists

1115 October 2005Richard White - Sp2000/ENBI - Stockholm

Types of integrity and conflict rules

• How a scientific name should be composed (Rules of Nomenclature)

• Rules for citing the assemblage of names and synonyms for one taxon

• Rules of integrity between the taxa in a taxonomic treatment

• Rules for detecting conflicts between treatments• Rules for classifying conflicts to describe the

“concept relationship” or to determine the action to be taken

1215 October 2005Richard White - Sp2000/ENBI - Stockholm

Internal Checking Rules - example

When examining the consistency of a single checklist these are some of the rules that may be used:

• A name with the same Latin components but different authorities cannot appear as an accepted name of different taxa.

Mercurialis elliptica Lam. (accepted)

Mercurialis elliptica Poir. (accepted)

1315 October 2005Richard White - Sp2000/ENBI - Stockholm

Internal Checking Rules - example• A full name, which is not a pro-parte name, cannot

appear both as an accepted name and as a synonym.

Euphorbia ledebourii Boiss. [accepted]Euphorbia pygmaea Ledeb. [accepted]

– Euphorbia ledebourii Boiss. [synonym] – Tithymalus pygmaeus (Ledeb.) Klotzsch &

Garke [synonym]

(This particular conflict can be “repaired” if an expert says that E. ledebourii has been split and could be labelled “p.p.”)

1415 October 2005Richard White - Sp2000/ENBI - Stockholm

Litchi 2.2

in use

1515 October 2005Richard White - Sp2000/ENBI - Stockholm

Log-in screen

1615 October 2005Richard White - Sp2000/ENBI - Stockholm

Main menu bar

1715 October 2005Richard White - Sp2000/ENBI - Stockholm

Importing a list

1815 October 2005Richard White - Sp2000/ENBI - Stockholm

Starting a run

1915 October 2005Richard White - Sp2000/ENBI - Stockholm

Monitor progress or choose result set to browse

2015 October 2005Richard White - Sp2000/ENBI - Stockholm

Name relationships (two lists)

2115 October 2005Richard White - Sp2000/ENBI - Stockholm

Choosing a cross-map to view

2215 October 2005Richard White - Sp2000/ENBI - Stockholm

Viewing the cross-map

2315 October 2005Richard White - Sp2000/ENBI - Stockholm

Editing the cross-map

2415 October 2005Richard White - Sp2000/ENBI - Stockholm

Managing the rules

2515 October 2005Richard White - Sp2000/ENBI - Stockholm

An “intelligent” system

• It would know about the differing taxon concepts in various databases

• It would help the user work with such data by interpreting these differences for the user

• It would assist the user in navigating from one database to another where the concepts are different

2615 October 2005Richard White - Sp2000/ENBI - Stockholm

“Mr Linnaeus”

• A web-based mock-up to explore aspects of the user interface of a system for interpreting “taxonomically intelligent links”

• Prepared by Helen Bradbrook, an MSc student in the School of Plant Sciences at the University of Reading

2715 October 2005Richard White - Sp2000/ENBI - Stockholm

2815 October 2005Richard White - Sp2000/ENBI - Stockholm

2915 October 2005Richard White - Sp2000/ENBI - Stockholm

3015 October 2005Richard White - Sp2000/ENBI - Stockholm

3115 October 2005Richard White - Sp2000/ENBI - Stockholm

3215 October 2005Richard White - Sp2000/ENBI - Stockholm

Services available and planned

• Submit data sets, run rule sets, create cross-maps (Litchi web interface)

• Given a taxon in database A, find taxa (if any) in database B which have identical, larger, smaller or overlapping taxon concepts (Web service)

• Obtain entire cross-map (Web service)• Extension for hierarchies (higher taxa)

3315 October 2005Richard White - Sp2000/ENBI - Stockholm

Acknowledgements

• Funding: BBSRC, NERC (UK), European Commission

• Ideas: Frank Bisby , Andrew Jones, Alex Gray• Testing: Judith Stephen, Yde de Jong• Programmers: John Robinson, Naomi Russell• Litchi 1: Suzanne Embury, Iain Sutherland• Cartoons: Helen Bradbrook