115 october 2005richard white - sp2000/enbi - stockholm litchi: interlinking species information...
TRANSCRIPT
115 October 2005Richard White - Sp2000/ENBI - Stockholm
Litchi: interlinking species information systems
Richard White, Andrew Jones, Ed DonovanComputer Science, Cardiff University
Naomi Russell, John RobinsonBiological Sciences, University of Southampton
Judith StephenPlant Sciences, University of Reading
215 October 2005Richard White - Sp2000/ENBI - Stockholm
Key role of species namesKey role of species namesSpecies names are the key to biodiversity information• Trend towards large biodiversity databases and
global systems • Manual merging of taxonomic databases very time-
consuming• Users want to browse “seamlessly” from one web-site
to another• Users want to assemble reliable data sets drawn
from several sources, but information on naming “conflicts” is hard to find and checking for them is tedious
315 October 2005Richard White - Sp2000/ENBI - Stockholm
Ambiguous nomenclature
• Challenges in creating global biodiversity information systems by merging and linking databases:– ambiguities arise from the way scientific names refer to
species– for example, if two species are combined, one of the
original names must be re-used to refer to the new concept
– conversely, when a species is divided into two, one part must retain the original name
415 October 2005Richard White - Sp2000/ENBI - Stockholm
Ambiguous nomenclature
• The problems are inherent in the subjective nature of the species concept– they cannot be removed by, for example, using
numbers instead of names
– (unless a completely new name or number is invented every time the circumscription changes, and even then, the correspondence between these new names or numbers still needs to be captured)
515 October 2005Richard White - Sp2000/ENBI - Stockholm
A problem in Biodiversity Informatics
• The way species are named may affect the reliability and usability of species information systems
• The differing interpretations of species names is being addressed in the TCS (Taxon Concept Schema) and “Berlin data model” databases, but such data sets need to be created and the differing concepts documented and recorded – this is a major task
• Techniques to handle the problem semi-automatically can be developed
• Some of these issues were addressed in the LITCHI project …
615 October 2005Richard White - Sp2000/ENBI - Stockholm
Litchi version 1• We modelled the knowledge integrity rules implicit in the
assemblage of scientific names and synonyms used to represent each taxon in a taxonomic treatment (checklist or database
• We formulated rules for integrity and conflict– in English– in definite clauses of logic– in a Prolog model
• Devised and tested algorithms– to detect and report conflicts – to manage the partially-automated correction of the conflicting
elements
• Built and operated a prototype software system for merging checklists & checking integrity of individual checklists; freely available (but scarcely usable)
715 October 2005Richard White - Sp2000/ENBI - Stockholm
Litchi 2
• “Intelligent linking” is to protect users from and explain nomenclatural ambiguities
• Development of these techniques would be easier if we had an explicit representation of the overlaps between species in different databases
• Such “cross-maps” can be constructed automatically using similar rules in the new Litchi version 2
815 October 2005Richard White - Sp2000/ENBI - Stockholm
LITCHI architecture - requirements
• Web-based – taxonomic editors in different institutions may wish to collaborate
• Persistence - an editor may update a checklist over many weeks or months
• Common interchange format(s) for the communication of data sets between applications
• Ability to support related software components such as cross-map editors and servers
915 October 2005Richard White - Sp2000/ENBI - Stockholm
Litchi’s Role in Sp2000 Europa
To create revised versions of LITCHI for taxonomic database integrity checking:
• Task 5.6A: Quality Checker for taxonomic integrity in a single taxonomic treatment
• Task 5.6B: Taxonomic Conflict Detector for checking and reporting breaches of taxonomic integrity between two taxonomic treatments
• Task 5.6C: Cross-map Generator (version of the Taxonomic Conflict Detector) to generate a “cross-map” semi-automatically for use in “intelligent linking” between databases
1015 October 2005Richard White - Sp2000/ENBI - Stockholm
How Does Litchi Work?
• Specially crafted rules are created that have been derived from taxonomic practice in the creation of checklists.
• These rules are used by Litchi to examine the taxonomic treatments:– Check a single checklist or two checklists for errors– Compare two checklists to discover relationships
between their taxa
• Some of the rules used on a single checklist can also be used on two checklists
1115 October 2005Richard White - Sp2000/ENBI - Stockholm
Types of integrity and conflict rules
• How a scientific name should be composed (Rules of Nomenclature)
• Rules for citing the assemblage of names and synonyms for one taxon
• Rules of integrity between the taxa in a taxonomic treatment
• Rules for detecting conflicts between treatments• Rules for classifying conflicts to describe the
“concept relationship” or to determine the action to be taken
1215 October 2005Richard White - Sp2000/ENBI - Stockholm
Internal Checking Rules - example
When examining the consistency of a single checklist these are some of the rules that may be used:
• A name with the same Latin components but different authorities cannot appear as an accepted name of different taxa.
Mercurialis elliptica Lam. (accepted)
Mercurialis elliptica Poir. (accepted)
1315 October 2005Richard White - Sp2000/ENBI - Stockholm
Internal Checking Rules - example• A full name, which is not a pro-parte name, cannot
appear both as an accepted name and as a synonym.
Euphorbia ledebourii Boiss. [accepted]Euphorbia pygmaea Ledeb. [accepted]
– Euphorbia ledebourii Boiss. [synonym] – Tithymalus pygmaeus (Ledeb.) Klotzsch &
Garke [synonym]
(This particular conflict can be “repaired” if an expert says that E. ledebourii has been split and could be labelled “p.p.”)
1915 October 2005Richard White - Sp2000/ENBI - Stockholm
Monitor progress or choose result set to browse
2515 October 2005Richard White - Sp2000/ENBI - Stockholm
An “intelligent” system
• It would know about the differing taxon concepts in various databases
• It would help the user work with such data by interpreting these differences for the user
• It would assist the user in navigating from one database to another where the concepts are different
2615 October 2005Richard White - Sp2000/ENBI - Stockholm
“Mr Linnaeus”
• A web-based mock-up to explore aspects of the user interface of a system for interpreting “taxonomically intelligent links”
• Prepared by Helen Bradbrook, an MSc student in the School of Plant Sciences at the University of Reading
3215 October 2005Richard White - Sp2000/ENBI - Stockholm
Services available and planned
• Submit data sets, run rule sets, create cross-maps (Litchi web interface)
• Given a taxon in database A, find taxa (if any) in database B which have identical, larger, smaller or overlapping taxon concepts (Web service)
• Obtain entire cross-map (Web service)• Extension for hierarchies (higher taxa)
3315 October 2005Richard White - Sp2000/ENBI - Stockholm
Acknowledgements
• Funding: BBSRC, NERC (UK), European Commission
• Ideas: Frank Bisby , Andrew Jones, Alex Gray• Testing: Judith Stephen, Yde de Jong• Programmers: John Robinson, Naomi Russell• Litchi 1: Suzanne Embury, Iain Sutherland• Cartoons: Helen Bradbrook