Download - 829 tdwg-2015-nicolson-kew-strings-to-things
Strings to things: a user-friendly framework for data reconciliation
Nicky Nicolson, RBG Kew@nickynicolson
Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015
Reconciliation
• Turns a string representation of an entity into an actionable identifier.
e.g.:Tahina spectabilis
Will reconcile to:http://
ipni.org/urn:lsid:ipni.org:names:77086615-1
Maximise reuse, two stage process1. Standardise data
- Package of 40 plus “transformers”- All accept a string input, produce a string
output
Examples of transformers
Open Refine screenshot
Maximise reuse, two stage process2. Match the data
- Package of 20 plus “matchers”- All accept two inputs and return a flag if they
match
Configuring a service
1) Read tabular data (file or DB)2) Configure transformers3) Configure matchers
Run it…
1) Service description2) Three service endpoints3) Javascript query interface
IPNI Reconciliation Service
3 service endpoints
IPNI Reconciliation Service
Flexible web service
• Open Refine compatible• But underneath it’s JSON over HTTP• … so call it from any programming language
Service metadata
Service call
Service response
List of reconciliation services
https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources
Open source
https://github.com/RBGKew/Reconciliation-and-Matching-Framework
What we’ll work on in the future
Reconciliation services on different data types
• Specimens– Add DwCA as a readable data store– Collections focussed transformers & matchers– Resolve & link specimen duplicates
• People• Trait glossaries
Integration with github
Thanks to:• Biodiversity Informatics team (Abigail Barker,
Matt Blissett, James Crowe, John Iacona, Rob Turner, Alecs Gueder)
• Plant & fungal name curation team (Christine Barker / Irina Belyaeva / Katherine Challis / Rafael Govaerts / Paul Kirk / Heather Lindon / Emma Williams)
• Data improvement team (Anna Lynch, Rachel Witherow, Malin Rivers, Esther Wainwright-Deri)
@nickynicolson / [email protected]
http://bit.ly/k-names-service
http://github.com/RBGKew
Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015