brief notes from kew mark jackson software applications manager
TRANSCRIPT
Brief Notes from Kew
Mark Jackson Software Applications Manager
Focussing on...
Herbarium digitisation electronic Plant Information Centre
Kew Herbarium Guesstimated
– 7 million specimens– 250,000 types
Less than 5% specimens databased
A variety of personal databases
Preparation for Digitisation
Computerise transactions Agree and document policy and
procedures Establish core fields (HISPID
pending ABCD) Develop hardware and software
infrastructure (e.g. catalogue database, mass storage)
Digitisation Strategy Curators to barcode, database and
image types for loan Repatriation & research projects
– to use infrastructure and core fields– data to be imported into Catalogue
(eventually) Pursue digitisation projects
www.kew.org/data/repatbr
Specimen imaging Decision to try to match
Cibachrome prints in terms of quality (e.g. suitable for many diagnostic purposes)– 600 dpi delivers 200MB images
Stored as uncompressed (but bzipped) TIFFs
Acquisition of mass storage
HerbScan
A3 flatbed scanner, inverted
Cradle for specimens
Distributed throughout Herbarium
Pros and cons
£30-40,000 200MB images
barely achievable 1 image per minute Fixed Versatile
£7,500 200MB images
easily achievable 10 images per hour Some mobility Suited to flat items
200 MB master images (600 dpi scans), based on capturing the level of detail of Cibachromes.
Camera HerbScan
HerbCat
ClientImage Server
ImagesMetadata
image enquiriesHerbCat enquiries
Focussing on...
Herbarium digitisation electronic Plant Information Centre
UK government funding for delivery of services electronically
Resource-discovery interface to multiple Kew data sources (not necessarily at Kew)
Data sources are heterogenous Simple interface overlaying other systems
ePIC Interface
Data source Data source Data source Data source
Data sources
Interface (java servlet)/JSPs
Multi-threaded Java server
Request queue
Handlers:one per data sourceone for loggingone for spell-checking
Requests
Data sources
Configuration files (XML)
Results
Architecture
Web documents indexed using Lucene Flora Zambesiaca digitised and marked-up
with XML Experimentation with options for query and
output via Java servlet– using XSL to output selections– using Lucene to index the XML– importing the XML into a database
Other texts - jury still out, but Lucene route looks promising
Texts
Feedback
Email mechanisms Web usability testing/focus groups Logging
– Quantitative success• levels of usage, patterns & trends• beware: crawlers, testing & development staff, harvesters • referring URLs, Google link: popularity of site• country, domain
– Qualitative success• success of queries esp. zero hits (spelling, common names,
families)• performance & system monitoring• number of queries per session, return visits• results pages viewed
World distribution of queries
www.kew.org/epic
Future
More data sources, including texts and images
Hierarchical browsing front-end based around revamped Brummitt Families & Genera with phylogenetic classification
Looking forward to – using the GBIF Names Service…– links with DiGIR/BioCASE resources...