brief notes from kew mark jackson software applications manager

Brief Notes from Kew

Mark Jackson Software Applications Manager

Focussing on...

Herbarium digitisation electronic Plant Information Centre

Kew Herbarium Guesstimated

– 7 million specimens– 250,000 types

Less than 5% specimens databased

A variety of personal databases

Preparation for Digitisation

Computerise transactions Agree and document policy and

procedures Establish core fields (HISPID

pending ABCD) Develop hardware and software

infrastructure (e.g. catalogue database, mass storage)

Digitisation Strategy Curators to barcode, database and

image types for loan Repatriation & research projects

– to use infrastructure and core fields– data to be imported into Catalogue

(eventually) Pursue digitisation projects

www.kew.org/data/repatbr

Specimen imaging Decision to try to match

Cibachrome prints in terms of quality (e.g. suitable for many diagnostic purposes)– 600 dpi delivers 200MB images

Stored as uncompressed (but bzipped) TIFFs

Acquisition of mass storage

HerbScan

A3 flatbed scanner, inverted

Cradle for specimens

Distributed throughout Herbarium

Pros and cons

£30-40,000 200MB images

barely achievable 1 image per minute Fixed Versatile

£7,500 200MB images

easily achievable 10 images per hour Some mobility Suited to flat items

200 MB master images (600 dpi scans), based on capturing the level of detail of Cibachromes.

Camera HerbScan

HerbCat

ClientImage Server

ImagesMetadata

image enquiriesHerbCat enquiries

Focussing on...

Herbarium digitisation electronic Plant Information Centre

UK government funding for delivery of services electronically

Resource-discovery interface to multiple Kew data sources (not necessarily at Kew)

Data sources are heterogenous Simple interface overlaying other systems

ePIC Interface

Data source Data source Data source Data source

Data sources

Interface (java servlet)/JSPs

Multi-threaded Java server

Request queue

Handlers:one per data sourceone for loggingone for spell-checking

Requests

Data sources

Configuration files (XML)

Results

Architecture

Web documents indexed using Lucene Flora Zambesiaca digitised and marked-up

with XML Experimentation with options for query and

output via Java servlet– using XSL to output selections– using Lucene to index the XML– importing the XML into a database

Other texts - jury still out, but Lucene route looks promising

Texts

Feedback

Email mechanisms Web usability testing/focus groups Logging

– Quantitative success• levels of usage, patterns & trends• beware: crawlers, testing & development staff, harvesters • referring URLs, Google link: popularity of site• country, domain

– Qualitative success• success of queries esp. zero hits (spelling, common names,

families)• performance & system monitoring• number of queries per session, return visits• results pages viewed

World distribution of queries

www.kew.org/epic

Future

More data sources, including texts and images

Hierarchical browsing front-end based around revamped Brummitt Families & Genera with phylogenetic classification

Looking forward to – using the GBIF Names Service…– links with DiGIR/BioCASE resources...

brief notes from kew mark jackson software applications manager

Documents

database n

specimens n

achievable n

types n

procedures n

minute n fixed n versatile

transactions n

mobility n