Moving beyond the box: automating the digitisation of insect collections

Download Moving beyond the box: automating the digitisation of insect collections

Post on 13-Jul-2015

258 views

Category:

Science

0 download

TRANSCRIPT

  • Moving beyond the box: automating the digitisation of insect collections

  • Blagoderov et al (2012) No specimen left behind: industrial scale digitization of natural history collections. ZooKeys 209: 131146, doi: 10.3897/zookeys.209.3178

  • Drawer level imaging is (mostly) a solved problem1. Place drawer2. Scan3. Stitch

  • Result: High resolution composite imageDrawer level imaging is (mostly) a solved problemFast (5 mins per drawer)High resolution (circa 500MB per image)No specimen handling

  • But, two key problems remainSynchronisation Label dataKeeping the physical & digital copies in syncCapturing data from multiple pinned labels

  • Dont worry about it, re-image as requiredLock down the drawers Crop-out each specimen imageAutomate the cropping processLink each specimen to its digital imageMake it easy to collect label dataApproaches to the synchronisation problem

  • Annotation software, NHM working with SmartDrive:Initial supporting softwareNo automated croppingManually link images & specimensPoor UX/UIClosed source and proprietaryNot cross-platform (Windows only)A good first step to understanding the problems

  • Automating specimen segmentationStarting imageWork with Pieter Holtzhausen and Stfan van der Walt (Stellenbosch University)Software: Inselect, written in Python

  • OriginalSegmentation methods

  • Inselecthttp://naturalhistorymuseum.github.io/inselect/

    Currently pre-release (alpha)Automatically detects specimensCreates bounding boxes for cropping and exporting imagesRapid annotation interfacePersistent settings & keyboard shortcutsData export in JSON formatOpen source & modularPython based (OpenCV, scikit-image libraries)Windows, OSX & LinuxAutomated recognition, cropping and annotation of specimens

  • Whole Drawer image Inselect: segmentation of specimen images

  • Easy to spot & correct errorsSecondary re-segmentation (seed growing)Whole Drawer image

  • Whole Drawer image Easy to spot & correct errorsSecondary re-segmentation (seed growing)

  • Works on slides & pinned insectsWhole Drawer image Also testing mineral & fossil samples

  • Planned UX / UI EnhancementsWhole Drawer image Unit tray recognitionMulti- specimen annotationPlug in controlled vocabulary services

  • 1D & 2D barcode recognitionWhole Drawer image Recognition and reading of 1D & 2D matrix barcodes from imagesDifferent physical requirements (smallest 6x6mm matrix, readable via handheld scanners)Testing open source & commercial libraries at different scan resolutions

    Initial resultsCommercial solutions outperform open sourceMax. read success 94%Idiosyncratic results (different results on different OS)Testing continues

  • The Holy Grail: label imaging & text recognitionWhole Drawer image Chauliodes pectinicornisDorsal

  • The Holy Grail: label imaging & text recognitionWhole Drawer image Agulla astutaDorsalCaudalLateralReconstructed labels

  • Approaches to label imaging pinned specimensWhole Drawer image Could be incorporated as part of the barcode dispensing processBarcode dispenser & scanner (two sides barcode labels)Freshly pinned barcode labelOther collection labelsMultiple label imaging camerasAssemble labels from composite images

  • AcknowledgementsWhole Drawer image Segmentation algorithm & app. developmentStefan van der Walt and Pieter Holtzhausen

    Application developmentAlice Heaton

    Barcode recognition & testingLawrence Hudson

    Analysis & testingLaurence Livermore, Vladimir Blagoderov and Ben Price

    Initial specification and fundingVince Smith

    **Big digitisation programmes. Need high throughput approaches.

    *Weve had access to industrial scale imaging hardware and processing software for a number of years, allowing us to rapidly create digital images of entire drawers of insects, arrays of slides and potentially other 3D objects. Such systems include SatScan (pictured above) but also Dscan and Invertnet.*Big digitisation programmeProject history: Inselect was originally conceived as an answer to whole-drawer imaging hardware setups, such as Dscan, SatScan and InvertNet, which are in use at different natural history collections around the world.

    The systems are typically used to industrially digitise insect collections but can also be used to digitise slide collections.**In early 2014 we started working with research teams in higher education to develop prototypes for automated drawer segmentation.

    The Inselect prototype was developed in collaboration between the NHM, London and Stellenbosch University, South Africa.

    The initial collaboration was instigated by Ben Price and the following people: Stfan van der Walt, Pieter Holtzhausen, Vladimir Blagoderov and Laurence Livermore

    The first prototype developed by Stfan and Pieter was Python-based and had the following fucntionality:Downscale high resolution TIFF images for processing to JPEG easier to run segmentation algorithmAutomatically detect specimensCreate rectangular bounding boxes around specimensBatch crop specimen images**One of the rate-limiting steps for these whole drawing imaging systems is annotation and post-processing. Typically collections want to record metadata about individual specimens which involves cropping specimens and entering metadata such as taxonomic name, location in collection etc.

    Most software currently in use has limitations:No automation all specimens have to be manually selected by a human operatorPoor user interface and user experience most software has a very basic user interface and provides a poor user experienceClosed source & proprietary making it costly or impossible for museums to collaborate, build upon and share softwareNot cross-platform limiting annotation and post-processing to Windows users onlyWhat I want to do for the remainder of this talk is focus on elements of this programme and pick out some of the key elements which are of broader interest, in terms of *Contrast based is the simple approach (quick and simple less computationally expensive) When you resegment you use seed growing. (More time consuming but better fidelity. It also uses a human to mark a startting centroid**The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)*The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)*The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)*The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)(Fishfly: Megaloptera)

    *The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together) Lace wing Neuroptera*The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)*The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)*

Recommended

View more >