price2 ecn2013
TRANSCRIPT
Rapid, industrial scale
digitization of the NHM
microscope slide collection
Ben Price & Vladimir Blagoderov
Outline
• The NHM slide collection
• What is Digitization?
• The NHM workflow
• Psyllid collection
• Future prospects
The NHM slide collection
• ~ 2 million slides (60 : 40 vertical : horizontal storage)
The NHM slide collection
• Mix of slide sizes, mounts, storage cabinets
What is Digitization?
?
What is Digitization?
Label data:
– Quick to image
• 5000 per day
– Slow to transcribe (crowdsourcing)
– Slow to georeference (crowdsourcing)
What is Digitization?
Specimen:
– Slow to image
• 100,000 per year
– Data storage
• GB images
– Image delivery
• Proprietery software
– Do we need ALL specimens?
The NHM workflow*
PreparationHandling Imaging Post ProcessingData Capture
* Work in progress
Preparation Handling Imaging Post Processing Data Capture
• Datamatrix Labels (4.5mm)
• Processing Scripts (GIMP, Barcodefiler)
• Computing Facilities (64bit, 16GB RAM)
• Storage & Retrieval (Ke-EMu)
– What is a slide?
• Delivery (NHM data portal)
Preparation Handling Imaging Post Processing Data Capture
• Horizontal vs Vertical storage
• Card Slide covers!
• Labelling & Handling = up to 90% of the time
Preparation Handling Imaging Post Processing Data Capture
• Scanner – SLR – Mamiya Leaf – SatScanner
• Balance slides per image vs label resolution (PPI)
• Single slide imaging?
Preparation Handling Imaging Post Processing Data Capture
Horizontal Storage:
• Less handling
– Tray fits A3 scanner / SLR
• Can be autocropped
Preparation Handling Imaging Post Processing Data Capture
Horizontal Storage:
• Less handling
– Tray fits A3 scanner / SLR
• Manual cropping
– Crowd cropping?
Preparation Handling Imaging Post Processing Data Capture
Vertical storage:
• Single type of template (post processing)
• High contrast (scripts)
• Cheap (foam, card)
• More Handling
• Autocropping
Preparation Handling Imaging Post Processing Data Capture
• Resolution tests (PPI)
– Canon 650D (18MP sensor) + 50mm Macro
300 450 600250
Slides
PPI
45 18 1072
Preparation Handling Imaging Post Processing Data Capture
• Resolution tests (PPI)
– Mamiya Leaf (80MP sensor) + 80mm lens
Slides
PPI 450
72
300
180
600
50
Preparation Handling Imaging Post Processing Data Capture
• Resolution tests (PPI)
– HerbScanner (EPSON A3 size)
Slides
PPI 450
50
300
50
600
50
Preparation Handling Imaging Post Processing Data Capture
• Resolution tests (PPI)
– SatScanner (0.16x lens, low resolution ~1000 PPI)
72 - 100Slides
Preparation Handling Imaging Post Processing Data Capture
Preparation Handling Imaging Post Processing Data Capture
Preparation Handling Imaging Post Processing Data Capture
Preparation Handling Imaging Post Processing Data Capture
Progress to date
• Psyllidae slide collection (4000 slides)
• Two digitizers + SatScanner = 4 days
• Handling (not Imaging) is the bottleneck
• Solutions:
– More digitizers
– Crowd cropping of tray scans?
Progress to date
• Theoretical maximum
– SatScan: 7000 slides per day (5-8 people)
– Other: 700 - 1000 slides per person per day
• NHM Entom collection = 10 – 15 person years
unloadimagelabel load
imagelabel load
unloadimage label load
unloadimage labelload
unloadimagelabel load
label load23
4
1
label
Future Plans
• Specimen Imaging
– Type material
Acknowledgments
Flavia
Johanna
Elisa
Peter
LyndseySara
Questions?