applying taxonomic intelligence to digitization initiatives

65
MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org Applying Taxonomic Intelligence to Digitization Initiatives Translating the Value of the Research Library Information Futures Institute October 27. 2007 Cathy Norton MBLWHOI Library Director Deputy Director Biodiversity Heritage Library

Upload: martin-kalfatovic

Post on 18-May-2015

1.626 views

Category:

Education


4 download

DESCRIPTION

Applying Taxonomic Intelligence to Digitization Initiatives by Cathy Norton, Marine Biological Laboratory / Woods Hole Oceanographic Institution Library. Translating the Value of the Research Library / Information Futures Institute. October 27, 2007. Mountain View, CA.

TRANSCRIPT

Page 1: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Applying Taxonomic Intelligenceto Digitization Initiatives

Translating the Value of the Research LibraryInformation Futures Institute

October 27. 2007Cathy Norton

MBLWHOI Library DirectorDeputy Director Biodiversity Heritage Library

Page 2: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

TOPICS

• Biodiversity Heritage Libraries• Open Content Alliance, Principles • Internet Archive Partner• Northeast Regional Digitizing Center @Boston

Public Library• Taxonomic Intelligence- modernizing the literature

Page 3: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

NMFS - 1871

MBL - 1888

WHOI - 1930

USGS - 1960

SEA - 1971

WHRC - 1985

Woods Hole Scientific Community

This library serves the MBL, WHOI, USGS, NMFS, SEA, WHRC,

and other scientific groups in the area.

Facing a new dynamic phase

Page 4: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

The vision

Imagine an electronic page for each species of organism on Earth, available everywhere by single access on command. The page contains the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits. The page opens out directly or by linkage with other databases such as ARKive, Ecoport, and GenBank. It comprises a summary of everything known about the species’ genome, proteome, geographic distribution, phylogenetic position, habitat, ecological relationships, and, not least, its perceived practical importance for humanity.

E. O. Wilson, 2003.

Page 5: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 6: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Vision

Build a Digital Open Access Library for Biodiversity Literature

Page 7: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Meetings in Colorado,2005

London, 2005 laboratories and libraries

Washington BHL 2006

Simultaneous Meetings in Woods Hole for BHL& EOL 2006

Page 8: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Members• American Museum of Natural History• Botany Library- Harvard• British Natural History Museum, UK• Field Museum*• MBLWHOI Library*• Missouri Botanical Gardens*• Museum of Comparative Zoology-Harvard*• New York Botanical Gardens• Royal Botanical Gardens @ Kew ,UK• Smithsonian Museum of Natural History*

– University of Illinois, contributing member– EOL Founding INstitution

Page 9: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Why BHL now ?

• Legacy Taxonomic Literature available in museums has limited access

• Much of it is rare

• Systematic literature depends on the historic literature

• The cited half-life of natural history is longer than that of any other scientific domain (TAXA TOY)Pre 1923

• 90% of Biodiversity Information is in these libraries

• 90% of Biodiversity is in 3rd world countries like Africa and South America

Page 10: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

The Open Content Alliance (OCA) represents the collaborative efforts of a group of cultural, technology, nonprofit, and governmental organizations from around the world that will help build a permanent archive of multilingual digitized text and multimedia content.

Page 11: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Principles of OCA

• The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors.

INTERNET ARCHIVE• Contributors will determine the terms and conditions under which their collections are

distributed and how attribution should be made.

• IA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible.

•• IA will offer collection and item-level metadata of its hosted collections in a variety of formats.

• IA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive.

• Copies of IA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all.

Page 12: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Name:

BioDiversity Heritage Library

Wiki- for all involved

Web Presence! Where to begin?

Page 13: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

NAME of Consortium -

BioDiversity Heritage Library

Web Presence!

Page 14: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 15: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 16: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 17: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 18: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 19: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 20: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 21: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 22: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 23: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 24: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 25: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

In the end… simplicity…• http://bhl.si.edu/

Page 26: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

• BHL invited to be a part of the EOL project.

• EOL - build one web page for each known species… 1.8 million!

• Alfred P. Sloan and Macarthur Foundations

Page 27: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

QuickTime™ and a decompressor

are needed to see this picture.

Page 28: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Page 29: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Principles of OCA

• The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors.

INTERNET ARCHIVE• Contributors will determine the terms and conditions under which their collections are

distributed and how attribution should be made.

• IA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible.

•• IA will offer collection and item-level metadata of its hosted collections in a variety of formats.

• IA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive.

• Copies of IA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all.

Page 30: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Northeast Digitization Center• Boston Public Library

– Space infrastructure• 10 Scanning Stations• .10 ¢per page• 50 Books per day• Journals- metadata,foldouts• Transportation

– ILL deliverymoving company15 rolling carts per trip

Cathy Norton, Bernie Margolis, Brewster Kale

Photo by lesveilleus 9/20/07

Page 31: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Economies of Scale• North East Regional Digitization

Center• Agreements made with the Boston

Public Library to Include the Boston Library Consortium and NE BHL members.

• Smithsonian and Library of Congress

• Field Museum of Ill• BNH UK and Kew UK*

Barbara Preece, Exec Director, BLC

Cathy Norton-MBLWHOI Library

Page 32: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

10 scribes

BPL

Page 33: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Biology Digitization Projects Problems, Dilemmas,Puzzles,Difficulties

• Copyright - Pre 1923, 1923-1964, orphan works, out-of-print– Stanford University Copyright Renewal Database

• Permissions• Collaboration with publisher, societies, institutions, etc.• Duplicates, journals 85,000 - 14,000 BID LIST• Monographs, collection analysis-- Ref Works

Page 34: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Name Changes over Time

Taxonomic Intelligence

Page 35: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”

~ Grimaldi & Engel, 2005, Evolution of the Insects

Page 36: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

• Information about named groups (taxa) of organisms (taxon-related information)

• Extends back at least 1000 years

• Books, journals, surveys• Museum specimens,

herbaria• In many languages and is

distributed

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

Page 37: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

The challenge for contemporary DIGITAL libraries

But … names of organisms change over time

Goal:

Use one name to find the content for all names

Page 38: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Names are even misspelled, such as Loligopealei

Loligo pealeiiLoligo pealiiLoligo pealei

Page 39: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Homonyms and polysemes VirginiaPeoplePlacesAnimals

And of course Anorexia nervosaHabeas corpus, and Etcetera etcetera

Peranema– the fern

Peranema– the euglenid

Page 40: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Libraries

PublishersMuseums

Federal Agencies

Search engines

Federated databases

Students and researchers

106000515358003371215585018700Red spotted newt

COML

Page 41: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Serious challenges in federated environments

One organism

4 scientific names

4 maps

We want one map

Page 42: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

• Metadata – such as names – provided the power to index and search

• Classifications allowed us browse, navigate, and run hierarchical searches

Classifications

Page 43: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Reconciliation – linking alternative names for the same organism

A query initiated with any name, can be expanded to all names and will unify data associated with each

Page 44: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

And the other issueconnecting ALL data about all organisms together?

• Data stores – mostly was not happening (despite the success of Genbank)

• Search engines – not taxonomically intelligent and missed 90%• Hyperlinks – slow, tedious, and unstable• Dynamic links – using variables, databases, and code (e.g.

micro*scope)• Federation – cluster of partners playing by the same rules (e.g.

OBIS)• Data transfer standards – rules that anyone can use (e.g. DiGIR,

TAPIR, UBDB)• API’s – spigots from databases• Aggregation (mashups) – the chosen way

Page 45: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms

• All names & all Classifications ClassificationBank

• Alternative names reconciled

• Similar names disambiguated

• Exploit hierarchies to browse and search, build a comprehensive classification

• Improve performance with federated systems

• Read documents, web sites, databases and taxonomically indexing the content

• Create a unified portal to information about organisms on the internet

Page 46: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomically intelligent aggregation technology builds portals to distribute information about

organisms

• There are many resources out there, but no single comprehensive resource for species information

• Rather than building another big database, we can create a new way to link existing information using an aggregation portal

• This places little or no burden on data providers

• Protecting ownership and diversity of initiatives

Page 47: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

uBioRSS Taxonomically Intelligent RSS Feed Aggregator

Page 48: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

MBL WHOI Library –Woods Hole authors’publications

Page 49: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

MBL WHOI Library –Woods Hole species publications

Page 50: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomically intelligent scientific text parsing

Page 51: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomically intelligent scientific text parsing

Page 52: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomic intelligence works miracles

• It will benefit any initiative that uses distributed and heterogeneous information about biology

• Distributed content on the same species can be drawn together because different names will be standardized through reconciliation

• We can read documents, find names, catalog and taxonomically index documents

• Produce a framework around which we can organize and assembleremote and local content

Page 53: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

“Taxonomic intelligence”enhances

search

Page 54: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

• Documents go to Internet Archive for OCR and storage

• The documents are added to the BHL collection

• uBio checks the BHL collection for new documents

• The documents are scanned for names

• TaxonFinder adds new strings to Namebank

• Document markup with anchors

• TaxonFinder adds all namebankIDs to Taxonomic Index

• This index is called upon by various applications...

Page 55: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Biological Data Revolution

Biomedical Knowledge Biodiversity Knowledge

Page 56: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Scientific Names

No Complete List of Scientific Names

112,133 741,872�

49,382*

*Scientific Names ≠ Species

Published Variants

Escherichia coli

Objective SynonymsBacterium coliBacillus coli

Mis-spellings

Escheria coli

Page 57: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Taxonomic Knowledge

Page 58: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Data, Data, Everywhere

Page 59: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

The ‘biopipes’ concept

BIOPIPESBIOPIPES

NomenclatorZoologicus

Then, dragged the functions (pipes) you wanted onto your desktop

get data blast get tree get matching clade name

get ITIS preferred name

GoogleEarth

get all names

reconciled search

myEoL page

get subset ofEoL species site

Original publicationinformation

Get originaldescription

And, of course, saved the functionality to apply to the next data

Page 60: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Proceeding Boldly

Page 61: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

ProgressMID 2007 --July• EoL funds started to flow

By - MID 2008• EOL Informatics Teams--Core bioinformatics infrastructure

(Taxonomic Intelligence and high priority marine modules of the Universal Biodiversity Data Bus) will be in place

• BioPipes for OBIS, BOLD, GENBANK, EoL, BHL

• List of most marine genera

• EoL with agreement show content from FishBase, SeaLifeBase, CephBase etc.

• RSS feeds and other alerts established to inform interested parties of new content

Page 62: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Progress

• BHL• 10 Scribes installed in Boston

– MBL/ Harvard/SI/ BNH/MOBOT/Field Museum all scanning

– AMNH/NYBG will use NY PUBLIC– Close to 2 million pages- AS OF NOV 07

Page 63: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

What role will librariesplay once the scanning is

done?

Will you be negotiators like you are now with serials?

Public domain publications restricted for EVER by contract or open?

Page 64: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

Road map

• Be a community• Be a working environment• Be creative• Become and Informatics Center with scientific

appointments.• Think translation not transactions!• Stay alive professionally

Page 65: Applying Taxonomic Intelligence to Digitization Initiatives

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

MBL WHOI LibraryMarine Biological Laboratory

Woods Hole Oceanographic Institution

© 2007 MBLWHOI Library www.mblwhoilibrary.org

AcknowledgmentsNeil SarkarDavid RemsenDavid PattersonDiane Rielinger

Martin KalfatovicTom GarnetGraham HigleyConnie Rinaldo

A.W. Mellon FoundationAlfred P Sloan Foundation

John D and Catherine T MacArthur Foundation