on-line biological data concepts at csiro marine research, australia tony rees & kim finney...
TRANSCRIPT
On-line biological data concepts at CSIRO Marine Research, Australia
Tony Rees & Kim Finney
Divisional Data Centre
CSIRO Marine Research, Hobart, Australia
http://www.marine.csiro.au/datacentre/
Our website: http://www.marine.csiro.au/datacentre/
Pre-existing situation at CMR (before 1997)
• Data in a variety of databases and flat files
• No metadata or digital documentation
• No web access to any data or metadata
• CAAB (taxon coding system) in existence but coverage patchy and compliance variable
Our implementation path
Stage 1 (1997-2000) ...
• Construct a searchable, web-accessible metadata system and start population it with information - MarLIN v1
• Upgrade CAAB to form a comprehensive taxon dictionary for MarLIN (also accessible by SQuID)
• Build a pilot data store and visualisation system with a web-driven GUI (Java applet) - SQuID v1
Stage 2 (2000-) ...
• Build SQuID v2 (onwards) to become a comprehensive data store, with upgraded links to MarLIN and CAAB
• Implement linkage between MarLIN and Australia-wide, distributed metadata search system
Stage 3… ???
Our system overview
Subsets of information shared with other metadata
directory systems
Entry point to data
Display relevant metadata
Data directory(metadatabase)
- holds info at “dataset” level (e.g. survey, species range)
Master data storage (includes index layer) - holds info at the atomic
data level
Taxon dictionary
Digression #1: Taxon matching
• Simplistic view:
– text match on one field (“scientific name”) or two (genus + species)
• More comprehensive approach:
– 10 or more fields required, e.g. in CAAB we define the following:Genus Subgenus Species Qualifier also need to flag: Subspecies - Is botanical or zoological code applicable? Variety - Species name latin or informal (“sp. A”, etc.)? Original Author/s - Has name changed from original? (even if Original Date no revising author/date stored) Revising Author/s Revision Date Authority Addendum
Examples from our database:• Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop
• Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton & Oates, 1983 … a coccolithophorid
• Heteroclinus sp. 1 [in Gomon et al, 1994] .. Kuiter's weedfish
Taxon matching … continued
• We have standardised on taxon codes, rather than names for data storage and matching … names are stored as an attribute of the code (and can be updated in the future as needed)
• Our “CAAB” coding system has evolved over 20+ years - earlier generations of codes are maintained on the system
• New web-based access facility for retrieving latest name for a code, searching for a taxon, etc.
• Same CAAB codes are also used by other marine science/fisheries agencies around Australia
• Facility newly implemented in CAAB to hold ITIS codes, for cross-reference to international systems in the future
CAAB services available
• Retrieve current sci. name, common name(s), taxon code, taxon report
CAABuser
interface
• Initiate a MarLIN search, ITIS report, FishBase report
User searches by scientific name,
common name or taxon code (or portion
thereof)
• List taxa by CAAB category or family
Application-level
requests
• Generate scientific name, common name, current code (if applicable) for a given taxon code
• Call a CAAB taxon report
• List taxa matching query
• Translate an ITIS number to a CAAB code (or vice versa)
CAAB web interface (current version)
Digression #2: taxonomy keywords• CAAB uses “major categories” (mostly = phyla)
• MarLIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of CAAB codes (in current implementation)
• NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS) - c. 50 currently relevant … could also cross-map to GEMET (EC) list (c.200)
EARTH SCIENCE >> Biosphere >> Zoology >> AmphibiansEARTH SCIENCE >> Biosphere >> Zoology >> AnemonesEARTH SCIENCE >> Biosphere >> Zoology >> ArachnidsEARTH SCIENCE >> Biosphere >> Zoology >> ArthropodsEARTH SCIENCE >> Biosphere >> Zoology >> BirdsEARTH SCIENCE >> Biosphere >> Zoology >> CentipedesEARTH SCIENCE >> Biosphere >> Zoology >> CoralsEARTH SCIENCE >> Biosphere >> Zoology >> CrustaceansEARTH SCIENCE >> Biosphere >> Zoology >> EchinodermsEARTH SCIENCE >> Biosphere >> Zoology >> FishEARTH SCIENCE >> Biosphere >> Zoology >> FlatwormsEARTH SCIENCE >> Biosphere >> Zoology >> InsectsEARTH SCIENCE >> Biosphere >> Zoology >> InvertebratesEARTH SCIENCE >> Biosphere >> Zoology >> JellyfishEARTH SCIENCE >> Biosphere >> Zoology >> MammalsEARTH SCIENCE >> Biosphere >> Zoology >> MillipedesEARTH SCIENCE >> Biosphere >> Zoology >> MollusksEARTH SCIENCE >> Biosphere >> Zoology >> ReptilesEARTH SCIENCE >> Biosphere >> Zoology >> RoundwormsEARTH SCIENCE >> Biosphere >> Zoology >> Segmented WormsEARTH SCIENCE >> Biosphere >> Zoology >> SpongesEARTH SCIENCE >> Biosphere >> Zoology >> VertebratesEARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton
EARTH SCIENCE >> Biosphere >> Microbiota >> AmoebaeEARTH SCIENCE >> Biosphere >> Microbiota >> BacteriaEARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green AlgaeEARTH SCIENCE >> Biosphere >> Microbiota >> CiliatesEARTH SCIENCE >> Biosphere >> Microbiota >> CoccolithophoreEARTH SCIENCE >> Biosphere >> Microbiota >> DiatomsEARTH SCIENCE >> Biosphere >> Microbiota >> FlagellatesEARTH SCIENCE >> Biosphere >> Microbiota >> ForaminifersEARTH SCIENCE >> Biosphere >> Microbiota >> MicroalgaeEARTH SCIENCE >> Biosphere >> Microbiota >> MicrophyteEARTH SCIENCE >> Biosphere >> Microbiota >> PhytoplanktonEARTH SCIENCE >> Biosphere >> Microbiota >> PlanktonEARTH SCIENCE >> Biosphere >> Microbiota >> ProtistEARTH SCIENCE >> Biosphere >> Microbiota >> RadiolariansEARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton
EARTH SCIENCE >> Biosphere >> Vegetation >> AlgaeEARTH SCIENCE >> Biosphere >> Vegetation >> Flowering PlantsEARTH SCIENCE >> Biosphere >> Vegetation >> LichensEARTH SCIENCE >> Biosphere >> Vegetation >> MacroalgaeEARTH SCIENCE >> Biosphere >> Vegetation >> MacrophyteEARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton
Taxonomy keyword cross-mapping (examples)
Invertebrates Sponges Jellyfish Anemones Corals Flatworms Roundworms Segmented Worms Mollusks
Arthropods Insects ArachnidsEchinoderms CrustaceansVertebrates Fish Amphibians Reptiles Birds Mammals
invertebrate … S709 poriferan … S744 coelenterate … S737 coral … S738 nematode … S743 annelid … S711 ++ mollusc … S740 cephalopod … S741 gastropod … S742 arthropod … S713 insect … S719 ++ chelicerate … S714 ++ echinoderm … S739 crustacean … S717vertebrate … S649 fish … S754 amphibian … S 650 ++ reptile … S691 ++ bird … S654 ++ mammal … S 664 ++
GCMD list GEMET list
MarLIN - used for data discovery
• MarLIN - based on an Oracle database containing dataset, project, and survey descriptions, plus on-line links to data and web resources
• Holds metadata according to regional (ANZLIC and “Blue Pages”) standards, with additional agency-constructed fields (“extended ANZLIC”)
• Web interface for searching and metadata contribution/update, using HTML, Oracle Web Server and custom PL/SQL application
• Produces lists of datasets, or dataset reports, as requested
• Includes links to pre-formatted data “packets” (now) and to SQuID (in future), for access to the data
NB: no data visualising capability, apart from “thumbnails” showing data extent
MarLIN - behind the scenes
• Some 25+ tables, holding the following:
– text-based fields (e.g. title, abstract, contributors, references, etc.)
– keywords, handled as numeric ID’s (including taxonomic keywords)
– species/species groups, handled as CAAB codes
– spatial extent, handled as bounding coordinates (max and min. latitude and longitude)
– time extent, handled as earliest and latest collection date for items in the dataset
– originator organisation, present custodian, survey, contact person, etc, handled as numeric ID’s
• Initial search set up by keyword/ID type, spatial coordinates, time period (if desired)
• Then search/browse by subject categories, keywords, taxon names, contributing project, vessel/voyage identifier, location of data, etc.
• Free text search also supported
MarLIN search interface
Example MarLIN search result - by taxonomic group
subject categories | custodian organisations | vessels | voyages | projects |taxonomic groups | species | habitats | parameters | equipment
The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,
East=122)
Click on any hyperlink to see the full listing for that item. Invertebrates 4 . . . . Cephalopods 1 . . . . . . Squids 1 . . Crustaceans 2 . . . . Prawns & Shrimps 2 Fishes 4 . . Breams 1 . . Dories 1 . . Leatherjackets 1 . . Perches 3 . . Redfishes 1 . . Roughies 1 . . Snappers 4 . . Whales 1
Example MarLIN search result - by species
subject categories | custodian organisations | vessels | voyages | projects |taxonomic groups | species | habitats | parameters | equipment
The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,
East=122)
Click on any hyperlink to see the full listing for that item.
23 636004 Nototodarus gouldi .. Gould's squid 1
28 786002 Metanephrops boschmai .. Boschma's scampi 1
28 786005 Metanephrops velutinus .. velvet scampi 1
28 821001 Ibacus alticrenatus .. deepwater bug 1
28 821002 Ibacus pubescens .. [a shovel-nosed/slipper lobster] 1
37 118001 Saurida undosquamis .. brushtooth lizardfish 3
37 118016 Saurida sp. 2 [in Sainsbury et al, 1985] .. grey lizardfish 3
37 255004 Gephyroberyx darwinii .. Darwin's roughy 1
37 258002 Beryx splendens .. alfonsino 1
(etc.)
Example MarLIN search result - dataset titles
You searched on the following criteria:
Start year: 1990 End year: 1995 Selected region: Australian North West Shelf CAAB Species: 37 118001 - Saurida undosquamis
There are 3 datasets matching your criteria in MarLIN at this time.Click on the dataset title to view the metadata record for any dataset.
Southern Surveyor Voyage SS 02/90 - Biological Data Overview Southern Surveyor Voyage SS 04/91 - Biological Data Overview
Southern Surveyor Voyage SS 08/95 - Biological Data Overview
------------------------------------------------------------------------------
SQuID - data repository and visualisation tool
• Oracle relational database containing c. 45 tables (present version)
• Holds point, poly-line, and polygon based, geo-referenced data (also time and depth referenced)
• Client runs as Java applet, connects to Oracle data store by Remote Method Invocation (RMI) and JDBC
• Search by spatial coordinates, time period, data “stream” … can subset by survey if desired
• Retrieve atomic-level data for inspection or upload to user’s system
• Basic plotting routines provided, such as:– geographic distribution of data (sampling points, vessel tracks)
– vertical plots (e.g. temperature, salinity, oxygen vs depth)
– time-based plots (e.g. water temperature measurement through a voyage)
– pie charts for catch composition by number or weight
– length-frequency data, aggregated or by sex of individual
• Taxon handling using CAAB codes (system includes legacy data with obsolete codes)
• Links to MarLIN to display relevant metadata
SQuID user interface - version 1.0
Example SQuID search result
SQuID atomic level data - example
Time series data in SQuID
SQuID vs MarLIN / CAAB - two different approaches
SQuID - a data-rich browser environment
• Large files uploaded to the browser to allow interactive functions (zoomable maps, on-demand display of sample details, cursor tracking, browser-generated plots)
• Disadvantages: more complex applet to load, longer waits for queries to be serviced, performance on user’s machine may be limiting
MarLIN & CAAB - a minimal browser environment
• No reliance on JAVA version control, browser plugins etc, no load time at startup
• All processing takes place on the server (can maximise performance there) - less stringent requirements for users in hardware terms
• Disadvantage: less real-time interactivity provided (although some workarounds possible)
… May look at a hybrid solution for SQuID v2 - prioritise what level of interactivity/data upload is really needed, handle more at server level
some considerations for OBIS ...
• For agency-specific reasons, we have arrived at separate metadata/data systems. OBIS might want to integrate these two aspects more fully
• Automated generation/maintenance of metadata might be possible (at least in part) and is certainly desirable
• Where would OBIS metadata reside? (centrally or replicated or fully distributed?) - Australian “ASDD” is an example of a fully distributed system, NASA “GCMD” is a centralised one
• Need to decide on taxon handling for OBIS (names or codes), plus standard(s) for higher level searching
• OBIS software should aim to tolerate a diversity of agency-level systems, while encouraging/facilitating “best practice” data management
The End
CAAB web search