tdwg at the university of tasmania

Post on 16-Apr-2017

338 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Integrating Bio-Data

Lee BelbinManager, Infrastructure Project

TDWG (Biodiversity Information Standards)

Who has heard of GBIF?

GBIF

The Global Biodiversity Information Facility

GBIF

International organisation established to share bio-data

GBIF

Supported by ~42 countries (including Australia) and ~35

international organisations

GBIF

The Australian hub is through ABRS (ABIF)

Who has heard of TDWG?

…that’s what I figured

TDWG

Formerly: The Taxonomic Database Working Group

…but more accurately referred to as

Biodiversity Information Standards

Biodiversity Information Standards

International group responsible for standards and protocols for

sharing bio-data

EOL?

Encyclopedia of Life

ALA?

Atlas of Living Australia

GBIF, EoL, ALA …

…are now or will be based on TDWG standards

So what?

(…hence this talk…)

Science is getting more collegiate

… a good thing.

The Project

US$2 million over 2.5 years (Gordon & Betty Moore

Foundation)

Aim

To improve the standards for sharing 'bio-data'

Why?

The whole is (far) more than the sum of the parts…

PeopleLee Belbin (Hobart: Manager), Roger Hyam (Edinburgh: Systems Architect), Ricardo Pereira (Brasilia: Software Engineer)Donald Hobern (Copenhagen: GBIF & now Manager of the ALA),Stan Blum (San Francisco, TDWG old timer!)

Once … we had paper!

and calculators!

The attitude:

“It’s Mine!”

Then..

… but we are moving to

…far more open sharing and integration of data

This will enable

…more effective environmental and species conservation / management

(among many other things)

To do this, we need effective standards

…using ‘web 2.0’ technologies

Video

‘The web is us’

http://www.youtube.com/watch?v=6gmP4nk0EOE

Standards?

…Good ones are transparent to most who use them

But for your education…I’ll give you a little insight … it will be

good for you.

Promise

Standards to exchange bio-data have three components-

1. An ontology2. GUIDs

3. Transport protocols

1. OntologyIs a data model that represents a formal set of concepts within a domain and the relationships

between those concepts

Ontologies…are the basis of the Semantic Web where objects are given

meaning which computers and humans can understand

Ontologies

…can be used by machines to reason about the objects

within that domain

Resource Description Framework …

RDF is the language of the Semantic Web

ALL data can be stored in the form of ‘RDF triples’ …

subject – predicate (verb) – objectWine – has vintage - 2005

2. GUIDs

Globally Unique Identifiers

GUIDs

Assigned by authorities to their (bio) objects

GUIDs

…Remain attached to data objects(with attribution!)

GUIDs

… When ‘clicked’ return ‘semantic’ metadata / data

GUID of Choice …

Life Science Identifiers(LSIDs)

Transport Protocols

…Map local data to global standards

Transport Protocols

… Enable searching across geographically separated data repositories (based on different

systems)

The transport protocol of choice …

TAPIRTDWG Access Protocol for

Information Retrieval

Transport Protocol

Video

http://www.youtube.com/watch?v=x9404is3RJ8

An Example

Antbase, Google, Genbank, PubMed ‘skimmed’ for RDF

and GUIDs using TAPIR

… Emergent Properties…there are specimens that have been barcoded and which are labelled in GenBank as unidentified (i.e., names like "Melissotarsus sp. BLF m1"), but the same specimen has a proper name in AntWeb (e.g., casent0107665-d01 is Melissotarsus insularis).

We can then use this information to add value to GenBank. For example, a search of GenBank for sequences for Melissotarsus insularis find nothing, but it does have sequences for this taxon, albeit under the name "Melissotarsus sp. BLF m1".

Rod Page

top related