eia biodiversity data mobilisation

Post on 10-May-2015

1.028 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GLOBALGLOBALBIODIVERSITYBIODIVERSITYINFORMATIONINFORMATIONFACILITYFACILITY

WWW.GBIF.OWWW.GBIF.ORGRG

Publishing EIA Biodiversity Data:

Technology and Infrastructure

Vishwas Chavan, Nick King and Francois RogersVishwas Chavan, Nick King and Francois RogersGlobal Biodiversity Information FacilityGlobal Biodiversity Information Facility

vchavan@gbif.orgvchavan@gbif.org

Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa2-4 March 2010, Cape Town, South Africa

ContentsContents

EIA Biodiversity Data: Types and formats

Data Capture & Digitisation tools Data Discovery Data Publishing Data Quality & fitness-for-use Data Hosting Centers Community Building Platforms

What are the challenges?What are the challenges?

More data types

More data types

Richer user interface

Richer user interface

Better managementBetter management

Richer contentRicher

content

Better synchronisation

Better synchronisation

Improved discoveryImproved discovery

EIA BIODIVERSITY DATA: TYPES AND FORMATS

EIA BIODIVERSITY DATA: TYPES AND FORMATS

Evidence

Metadata

Taxon names

Taxon concepts

IndicesNomenclatorsNamebanks

BiologyConservationEcologyDistributionPhylogenies ...

GeolocationCountryCollectorDate …

Observation

Voucher specimenBlood sampleDNA BarcodeImageAudioVideo ...

Literature

Species banksBHLPlazi.org...

EIA Biodiversity data are very diverse

DATA CAPTURE AND DIGITISATION TOOLSDATA CAPTURE AND DIGITISATION TOOLS

Florin Pandora

TaxisCassia FieldNote

MandalaATTA

BirdRecorder

Data Capture and Digitisation ToolsData Capture and Digitisation Tools

uBio ToolsuBio Tools

Name recognition tool (FindIT) Author abbreviation resolver Checking classification (TSN name mapper) Deconsrtuct scientific name (ParseIT) Find scientific name (CrawlIT) etc…

http://www.ubio.org

GBIF TemplatesGBIF Templates

Capture data in DwC compatible format Occurrence Data Template Names Data Template

Facilitate authoring ’resource metadata’ Occurrence template Documentation for occurrence templ

ate

GBIF Informatics ArchitectureGBIF Informatics Architecture

Improved accessto Names, Metadata and Primary Biodiversity Data

Distributed GBIF informatics architecture

Faster and easier publishing of data

DATA DISCOVERYDATA DISCOVERY

• GBRDS REGISTRY

• METADATA CATALOGUE

• GBRDS REGISTRY

• METADATA CATALOGUE

GBRDS: Global Biodiversity Resources Discovery System

DATA DISCOVERY:GBRDS REGISTRY

DATA DISCOVERY:GBRDS REGISTRY

GBRDS, a Discovery SystemGBRDS, a Discovery System

ConsumersDataPublishers

Discovering

SearchingRetrieving

DiscoverySystem

Registering

ServicePublishers Others…

That links to resources…That links to resources…

Who? Institutions, Collections …

What?

Where?

When?

How

Data, Services, GUID/LSID…

Location, Access points…

Temporal Scope…

Formats, protocols, qualities

A distributed service ………….. which resolves to information resources

…./

Global Biodiversity Resources Discovery System

Global Biodiversity Resources Discovery System

Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…

Global Biodiversity Resources Discovery System

Global Biodiversity Resources Discovery System

Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…

GBRDS Registry Release: April 2010

DATA DISCOVERY: METADATA CATALOGUES

DATA DISCOVERY: METADATA CATALOGUES

User Perspective

Data Producer Perspective• Document data with minimum effort• Assess the value of the data for others• Bridge the gap between data owners and users• Educate users about the characteristics of the data

Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc

Two perspectives on metadata

• Discover if data exists• Identify source, provenance• Make judgement about data quality and usability before getting it• Minimise costs involved in the search, retrieval, integration and use of the data

Two levels of metadata

Discovery Metadata

Full Metadata

Discover if a resource exists; get information on -• Ownership• Location• How to get further information

Provides a full description of the resource, including -

• Data quality• Data lineage• Full access and exploitation

Natural Collections Descriptions (NCD)

Ecological Metadata Language (EML)

ISO 19115/19139

FGDC Biological Data Profile

Metadata Standards

Dublin Core

MRTG Multimedia Metadata Schema

IPT 1.1 Metadata Profile

DATA PUBLISHINGDATA PUBLISHING

Key Components: the IPTKey Components: the IPT

IPTIPTIPTIPT

The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data

The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data

Data Publisher

Registration (GBRDS) +Publishing of Names, Metadata,Primary biodiversity data etc…

Simple process!Simple process!

The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!

The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!

GBIF Integrated Publishing Toolkit (IPT)GBIF Integrated Publishing Toolkit (IPT)

Open source Java web application Bypasses limitations of traditional wrapper tools in

publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access)

Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata

Documentation and download http://code.google.com/p/gbif-providertoolkit/

Demo site http://ipt.gbif.org

* Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009

IPT Publishes Through…IPT Publishes Through…

More to come….

IPT DemoIPT Demo Screencast of IPT demo GBIF Help Desk (helpdesk@gbif.org)

IPT 1.1 Release: April 2010

NAMES DATANAMES DATA

Scope of the Global Names Architecture

Scope of the Global Names Architecture

Referencing names in Checklists to a common Nomenclatural Index

Checklist Bank – A Name Services brokerage

Checklist Bank – A Name Services brokerage

Global broker of taxonomic data

Index of Taxonomic Catalogues and Annotated Checklists

Extends the GBIF network to support publishing Species-level data

Publishing Checklists to GBIFPublishing Checklists to GBIF

Using Integrated Publishing Toolkit Via pre-composed Spreadsheet templates Exporting according to DwC Archive

format and registering a local data file (self-serve)

GBIF desktop publishing tool Other taxonomic editors (EDIT/ITIS) that

support DwC Archive format

Desktop Annotated Checklist BuilderDesktop Annotated Checklist Builder

Create, manage, publish

Synonymised checklistsVernacular NamesDistribution dataBibliographyType/Specimen data

Mac OS/ Windows

Publishes “GBIF-ready” format

DwC Archive – simple, extensible Text-based format

Q3 2010

Controlled Vocabularies ServerControlled Vocabularies Server

ISO: CountriesISO: LanguageDwC: Basis of RecordDwC: Nomenclatural StatusDwC: Sex (Gender)DwC: Taxonomic StatusIUCN: Threat Status…

vocabularies.gbif.org

Vocabularies publishing platform – Internationalise all GBIF vocabularies

Controlled Vocabularies ServerControlled Vocabularies Server

Create, manage, publish

Extensions to Darwin Core

Extend Occurrence Data

Extend Species Data

vocabularies.gbif.org

Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..

DATA QUALITY & FITNESS-FOR-USEDATA QUALITY & FITNESS-FOR-USE

Fitness-for-useFitness-for-use

• Primary biodiversity data can be used for multiple purposes by various user communities worldwide.

• Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science.

Fitness-for-use varies from one use case to another.....

Data quality assessment and quality control are important components of ‘fitness-for-use’ regime

Loss of Data QualityLoss of Data Quality

At the time of collection

During digitisation

During documentation

During storage and archiving

During analysis and manipulation

During dissemination and presentation

Through the use to which they are put

Issues influencing data qualityIssues influencing data quality

• Accuracy and precision• Completeness• Currency and Timeliness• Update frequency• Consistency• Flexibility• Transparency• Performance measures and targets• Data cleaning• Outliers• setting targets for improvement• Truth in labelling

• Error and bias• Uncertainty• Auditability• Edit Controls• Minimise duplication and reworking of data• Maintenance of original (or verbatim) data• Categorisation can lead to loss of data and quality• Documentation• Feedback• Education and Training • Accountability

Data quality: Responsible PlayersData quality: Responsible Players

Collectors

Custodian or Curator

Aggregator

Publisher

Users

Data Cleaning: definition & framework

Data Cleaning: definition & framework

A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions

General framework for data cleaning

Define and determine error types Search and identify error instances Correct the errors Document error instances and error types;

and Modify data entry procedures to reduce future

errors

Tools and Best PracticesTools and Best Practices

http://mapstedi.colorado.edu/http://manisnet.org/GeorefGuide.html

Tools and Best PracticesTools and Best Practices

GBIF Templates

Best Practice GuidelinesBest Practice Guidelines

All freely availableAll freely available

Best resource…Best resource…

Chapters on Data Quality Data Cleaning Geo-referencing Generalising

sensitive data

http://www2.gbif.org/TM1.pdf

DATA HOSTING CENTERSDATA HOSTING CENTERS

Data Hosting CentersData Hosting Centers

Caters to data publishers without skills & resources

Facilitate long term archival and publishing

GBIF Plans Criteria for establishing DHC Criteria for endorsement of DHC Tools and Best Practices for DHC

Data Hosting CentersData Hosting Centers

COMMUNITY BUILDING PLATFORMS

COMMUNITY BUILDING PLATFORMS

http://community.gbif.org

??Email: vchavan@gbif.org

Skype: vishwaschavan

top related