data quality in gbif spain

21
Data Quality in GBIF Spain Francisco Pando GBIF - España GBIF Mentoring (France, Portugal, Spain) Madrid Meeting GBIF España. Unidad de Coordinación Real Jardín Botánico - CSIC Madrid, January 2014

Upload: dunne

Post on 07-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Data Quality in GBIF Spain. Francisco Pando GBIF - España. GBIF España. Unidad de Coordinación Real Jardín Botánico - CSIC Madrid, January 2014. GBIF Mentoring (France, Portugal, Spain ) Madrid Meeting. GBIF.ES and Data quality. Situation Multipronged approach: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data  Quality  in GBIF  Spain

Data Quality in GBIF Spain

Francisco PandoGBIF - España

GBIF Mentoring (France, Portugal, Spain)Madrid Meeting

GBIF España. Unidad de CoordinaciónReal Jardín Botánico - CSIC

Madrid, January 2014

Page 2: Data  Quality  in GBIF  Spain

GBIF.ES and Data quality

• Situation• Multipronged approach:

– Training program– Cleaning & Validation tool– Tracking– Repository of resources

Page 3: Data  Quality  in GBIF  Spain

Situation in GBIF Spain(2007) 110 out of 122 datasets using “hosting

service” DIGIR crashes Room for improvement

Page 4: Data  Quality  in GBIF  Spain

We needed an strategyGBIF is about science; data has to be usable, and re-usable:

• Training, tools, tracking, facilitating

That is not all:

• Data usefulness depends on each use

• No data record will be prevented from being published on

the basis of its intrinsic quality

• Metadata becomes very important

• Analysis & recomendationsOtegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144

Page 5: Data  Quality  in GBIF  Spain

Training

http://www.gbif.es/formaciondetalles.php?IDForm=46

Last one Feb. 2011 (the 5th) and delivered online using eLearning Platform

Page 6: Data  Quality  in GBIF  Spain

Tool

http://www.gbif.es/darwin_test/Darwin_Test_in.php

Two kind of testsTechnical

Field names, data types, etcAscii characters

ContentCongruence tests

Page 7: Data  Quality  in GBIF  Spain

Tool (Darwin test) deployment

• First stage– Tool used at the coordination unit on "ready to publish

datasets”– Report to data providers

• Technical part: has to be passed• A FYI report (on content)

• 2nd stage– Make it publicly available:

• Offer to all GBIF participants as a services• Request to users of the hosting service to past the Darwin Test

• 3rd stage– Using CoL as controlled vocabulary for scientific names– Using the Colombia’ss AATs (Archivos de autoridad

taxonómica; taxonomic reference archives) as dictionaries for scientific names

– Making Darwin Test multilingual

Page 8: Data  Quality  in GBIF  Spain

DarwinTest DOWNLOADS• http://www.gbif.es/darwin_test/Darwin_test

_in.php• Source code at SourceForge under CCAS

license: http://sourceforge.net/projects/darwintest33/

Page 10: Data  Quality  in GBIF  Spain

Tracking Data quality

The ICA Apparent Quality Index

• Three components (taxonomy, georeferencing, dates)

• Calculated on dataset from DT• Improving over time

http://www.gbif.es/ica.php

Page 11: Data  Quality  in GBIF  Spain

TRAINING: PROMOTE DQ• Extensive training at gbif.es, first

seminar in 2007• International – strong recruitment in the

Americas• Materials made publicly available• Repository of DQ materials: BDQ

Page 12: Data  Quality  in GBIF  Spain

DQ TRAINING

• Started 2007

• On-site workshops:– III GBIF Workshop on Biodiversity Database Quality (2009)

http://www.gbif.es/formaciondetalles.php?IDForm=60

– Etc.

• On-line workshops:– E-learning at GBIF.ES: IV Workshop on Biodiversity Database Quality (2013)

http://www.gbif.es/formaciondetalles.php?IDForm=109

– Etc.

• Video recordings of workshopshttp://www.gbif.es/videos/videos.php

Page 13: Data  Quality  in GBIF  Spain

DQ TRAININGhttp://www.gbif.es/formacion_in.php

Page 14: Data  Quality  in GBIF  Spain

http://www.gbif.es/formaciondetalles.php?IDForm=60http://www.cienciatk.csic.es/

DQ TRAINING: ON SITE

Page 15: Data  Quality  in GBIF  Spain

DQ TRAINING: ONLINE• Started 2010, 7 courses so far• Enrollment: 130 students, 16 countries• Two components:

– ATutor, open during the actual courses• http://elearning.gbif.es/login.php

– AContent, permanent repository for the courses• http://elearning.gbif.es/AContent/home/index.php

SCORM package

Page 16: Data  Quality  in GBIF  Spain

BIODIVERSITY DATA QUALITY HUB

• DQ Resource Locator. Proposed at the GBIF European Node Meeting, 2011

• Compatible with GBIFS’ ORC• Includes tools, thesauri, training materials,

experiences, Prezi presentation• Allows for resource submission• http://www.gbif.es/BDQ

Page 17: Data  Quality  in GBIF  Spain
Page 18: Data  Quality  in GBIF  Spain
Page 19: Data  Quality  in GBIF  Spain
Page 20: Data  Quality  in GBIF  Spain

At your service:

Francisco [Paco] Pando ([email protected])

Director

Unidad de Coordinación de GBIFReal Jardín Botánico - CSICClaudio Moyano 128014 Madrid, Spain

[email protected] Phone: + 34 91 420 3017Fax: + 34 91 420 0157

Page 21: Data  Quality  in GBIF  Spain