global biodiversity information facility - 2013

58
Seminar at the Norwegian Forest and Landscape Institute Global Biodiversity Information Facility (GBIF) A global infrastructure for publishing biodiversity data Dag Endresen and Christian Svindseth GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 17. October 2013

Upload: dag-endresen

Post on 10-May-2015

662 views

Category:

Technology


3 download

DESCRIPTION

Presentation of the Global Biodiversity Information Facility (GBIF), GBIF-Norway and the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken) at the Norwegian Institute for Forestry and Landscape (Skog og Landskap) at Ås outside Oslo on the 17th October 2013. Seminar together with the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken).

TRANSCRIPT

Page 1: Global Biodiversity Information Facility - 2013

   

Seminar at the Norwegian Forest and Landscape Institute

Global Biodiversity Information Facility (GBIF)

A global infrastructure for publishing biodiversity data

Dag Endresen and Christian Svindseth GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 17. October 2013

Page 2: Global Biodiversity Information Facility - 2013

Topics  •  What is GBIF? •  International partners •  Darwin Core terminology •  GBIF data portal and services •  Norwegian collection portals •  Persistent identifiers (PID) •  Data paper

2  

Page 3: Global Biodiversity Information Facility - 2013

Status GBIF data-portal

Oktober 2013

GBIF enables free and open access to biodiversity data online.

We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.

3  

Page 4: Global Biodiversity Information Facility - 2013

GBIF’s  unique  role  •  Registry  of  biodiversity  data  resources.  •  Tools  and  support  for  biodiversity  data  publica?on.  •  Network  development  at  na?onal,  regional  and  

global  levels.  •  Global  virtual  natural  history  collec?on.  •  Cross-­‐domain  linkage  between  data  from  

collec?ons,  ecology  and  genomics.  •  Access  to  global  biodiversity  data  for  GIS  analysis  

and  environmental  monitoring.  –  Aggregated  presence  data  –  Site-­‐based  survey  data  (samples,  presence/absence)  

Slide  by  Donald  Hobern,  2012

4  

Page 5: Global Biodiversity Information Facility - 2013

Norway joined GBIF in February 2004.

The  low  membership  coverage  in  Africa  and  Asia  is  an  important  gap! 5  

Page 6: Global Biodiversity Information Facility - 2013

OECD  Global  Science  Forum  (1999):    “establish  and  support  a  distributed  system  of  interlinked  and  interoperable  modules  (databases,  so6ware  and  networking  tools,  search  engines,  analy:cal  algorithms,  etc.)  that  together  will  form  a  Global  Biodiversity  Informa:on  Facility  (GBIF)”.  

6  

Page 7: Global Biodiversity Information Facility - 2013

   

The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history. Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globally-accessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure.

7  

Page 8: Global Biodiversity Information Facility - 2013

Organisa?onal  partnerships  

•  Some  poten?al  data  collabora?ons  – Taxon  names  and  nomenclature  

•  Catalog  of  Life  (CoL)  •  IPT  to  publish  global  and  regional  species  databases  •  GBIF  infrastructure  to  support  construc?on  of  CoL  

– Biodiversity  literature  •  Biodiversity  Heritage  Library  (BHL)  •  User  annota?ons  to  extract  occurrence  records  •  Link  original  (and  other)  descrip?ons  to  taxonomy  

– Species  informa?on  and  traits  •  Encyclopedia  of  Life  (EoL)  •  Support  EOL  as  global  species  informa?on  aggregator  •  Include  EOL  summary  box  on  each  GBIF  species  page  

Based  on  slide  by  Donald  Hobern,  2012

8  

Page 9: Global Biodiversity Information Facility - 2013

GBIF and GEO Intergovernmental group on earth observations

Data Integration & Interoperability

GBIF provides the infrastructure delivering species occurrence data.

GEO  BON  Biodiversity observation network

9  

Page 10: Global Biodiversity Information Facility - 2013

GIASIP  Global Invasive Alien Species Information Partnership GBIF provides the infrastructure delivering species occurrence data.

Launched at CBD COP11 October 2012 in Hyderabad, India.

10  

Page 11: Global Biodiversity Information Facility - 2013

GBIF and IPBES (Naturpanelet) Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES)

IPBES  provides  informa?on  to  support  policy  decisions  and  scien?fic  research  on  biodiversity.    GBIF  operate  within  data,  informa?on  and  knowledge  domain  of  biodiversity  informa?cs.    GBIF  GBIF  provides  the  infrastructure  delivering  species  occurrence  data  in  IPBES.

Science

Policy

Biodiversity

Data,  informa?on  and  knowledge

IPBES GBIF 11  

Page 12: Global Biodiversity Information Facility - 2013

1.  Information infrastructure –

an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.

2.  Community-developed tools,

standards and protocols – the tools data providers need to format and share their data.

3.  Capacity-building and training

– and access to a global expert community.

12  

Page 13: Global Biodiversity Information Facility - 2013

Common discovery system http://gbrds.gbif.org

Based  on  slide  by  David  Remsen,  GBIF,  January  2012  

gbrds.gbif.org www.gbif.org

13  

Page 14: Global Biodiversity Information Facility - 2013

Architecture  •  Global  Registry  for  resource  discovery.  •  Common  and  documented  data  standards.  – Metadata  – Data  – Vocabularies  

•  Data  Sharing  tools.  •  Common  web  service  methods.  •  Resolvable  iden?fiers.  

Slide  by  David  Remsen,  GBIF,  November  2011  

14  

Page 15: Global Biodiversity Information Facility - 2013

Darwin Core – a vocabulary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715) 15  

Page 16: Global Biodiversity Information Facility - 2013

http://rs.tdwg.org/terms/

Page 17: Global Biodiversity Information Facility - 2013

Unifying  species  data  

Integrated access for records of the occurrence of any species: •  What? •  When? •  Where? •  What evidence? •  Data owner? •  Link to full record

Presence only

Collec/ons  

Ecological  Monitoring   Genomics  

Darwin  Core  

Slide  by  Donald  Hobern,  2012

17  

Page 18: Global Biodiversity Information Facility - 2013

Unifying  species  data  

Integrated access for records of the occurrence of any species: •  What? •  When? •  Where? •  What evidence? •  Data owner? •  Link to full record

Presence only

Collec/ons  

Ecological  Monitoring   Genomics  

Darwin  Core  

Fully compatible with existing Darwin Core data, plus:

•  Which species were recorded together?

•  Which sets of data are directly comparable?

•  Which species were most abundant in each sample?

Presence/absence

Darwin  Core  +  Core  Survey  

Fields    

Sample  Id  Method  Id  

Rela?ve  abundance  ...  

Slide  by  Donald  Hobern,  2012

18  

Page 19: Global Biodiversity Information Facility - 2013

Darwin Core Archive (DwC-A) v  DwC-A publish DwC records including terms

from DwC-A extensions. v  Simple text based format. v  Zipped single file archive.

Germplasm.txt

19  

Page 20: Global Biodiversity Information Facility - 2013

Darwin Core Archive Assistant (GBIF, 2010) The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an up-to-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format.

http://tools.gbif.org/dwca-assistant/ 20  

Page 21: Global Biodiversity Information Facility - 2013

http://tools.gbif.org/spreadsheet-processor/

Page 22: Global Biodiversity Information Facility - 2013

Fitness  for  use  Defini?on  

"The  general  intent  of  describing  the  quality  of  a  par:cular  dataset  or  record  is  to  describe  the  fitness  of  that  dataset  or  record  for  a  par:cular  use  that  one  may  have  in  mind  for  the  data."

 Chrisman,  1991  

Slide  by  Laura  Russell,  VertNet,  September  2011  

22  

Page 23: Global Biodiversity Information Facility - 2013

Improving  fitness-­‐for-­‐use  Aggregate  

Data  Indexes  

Data  Quality  

Expert  Cura/on  

•  Progressive  improvement  – Data  indexes  

•  Centralised  discovery  •  Standardisa?on  of  persistent  iden?fiers  •  Consistent  metadata  

– Data  quality  •  Inconsistencies  within  records  •  Valida?on  against  metadata  •  Outlier  detec?on  •  Metrics  per  record  and  per  data  set  

–  Expert  cura?on  •  Interface  with  taxon  expert  groups  •  Incorporate  findings  of  data  users  •  Need  efficient  researcher-­‐friendly  tools  

Slide  by  Donald  Hobern,  2012

23  

Page 24: Global Biodiversity Information Facility - 2013

Taxonomic  data  Names  are  oeen  the  first  point  of  entry  to  biodiversity  databases.          =>  Risk  of  error  propaga?on    Possible  errors:    • Wrong  iden?fica?on  • Wrong  format  • Spelling  errors  

Slide  by  Laura  Russell,  VertNet,  September  2011  

24  

Page 25: Global Biodiversity Information Facility - 2013

The problem with scientific names

•  No  comprehensive  catalog  of  species  •  Names  ≠  species  •  The  species  problem  –  species  concepts  •  Compe?ng  classifica?ons  /  phylogenies  •  Many  names  for  one  taxon  •  One  name  for  many  taxa  •  ‘Names’  are  more  than  code-­‐compliant  scien?fic  names  

Slide  by  David  Shorthouse,  Canadensys,  January  2013  

25  

Page 26: Global Biodiversity Information Facility - 2013

Proposed solution •  Inclusive  

–  Accommodate  alternate  perspec?ves  

•  Reconcilia?on  –  Map  names  among  and  between  each  other  

•  Disambigua?on  –  Context  to  assign  homonymic  names  to  righmul  place  

Slide  by  David  Shorthouse,  Canadensys,  January  2013  

26  

Page 27: Global Biodiversity Information Facility - 2013

Indexed by GBIF 3 May 2013

Indexed by GBIF 14 January 2013

Improvingdata  quality  

The fish collection at NHM has some longitude latitude columns swapped… Noticed and corrected in April 2013.

(dataset 8102)

27  

Page 28: Global Biodiversity Information Facility - 2013

   

http://www.gbif.org/

New  portal  launched    9  October  2013  

28  

Page 29: Global Biodiversity Information Facility - 2013

Data published through GBIF

Last  updated:  2013-­‐10-­‐02  

A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data, identified through software and processing upgrades.

80

100

120

140

160

180

200

220

240

260

280

300

320

340

360

380

400

420

440

Prim

ary

biod

iver

sity

rec

ords

(m

illio

ns)

29  

Page 30: Global Biodiversity Information Facility - 2013

GBIF data publishers

Last  updated:  2013-­‐10-­‐02  

A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly.

200

220

240

260

280

300

320

340

360

380

400

420

440

460

480

500

520

540

560

580

Num

ber

of in

stit

utio

ns r

egis

tere

d as

GBI

F da

ta p

ublis

hers

30  

Page 31: Global Biodiversity Information Facility - 2013

GBIF citation in research

Last  updated:  2013-­‐10-­‐2013  

57  

43  

61  66  

90  

64  

17  

35  

48  

66   63  

25  

52  

89  

148  

170  

232  

197  

0  

50  

100  

150  

200  

250  

2008   2009   2010   2011   2012   2013  (Jan-­‐Sep)  

No.  of  p

eer-­‐review

ed  pub

lica?

ons  

GBIF  men?oned  

GBIF  discussed  

GBIF-­‐mediated  data  used  

31  

Page 32: Global Biodiversity Information Facility - 2013

GBIF portal:

13,3 million occurrences are located in Norway. Published from 30 countries worldwide.

Page 33: Global Biodiversity Information Facility - 2013

GBIF portal:

12,5 million occurrences published form Norwegian institutes. Covering 180 countries worldwide.

Page 34: Global Biodiversity Information Facility - 2013

Danmark Finland

Norway Sweden

Iceland

Oct  2013   Data  set   Occurences  Denmark   45   9  311  741  Finland   57   14  666  474  Iceland   4   458  705  Norway   85   12  531  207  Sweden   47   43  374  550  

Status  Nordic  GBIF  data  sets  (data  hosted  by…)  

34  

Page 35: Global Biodiversity Information Facility - 2013

“Artskart” provides the national “GBIF”

portal to species occurrences and

specimens in Norway. 35  

Page 36: Global Biodiversity Information Facility - 2013

The site at http://gbif.no provides an

overview of the Norwegian

data sets published to

GBIF.

36  

Page 37: Global Biodiversity Information Facility - 2013

•  Custom data portals for Norwegian collections. •  Upgrade to Darwin Core archives across Norway. •  Persistent identifiers (UUID, QR code). •  Data set metadata descriptions (data paper). •  GIS data server for spatial environment data.

37  

Page 38: Global Biodiversity Information Facility - 2013

Custom  collec?on  portals  

38

Page 39: Global Biodiversity Information Facility - 2013

•  Soeware  from  GBIF  to  implement  online  data  portals  for  biodiversity  data.  

–  Na?onal,  thema?c  or  regional.  

–  Based  on  data  published  using  GBIF  standards.  

39  

Page 40: Global Biodiversity Information Facility - 2013

Different  data  portals  will  implement  very  different  modules  and  

func?onality  to  meet  their  own  needs.  

Slide  by  David  Remsen  (2011)

40  

Page 41: Global Biodiversity Information Facility - 2013

Artskart

UiT

UiB

S&L

Darwin Core Archive

Collec?ons  and  data  sets  published  from  the  data  owner  as  one  single  Darwin  Core  archive  (DwC-­‐A).  Different  data  types  from  the  same  DwC-­‐A  can  be  included  to  different  data  portals.

GBIF Portal

Opportunities with Darwin Core:

Data portal for institute, region, or theme?

41  

Page 42: Global Biodiversity Information Facility - 2013
Page 43: Global Biodiversity Information Facility - 2013

The purpose of identifiers …is to name things,

making it possible to refer to them.

What is an identifier: “Each identifier refers to one and only one thing” (Coyle 2006). “An association between a string and a thing” (Kunze 2003). “A stated association between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007).

43  

Page 44: Global Biodiversity Information Facility - 2013

UUID QR codes for all museum objects at NHM-UiO would provide: •  Machine-readable using an

ordinary smart phone (or PDA). •  Allows for new and efficient

workflows for collection management.

•  Deployment for stable identifiers appropriate for data-basing.

44  

Page 45: Global Biodiversity Information Facility - 2013

Catalog number: O-L-000014, http://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3

45  

Page 46: Global Biodiversity Information Facility - 2013

http://purl.org/nhmuio/id/d91e8253-0ac1-4681-ac69-e50070af86a2

46

Page 47: Global Biodiversity Information Facility - 2013

47 47  

Page 48: Global Biodiversity Information Facility - 2013

48 48  

Page 49: Global Biodiversity Information Facility - 2013

•  Peer  review  op?on  for  biodiversity  data.  •  Authors  get  scien?fic  credit  for  data  publica?on.  •  Mee?ng  concerns  over  data  quality.  •  Mee?ng  concerns  over  data  cita/on  mechanism.  •  Metadata  formats:  Ecological  Metadata  Language  

(EML),  Dublin  Core,  Darwin  Core,  Natural  Collec?ons  Descrip?ons  (NCD)…  

•  Towards  à  Each  data  set  published  through  GBIF  accompanied  by  a  data  paper…?  

49

Page 50: Global Biodiversity Information Facility - 2013

50  

Page 51: Global Biodiversity Information Facility - 2013

Why  publish  your  data    

•  Citable  publica?on  •  Establish  scien?fic  priority  •  Increase  collabora?on  •  Link  data  to  bigger  network  •  Re-­‐use  and  mul?ply  effect  •  Respond  to  funding  requirements  

hqp://biodiversitydatajournal.com/    

Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995

Page 52: Global Biodiversity Information Facility - 2013

Data rescue activity: Many species occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Project with Artsdatabanken

Photo by: Niklas Bildhauer

Page 53: Global Biodiversity Information Facility - 2013
Page 54: Global Biodiversity Information Facility - 2013

Scien?sts  from  Norwegian  ins?tutes  using    

GBIF-­‐mediated  data:

Page 55: Global Biodiversity Information Facility - 2013

Sections (Moen 1999)

PCA Component 1

Zones (Moen 1999)

PCA component 2

Norwegian Vegetation Atlas (Moen 1999) PCA analysis of 54 environmental variables across

Norway versus the National Vegetation Atlas.

“PCA  Norway”  

Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for regional environmental variation in Norway. J. Biogeography 35: 1906-1922.

Based on a slide by Vegar Bakkestuen 55  

Page 56: Global Biodiversity Information Facility - 2013

Modeling  Norwegian  fungi  •  83  fungi  species.  •  10.500  occurrences  

from  the  GBIF  portal.  •  Predic?ve  modeling  

of  species  distribu?on.  

   Wollan,  A.  K.,  Bakkestuen,  V.,  Kauserud,  H.,  Gulden.,  G  and  Halvorsen,  R.  2008.  Modelling  and  predic?ng  fungal  distribu?on  paqerns  using  herbarium  data.  J.  Biogeography  35:2298-­‐2310.      Slide  by  Vegar  Bakkestuen  

Amanita phalloides Catathelasma imperiale

Hygrocybe vitellina Marasmius_siccus 56  

Page 57: Global Biodiversity Information Facility - 2013

Node Personnel Dag Endresen, Node Manager Christian Svindseth, Database manager Fridtjof Mehlum, Research Director Einar Timdal, Associate Professor Vegar Bakkestuen, Researcher Geir Søli, Associate Professor Nils Valland, Artsdatabanken Wouter Koch, Artsdatabanken

57  

Page 58: Global Biodiversity Information Facility - 2013

Thanks for listening!

GBIF Norway

Dag Endresen [email protected]

Christian Svindseth

[email protected]

58