pangaea - dini.de · dini jahrestagung, göttingen – 2017-10-05 what is pangaea? • information...
TRANSCRIPT
DINI Jahrestagung, Göttingen – 2017-10-05
PANGAEA
Data Publisher for Earth & Environmental Sciences
Michael Diepenbroek
DINI Jahrestagung, Göttingen – 2017-10-05
What is PANGAEA?
• Information system for long-term archiving and publication of data from earth & environmental sciences (since 1993)
• Accredited by the „World Meteorological Organisation“ (WMO) as „World Radiation Monitoring Center“ (WRMC) (since 2007)
• Accredited by the „International Council for Science“ (ICSU) as World Data Center „Publisher for Earth & Environmental Science“ (World Data Center) (since 2001)
DINI Jahrestagung, Göttingen – 2017-10-05
PANGAEA - contents
IRD
( gr av/ 10 cm 3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm 3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm 3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm 3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm 3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1
Age (kyr) max. : 233.55 kyr PS1389-3ff
0.0
100.0
200.0
0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100
54° 0' 54° 0'
54°30' 54°30'
55° 0' 55° 0'
55°30' 55°30'
11°
11°
12°
12°
13°
13°
14°
14°
15°
15°
World vector shore line
Grain size class KOLP A
Grain size class KOEHN2
Grain size class KOEHN
Geochemistry
Grain size class KOLP B
Grain size class KOLP DIN
20 m
Scale: 1:2695194 at Latitude 0°
Source: Baltic Sea Research Institute, Warnemünde.
• Integral part of science – More than 160 European to
international projects since 1995 (https://www.pangaea.de/projects)
• highly heterogenous &dynamic
• multidisciplinary
Hydrosphere
Human Dimensions
Biosphere
Cryosphere
Lthosphere
Atmosphere
Number of data sets ~360.000
Number of data items ~14 Billion
Data volume <3 PB
Increase ~5% per year
1.000.000.0002.000.000.0003.000.000.0004.000.000.0005.000.000.0006.000.000.0007.000.000.0008.000.000.0009.000.000.000
10.000.000.00011.000.000.00012.000.000.00013.000.000.00014.000.000.00015.000.000.000
cumulative growth
DINI Jahrestagung, Göttingen – 2017-10-05
DataCite
OCLC
Thomson Reuters
EUR-OCEANS
CARBOOCEAN
OBIS
GBIF
IODP
ICSU WDS
PubMed Central
OpenAire
WMO-IS
PANGAEA – interoperability
Dublin Core
STD-DOI
ISO19115
PANGAEA
data management &
longterm archiving
RDB
catalogues
XSLT
Index
protocols
marshaller
WS (SOAP/WSDL)
Frontends /
portals
Elsevier,Scopus …
OGC CSW
Geoserver (OGC)
OAI-PMH
WS (SOAP/WSDL)
INSPIRE
DOI registration
catalogues
DOI registry
DIF
Dublin Core harvester
ISO19115 harvester
GEOSS
Darwin Core
DIGIR
Darwin Core
DIF
harvester
harvester
gml, kml
PANGAEA
web frontend
GFBio
DINI Jahrestagung, Göttingen – 2017-10-05
PANGAEA – Dissemination of Data & Metadata
DINI Jahrestagung, Göttingen – 2017-10-05
Cross-referencing, linking
Publications
Researchers
Samples
Organisms
Sequences Projects
DINI Jahrestagung, Göttingen – 2017-10-05
Data Publishing – Cross-referencing
DINI Jahrestagung, Göttingen – 2017-10-05
Data Publishing – Cross-referencing
DINI Jahrestagung, Göttingen – 2017-10-05
DINI Jahrestagung, Göttingen – 2017-10-05
DOC
CSV
NetCDF
TXT
XML
XLSX
XLS
GRIB
…
OECD principles and guidelines for access to research data (2007)
• Licenses & persistent identification (DOI) • Quality
QA/QC -> review procedures Harmonization of data -> ontologies
• Efficiency (Meta)data & interoperability standards
(mashine readable)
FITNESS OF USE!
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
…
Data publication - prerequisites
DINI Jahrestagung, Göttingen – 2017-10-05
Data submission
Editorial review &
processing
Archiving author
proof read
Publication registered & citable - DOI
Data Publishing – simplified workflow
DINI Jahrestagung, Göttingen – 2017-10-05
Fitness for Use - Initiatives
• RDA/WDS Data Publishing Workflows WG • Certification of data centers/repositories • FAIR principles • GEO label facets • ESIP Information Quality Cluster
• Literature!
DINI Jahrestagung, Göttingen – 2017-10-05
Fitness for Use - Assessment & Roles
• Certification authority – Reviewers
• Data center / repository – Data editors / reviewers
• User – Downloads, social tagging
Current approaches
F A I R F A I R 2 User Reviews
1 Archivist Assessment
24 Downloads
2 User Reviews
1 Archivist Assessment
24 Downloads
F A I R 2 User Reviews
1 Archivist Assessment
24 Downloads
TrustSeal Repository
TrustSeal Software
TrustSeal Data
5 ★ OPEN DATA
DINI Jahrestagung, Göttingen – 2017-10-05
WDS/RDA Publishing Data IG WDS/RDA Certification of Digital Repositories IG
Assessment of Data Fitness for Use
Helena Cousijn Claire Austin Jon Petters
Michael Diepenbroek
DINI Jahrestagung, Göttingen – 2017-10-05
Lessons learnt
• Multidisciplinarity
• Generic & flexible technical infrastructure
• Flexible business model
• Linkage to international developments
• Moving target!
DINI Jahrestagung, Göttingen – 2017-10-05
Costs
• Overall annual budget -> ~1 Mio Euro
• Staff -> ~24, >2/3 for curation
• Open access
• Basic operation -> host institutions (AWI, marum ~15%)
• Further development -> third party funds
• Curational costs -> third party funds
– Open science policy -> EU, DFG, BMBF