pangaea archiving and publication of scholarly data for the long tail of science
DESCRIPTION
PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science . Michael Diepenbroek. What is PANGAEA?. Information system for long -term archiving and publication of data from earth & environmental sciences ( since 1993) - PowerPoint PPT PresentationTRANSCRIPT
PANGAEAArchiving and Publication of Scholarly Data for the Long
Tail of Science
Michael Diepenbroek
What is PANGAEA?• Information system for long-term archiving and publication of data
from earth & environmental sciences (since 1993)
• Accredited by the „World Meteorological Organisation“ (WMO) as „World Radiation Monitoring Center“ (WRMC)(since 2007)
• Accredited by the „International Council for Science“ (ICSU) as World Data Center„Publisher for Earth & Environmental Science“ (World Data Center) (since 2001)
PANGAEA - contentsIRD
(gra v/1 0 c m3)
Sand(% )
CaCO3(% )
TOC(%)
Radio(%/s an d )
Smect(% /cl a y)
IRD(g ra v/1 0 c m3 )
Sand(% )
CaCO3(% )
TOC(% )
Radio(%/s an d )
Smect(%/c la y )
IRD(gra v/1 0 c m3)
Sand(% )
CaCO3(% )
TOC(% )
Radio(% /sa n d)
Smect(%/c la y )
IRD(gra v/1 0 c m3)
Sand(% )
CaCO3(% )
TOC(% )
Radio(%/s an d )
Smect(% /c la y )
IRD(g ra v/1 0 c m3)
Sand(% )
CaCO3(% )
TOC(% )
Radio(%/s an d )
Smect(%/c la y )
PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1
Age (kyr) max. : 233.55 kyr PS1389-3ff
0.0
100.0
200.0
0 20 0 1 00 0 15 0 0.5 0 50 0 10 0 0 20 0 10 0 0 15 0 0 .5 0 50 0 1 00 0 20 0 1 00 0 15 0 0 .5 0 50 0 1 00 0 20 0 1 00 0 15 0 0.5 0 50 0 10 0 0 2 0 0 10 0 0 1 5 0 0.5 0 5 0 0 1 00
54° 0' 54° 0'
54°30' 54°30'
55° 0' 55° 0'
55°30' 55°30'
11°
11°
12°
12°
13°
13°
14°
14°
15°
15°
World vector shore lineGrain size class KOLP AGrain size class KOEHN2Grain size class KOEHNGeochemistryGrain size class KOLP BGrain size class KOLP DIN20 m
Scale: 1:2695194 at Latitude 0°
Source: Baltic Sea Research Institute, Warnemünde.
• Integral part of science– More than 160 European to
international projects since 1995 (www.pangaea.de/projects)
• highly heterogenous &dynamic• multidisciplinary
HydrosphereLithosphereAtmosphereCryosphere
Total number of data sets ~350.000 Data volume <2 PB Increase ~5% per year
Editorial System
SybaseASE
MiddlewareWebserver
PANGAEAsearchengine
PANGAEA - technical architecture
Harddisk+ tape (silo)
RDB
SybaseIQ
warehouse
IQinterface
Various services
Ticket System
Curators
Users
Portals CARBOOCEAN EUR-OCEANS IODP - SEDIS ICSU WDS portal ESONET/EMSO
Broker function GBIF, OBIS
Sensor webs ESONET/EMSO, Statoil
Conform to global standards ISO19xxx, OGC, W3C, OAI
PANGAEA - interoperability
PANGAEA – interoperability
Dublin Core
STD-DOI
ISO19115
data management & longterm archiving
RDB
catalogues
PANGAEA
XSLT
Index
protocols
marshaller
WS(SOAP/WSDL)
Frontends / portals
Elsevier,Scopus …
OGC CSW
Geoserver(OGC)
OAI-PMH
WS(SOAP/WSDL) ISO690
INSPIRE
DataCite
DOI registration
catalogues
DOI registry
DIFDublin Coreharvester
OCLC
ISO19115harvester
Thomson Reuters
EUR-OCEANS
CARBOOCEAN
GEOSS
Darwin Core
DIGIRDarwin Core
DIF
OBIS
GBIF
harvester
harvester
IODP
gml, kml
ICSU WDS
PANGAEAweb frontend
PubMed
OpenAire
PANGAEA – Dissemination of Data & Metadata
The Long Tail of DataFi
tnes
s of u
se
Total volume of scientific data
Professionally managed & published dataLarge scale monitoring & computed data & disciplinary data centers
Unmanaged & non-public dataData from individual scientists, labs, or smaller projects
Unmanaged open access data
DOC
CSV
NetCDF
TXT
XML
XLSX
XLS
GRIB
…
• Citable & persistent (DOI)• CC-BY License• Quality data
QA/QC -> review procedures
• Efficient usage (Meta)data & interoperability standards
(mashine readable)
• FITNESS OF USE!
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
Data Set
…
Publishing data with PANGAEA
OECD principles and guidelines for access to research data (2007)
Data
time
Article Data
Article
ArticleData
Data
ArticleData
Data publication - citability
Publishing workflow - synchronized
technical review
peer review(incl. data)
submit data sets
archive data sets
send DOI
publish data sets
submit article
publish article
prepare article &related data sets
JOURNAL .
data curator
reviewers
author,data originator
editor
DATA ARCHIVE .
noyes
accepted?
yes
no
accepted?
Impact on citation rates35% to 69%
more citations!
courtesy of Jon Sears (AGU)Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
Collaboration between data centers & science journals
linking editorial workflows linking services
Data Publishing – Cross-referencing
Data Publishing – Cross-referencing
Publishers
Data archiveBibliometrics
CataloguesData archive
Linking infrastructure
Data archive
Data archive
Data archive
…
ICSU WDS perspective
Certified Data Archives
Registries
Bibliometric Services
Catalogues
Web of KnowledgeGoogle ScholarScopus
Thomson ReutersCitation Indexes
CrossrefDataCiteORCIDCrossData
Journals
ICSU WDS
WDS Certification & accreditation Trustworthiness of WDS data
holders and service providers
Evaluation criteria: based on a compilation of international standards and best practices
Certification authority: WDS Scientific Committee
2014/03: 75 members
19
WDS/RDA WGs and IGsFi
tnes
s of u
se
Total volume of scientific data
e-Infrastructures
Scientific research projects
• Publishing workflows• Publishing Services• Incentives (Bibliometrics)• Trusted repositories & services• Cost compensation models
Some conclusions• Publishing data gives benefit to providers and has significant
impact on data quality.• „Fitness of use“ is an important aspect of data quality and a
prerequisite for integrating data from different sources.• Certification is key for the evaluation of the quality of services
and data.• Scalable services are needed to embed data publications into the
current scholarly publishing system