data management overview - xiamen university€¦ · various research and engineering users,...

35
Maciej Telszewski IOCCP Director and Coordinator of GOOS Biogeochemistry Panel With slides from: Meike Becker and Jay Pearlman Data Management Overview Institute of Oceanology of Polish Academy of Sciences, ul. Powstańców Warszawy 55, 81-712 Sopot, Poland Phone: +48 58 731 16 10 / Fax: +48 58 551 21 30, www.ioccp.org Biogeochemistry Panel Biogeochemistry Panel

Upload: others

Post on 30-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Maciej Telszewski

IOCCP Director and Coordinator of GOOS Biogeochemistry Panel

With slides from: Meike Becker and Jay Pearlman

Data Management Overview

Institute of Oceanology of Polish Academy of Sciences, ul. Powstańców Warszawy 55, 81-712 Sopot, Poland

Phone: +48 58 731 16 10 / Fax: +48 58 551 21 30, www.ioccp.orgBiogeochemistry

Panel

Biogeochemistry

Panel

Biogeochemistry

Panel

SystemA broad schematic of a full value chain in

sustained ocean observing programs

Observations

CoordinationIn situ and satellite observations

Global networks and global

approaches

System requirementsApplications/products, knowledge challenges,

phenomena, EOVs, network design

Data

systemsAssembly

and

dissemination

UnderstandingScientific analysis,

indicators

Predicting / ModelingOcean forecast systems

Societal benefit from actionable

informationPolicy, public and private management and individual decisions

AssessingPolicy-relevant scientific

assessmentsServices

[informing]Early warning, forecasts,

short and long term

direct advice

www.ioccp.org/foo

1st International GO2NE Summer School

2-8 September 2019, Xiamen, China

How to take care of data?

Always! Everything! With backup!

And keep it tidy!

So that you (and others) are able to find things also in ten years from now

How to store data related documantation?

In a way nothing gets lost

and that can be understood by others

• Lab-books

• Cruise reports

• Excel spreadsheets or txt files with the data -to be submitted

• netCDF files

Open access to publications and data became a formal requirement by

Funding agencies

Publishers

Researchers

Society

By traditional measures scientific data is already open!

researchers publish their results in peer-reviewed journals

they share data with one another

they present at conferences

collaborate on projects

Sharing data with colleagues and collaborators is the

basic principle of science!

Several international agreements exist for open data

• Good scientific practice in research and scholarship

European Science Foundation (ESF), 2000

It is vital that all primary and secondary data are stored in a secure and

accessible form.

• Principles for dissemination of scientific data

(ICSU/CODATA, 2000)

Scientific advances rely on full and open access to data. … Legal entities

should foster a balance between individual rights to data and the public

good of shared data.

• OECD Principles and Guidelines for Access to Research Data

from Public Funding (2007)

Databases are rapidly becoming an essential part of the infrastructure of the

global science system.

What has to be done prior data archival?

› Get the data in compliance with SOPs!

› Ensure completeness and consistency (reformatting, standardised vocabulary, units)

› Quality control and quality assurance

› Documentation

Types of best practice

Equipment User Manuals

come from developer/manufacturer

Good to assemble and for deployment

Specs often recorded in unrealistic environment

Standard Operating Procedures (SOP)

Very comprehensive one parameter, one platform

description

Describe method and not nuances of specific design

Best Practices (guides / manuals, cookbooks, SOP etc)

Practical knowledge plus elements of two above categories

Often developed for specific environment, phenomenon

or platform

(Certified) Reference Materials and Standards

Provide trusted reference for calibration and quality control

Published Papers

Methodology/protocol described in

a published journal/book article

Best Practices Documents

(OBPS Templates available)Written by practitioner for the

community often used as the basis

for a published peer review article

Training Courses

Face-to face/hands on experience

https://www.oceanbestpractices.org

Why a System for Best Practices?

Best practices bring many benefits:

Quality and consistency of observations

Interoperability of data

Efficiency (don’t re-invent the wheel – cost saving)

Data traceability

Connections between data, models and applications

BUT:

Not all best practice knowledge is documented

They are scattered and can be hard to find

Not stored in a machine readable format

Can be lost when a project ends

Promising methods may not be shared

Work to create a best practice is often not acknowledged

An Ocean Best Practice

System is Needed

https://www.oceanbestpractices.org

The Ocean Best Practices System

Vision:To have agreed and broadly adopted methods across ocean research, operations, and applications.

MissionTo provide a trusted system to support the

collaborative development, sharing, and adoption of

best practices across the ocean community.

Participating Organizations and Programs

RepositoryPeer

Review

Journ

UsersTraining

OBO

Technologies

Components of the

Ocean Best Practices

System

1) A trusted repository

2) Advanced

Technology:

including text

mining, natural

language

processing and

semantic search

3) Sophisticated but

user-friendly web

interface

4) Peer-reviewed

journal linked to the

repository

5) Training materials

supporting the

users and their

experience with the

OBPS

6) A community

forum for users and

providers of best

practices

https://www.oceanbestpractices.org

Observing Approach ☞

Ship

-bas

ed

Re

pe

at

Hyd

rogr

aph

y

Ship

-bas

ed

Un

de

rway

O

bse

rvat

ion

s

Pro

filin

g Fl

oat

s

Mo

ore

d F

ixed

-po

int

Ob

serv

ato

rie

s

Glid

ers

Ship

-bas

ed

Fix

ed-p

oin

t O

bse

rvat

ori

es

Sate

llite

Re

mo

te S

en

sin

g

...

EOV Sub-variable Procedure

I

N

O

R

G

A

N

I

C

C

A

R

B

O

N

Measurement technique☞

pCO2

Deployment & sampling

Data retrieval & formatting

Calibration / validation

Reference materials & standards

Primary quality control

(Near) real-time

Delayed-mode

Secondary quality control

DIC

Total Alkalinity

pH

OXYGEN

NUTRIENTS

TRANSIENT TRACERS

PARTICULATE MATTER

NITROUS OXIDE

STABLE CARBON ISOTOPES

DISSOLVED ORGANIC MATTER

OCEAN COLOUR

O

c

e

a

n

B

e

s

t

P

r

a

c

t

i

c

e

s

https://www.oceanbestpractices.org

25

The Repository – hub of the system

FAIR: Findable, Accessible, Interoperable, Reusable

Discovery and access to relevant and tested methods

● Global, permanent, open access

repository, hosted by IOC/UNESCO

● All elements of the ocean information

value chain.

● DOIs issued, version control, standard

metadata, active links

● Templates supporting uniform

submission and processing

● Notification services to keep track of

updates

BP Webinar May 8 2018

https://www.oceanbestpractices.org

Metadata

Without metadata - all the rest is useless…

Should provide all important information about the dataset.

What do you think should be included in a metadata document?

What would you want to know if you should use data someone else measured and processed?

Metadata – describing your data

Principal investigator(s) (PI), Project(s)who

what

where

when

how

Data types, Parameter [unit]

Methods

Spatial coverage -> geographical positions

Temporal coverage ->

Title, Identifier (DOI)

Reference(s)

Quantities

Sampling event, Campaign, Location

Data archives store your data and metadata but….

What they don’t store (yet):

• Calibration sheets (pre and post deployment)

(for all sensors used in your data reduction)

• Certificates/Calibrations of calibration gases you used

• Specific documentation about your system setup

• Specific circumstances that might have influenced your measurements during the cruise/deployment

There is still a lot of information, only the PI has.

Make sure it doesn’t get lost!

Data archival – important note

Players in data management

› National data archives (e.g. National Oceanographic Data Centres)

› International data archives (e.g. World Data Centres, regional archives)

› Community agreed data archives (e.g. human genome project, CMIP 5 (model intercomparison), OBIS (Ocean Biogeographic Information System))

› Portals/data harvesters: data from various sources (GEO, Copernicus Marine Environmental Monitoring Services, Global Change Master Directory)

› Data products (OBIS, World Ocean Database, GLODAP, SOCAT)

In reality many archives are a mix of the above

National Responsibilities include:

• Receiving data from researchers, performing quality control, and archiving;

• Receiving data from buoys, ships and satellites on a daily basis, processing the data in a timely way, and providing outputs to various research and engineering users, forecasters, experiment managers, or to other centres participating in the data management plan for the data in question.

• Reporting the results of quality control directly to data collectors as part of the quality assurance module for the system.

• Participating in the development of data management plans and establishing systems to support major experiments, monitoring systems, fisheries advisory systems;

• Disseminating data on the Internet and through other means (and on CD-ROM, DVD, etc);

• Publishing statistical studies and atlases of oceanographic variables.

• Providing indicators for the different types of data being exchanged in order to track the progress.

National Oceanographic Data Centres

31

Examples for NODCs

IODE Ocean Data Portal - www.oceandataportal.org

International Council for Science (ICSU) :

World Data Center system (WDCs)

• Founded in 1931 to promote international scientific activityin the different branches of science and its application for the benefit of humanity

• One of the oldest non-governmental organizations

• More than 135 nations adhere to it

• ICSU established the World Data Center system in the1950s

Mission:Data constitute the raw material of scientific understanding. The World Data Center system works to guarantee access to solar, geophysical and related environmental data. It serves the whole scientific community by assembling, scrutinizing, organizing and disseminating data and information

Source: www.iscu.org

Network of ICSU WDCs

•Nuclear Radiation

Tokyo, Japan

WDC Co-ordination Offices

Washington DC, USA

Beijing, China

•Meteorology

Asheville NC, USA

Beijing, China

Obninsk, Russia

•Oceaography

Obninsk, Russia

Silver Spring MD, USA

Tianjin, China

•Paleoclimatology

Boulder CO, USA

•Marine Geology and Geophysics

Boulder CO, USA

Moscow, Russia

•Remotely Sensed Land Data

Sioux Falls SD, USA

•Renewable Resources and Environment

Beijing, China

•Recent Crustal Movements

Ondrejov, Czech Republic

•Airglow

Mitaka,Japan

•Astronomy

Beijing, China

•Atmospheric Trace Gases

Oak Ridge TN, USA

•Aurora

Tokyo, Japan

•Cosmic Rays

Toyokawa, Japan

•Geology

Beijing, China

•Human Interactions in the Environment

Palisades NY, USA

•Ionosphere

Tokyo, Japan

•Earth Tides

Brussels, Belgium

•Geomagnetism

Copenhagen, Denmark

Edinburgh, UK

Kyoto, Japan

Colaba, India

•Glaciology

Boulder CO, USA

Cambridge, UK

Lanzhou, China

•Marine Environmental Sciences

Bremen, Germany

•Rotation of the Earth

Obninsk, Russia

Washington DC, USA

•Satellite Information

Greenbelt MD, USA

•Rockets and Satellites

Obninsk, Russia

•Seismology

Denver CO, USA

Beijing, China

•Solar Radio Emission

Nagano, Japan

•Space Science

Beijing, China

•Space Science Satellites

Kanagawa, Japan

•Solar Activity

Meudon, France

•Soils

Wageningen, The Netherlands

•Sunspot Index

Brussels, Belgium

•Solar Terrestrial Physics

Boulder CO, USA

Didcot Oxon, UK

Moscow, Russia

Haymarket, Australia

•Solid Earth Geophysics

Beijing, China

Boulder CO, USA

Moscow, Russia

Community agreed data archives

• Data from various sources is made available in a

structured matter

• Structure can be on the metadata level or data level

Portals / Data harvesters

36

Marine Data Infrastructure for the

management of large and diverse sets of

data deriving from in situ of the seas and

oceans.

Portal / Data harvester

37

Portal / Data harvester

Portal / Data harvester

38

Organize and maintain data acquisition in

real-time and delayed mode of in-situ

measurements necessary for operational

oceanography

Data products

• Often parameter-centered

• Global synthesis of raw data

• gridded products with or without interpolation

• in uniform format with quality control (e.g. quality flags)

• Often periodically updated with new data (e.g. annual releases)

• often online viewers

• Downloadable in several formats (text, NetCDF, ODV)

• often documented in ESSD articles;

• Fair Data Use Statement;

• Often community activity with numerous contributors worldwide.

Data products – WOD climatologies

Finally!

Endelig!

Wreszcie!

Schließlich!

Finalmente! 最終的に!

Hopea!• A global collection data from

724 hydrographic cruises

• 45 306 stations

• 999 488 sampling depths

• 1972 -2013 GEOSECS-TTO-WOCE-CLIVAR

• Corrected for biases

• Extensively documented

• Released January 2016

Äntligen!

Data products - Ocean Interior Data Synthesis

Biogeochemistry

Panel

1st International GO2NE Summer School

2-8 September 2019, Xiamen, China

Data products - Surface Ocean CO2 Atlas v623 million in situ surface ocean CO2 observations

V6 new

A-E

V6 all

A-E

30°E 150°E 90°W 30°E 150°E 90°W 30°E

60°N

60°S

260 280 300 320 340 360 380 400 420 440(µatm)

Possible problems in retrieving data

• Version conflicts (data is archived in many data centres – in different stages e.g. raw data, quality controlled, etc.)

• Bad documented metadata and data (methods, units, unclear parameter definitions, etc)

• Just metadata is available online – data has to be requested or log-in required

• Metadata is standardised but not data itself

• Naming of cruises varies in many countries > hard to identify same cruises

• Date formats (mm/dd/yyyy; yy/mm/dd; dd/mm/yyyy etc)

• Ways to report the position (Lat/Long, UTM)

• Different export formats (plain text, xml, netCDF, etc)

• Different entities (one data set = data from one cruise or data from one station or data from one sample)

• Data set is too large to be downloaded (e.g. model data)

Keep that in mind when preparing your data for submission and when accessing data from others!

Carbon Data Management

Biogeochemistry

Panel

1st International GO2NE Summer School

2-8 September 2019, Xiamen, China

Global Data Assembly Centre for

Marine Biogeochemistry

• Central access to QCed EOV Inorganic Carbon data

regardless the source

• Collaboration - no competition

• Mirrored inventories ensure sustainability

• IOC UNESCO GOOS and IODE, UNESCO/SCOR's

IOCCP, GEO’s Carbon and GHG Initiative, GOA-ON,

SOLAS, GCP’s GCB, GO-SHIP, ATLANTOS,

COPERNICUS, GDAC ARGO

1st International GO2NE Summer School

2-8 September 2019, Xiamen, China

Biogeochemistry

Panel

Institute of Oceanology of Polish Academy of Sciences, ul. Powstańców Warszawy 55, 81-712 Sopot, Poland

Phone: +48 58 731 16 10 / Fax: +48 58 551 21 30, www.ioccp.orgBiogeochemistry

Panel

Maciej: [email protected] our website: www.ioccp.org