research data infrastructure for geochemistry (dfg roundtable)

Post on 18-Jan-2017

277 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Research Data Infrastructure for Geochemistry

iedadata.org

2

Investment2

IEDA 2016-2021: Operation of a Multi-Disciplinary Data Facility for the Earth Science Community• Invited renewal proposal after IEDA

review in 2014/15 • Next 5 years of operating IEDA• $14.4 million

IEDA Data Systems for Geochemistry3

4

IEDA / EarthChem

Community driven Community governance Community engagement & training

Standards compliant (accredited ‘trustworthiness’) Follow data curation standards

QA/QC procedures Unique, persistent identification of data Persistent access of data holdings

Operational procedures (risk management, IP, etc.)

Demonstrated impact on science

4

5

5

Scientific Justification

enable new data intensive science, new cross-disciplinary studies, and new kinds of collaborations.

expand opportunities for scientists, educators, and the public to participate in science.

maximize the return on national research investments.ensure reproducible science: permit verification of

research results.contribute to new science initiatives.

“Data collections provide more than an increase in the efficiency and accuracy of research: they enable new research opportunities.”Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century” (NSB Report, September 2005)

6

Science from EarthChem Data Systems

7Gale et al.

8Gale et al.

9

Data Policies

December 11, 2013

9

Agencies

Societies

Journals

May 9, 2013

February 22, 2013

10

Data Policies

December 11, 2013

10

Concern: Reproducibility11

“The field sciences (e.g., geology, ecology, and archaeology), where each study is temporally (and often spatially) unique, provide exemplars for the importance of

preserving data and samples for further analysis.”

12

Data Policies:

December 11, 2013

12

COPDESSCoalition for Publishing Data in the Earth & Space Sciences

“Connecting Earth Science publishers and Data Facilities to help translate the aspirations of open,

available, and useful data from policy into practice.”

14

14

Data: Publishers’ PerspectiveMany have had supplements for some time.

Difficult to deal with, costlyPDF’s mostly (not searchable, poorly indexed, variable quality)

Require authors to comply with data availability policy; policing

Little guidance on community standards

Want to use and promote repositories, but not well integrated except for a few exceptions

Worried about repository funding and stability

Slide courtesy of Brooks Hanson, AGU Director for Publications

15

15

Statement of CommitmentCOPDESS.orgreaffirm and ensure adherence to our existing journal and

publishing policies…regarding data sharing and archiving...

Signed by ~50 publishers & data facilities

“Earth and space science data should, to the greatest extent possible, be stored in appropriate domain repositories that ... follow leading practices, and can provide additional data services.”

released 15 January. Article in Eos.org https://eos.org/agu-news/committing-publishing-data-earth-space-s

ciences

https://copdessdirectory.osf.io/

To be integrated with re3data.org

Domain-specific Data Facilities17

Science Community

Domain specific Data facility

17

Libraries Archives

CI, Computer Science

Publishers, editors

Discipline-specific data services• Context & provenance metadata

• Semantics• Workflows

Funding Agencies

Data Facilities

Registries

Data curation servicesCI development

18

findableidentification,persistence

accessibleprotection,protocols

context,provenance

re-usableharmonized, machine-readable

interoperableBIG DATA

Adding Value

small data

1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"

Generic Repositories

Data Curation Standards

Community Data Collections

Domain-specific Data Standards

19

findableidentification,persistence

accessibleprotection,protocols

context,provenance

re-usableharmonized, machine-readable

interoperableBIG DATA

Generic Repositories Community Data Collections

Domain Repositories

Adding Value

small data

Unleashing the BIG in small Research Data

Kerstin Lehnert Lamont -Doherty Earth Observatory of Columbia UniversityPalisades, NY, 10964

http://bigdata-madesimple.com/hey-big-data-dont-forget-your-little-data-cousin/

21

Small Data:Pieces of a Puzzle …

1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"

21

1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"

22

… that build a picture

Small Data, Big Science: Example 123

1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"

“Understanding where the dust that's in the atmosphere and oceans comes from can help scientists estimate its impact on earth's climate system.”

Bess Koffman, Michael Kaplan, Steven Goldstein, Gisela Winckler (LDEO), Natalie Mahowald (Cornell)

http://blogs.ei.columbia.edu/2014/03/13/did-new-zealand-dust-influence-the-last-ice-age/

Science Question:Did New Zealand Dust Influence the Last Ice Age?

Small Data - Big Effort or What it takes to generate a few kilobytes of data

ESIP Winter 2016: "Unleashing the BIG in Small Data"

24

1/6/16

ESIP Winter 2016: "Unleashing the BIG in Small Data"

25

25

Small Data, Big Science: Example 2

1/6/16

Science question:Do convergent margin volcanoes really represent continental crust?

“As it is crucial to understand the extent and origin of the compositional difference between central Aleutian lavas and plutons through time and space, this project will map and sample plutonic rocks exposed on the central Aleutians and their coeval volcanic host rocks.”

http://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=135851&org=NSF

ESIP Winter 2016: "Unleashing the BIG in Small Data"

26Small Data - Big Effort or What it takes to generate a few kilobytes of data

1/6/16

• 4 scientists (3 institutions) traveling to Alaska

• 5 weeks on remote islands• a boat (with crew)• a helicopter

Anticipated Data:• ~ 250 samples• ~ 200 major element analyses• ~ 150 trace element analyses• 50 U/Pb zircon geochronology• 30 Ar-Ar ages• 80 Sr, Nd, Hf and Pb isotope analyses

27

28EarthChem Data Systems

Data Data Data Data Data

EarthChem Library

Data Data Data Data Data

PetDB, SedDB EarthChem Portal

Data Publication & Preservation Data Mining & Analysis

InvestigatorsMetadata

Catalog Data & Metadata

Data & Metadata

External SystemsEarthChem Data Managers

29

EarthChem Library

Data Types:- Analytical datasets- Experimental datasets- Macros/tools- Data compilations (syntheses)- Images- Data reports

30DOI to allow proper citation of data

Link to publications

Link to funding source

30

31

Accessible in the EarthChem Library

32

Editors Roundtable Recommendations

Data need to be available in useful format Complete disclosure of data Data in tabular (usable!) format, no .PDF or .jpg No ratios

Sample metadata locations Unique sample identifiers Object classifications

Analytical metadata Method Lab Data quality & reproducibility (reference material measurements)

33

33

Data Templates

LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data

EarthChem Data Templates34

36

NEW!

37

Data Standards: Why?

Re-usability of data

Reproducibility of science

Integration/interoperability of data

38

38

Open Geospatial Consortium (OGC):Observations & Measurements

Observation Result

Feature of Interest

Sampling Sampling Feature

Observation

“Observations commonly involve sampling of an ultimate feature of interest. This International Standard defines a common set of sampling feature types classified primarily by topological dimension, as well as

samples for ex-situ observations.” (OGC O&M 2.0.0 / ISO19156; editor: Simon Cox)

e.g. Station,Transect, Section, Specimen

Observation Data Model v2

39

ODM2 Team:J S HorsburghA K AufdenkampeL HsuA JonesK LehnertE MayorgaL SongD TarbotonI Zaslavsky

Horsburgh et al., Environmental Modelling & Software, Volume 79, 2016.

PetDB40

ESIP Winter 2016: "Unleashing the BIG in Small Data"

41

41

PetDB Data Mining: Search & Filter

1/6/16

Filter by method or concentration

ESIP Winter 2016: "Unleashing the BIG in Small Data"

42

43

EarthChem Collaborations

External EC Portal contributors GEOROC, USGS, MetPetDB, GANSEKI

Critical Zone Observatories

DiamondDB (funded by Sloan Foundation/DCO)

DECADE Portal (funded by Sloan Foundation/DCO) Collaboration with Global Volcanism Program & MAGA

database (C. Cardellini)

Layered Intrusions Database J. van Tongeren (student engagement project)

MoonDB (funded by NASA 2015-2017) Johnson Space Center, C. Neal,

43

44

IEDA Data Rescue Initiative

Data Rescue Mini-awards ($7,000) J. Delano (SUNY Albany), A. Saal, E. Hauri: Apollo samples J. Gill (UCSC, retired): P. Janney (UCT): UCT Mantle Xenolith Collection M. Rhodes (U Mass): Hawaiian Drilling project T. Fischer (UNM): Russian Volcanic Gas Data

International Data Rescue Award in the Geosciences Sponsored by Elsevier Research Data division Awared 2013 (at AGU FM) and 2015 (at EGU GA) Competition for 2016 starting soon

Special Issue of GeoResJ on Data Rescue (volume 6, 2015)

44

EarthChem Portal45

Data Analysis 46

47

Data Analysis 48

Interoperability with LEPR (M. Ghiroso) 49

Results at LEPR 50

Data Analysis 51

52

53

53

EarthCube

Advances coordination, collaboration, and integrationCommunity governance Integrative Activities

Fosters new data communitiesResearch Coordination Networks

Develops and adapts new technologies to structure, transform, integrate, document, harmonize data & metadataBuilding Blocks

top related