ieda: making small data big through interdisciplinary partnerships among long-tail domains

21
Making small Data BIG THROUGH INTERDISCIPLINARY PARTNERSHIPS AMONG LONG-TAIL DOMAINS AGU FM 2014: IN14B-01 1 K. Lehnert 1 , S. Carbotte 1 , R. Arko 1 , V. L. Ferrini 1 , L. Hsu 1 , L. Song 1 , M. Ghiorso 2 , J. D. Walker 3 1 Lamont -Doherty Earth Observatory, Columbia University, Palisades, NY, 2 OFM Research, Seattle, WA 3 University of Kansas, Lawrence, KS

Upload: kerstin-lehnert

Post on 07-Aug-2015

97 views

Category:

Science


0 download

TRANSCRIPT

AGU FM 2014: IN14B-01

Making small Data BIGTHROUGH INTERDISCIPLINARY PARTNERSHIPS AMONG LONG-TAIL DOMAINS

1

K. Lehnert 1, S. Carbotte 1, R. Arko 1, V. L. Ferrini 1, L. Hsu 1, L. Song 1, M. Ghiorso 2, J. D. Walker 3

1 Lamont -Doherty Earth Observatory, Columbia University, Palisades, NY, 2 OFM Research, Seattle, WA3 University of Kansas, Lawrence, KS

AGU FM 2014: IN14B-01

DATA FACILITIES IN A BIG DATA WORLD

2small data

BIG

dat

a

Data Centers & Facilities

X axis: Data VolumeY axis: Data Size

Distributed datasets

Research Data Collections

AGU FM 2014: IN14B-01 33

HOW WE DEFINE ‘BIG’

Volume

Velocity

Variety

Veracity

VALUE

3

“The long tail is a breeding ground for new ideas and never before attempted science.”

(Heidorn, B. 2008: “Shedding Light on the Dark Data in the Long Tail of Science”)

AGU FM 2014: IN14B-01

ADDING VALUE

4

citable

small data

BIG DATA

accessible

integrated

digital data collection

trustworthy repositories

domain standards

interoperable

APIs, OLP,

AGU FM 2014: IN14B-01 5

55

DATA FACILITIES

“acquire, curate, preserve, and/or disseminate data, software, and/or models for one or more defined communities or disciplines”

need to adhere to standards (e.g. ISO 16363, ICSU-WDS) such as

• governance and organizational viability• organizational structure and staffing• procedural accountability and preservation policy framework• financial sustainability• contracts, licenses, and liabilities

AGU FM 2014: IN14B-01 66

DOMAIN-SPECIFIC DATA FACILITIES

“With both content-area and digital curation expertise, domain repositories are uniquely capable of ensuring that data and other research products are adequately preserved, enhanced, and made available for replication, collaboration, and cumulative knowledge building.”

6

“Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories” Interuniversity Consortium for Political and Social Research (ICPSR), 2013

AGU FM 2014: IN14B-01

IEDA: A MULTI-DISCIPLINARY DATA FACILITY FOR LONG-TAIL SCIENCE

7

• Many disciplines• geochemistry, marine geophysics, marine geology, geochronology, and more

• Many data types• sensor data and sample-based observations & experiments• raw data (e.g. multi-beam), field data, lab data, derived data, samples• gridded data, point data, time-series data, maps, photos, and more

• File sizes varying from a few kilobytes to terabytes

AGU FM 2014: IN14B-01

DRIVEN BY MULTI-DISCIPLINARY SCIENCE

8

• Ridge 2000

• MARGINS

• GeoPrisms

AGU FM 2014: IN14B-01

FROM RESEARCH DATA COLLECTIONS TO DATA FACILITY

9

“This Cooperative Agreement converts a series of proposal/award-driven activities into a community-based facility that serves to support, sustain, and advance the geosciences by providing a centralized location for the registry of and access to data essential for research in the solid-earth and polar sciences.”

LDEO Data projects funded by NSF OCE, EAR, OPP that were merged into IEDA

AGU FM 2014: IN14B-01

FROM RESEARCH DATA COLLECTIONS TO DATA FACILITY

10

Formal Governance

Robust Infrastructure

Stable Expert Team

Accreditation

Adherence to Community Standards

AGU FM 2014: IN14B-01

IEDA: small data gone BIG

11

IEDA Syntheses 19 x 106 analytical values in EarthChem 2.63 x 106 miles of data from 808 cruises in the

Global Multi-Resolution Topography (GMRT)

IEDA Repositories >500,000 files 47 TB 4 x 106 samples

AGU FM 2014: IN14B-01

12

LAYERED SERVICES:THE EUDAT MODEL

Discipline-specific Services

Users

Common Services

- data publication (DOI)- data submission- data management (investigator) support- integrated data access & visualization- interoperability (web services, RDF linked data, etc.)- community governance- community liaison (E&O)

- Data capture (templates, software tools)- Domain-specific workflows & GUIs- Data products (syntheses)- Community standards- User support & training

AGU FM 2014: IN14B-01

13

IEDA: SCOPE & PARTNERS

EarthChem MGDS

Users (Data contribution & retrieval)

Geochron

IEDA Common Services

Solid Earth Observational DataAreas of expertise: Sensor data & Sample data

AGU FM 2014: IN14B-01

14

IEDA: SCOPE & PARTNERS

EarthChem MGDS

Users (Data contribution & retrieval)

Geochron

IEDA Common Services

LEPR

15

1515

PARTNERSROLES & RESPONSIBILITIES

Operation of partner systems & services

• Day-to-day operation (except sys admin)• Planning improvements & new capabilities

• supported by and in coordination with IEDA Implementation Team)

• Align partner systems with IEDA Common Services • Plan & oversee budget for their activities• Interact with their specific user communities (user support, training,

feedback, etc.)

Participate in IEDA Partner Assembly

• Contributes to strategic planning & development• Contribute to planning & prioritization of IEDA developments & activities• Recommends new opportunities & partnerships • Participate in IEDA governance• Participate in annual Face-- Face meeting

AGU FM 2014: IN14B-01

EXAMPLE

16

IEDA Repository

IEDA Sample Registry

IEDA Sys Op

J.D. Walker (KU):- metadata schemas- user interfaces- web services- community liaison

Geochron

IEDA Common Services

AGU FM 2014: IN14B-01

EXAMPLE

17

IEDA Repository

IEDA Sample Registry

IEDA Sys Op

M. Ghiorso (OFM-Research):- metadata schemas- user interface- web services- community liaison

LEPR

IEDA Common Services

AGU FM 2014: IN14B-01

18

A SCALABLE MODEL

EarthChem MGDS

Users (Data contribution & retrieval)

Geochron LEPR

IEDA Common Services

XX YY. . . . . .

AGU FM 2014: IN14B-01

‘EXTERNAL’ PARTNERSHIPS

19

PartnerPartner

Funded through the Cooperative AgreementFunded outside the CA;contract with IEDA

Users (Data contribution & retrieval)

IEDA Common Services

AGU FM 2014: IN14B-01 20

2020

CONCLUSION

Data facilities can grow small data through partnerships among data efforts in long tail communities

• Maintain the expertise and community liaison of domain-specific data efforts

• Leverage data curation expertise & infrastructure of data facilities

Interdisciplinary Earth Data Alliance

AGU FM 2014: IN14B-01

THE NEW IEDA

21

Interdisciplinary Earth Data Alliance “IEDA strives to be a leading-edge inter-disciplinary data facility

for solid earth data and information, founded in domain-specific data resources, to deliver integrated and streamlined data services that advance

Ocean, Earth and Polar science and education.”