ieda: making small data big through interdisciplinary partnerships among long-tail domains
TRANSCRIPT
AGU FM 2014: IN14B-01
Making small Data BIGTHROUGH INTERDISCIPLINARY PARTNERSHIPS AMONG LONG-TAIL DOMAINS
1
K. Lehnert 1, S. Carbotte 1, R. Arko 1, V. L. Ferrini 1, L. Hsu 1, L. Song 1, M. Ghiorso 2, J. D. Walker 3
1 Lamont -Doherty Earth Observatory, Columbia University, Palisades, NY, 2 OFM Research, Seattle, WA3 University of Kansas, Lawrence, KS
AGU FM 2014: IN14B-01
DATA FACILITIES IN A BIG DATA WORLD
2small data
BIG
dat
a
Data Centers & Facilities
X axis: Data VolumeY axis: Data Size
Distributed datasets
Research Data Collections
AGU FM 2014: IN14B-01 33
HOW WE DEFINE ‘BIG’
Volume
Velocity
Variety
Veracity
VALUE
3
“The long tail is a breeding ground for new ideas and never before attempted science.”
(Heidorn, B. 2008: “Shedding Light on the Dark Data in the Long Tail of Science”)
AGU FM 2014: IN14B-01
ADDING VALUE
4
citable
small data
BIG DATA
accessible
integrated
digital data collection
trustworthy repositories
domain standards
interoperable
APIs, OLP,
AGU FM 2014: IN14B-01 5
55
DATA FACILITIES
“acquire, curate, preserve, and/or disseminate data, software, and/or models for one or more defined communities or disciplines”
need to adhere to standards (e.g. ISO 16363, ICSU-WDS) such as
• governance and organizational viability• organizational structure and staffing• procedural accountability and preservation policy framework• financial sustainability• contracts, licenses, and liabilities
AGU FM 2014: IN14B-01 66
DOMAIN-SPECIFIC DATA FACILITIES
“With both content-area and digital curation expertise, domain repositories are uniquely capable of ensuring that data and other research products are adequately preserved, enhanced, and made available for replication, collaboration, and cumulative knowledge building.”
6
“Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories” Interuniversity Consortium for Political and Social Research (ICPSR), 2013
AGU FM 2014: IN14B-01
IEDA: A MULTI-DISCIPLINARY DATA FACILITY FOR LONG-TAIL SCIENCE
7
• Many disciplines• geochemistry, marine geophysics, marine geology, geochronology, and more
• Many data types• sensor data and sample-based observations & experiments• raw data (e.g. multi-beam), field data, lab data, derived data, samples• gridded data, point data, time-series data, maps, photos, and more
• File sizes varying from a few kilobytes to terabytes
AGU FM 2014: IN14B-01
FROM RESEARCH DATA COLLECTIONS TO DATA FACILITY
9
“This Cooperative Agreement converts a series of proposal/award-driven activities into a community-based facility that serves to support, sustain, and advance the geosciences by providing a centralized location for the registry of and access to data essential for research in the solid-earth and polar sciences.”
LDEO Data projects funded by NSF OCE, EAR, OPP that were merged into IEDA
AGU FM 2014: IN14B-01
FROM RESEARCH DATA COLLECTIONS TO DATA FACILITY
10
Formal Governance
Robust Infrastructure
Stable Expert Team
Accreditation
Adherence to Community Standards
AGU FM 2014: IN14B-01
IEDA: small data gone BIG
11
IEDA Syntheses 19 x 106 analytical values in EarthChem 2.63 x 106 miles of data from 808 cruises in the
Global Multi-Resolution Topography (GMRT)
IEDA Repositories >500,000 files 47 TB 4 x 106 samples
AGU FM 2014: IN14B-01
12
LAYERED SERVICES:THE EUDAT MODEL
Discipline-specific Services
Users
Common Services
- data publication (DOI)- data submission- data management (investigator) support- integrated data access & visualization- interoperability (web services, RDF linked data, etc.)- community governance- community liaison (E&O)
- Data capture (templates, software tools)- Domain-specific workflows & GUIs- Data products (syntheses)- Community standards- User support & training
AGU FM 2014: IN14B-01
13
IEDA: SCOPE & PARTNERS
EarthChem MGDS
Users (Data contribution & retrieval)
Geochron
IEDA Common Services
Solid Earth Observational DataAreas of expertise: Sensor data & Sample data
AGU FM 2014: IN14B-01
14
IEDA: SCOPE & PARTNERS
EarthChem MGDS
Users (Data contribution & retrieval)
Geochron
IEDA Common Services
LEPR
15
1515
PARTNERSROLES & RESPONSIBILITIES
Operation of partner systems & services
• Day-to-day operation (except sys admin)• Planning improvements & new capabilities
• supported by and in coordination with IEDA Implementation Team)
• Align partner systems with IEDA Common Services • Plan & oversee budget for their activities• Interact with their specific user communities (user support, training,
feedback, etc.)
Participate in IEDA Partner Assembly
• Contributes to strategic planning & development• Contribute to planning & prioritization of IEDA developments & activities• Recommends new opportunities & partnerships • Participate in IEDA governance• Participate in annual Face-- Face meeting
AGU FM 2014: IN14B-01
EXAMPLE
16
IEDA Repository
IEDA Sample Registry
IEDA Sys Op
J.D. Walker (KU):- metadata schemas- user interfaces- web services- community liaison
Geochron
IEDA Common Services
AGU FM 2014: IN14B-01
EXAMPLE
17
IEDA Repository
IEDA Sample Registry
IEDA Sys Op
M. Ghiorso (OFM-Research):- metadata schemas- user interface- web services- community liaison
LEPR
IEDA Common Services
AGU FM 2014: IN14B-01
18
A SCALABLE MODEL
EarthChem MGDS
Users (Data contribution & retrieval)
Geochron LEPR
IEDA Common Services
XX YY. . . . . .
AGU FM 2014: IN14B-01
‘EXTERNAL’ PARTNERSHIPS
19
PartnerPartner
Funded through the Cooperative AgreementFunded outside the CA;contract with IEDA
Users (Data contribution & retrieval)
IEDA Common Services
AGU FM 2014: IN14B-01 20
2020
CONCLUSION
Data facilities can grow small data through partnerships among data efforts in long tail communities
• Maintain the expertise and community liaison of domain-specific data efforts
• Leverage data curation expertise & infrastructure of data facilities
Interdisciplinary Earth Data Alliance
AGU FM 2014: IN14B-01
THE NEW IEDA
21
Interdisciplinary Earth Data Alliance “IEDA strives to be a leading-edge inter-disciplinary data facility
for solid earth data and information, founded in domain-specific data resources, to deliver integrated and streamlined data services that advance
Ocean, Earth and Polar science and education.”