sead prototype: data curation and preservation for sustainability science
DESCRIPTION
A poster presented at ESIP July 2013TRANSCRIPT
The SEAD Prototype: Data Curation and Preservation for Sustainability ScienceBeth Plale, Robert H. McDonald, Kavitha Chandrasekar, Inna Kouper, Indiana University, {plale, rhmcdona, kavchand, inkouper}@indiana.edu
Margaret Hedstrom, James Myers, University of Michigan, {hedstrom, myersjd}@umich.eduPraveen Kumar, Rob Kooper, Luigi Marini, University of Illinois at Urbana-Champaign, {kumar1, kooper, lmarini}@illinois.edu
SEAD Vision and Rationale Serve interdisciplinary and data-driven
research in sustainability science Enable access to publications, data and
people Support new types of analyses with
heterogeneous data Reduce overall cost of data curation and
preservation Capture metadata to provide immediate
value for users, producers and repositories Increase capabilities for research data re-
use
SEAD Use Cases (focusing on curation) Ingestion of heterogeneous data types
(e.g., images, geo-spatial data, and sensor data) and mapping of semantic relationships among the research data collections as well as semantic annotation and tagging.
Support of data discovery through interoperable standards and algorithms, social networking and data publishing.
Enhancements of existing data through automated scientific metadata extraction and data visualization plugins.
Ingestion of new data sets directly via workbench tools.
Curation of data via federated deposit into institutional and disciplinary repositories.
SEAD Prototype
Branded Public Access Active Project Spaces Individual Data Pages
Dat
a pa
ges
Colle
ction
pag
esTa
g –
Sear
ch –
Map
Proj
ect
Sum
mar
yG
eo-W
eb A
ppBr
ande
d Re
posi
tory
Andr
oid
– D
eskt
op
Apps
APIs – Web Services
Role-based Access Control
Data/Metadata Management
Extractors and Indexing
User Management
RDF –Tupelo 2 – Medici – Lucene – Geoserver
MySQL – Local File System
Active Content
Repository
Peop
lePr
ojec
tsPu
blic
ation
sO
rgan
izatio
nsD
ata
Cita
tions
Visu
aliza
tions
APIs – Joseki – Web Services
Jena – RDF
MySQL – Local File System
VIVO
Cura
tor’s
W
orkb
ench
Inge
st P
roce
ssin
gM
atch
mak
ing
Face
ted
Sear
chG
eo-s
patia
l Sea
rch
APIs – Web Services
Metadata ExtractionPersistent IDs
IndexingArchiving
Solr Query (XML)
Geospatial Query
MySQL – Local File System – Solr – PostGIS
Virtual Archive
BagIt Conversion Matchmaker DataONE
Member Node
Acknowledgements
SEAD is funded by the National Science Foundation under Cooperative Agreement #OCI0940824.
SEAD gratefully acknowledges all of our partner participants who have been involved in developing our services framework. This includes the research teams from the following organizations: School of Information, University of Michigan; Department of Civil and Environmental Engineering, the National Center for Supercomputing Applications (NCSA) and UIUC Libraries, University of Illinois at Urbana-Champaign; Data to Insight Center, IU Libraries and School of Informatics and Computing, Indiana University; the Interuniversity Consortium for Political and Social Research (ICPSR); the National Center for Earth-Surface Dynamics (NCED) and the Data Conservancy Project, John Hopkins University.
Currently, SEAD has implemented core functionality for uploading, annotating, and viewing data, linking data to researcher profiles, and mechanisms to package this information and transfer it to institutional repositories or archival cloud storage. The curation pipeline to institutional repositories supports both long-term preservation and search and discovery workflows. The SEAD prototype is currently being tested by ingesting, annotating, and preserving datasets from the National Center for Earth Surface Dynamics (1.6 terabytes of data containing over 450,000 files) which involves transfer of data and metadata between SEAD ACR, VIVO and VA components.
Active Curation, Actionable Data (ACR) Community Exploration,
Research Analytics (VIVO)
People / Projects / Publications Data Citations Organizations Visualized Networks and Community
Dynamics
Policy-Driven Curation Institutional / Cloud / Grid Storage Faceted Search
Data Publication, Preservation and Discovery (VA)
SPRAQL / HTTP
SPRAQL / HTTP BAGIT
User / Entity Management – Analytics