data management for grown ups
Post on 21-Jan-2017
329 Views
Preview:
TRANSCRIPT
Data ManagementFor Grown Ups
Terrell Russell, Ph.D.
@terrellrussell
Senior Data Scientist, iRODS Consortium
Renaissance Computing Institute (RENCI), UNC-Chapel Hill
iRODS Consortium
was created to ensure the sustainability of iRODS and tofurther its adoption and continued evolution. To this end, the Consortiumworks to standardize the definition, development, and release of iRODS-baseddata middleware technologies, evangelize iRODS among potential users,promote new advances in iRODS, and expand the adoption of iRODS-baseddata middleware technologies through the development, release, and supportof an open-source, mission-critical, production-level distribution of iRODS.
Current Members:
RENCI, DICE, Seagate, DDN, Novartis, IBM, Complete Genomics, Wellcome TrustSanger Institute, UCL, Cleversafe, EMC, and the NASA Atmospheric Science DataCenter
The iRODS Consortium
Data Management
Access
Description
Integrity
Replication
Availability - If things are down, nothing else matters
Data Management
Access
Description
Integrity
Replication
Availability
Migration - Hardware changes, format changes
Data Management
Access
Description
Integrity
Replication
Availability
Migration
Recovery - Robust plans for when things go wrong
Data Management
Access
Description
Integrity
Replication
Availability
Migration
Recovery
Provenance - Full record of all related activity
Data Management
Access
Description
Integrity
Replication
Availability
Migration
Recovery
Provenance
Retention - Deleting data on a defined schedule
People with Keys + Notes/Reports
Passwords + Folders + Scripts (Maybe)
Credentials + Metadata + Automation
Policy Enforcement - Through the Years
Four Verticals → Four Case Studies
Health Care & Life Science
Oil & Gas
Media & Entertainment
Archives & Records Management
Health Care & Life Science
Genomics Use Case - Data begins as series of images
from a sequencer, converted to bases (ATCG),
fragmented, aligned, annotated for variants, filtered,
analyzed
Extensive Data Pipelines
Saved State
Diverse Data Products
Share Results
Oil & Gas
Ingest Use Case - As existing storage fills up,
complementary strategies 1) migrate from active to
slower, cheaper archive and 2) add more active.
Traditional HSM has limited flexibility (access date,
physical location, etc.) and additional namespaces
just add more complexity.
Diverse Data Sources
Spread Geographically
Computationally Intense
Media & Entertainment
Born Digital Use Case - New valuable creative
content (movie assets, original musical tracks)
requires large, robust, long-term, flexible,
accessible infrastructure.
Popular Content
Unique
Largely Video and Games
Archives & Records Management
Provenance Use Case - Libraries, museums, and
other cultural institutions have a 100+ year view on
their digital assets. Must maintain archival and
dissemination copies. Lots of metadata.
Cultural Heritage
Original and Derivative Copies
Quality Search and Browse
Four Verticals → Four Case Studies
Health Care & Life Science
Oil & Gas
Media & Entertainment
Archives & Records Management
Open Source Data Management Middleware
iRODS enables data discovery using a metadata catalog thatdescribes every file, every directory, and every storageresource in the data grid.
iRODS automates data workflows, with a rule engine thatpermits any action to be initiated by any trigger on any serveror client in the grid.
iRODS enables secure collaboration, so users only need tolog in to their home grid to access data hosted on a remotegrid.
iRODS implements data virtualization, allowing access todistributed storage assets under a unified namespace, andfreeing organizations from getting locked in to single-vendorstorage solutions.
top related