rogercummings combining snia cloud v7
TRANSCRIPT
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
1/31
PRESENTATION TITLE GOES HERECombining SNIA Cloud, Tape and ContainerFormat Technologies for the Long Term
Retention of Big Data
Roger Cummings, AntesignanusCo-Author: Simona Rabinovici-Cohen, IBM Research Haifa
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
2/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data 2013 Storage Networking Industry Association. All Rights Reserved.
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unlessotherwise noted.
Member companies and individual members may use this material inpresentations and literature under the following conditions:Any slide or slides used must be reproduced in their entirety without modification
The SNIA must be acknowledged as the source of any material used in the body ofany document containing material from these presentations.
This presentation is a project of the SNIA Education Committee.
Neither the author nor the presenter is an attorney and nothing in thispresentation is intended to be, or should be construed as legal advice or anopinion of counsel. If you need legal advice or a legal opinion pleasecontact your attorney.
The information presented herein represents the author's personal opinion
and current understanding of the relevant issues involved. The author, thepresenter, and the SNIA do not assume any responsibility or liability fordamages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
3/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.3
Abstract
Combining SNIA Cloud, Tape and Container FormatTechnologies for the Long Term Retention of Big
DataGenerating and collecting very large data sets is becoming a necessity in many domains thatalso need to keep that data for long periods. Examples include astronomy, atmosphericscience, genomics, medical records, photographic archives, video archives, and large-scalee-commerce. While this presents significant opportunities, a key challenge is providingeconomically scalable storage systems to efficiently store and preserve the data, as well asto enable search, access, and analytics on that data in the far future.
Both cloud and tape technologies are viable alternatives for storage of big data and SNIAsupports their standardization. The SNIA Cloud Data Management Interface (CDMI) providesa standardized interface to create, retrieve, update, and delete objects in a cloud. The SNIALinear Tape File System (LTFS) takes advantage of a new generation of tape hardware toprovide efficient access to tape using standard, familiar system tools and interfaces. Inaddition, the SNIA Self-contained Information Retention Format (SIRF) defines a storagecontainer for long term retention that will enable future applications to interpret stored dataregardless of the application that originally produced it.
This tutorial will present advantages and challenges in long term retention of big data, as wellas initial work on how to combine SIRF with LTFS and SIRF with CDMI to address some ofthose challenges. SIRF with CDMI will also be examined in the European Union integratedresearch project ENSURE Enabling kNowledge, Sustainability, Usability and Recovery forEconomic value.
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
4/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies Cloud Data Management Interface (CDMI) Linear Tape File System (LTFS)
Self-contained Information Retention Format (SIRF)
Combining SNIA technologies SIRF Serialization for CDMI SIRF Serialization for LTFS
EU Enabling kNowledge, Sustainability, Usability and
Recovery for Economic Value (ENSURE)Summary
4
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
5/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Big Data..
Really is BIG..
2.5 quintillion (1018) bytes of new data created per day in 2012
(source IBM)And the move to the Internet of Things is only going
to increase this volume
19.8 Billion connected devices by 2020 (source McKinsey)
Only 4.2 billion smartphones and tablets, 3.4 billion PCs
Data analytics is improving all the time
Therefore historical information has significant value
Apply new techniques and algorithms to gain new insights
Need to ensure ALL necessary information is captured to extract full value
Therefore Big Data has similarities to (long term)
preservation
5
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
6/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
The Need for Digital Preservation of Big Data
Regulatory compliance and legal issues
Sarbanes-Oxley, HIPAA, FRCP, intellectual property litigation
Emerging web services and applicationsEmail, photo sharing, web site archives, social networks, blogs
Many other fixed-content repositories
Scientific data, intelligence, libraries, movies, musicDomains that have Big Data require preservation
6
M&E
Film Masters, Out
takes. Related
artifacts (e.g.,
games). 100 Years
or more
X-rays are
often stored for
periods of75yearsRecords of
minors are
needed until 20 to
43 years of age
Healthcare
Scientific and
CulturalSatellite data is
kept for ever
We would like tokeep digital art
for ever
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
7/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SNIA Survey from 2007
What does Long-Term Mean?Retention of 20 years or more
is required by 70% of responses.
1.9%
12.3%
15.7%
13.1%
18.3% 38.8%
0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0%
>3-6 Years
>7-10 Years
>11-20 Years
>21-50 Years
>50-100 Years
>100 Years
Top External Factors DrivingLong-Term Retention Requirements:
Legal Risk, Compliance Regulations,
Business Risk, Security Risk
Legal Risk
Legal Risk
Compliance
Requirements
Compliance
Requirements
Business Risk
Business Risk Security Risk
Security Risk
Other
0% 10% 20% 30% 40% 50% 60%
Percent of Respondents
Concern with ligitation
protection
Meeting regulatory
requirements
Meeting regulatory
requirements
Protection from compliance or
legal fines
Retaining history for
competitiveness or protection
Protection of business or
intellectual assets
Protection of customer privacy
Preservation of business history
Legal Risk
Legal Risk
Compliance
Requirements
Compliance
Requirements
Business Risk
Business Risk Security Risk
Security Risk
Other
0% 10% 20% 30% 40% 50% 60%
Percent of Respondents
Concern with ligitation
protection
Meeting regulatory
requirements
Meeting regulatory
requirements
Protection from compliance or
legal fines
Retaining history for
competitiveness or protection
Protection of business or
intellectual assets
Protection of customer privacy
Preservation of business history
Source:SNIA-100 Year Archive Requirements Survey, January 2007.
7
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
8/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Goals of Digital Preservation
Digital assets stored now should remain
AccessibleUndamaged
Usable
For as long as desired beyond the lifetime of
Any particular storage system
Any particular storage technology
And at an affordable cost
8
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
9/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Real Life Example Problem
To: [email protected]: [email protected]
Subject: Something or other
2003
To: [email protected]: [email protected]
Subject: Something else
2007
To: [email protected]: [email protected]: Something or other
To: [email protected]: [email protected]: Something else
Same people?? Could you PROVE it 20 years on?
9
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
10/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies Cloud Data Management Interface (CDMI)
Linear Tape File System (LTFS)
Self-contained Information Retention Format (SIRF)
Combining SNIA technologies SIRF Serialization for CDMI
SIRF Serialization for LTFS
EU Enabling kNowledge, Sustainability, Usability and
Recovery for Economic Value (ENSURE)Summary
10
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
11/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Cloud Data Management Interface (CDMI)
Being developed by SNIA CDMI TWG
The CDMI standard defines an interoperable format for moving dataand associated metadata between cloud providers
CDMI data objects can be accessed by standard browsers and
internet tools (subject to owners access control lists)
CDMI data objects may order data services from the cloud
Secure Erasure, Encryption, Replication, Retention,
Backup/Restore, Tiering, Hashing, Preservation, etc. (extensible)
Done through Data System Metadata (key/value) on the
Containers or ObjectsHas several implementations including OpenStack
11
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
12/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Model for the CDMI Interface
Resources accessed
through RESTful interface:
12
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
13/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Linear Tape File System (LTFS)
A file system implemented on dual-partition linear tape:
Index Partition and Data Partition
Index Partition is small (2 wraps, 37.5 GB out of 1.5 TB on LTO5)
Data Partition is remainder of the tape
File System module that implements a set of standard file system
interfaces
Implemented using FUSE On Linux and Mac OS X
Windows implementation uses FUSE-like framework
Includes an on-tape structure used to track tape contents
XML Index Schema
Format becoming the standard for linear tape
Formal standardization through SNIA LTFS TWG
13
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
14/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Logical View of LTFS Volume
B
O
T
E
O
T
Index Partition
Data Partition
Guard Wraps
LTFSXMLIndex
File File File
File File
14
Check out SNIA Tutorial:
Big Data Storage Options
for Hadoop
Check out SNIA Tutorial:
Protecting Data in the "Big
Data" World
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
15/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF: Self-contained Information Retention
Format
An Analogy
Standard physical archival box
Archivists gather together a group of relateditems and place them in a physical box container
The box is labeled with information about itscontent e.g., name and reference number, date,contents description, destroy date
SIRF is the digital equivalent
Logical container for a set of (digital)
preservation objects and a catalog
The SIRF catalog contains metadata related tothe entire contents of the container as well as to
the individual objects
SIRF standardizes the information in the
catalog
Photo courtesy Oregon State Archives
Being developed by SNIA Long Term Retention (LTR) TWG
15
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
16/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF Properties
SIRF is a logical data format of a storage containerappropriate forlong term storage of digital information
A storage container may comprise a logical or physical storagearea considered as a unit. Examples: a file system, a tape, a block device, a stream
device, an object store, a data bucket in a cloud storage
16
Required Properties
Self-describing can be interpreted by differentsystems
Self-contained all data needed for the
interpretation is in the container Extensible so it can meet future needs
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
17/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF Components
17
A SIRF container includes:
A magic object: identifies
SIRF container and itsversion
Multiple preservation
objects that are immutable
A catalog that is Updatable
Contains metadata to make
container and preservation
objects portable into the
future without external
functions
* Work-in-progress and less mature than CDMI and LTFS
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
18/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies Cloud Data Management Interface (CDMI)
Linear Tape File System (LTFS)
Self-contained Information Retention Format (SIRF)
Combining SNIA technologies SIRF Serialization for CDMI
SIRF Serialization for LTFS
EU Enabling kNowledge, Sustainability, Usability and
Recovery for Economic Value (ENSURE)Summary
18
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
19/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Goals of SIRF Serialization for CDMI/LTFS
SIRF serialization for CDMI/LTFS specify how can a
CDMI container or LTFS Tape become also SIRF-compliant
A SIRF-compliant CDMI container or LTFS Tape
enables future CDMI/LTFS client understand
containers created by todays CDMI/LTFS clientThe properties of the future client is unknown to us today
understand means identify the preservation objects in the
container, the packaging format of each object, its fixities values,
etc. (as defined in the SIRF catalog)
19
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
20/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF Serialization for CDMI: Interface
CDMI API can be used to access the various preservation objects andthe catalog object in a SIRF-compliant CDMI container
ExampleAssume we have a cloud container named "PatientContainer" that is SIRF-compliant
each encounter is a preservation object
each image is a preservation object
the container has a catalog object
We can read the various preservation objects and the catalog object via
CDMI REST API as follows:GET //encounterJan2001
GET //chestImage
GET / PatientContainer>/sirfCatalog
PatientContainer
PatientContainerPO
PO
PO
cat
20
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
21/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
PatientContainer
SIRF magic object :specification=1111
SIRF level = 1Catalog object=sirfCatalog
sirfCatalog{
"encounterJan2001":[
"IDs": [{ ...},]
"Fixity": [{ ...},]
]
"chestImage":["IDs": [{ ...},]
"Fixity": [{ ...},]
]
}
Simple
POSimple
POSimple
POEncounter
Jan2001
Simple PO
Composite PO
cestImage
manifest
cestImage
dicom1
cestImage
dicom1
SIRF Serialization for CDMI
cestImage
manifest
cestImage
dicom1
cestImage
dicom1
cestImage
manifest
cestImage
dicom1
cestImage
dicom1
chestImage
manifest
chestImagedicom1
chestImagedicom1
21
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
22/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF Serialization for CDMI: General
A CDMI Container can be qualified also as a SIRF Container when:
The SIRF magic object is mapped to the CDMI container metadata
and includes, for example, specification ID and version, SIRF level,SIRF catalog object ID.
The SIRF catalog is an object in the CDMI container formatted inJSON
A SIRF preservation object (PO) that is a simple object (containsone element) is mapped to a CDMI data object
The simple object can be a tar/zip
A SIRF PO that is a composite object (contains several elements) is
mapped to:a set of data objects (one for each element) and a manifest data objectthat its content includes the IDs and fixities of the element data objects
22
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
23/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF Serialization for LTFS: General
The SIRF catalog resides in the index partitionLTFS application has rules to indicate what to store in the index partition.
This is used to indicate to store the SIRF catalog in the index partition.
A SIRF preservation object (PO) that is a simple object (contains one element)is mapped to a LTFS file
A SIRF PO that is a composite object (contains several elements) is mapped to:a set of LTFS files (one for each element) and a manifest file that its content includesthe IDs and fixities of the element data objects
.LTFS
index
SIRF
catalog
File Mark
IPLabel
Construct
.Preservation
Object DP
Label
Construct .Preservation
Object
23
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
24/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
SIRF and LTFS: Label Construct
VOL1 Label includes for example volume identifier (6 bytes),implementation identifier (13 bytes), owner identifier (14 bytes).
LTFS Label includes for example creator, volume UUID, blocksize,compression, partitions ids.
The SIRF Label is the magic object and includes for examplespecification ID and version, SIRF level.
VOL1Label
LTFSLabel
SIRFLabel
fixed-size
80 bytes
XML XML
24
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
25/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Outline
Introduction
SNIA technologies Cloud Data Management Interface (CDMI)
Linear Tape File System (LTFS)
Self-contained Information Retention Format (SIRF)
Combining SNIA technologies SIRF Serialization for CDMI
SIRF Serialization for LTFS
EU Enabling kNowledge, Sustainability, Usability and
Recovery for Economic Value (ENSURE)Summary
25
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
26/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
ENSURE is FP7 EU Project in the area of preservation
Three year Integrated Project (IP) started Feb. 1, 2011
Consortium of 13 partners (industry and academic) ENSURE has a business/industry-oriented focus
Drivers for preservation are both regulatory and business value
Demonstrated with three use case: Health Care, Clinical Trials and
Finance Contributions to standards is a goal of the project
26
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
27/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
PDS Cloud and SIRF in ENSURE
27
Preservation DataStores (PDS) in the Cloud provides preservation-aware storage services
for ENSURE based on OAIS
The SIRF Handler component will implement SIRF Serialization for CDMI
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
28/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Summary
Need to retain not only information of interest but ALL
other information to make it fully usable in futurePut it all in the SIRF digital box, preserve that as a unit
No single technology will be usable over the timespans
mandated by current digital preservation needs
SNIA CDMI and LTFS technologies are among best currentchoices
Are good for perhaps 5-10 years
SIRF provides a vehicle for collecting all of the information that
will be needed to transition to new technologies in the futureSIRF can be serialized for the future technologies as they come
28
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
29/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Other Tutorials and Labs and Labs
29
Data Protection, Business Continuity, and Disaster
Recovery - New Technologies
Check out SNIA Tutorial:
Deploying Public, Private,
and Hybrid Cloud Storage
Check out SNIA Tutorial:
Massively Scalable File
Storage
Check out SNIA Tutorial:
Object Storage Systems:
The Underpinning of Cloud
and Big Data Initiatives
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
30/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
Attribution & Feedback
30
Please send any questions or comments regarding this SNIA
Tutorial to [email protected]
The SNIA Education Committee thanks the following
individuals for their contributions to this Tutorial.
Authorship History
Authors (Spring 2013)
Mary Baker
Simona Rabinovici-CohenRoger Cummings
Sam Fineberg
(incorporating materials from earlier tutor ials
dating back to 2008, and with particular
thanks to the 100 Year Archive Task Force
(2007))
Additional Contributors
Mark Carlson (& the Cloud TWG)
David Pease
Joseph WhiteAlan Yoder
-
7/26/2019 RogerCummings Combining SNIA Cloud v7
31/31
Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data
2013 Storage Networking Industry Association. All Rights Reserved.
For further information
SIRF use cases and requirements document is released
for public review
http://www.snia.org/tech_activities/publicreview
More information on SIRF (& other SNIA LTR activities)
is available at
http://www.snia.org/ltr
More information on ENSURE is available @:
www.ensure-fp7.eu
31