silverton cleversafe-object-based-dispersed-storage

© 2012 Silverton Consulting, Inc. of 7 All Rights Reserved twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

Introduction During the evolution of the IT industry data centers have encountered the development of block oriented, storage area network (SAN) and direct accessed storage (DAS) as well as file oriented, network attached storage (NAS). Recently a new technology has emerged to supplement all this using a new storage paradigm originated in high performance computing environments and based on a new data unit called an object. Objects or data elements, consisting of data and metadata, can solve many of the problems found in current DAS, SAN and NAS solutions. An object repository provides a new way to store, access and manage data, and as such, no longer needs to adhere to traditional storage system restrictions or protocols. This new collection of data elements has become the foundation for a number of sophisticated applications such as active archives, vast content farms, and omni-‐present, cloud storage services to name just a few. The Cleversafe® Dispersed Storage® solution provides unique object storage functionality not found in other vendor offerings. For instance, their product uses an innovative information dispersal approach to distribute data across a number of nodes or locations, supplying a much more robust, fault tolerant system in the face of drive, node and/or site outages.

Why object storage? IT and end user unstructured data rapidly multiplies, often leading to orphaned files, a manageability morass, or worse. When this happens, file systems must be partitioned, data must be moved, and new mount points/shares must be created. All this consumes extra administrative time and causes unnecessary end user confusion. In contrast, an object repository can support vast numbers of data elements. These repositories can easily grow from thousands to billions of objects without partitioning or other application/end user disruptions, all within the same system environment. One may need to add more system nodes to accommodate data growth, but this can all be done without altering data elements, changing accessibility or system outages. Next, customary file and block metadata, or information about data, is defined and controlled by standards committees, making it limited, immutable and thus hard to extend. For example, Internet Engineering Task Force (IETF) defined NFS file

Silverton Consulting, Inc. StorInt™ Briefing

Cleversafe Object-based Dispersed Storage



metadata1 includes such items as the filename, directory path, creation/last-‐open/last-‐modified dates as well as size and file physical location. To change or excise NFS metadata usually involves moving, modifying or deleting the file altogether. That’s about it, there’s typically no capability to extend this file metadata other than by using an associated store alongside the original file system or by encoding additional information into file directory paths. On the other hand, object metadata can be created, added to or modified almost at will. This allows a very flexible, easily adaptable, rich set of information about data that can then be used to help better manage the elements of a repository over its lifetime. Such easy extensibility enables more automation and other services unavailable with conventional storage systems. Another problem with today’s file and block storage is that data can only be accessed within a single IT location. Yes, solutions exist that can extend this beyond the data center boundary but they are historically expensive and very proprietary. Alternatively, objects can be read or written over the Internet. As such, this data can be processed from anyplace around the world with Web access, leading to all sorts of new possibilities and a more disaster tolerant storage solution.

Object characteristics Objects are essentially a package of data along with rich metadata that is identified by a single object-‐ID. Further, data elements are normally read or written sequentially, in one continuous access and may contain any binary information. Equally important, metadata can supply any data about an object and can be easily modified or extended way beyond anything available in today’s file and block storage systems. Thus, application and system designers can define any information needed to help with cataloguing, processing and managing data elements. With such complete versatility, object repositories can be tailored to meet many diverse customer requirements. For example, metadata could be used to identify,

• Lifecycle attributes – in an intelligent archive, objects can be moved to different storage tiers to reduce expense over time as data ages. Lifecycle metadata can be used to identify how aggressively to manage the item or how quickly to move the data down to less expensive tiers of storage.

• Expiration attributes – in a compliance repository, some records may have

different expiration dates than other data. Providing an expiration date at the time of creation can guarantee that important records are not modified or deleted until they have properly expired.

1 Please see http://tools.ietf.org/html/rfc5661#section-‐5.1 for more information




• Processing attributes – in a video library, some clips may need further processing at ingest time, e.g., to transcode to other formats. Supplying processing metadata at time of video clip creation can enable the system to quickly convert the segment into required formats before it’s needed.

Object storage advantages First and foremost, object based storage systems can scale with ease. Most of these systems are multi-‐node clusters, built out of storage, access and management components with a system interconnect between nodes. As such, these systems grow by adding more cluster nodes, scaling from a few TB to multiple PB in the same system environment. Usually storage, access, and management components can be added independently of one another, but to obtain adequate performance one may need to add access elements as capacity grows over time. Some object stores can span multiple sites, creating a geographically dispersed storage system. In this case, there is a storage cluster at each location, which participates in the fully distributed storage system consisting of all sites. With such storage, data is commonly retrievable from one or more sites or from multiple nodes at a single location and as such, is more fault tolerant. In addition, most object stores support REST (REpresentable State Transfer) interfaces. Such protocols underlie today’s World Wide Web and are in action everyday when we browse the Internet. These access conventions are generally considered more loosely coupled than traditional storage interfaces and as a result, are easier to extend. This allows metadata to be easily added to object data and permits access to data elements from anywhere with a link to the Internet. Another benefit of RESTful interfaces is that they are simpler to map to other protocols, e.g., using a file system gateway to access an object store. In this fashion, data elements can be read or written by more standard IT applications that currently employ file or block storage. Object repositories front-‐ended by file gateways like this may sacrifice some advantages such as extensible metadata, but allow data element access to standard applications and current end user computing environments.

Object storage use cases Object stores are ideal to host large quantities of data elements like content storage, content distribution, data archives, and cloud storage. Specifically, • Content storage – media storage solutions can contain millions of media

segments that can overly burden classic file systems with their number and metadata requirements. However, by using an object store, content repositories can support almost any number of MPEG files and can provide the metadata needed to manage all of them. For example, metadata can be supplied for video data such as speech-‐to-‐text translations, facial recognition results, clip abstracts,




etc. With an object repository’s extensible metadata even more information about video fragments can be added to the content storage that would make them more searchable and thus, more discoverable.

• Content distribution – video distribution centers can hold thousands of

videos whose streaming requirements may easily tax the performance of customary file systems. In contrast, object repositories can be implemented across multiple sites, with data residing at many locations to provide quick, regional video streaming. In this way, content distribution could be scaled up to meet whatever video streaming performance required by their customer environment.

• Intelligent data archives – data archives can be built with object storage

that’s almost impossible to supply with file systems alone. Most file data passes through a pre-‐defined access cycle, i.e., data is referenced extensively for the first week to 90 days after creation/modification and then access rates fall off precipitously. By migrating or archiving this data through a multi-‐tier object store as it ages, one can reduce costs using slower storage commensurate with its drop in access intensity.

• Cloud storage – cloud data storage can be hard to support with traditional

data center storage systems. As discussed previously, object repositories with RESTful interfaces are inherently WWW enabled, and thereby, a better cloud-‐based storage medium. Also, with extensive metadata, cloud data services can be tailored to the needs of the data element rather than the limited capabilities of classic storage systems.

Cleversafe object-‐based dispersed storage Cleversafe’s Dispersed Storage Network (dsNet®) solution is an object storage system that spans multiple nodes or geographically dispersed locations and can be deployed as a cluster of hardware appliances or as a software-‐only solution. As such, because of its flexible deployment options, customers can elect to implement their dsNet store on currently owned hardware or purchase a complete integrated and tested storage solution from Cleversafe.

With either approach, Cleversafe functionality is partitioned across the following components:

• dsNet Manager – one of these instances is required to configure, upgrade and monitor the object repository.

• Accesser® -‐ two or more of these instances are required for each Cleversafe

storage site and they provide access to the stored data elements for multiple clients.




• Slicestor® -‐ multiple instances of these components are required for each Cleversafe location and they provide the actual storage for all data elements.

As discussed previously, Cleversafe’s dispersed storage system is built around an information dispersal algorithm that slices up objects and distributes data to multiple storage nodes or locations. The advantages of such an approach include:

• Cost effective data protection – with dispersed storage, a mathematically deduced, minimal amount of check or parity information is added to each slice of data to support fault tolerance for location outages. To be this highly available with conventional storage would require whole replications of the data at multiple sites, significantly increasing storage capacity and thus, system costs.

• Configurable levels of data protection – with the data protection described

above, dsNet data availability levels can be configured to support whatever fault tolerance is required for one’s object store, based on site layouts, network connectivity and storage configuration. Cleversafe data protection can be varied to support 1, 2 or even N site failures, all with a lone parameter change. Naturally this may require more parity but the system automatically takes care of computing and storing the revised check information for all data elements.

• Inherent levels of data security – with dsNet information dispersal no one

location has all of an object’s data as slices are scattered across multiple nodes or sites. In this way even if someone could read all the information at one node, all they would get is pieces of data and parity information with no way of understanding which bits go with what objects. Thus, dispersed storage is inherently more secure than more common object stores that keep all data in consecutive locations within a node.

Moreover, Cleversafe storage is both readily scalable and easily supports billions of data elements. In fact, Shutterfly, a Cleversafe customer, started out with a 217TB store and quickly scaled it to multiple PB, storing over 15 billion objects today.2 Cleversafe also can use a RESTful interface to access its object store along with a defined software oriented API. For the REST access protocol, HTTP oriented PUT, GET, DELETE and LIST commands are used to create, retrieve, delete and identify data elements within the dsNet storage repository. At data element creation, the application issuing the PUT request receives an object-‐ID, which uniquely identifies

2 Please see http://www.cleversafe.com/images/pdf/shutterfly-‐cleversafe-‐case-‐study-‐07142012.pdf for more information




its data and metadata within the repository. Any application using the storage repository is responsible for remembering the object-‐ID returned by Cleversafe. Furthermore, Cleversafe storage solutions provide extensive integrity checking to insure that objects are readily accessible and always correct. This integrity verification activity operates in a continuous and ongoing fashion validating that data in the object repository are always accessible as stored. These same facilities are used at retrieval time to insure that the current and correct data is always read. In addition to the inherent security provided by information dispersal, Cleversafe also offers SecureSlice™ keyless encryption technology. With SecureSlice an object’s data is encrypted and cryptographically signed before being sliced and written to Slicestor(s). Thus, during read back, data can only be decrypted after a predefined threshold of slices have been retrieved, making it impossible for individual portions of data to be read without the whole threshold being present. While Cleversafe provides a very capable, standalone object store, they have partnered with several 3rd party solutions to supply unique, vertical/industry specific data services over the dsNet storage repository. For instance:

• iRODS™ (integrated Rule Oriented Data System) is an open source solution that can integrate with Cleversafe storage to supply automated policy management over data elements. The iRODS data grid application is widely deployed in data intensive research and high performance computing environments throughout the world. This application provides easy scalability, automated management and share-‐ability for large collections of scientific data used by researchers located across the globe.

• QStar Archive Manager is data archiving software that creates a gateway

supporting NFS and CIFS/SMB data center protocol access to Cleversafe’s object store. As such, the QStar archive is presented as a network mountable file share that provides automated storage tiering across high-‐speed disk and the backend dsNet storage as a function of data frequency or age within the system. This data archive was designed to support vast quantities of data and easy scalability from TB to PB without system disruption.

• Mezeo Cloud Storage is an enterprise class, cloud based file sync solution.

The combined Mezeo and Cleversafe solution provides secure, highly available data center file synchronization using cloud storage that enables easier collaboration and intrinsic data protection for enterprise files. Further, as a cloud based storage system, data in the Mezeo and Cleversafe solution can be accessed securely from any Internet enabled location.




Summary In short, Cleversafe dispersed storage implements a highly resilient, object storage solution that goes well beyond traditional IT storage systems. Cleversafe has proven dispersed storage’s high capacity scalability and support for billions of data elements. Just as important, configurable data protection, flexible security and extensible metadata are inherent features of the Cleversafe dsNet system. Furthermore, 3rd party applications exist that enhance Cleversafe storage capabilities to support high performance/scientific research data grids, vast data archives and immense cloud storage systems. Given all this, Cleversafe’s object storage and its application ecosystem provide a compelling set of advanced functionality that supports large data collections, needed by many new and emerging data center solutions.

Silverton Consulting, Inc. is a Storage, Strategy & Systems consulting services company, based in the USA offering products and services to the data storage community.

silverton cleversafe-object-based-dispersed-storage

Documents