dell emc elastic cloud storage and red hat ceph: contrasts ... · pdf filefile storage is...

12
89 Fifth Avenue, 7th Floor New York, NY 10003 www.TheEdison.com White Paper Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts in Object Storage

Upload: buithuy

Post on 16-Mar-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

89 Fifth Avenue, 7th Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

White Paper

Dell EMC Elastic Cloud Storage and

Red Hat Ceph:

Contrasts in Object Storage

Page 2: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

This report was developed by Edison Group, Inc. with Dell EMC assistance and

funding. This report may utilize information, including publicly available data,

provided by various companies and sources, including Dell EMC. The opinions are

those of Edison Group, Inc. and do not necessarily represent Dell EMC's position.

Printed in the United States of America.

Copyright 2016 Edison Group, Inc. New York. Edison Group offers no warranty either

expressed or implied on the information contained herein and shall be held harmless for

errors resulting from its use.

All products are trademarks of their respective owners.

First Publication: November 2016

Produced by: Barry Cohen, Chief Analyst and Editor-in-Chief; Manny Frishberg, Editor

Page 3: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Table of Contents

Introduction .................................................................................................................................. 1

Comparing EMC ECS and Red Hat Ceph Storage ................................................................. 2

About the Platforms.................................................................................................................. 2

Ceph - Open Source Object Storage from Red Hat .......................................................... 2

Elastic Cloud Storage - 3rd Generation Object Storage from Dell EMC ....................... 3

Platform Use Comparisons ...................................................................................................... 4

Bringing It Up - System Planning, Installation and Provisioning .................................. 4

How to Use It - Management and Access Protocols ........................................................ 5

Planning for Trouble - Data Integrity, Replication and Disaster Recovery .................. 6

Keeping It Useful - Support and Upgrades ....................................................................... 7

Conclusion ..................................................................................................................................... 9

Page 4: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 1

Introduction

In the last half century, data storage has evolved from raw data blocks through structured files.

Conventional file systems can only scale so far and even the best clustered file systems have size

and performance limits.

With the explosion of non-structured and imagery data, a new need has arisen to represent and

store data as uniquely addressed objects, often associated with relevant metadata - data about

the data itself. By dispensing with centralized directories and file hierarchies, object storage

systems can scale almost without bound and, with appropriate design, run reliably on

commodity hardware platforms.

Object storage has been strongly popularized by the public cloud services that have emerged,

especially Amazon Web Services and their keystone S3 object storage. While many have

embraced the new cloud data storage models, many others want similar functionality on their

own premises - for reasons of cost, security or intent to compete with cloud providers.

There are many choices of object storage available today, open source and commercial. Many

organizations are specifically curious about choosing open source versus commercial storage

products in today’s world of Software Defined Storage. In this paper, open source Ceph —

packaged and supported by Red Hat — will be compared with Elastic Cloud Storage (ECS)

from Dell EMC.

Page 5: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 2

Comparing EMC ECS and Red Hat Ceph Storage

About the Platforms

Ceph - Open Source Object Storage from Red Hat

Ceph is an open source storage system providing block, object and file interfaces. It was devised

by Sage Weil for his doctoral thesis as University of California, Santa Cruz. Ceph was designed

as a scalable clustered file system with object-oriented underpinnings akin to the Lustre file

system used for High Performance Computing.

Ceph was released as open source to serve as a reference implementation and research

platform. Open source is an intellectual model where the copyright holder releases software

source code with rights to study, change and distribute to anyone for any purpose. The intent is

that collaborative development from multiple independent sources will generate an

increasingly more diverse scope of design perspective.

After Weil’s graduation in 2007, work continued on Ceph, resulting in the creation of Inktank

Software in 2012 to provide professional services and support. Red Hat Software acquired

Inktank in 2014, combining Ceph development and support with previously acquired

GlusterFS.

There have been ten major releases of open source Ceph since 2012, some officially “stable” and

some less so, most recently Jewel in April 2016. Red Hat has taken this code base and produced

two major supported versions of Red Hat Ceph Storage, last updated in June 2016.

Ceph is built on top of a Reliable Autonomic Distributed Object Store (RADOS) which abstracts

individual object, metadata and monitor services and servers from higher level services.

Individual data items are physically located on object storage servers, each running many

instances of the Ceph object storage daemon (OSD), usually one per disk drive. Metadata

servers implement file system abstractions like inodes and directories. Monitor servers and

services keep the cluster operational by tracking active and failed cluster nodes.

Ceph remains heavily in flux, with over 1,100 open bugs and 845 open feature issues as of this

writing. Ceph was originally intended to have a strong dependence on the underlying Linux

btrfs filesystem but fundamental design and reliability issues have limited production to the

well-proven Linux XFS filesystem. The OSD backend is currently being rewritten as there’s a

perceived mismatch between what Ceph needs and what POSIX file systems provide.

Page 6: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 3

Elastic Cloud Storage - 3rd Generation Object Storage from Dell EMC

The first commercial object storage system was FilePool, developed by a Belgian startup in the

late 1990s for their internal use, and acquired, productized and generalized by EMC as Centera

starting in 2001. Centera pioneered commercial large scale object storage for the largest of

enterprises including such applications as image storage and archiving for regulatory

compliance.

EMC released their successor object storage, Atmos, in 2008. Atmos was designed as low cost

bulk storage - measured in petabytes - system for emerging markets, like Web 2.0 companies or

other industries with lots of user generated content. Atmos pioneered policy-based

management for object storage at EMC, also adding multi-tenancy, a unified namespace, a

single management console and physical distribution. Eventually over 1.5 Exabytes of Atmos

were sold.

EMC introduced Elastic Cloud Storage (ECS) in 2014 as their third-generation object storage

solution, building on knowledge gained from Centera and Atmos. The third major ECS

software release, 3.0, was released in October 2016.

ECS design was strongly influenced by the cloud services that emerged after older object

systems were deployed. EMC’s stated design goal is to deliver “better than AWS S3” services -

more protocols and faster, better geo distribution, better global namespace, lower cost to serve,

support for structured, semi-structured and unstructured data, simple management and

elaborate monitoring.

ECS is designed around a pool of server nodes, each server node providing the entire

functionality and all data being available from all nodes. Data is written across local nodes

while also being asynchronously replicated to remote ones to ensure data protection while

minimizing WAN usage.

Page 7: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 4

Platform Use Comparisons

Bringing It Up - System Planning, Installation and Provisioning

There are two choices when deciding to deploy Ceph - building a distribution from open source

repositories or licensing an existing supported distribution, usually via Red Hat subscription.

Given the complexity and dependencies of Ceph, most commercial deployments tend to license

from Red Hat. Red Hat acknowledges high complexity but views it as an acceptable tradeoff for

the flexibility that deep required knowledge brings.

Ceph can be deployed on bare metal, containers or virtual machines but most deployments are

bare metal. There are configuration rules of thumb available on the Internet but it’s challenging

to find ones that are up to date with current software and hardware. Hardware configurations

must be selected: Red Hat has supported configurations for their subscription Ceph Storage.

ECS is available as software for deployment on supported commodity hardware, a packaged

appliance — hardware and software — and as a hosted solution from Dell EMC’s Virtustream.

As a commercial product with a large dedicated development team, ECS is intended to be a

black box for its users, performing without requiring knowledge of inner workings.

ECS can be deployed as a single or multi-site configuration. In a multi-site replicated

environment, different hardware platforms can be used in each replicated site. For instance,

commodity hardware can be in one site and a replicated site can be an ECS Appliance. Once

hardware is selected and configured, ECS software functions in the same fashion as if it was

running on an ECS appliance. The ECS software is deployed fully containerized running on the

host Linux operating system.

Takeaway

Both Ceph and ECS have considerable flexibility in deployment. Ceph has both open source

and supported versions but only the latter is suggested due to the complexity and often the

obscurity of the necessary knowledge. ECS, designed as a turnkey product, requires little initial

knowledge beyond an understanding of desired workloads and the available facilities to locate

them.

Page 8: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 5

How to Use It - Management and Access Protocols

Ceph is configured and managed primarily from the Linux command line and requires

significant knowledge of the components that comprise Ceph. There is a relatively new open

source project called Calamari which provides a web-based UI for Ceph management. Calamari

lacks deployment functionality and uses a REST API distinct from Ceph itself. Another

community project is Virtual Storage Manager (VSM), backed by Intel, which provides another

alternative UI for managing and monitoring Ceph.

Ceph block storage services are available via RADOS Block Device (RBD). RBD allows users to

thinly provision (e.g. blocks are actually allocated only on use) and mount block storage. Block

device images are stored as objects via RADOS which stripes them across the cluster for

increased performance. RBD is supported as block and image storage by several open source

virtualization platforms, most notably OpenStack - which is the most common current use for

Ceph. It is a high latency service, though, and not appropriate for IOPS heavy workloads.

File storage is provided via CephFS which allows file volumes to be mounted with either a

Linux native or FUSE client. It has recently been declared “stable” but is only recommended for

adventurous early adopters. CephFS currently lacks robust check and repair functionality and

erasure coding, snapshots and multiple metadata servers aren’t yet supported.

Most importantly for this document, user object storage — different from Ceph’s object backend

— is implemented by RADOS Gateway (RGW). Large subsets of the S3 and OpenStack Swift

RESTful APIs are supported by RGW. Freshly added is support for Swift bulk delete and object

expiration but object versioning remains missing. These interfaces also have their own user

management separate from base Ceph. RGW can store data in the same Ceph Storage Cluster

used to store block or file data but S3 and Swift APIs can only share data between each other.

ECS is managed entirely from a Web-based UI that controls storage provisioning, monitoring

and reporting globally across the ECS namespace. Automation, licensing, authentication

features are also available via the ECS UI along with a self-service interface allowing delegation

of storage configuration to end users.

ECS supports object storage via the popular S3 and Swift RESTful APIs as well as EMC’s legacy

Centera and Atmos APIs. Supersets of Swift and S3 APIs are implemented allowing atomic

appends and byte-range updates.

File storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

written via NFS can be transparently accessed as objects via S3 or Swift and vice versa.

Page 9: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 6

Takeaway

Both Ceph and ECS offer a variety of access protocols. Ceph can be useful if the primary

workload is non-transactional block storage for OpenStack with object as a secondary limited

access mode. ECS was designed to focus on unstructured storage use cases supporting a wider

range of object and file protocols with fuller implementations of Swift and S3 APIs. ECS ability

to transparently access data across protocols is notably useful in easing data access across

organizations.

Planning for Trouble - Data Integrity, Replication and Disaster Recovery

Storage systems require methods to protect data and recover in case of drive or system failure.

There are three main methods of data protection:

1. Replication - making extra copies of data, local or remote

2. RAID - fewer copies for lower storage overhead at the cost of reconstruction time

3. Erasure Coding

Erasure coding (EC) is a method of data protection in which data is broken into fragments,

encoded and then stored in a distributed manner. Storage and compute overhead — required to

encode the fragments — is adaptive based on the desired storage and compute footprint.

Ceph originally supported only mirroring for data integrity, but as of version 0.78, erasure

coding was introduced as an option. Storage pools can now be provisioned as replicated or

erasure coded at drastically lower overhead.

At the base system level, Ceph data replication is synchronous, requiring high speed, low

latency WAN links. Layered on top, RBD and RGW support asynchronous replication of their

specific data via separately configured WAN gateways. Backup Ceph sites will eventually see

all master site updates, but delay between master site operations and backup site replay means

that clients of backup sites will sometimes see old data.

Data is erasure coded by ECS for efficiency with the actual erasure coding being deferred to

enhance performance. Incoming data is initially triple replicated across a local ECS cluster

allowing quick acknowledgement. Once safely on disk, data is migrated to erasure coding in the

background. Small files are containerized allowing all data, regardless of size, to be protected

efficiently. Normal access to data doesn’t require decoding an object and can typically be

satisfied by a single offset read from an individual drive.

Page 10: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 7

ECS data replication works in three different ways depending on the number of sites. ECS

supports a strong consistency model and the latest data is always returned.

1. In the case of one geographical location, data is erasure coded into pieces distributed

across the cluster.

2. For two locations, data is distributed across the local cluster, encrypted and

asynchronously replicated to the remote cluster. The receiving site is responsible for

local data protection via erasure coding.

3. For three or more locations, the replicated data is XORed together to greatly increase

storage efficiency.

Takeaway

Ceph offers a complex mix of synchronous and asynchronous replication demanding

considerable system planning expertise and real familiarity with current Ceph practice. Ceph

use of eventual consistency requires that applications account for the possibility of receiving

older data. ECS replication - a combination of local erasure coding with asynchronous

replication - is designed to be plug-and-play with strong consistency. Users specify sites to be

connected and all ECS services simply work across the sum of all sites with tolerance to site

loss.

Keeping It Useful - Support and Upgrades

As with other open source software, Ceph support can come from the greater community for

free or via paid support and professional services from an experienced provider: Red Hat.

Ceph is maintained as a combination of several projects, including the base file system, the

Linux kernel client, management, RBD and RGW. Each has separately tracked project issues,

roadmaps and activities.

The ceph.com web site exists as a project portal, allowing access to documentation, current

source code, mailing list archives and Internet Relay Chat (IRC) for support. Access is also

available to the public issue tracking system, which covers many categories of bugs, fixes and

enhancements.

While use of open source resources is completely without financial cost, there is a significant

expense in time and required expertise to learn the various aspects of the system and of the

community. There is also no guarantee that an issue important to a customer will be considered

as reproducible or even interesting enough for a maintainer to take on.

Page 11: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 8

To lessen this burden, Red Hat sells support subscriptions that include access to tested

combinations of Ceph components, relevant updates and various levels of support. Red Hat’s

highest level support provides one-hour response to critical issues on a 24 x 7 basis with

separate support from the hardware platform provider. Red Hat also sells both training and

consulting services which are highly recommended to ensure customer success.

Dell EMC offers a variety of support options for ECS with the most common option promising

30-minute response time on a 24 x 7 basis. If ECS has been purchased as an appliance, hardware

support is identical with onsite support within 4 hours if necessary. There is also 24 x 7 access to

Dell EMC’s web-based knowledge and customer self-help tools. Dell EMC also provides and

performs installation of new software releases.

Takeaway

Ceph, acquired via supported Red Hat subscription, and ECS both offer enterprise levels of

support. Ceph installations require separate support chains for software and hardware -

compute and networking. Most ECS installations are appliance-based with a single contact for

the entire system, software and all hardware. While an appliance constrains ultimate flexibility,

the rewards are myriad in the customer efforts required to keep their services running.

Page 12: Dell EMC Elastic Cloud Storage and Red Hat Ceph: Contrasts ... · PDF fileFile storage is supported using the Hadoop distributed file system (HDFS) and NFSv3. Files

Edison: Edison Dell EMC Elastic Cloud Storage and Red Hat Ceph 9

Conclusion

Red Hat Ceph Storage and Dell EMC Elastic Cloud Storage present diverse answers to the

question of deciding on an object storage platform.

Ceph is built and sold on the model that flexibility is everything and appliances are an excuse

for vendor lock-in. Ceph is a very complex system, immature and frequently changing. It

requires high levels of competence and specialist knowledge from those who configure and

operate it. Open source is often seen as a panacea, providing high value for zero financial

expenditure. In reality, open source often has greater demands than equivalent commercial

products, requiring considerable direct spending on support, training and professional services.

ECS, by comparison, offers very high functionality packaged as a turnkey scale-out appliance.

While the likes of web companies prefer software only solutions, the enterprise tends to prefer

preconfigured solutions that install and operate with the minimum of administrative resources.

It’s also a complex system, but the complexity is hidden and there is commonly a single point of

support contact.

If an IT organization wishes to move to the new paradigm of object storage, other variables

should be kept to a minimum. Ceph requires the mastery of all of its moving, changing parts

while ECS can be thought of as a black box with defined interfaces. It is strongly suggested that

the latter is preferable for most organizations with other problems to solve as their mission.