an overview of on- premise file and object storage access protocols

An Overview of On-Premise File and Object

Storage Access Protocols

Dean Hildebrand - Research Staff Member, IBM ResearchBill Owen - Senior Engineer, IBM

v1.2

Introduction

File and Object Discussion

NFS

Object Storage Introduction

Swift

S3

File to Object and Object to File

Comparison Discussion

Conclusion

Outline

2

Dean HildebrandResearch Staff Member

IBM Research

Bill OwenSenior Engineer

IBM

3

Attendance Poll

SysAdmin/Storage Architect/

ManagerDevelopers Students Researchers

4

Software Storage Market Growth

5

Accessing Data in On-Premise Storage Systems

6

Local vs SharedStorage

7

Local Storage

Most common for laptops, desktops, mobile devices, server OS boot disks Typically formatted with a file system

E.g., Ext4, XFS, NTFS, HFS+, BtrFS, ZFS Invaluable to manage a single device (or maybe a few with LVM) Varying levels of availability, durability, scalability, etc, supported All limited to a single node

E.g., Cannot support VM or container migration, support 1000s of applications, etc, etc

In your research, think about the real benefits of further optimizing local storage How many pressing problems are left to be solved? Only incremental gains? Common to be used as a building block in higher level storage systems

8

Shared Storage

SSD FastDisk

SlowDisk

Tape

Network

Supports any kind of storage device

Supports any type of network and network/file protocol

Supports any kind of client deviceIndependent Scaling of

clients

Independent Scaling of storage bandwidth and capacity

9

Block Shared Storage

Used to dominate, now mostly shrinking...except

FC continues to have very low

latency, and so finding new life with Flash storage systems

iSCSI still very popular for VMs

SSD FastDisk

SlowDisk

Tape

FibreChannel/Ethernet

iSCSI/FC/FCoE (and others)

Typically in pairs for H/A

10

Parallel and Scale-out File Systems Scalability (all dimensions) Performance (all dimensions) Support general applications and

middleware Make managing billions of files, TB/s

of bandwidth, and PBs of data *easy*

HPC

Commercial

SSD FastDisk

SlowDisk

Tape

Infiniband/Ethernet

Proprietary File Access Protocol

Scale-out as needed

...

11

Distributed Access Protocols

SlowDisk

Ethernet

SSD FastDisk

SlowDisk

Tape

Infiniband/Ethernet

...

Wide variety of solutions

Vast Range of Performance and

Scalability Options

Standard and Non-Standard Protocols

12

Distributed Access Protocols:Portability and Lock-In

Standard APIs help Maximize Application Portability Minimize Vendor Lock-in

Numerous benefits of standard protocols Standard protocol clients ship in most OSs Promote predictability of semantics and application behavior

Minimize changes to applications and system infrastructure when switching to new storage system (many times due to reasons out of your control)

Applications can move between on-premise and off-premise (cloud) systems

Wider and more broad user base makes it easier to find support and also hardens implementations

13

Distributed Access Protocols:Standards Are Not A Silver Bullet

For File, while applications use POSIX, they are sensitive to implementation No common set of commands guarantee crash consistency [***] For distributed file systems, it becomes even more complicated

Different crash consistency semantics, cache consistency semantics, locking semantics, security levels/APIs/tools, etc

For Object, each implementation varies w.r.t. level of eventual consistency, security, versioning, etc

Even what we consider standards are not actually well defined E.g., SMB, S3

Examples: CIFS/SMB does sequential consistency whereas NFS has close-to-open semantics Versioning is quite different between object protocols

[***] - All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14 14

Distributed Access Protocols:One Storage Protocol CANNOT Do It All

There are so many vendors...each claiming they have *solved* data storage or is it world hunger?

Vendors sell what they have, not what you need A storage seller takes what they have a makes it fit for practically any requirement and use case Leads to many unsatisfied customers soon after deployment

Many protocols have existed: DDN WOS, EMC ATMOS, CDMI, AFS, DFS, RFS, 9P, etc

Tips Attend sessions like this to learn more about reality and not hype :) Dig into advertised feature support

How many customers use a feature, will the customer talk about it, in what context do they use it, etc Validate system on-premise using realistic workloads (do you know your workloads?)

Remember there is no guarantee for what you havent tried (x- and y-axis have an upper bound for a reason) Dont buy H/W first and then expect any storage S/W vendor to support it efficiently

15

On Premise Data Access Protocols:NFS and now Swift, S3

NFS and SMB are the clear winners

SMB is being discussed in SNIA tutorial this week, so well focus on

NFS

Note: HDFS also dominant for analytics

Winners

Industry appears to be centralizing around Swift and S3

S3: Amazon + many, many apps/tools

Swift: Open source + API + 3 cloud vendors (or more)

Easily repatriate apps due to cost

File Object

16

Tutorial GoalsSysAdmin/

Storage Architect/Manager

Developers Students Researchers

Understand which protocols are best for which applications

Understand tradeoffs between protocols

Introduction to vendor landscape

Be able to determine which file-based applications are good candidates for using an object protocol

Understand how to choose the best protocol for an application (and consequences of choosing the wrong protocol)

Introduction to NAS and Object history and vendor landscape

Introduction to distributed data access research potential

Understand challenges of on-premise distributed data access

Understand on-premise data center challenges

Introduction to distributed data access research potential

17

Introduction


NFS


Swift

S3



Conclusion

Outline

18

File and ObjectBoth Can Do Anything

Fantasy

19

File and ObjectEach has their strengths and weaknesses

vs

Reality

20

File and ObjectEach has their strengths and weaknesses

vs

Reality

Confusion21

Object vs File Summary Object Store File

Target most workloads (except HPC)

Medium to High performance

Typically Scales to Medium Scalability

Low to High Cost

Limited Capability for Global Distribution

Standard File Data Access

POSIX + Snapshots

Strong(er) Consistency

Target cold data (Backup, Read-only, Archive)

Low to Medium Performance

Typically Scales to Large Capacity

Low cost

Global and Ubiquitous/Mobile Access

Data Access through REST APIs

Immutable Objects and Versioning

Loose/Eventual Consistency

22

Introduction


NFS


Swift

S3



Conclusion

Outline

23

NFS: A Little History... NFSv2 in 1983

Synchronous, stable writes,...outdated Finally removed from Fedora

NFSv3 in 1995 Still default on many distros...

NFSv4 in 2003, updated 2015 Default in RHEL... possibly others

NFSv4.1 and pNFS in 2010 Many structural changes and new features

NFSv4.2 practically complete now Many new features, VM workloads specifically targeted

Now going to try per-feature releases 24

Deployment

The beauty is that it is everywhere (even Windows) Well mostly...more on that later with object

Most NFS servers are in-kernel or proprietary But Ganesha is the first open-source user-level NFS server daemon

For the enterprise, Scale-out NAS now a requirement for capacity and availability

New clients and environments emerging VMware announces support for NFSv4.1 as a client for storing VMDKs Amazon announces support for NFSv4.0 in AWS Elastic File System (EFS) OpenStack Manila is a shared file service with NFS as the initial file protocol Docker has volume plugins that support NFS

25

NFS Caching Semantics

Not POSIX But a single client with exclusive data access should see POSIX semantics

v2 could not cache data Sync writes

v3 can cache data, but... Weak cache consistency Revalidates on open and periodically (30s in Linux) Data must be kept in cache until committed by server (just in case the server fails)

v4 standardizes close to open cache consistency Similar to v3, but guarantees cache is revalidated on OPEN, flushed at CLOSE Also checked periodically and at LOCK/LOCKU Note granularity is typically on 1 second Delegations reduces number of revalidations required...

26

NFSv3 Collection of protocols (file, mount, lock, status)

Each on their own port Stateless (mostly)

Locks add state Server must keep request cache to prevent duplicate non-idempotent RPCs

UNIX-centric, but seen in Windows too 32 bit numeric uids/gids UNIX permissions, but Kerberos also possible Works over UDP, TCP Needs a-priori agreement on character sets

27

NFSv4 New Features

Finally standardized almost everything Custom export tree with pseudo-name space Mandatory use of congestion protocol (TCP) Delegations

Clients become server for a file, coordinating multi-threaded access

Less communication and better caching Also includes callbacks from server to client Linux only implements RO delegations

Uses a universal character set for file names Integrated and well defined locking

Removes need for additional ports and daemons Share reservations for Windows Mandatory locks supported Much easier to support consistency across failures

Security NFSv4 ACLs (much more full featured than POSIX ACLs) Use of named strings instead of 32-bit integers Lofty goals with new GSS-API, but essentially benefit is that

Kerberos is officially supported and easier to configure Kerberos V5 must always supported (but not

necessarily used)

Compound RPCs Dream was to reduce number of messages, but. Due to state operations, and POSIX API, number of

messages actually increases in some cases

Referrals Server can refer clients to other servers for subtree Migration, load balancing

Increased create rate through asynchronous ops Better inter-protocol support

OPEN operation allows coordination with CIFS, etc

28

56% 43%OPEN &CLOSE

NFSv4 Statefulness Implies Talkative

*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16

29

http://doi.acm.org/10.1145/2745844.2745845

NFSv3 NFSv4

Lease-based state

StatelessnessThe state of being immortal

30

But Does Statelessness Really Justify Lack of Innovation?

Are We Frozen In Time?31

*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16 32

http://doi.acm.org/10.1145/2745844.2745845

*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16

Reading Small Files

33

http://doi.acm.org/10.1145/2745844.2745845

NFSv4.1 New Features Introduces a session layer

Exactly once semantics Vastly simplifies locking

Multipathing via Trunking Utilize more paths by using multiple IPs can be identified as same server Retry failed requests over other paths

Retention attributes for compliance Delegations are easier to manage

Recall ANY semantics allow clients to decide what is the best delegations to recall Re-acquisition without re-open

pNFS Scalable data access to scale-out storage systems Improved load balancing

34

What is pNFS? Scalable access to YOUR data

Direct and parallel data access Scale with underlying storage system Better load balancing If NFS can access it, then so can pNFS

Standard file access (part of OS) Open client, no client licensing issues

Layouts Metadata

Clients always issue metadata requests to an NFSv4.1 server Scale-out systems can support multiple metadata servers to the same data

Data File layout part of NFSv4.1 Object and Block variants in separate Internet Drafts

Security and Access Control Control path uses NFSv4.1 security Data path uses security of I/O protocol

GPFS HDFS

Lustre

ZFSPanFS

NetappdCache

pNFS

35

What Coming in NFSv4.2 Sparse File Support

Hole punching to reclaim space Avoid transferring 0s and unallocated space across wire on reads

Space Reservation Ensure application does not run out of space

Server Side Copy Finally stop copying data through client machine

Labeled NFS Allows (partial) SELinux support for Mandatory Access Control (MAC)

Client can inform server of I/O patterns Provide a fadvise-like mechanism over a network

Application Data Holes Allows definition of file format E.g., Initializing a 30G database takes a single over the wire operation instead of 30G of traffic.

Great for managing virtual disk images

36

Other Notable NFS Features

RDMA Support possible for all versions of NFS, but best with NFSv4.1

Federated File System Enables file access and namespace traversal across independent file servers Across organizations or within a single organization Suite of standards including DNS, NSDB, ADMIN, and file-access (NFS)

Extended Attribute (xattr) support on track for first post-NFSv4.2 feature Existing named attributes did not work well with modern OS xattrs New NFS xattrs will interoperate with existing OS xattr support

37

Ganesha: User Space NFS Servero Ganesha History

Developed by Philippe Deniel (CEA)

o Ganesha Features Efficient caching device for data and metadata Scalable, High Performance Per file system namespace (FSAL) modules

Abstraction that allows each file system to perform its own optimizations Also allows for proxy support and other non-traditional back-ends

o User Space makes life easier Security Managers (like Kerberos) reside in User Space

Can be accessed directly via GSSAPI (no need for rpc_pipefs) ID mappers (NIS, LDAP) reside in User Space

Accessed directly (the daemon is linked with libnfsidmap) Less constraints for memory allocation than in kernel space

Managing huge pieces of memory is easy Developing in User Space is soooooooooooo much easier

Hopefully increased community support

Great Open Source Community that

includes IBM, Panasas, DDN, CEA,

Redhat

38

Ganesha FSAL: File System Abstraction Layer Namespace independent API

Translation layer between NFS server and underlying file system

Allows file system to customize how files and directories are managed

Allows for file system value add

Handle based API (lookup, readdir, )

Implements namespaces specific authentication mechanisms

Many FSAL modules exist GPFS, HPSS, Proxy, Lustre, XFS, ZFS, GlusterFS,

etc

39

NFS Security NFSv3 first relied on ONC RPC

AUTH_SYS is trivial to exploit AUTH_DES is trivial to exploit by someone with a

degree in Mathematics AUTH_KERB is better, but it isnt standard

No written specification to refer to Like AUTH_SYS, AUTH_DES, there is no integrity

or privacy protection. All NFS versions now support RPCSEC_GSS NFSv4 added

Mandatory support for Kerberos V5 krb5 (authentication) krb5i (auth+integrity) krb5p (auth+integrity+privacy)

Removed external mount protocol NFSv4 ACLs

Kerberos

40

Quick Basics on ACLs (Authorization) Linux permissions too coarse

Single user to narrow Group too broad

POSIX ACLs are very basic Allows multiple users/groups per file/directory Files/Directories inherit ACLs of their parent directory Use standard userids

NFSv4 ACLs are richer Close to subsuming Windows ACLs is a user/group (at an org) defined by text name 4 types of Access Control Entities (ACEs) for a

ALLOW - Grant access DENY - Deny access AUDIT - Log access to any file or directory ALARM - Generate an alarm on attempt to access any

file or directory Can control inheritance among other things Works well with enterprise directory services

Example 2: Give myuser read permissions to file1:

$ nfs4_setfacl -a "A::[email protected]:R" file1

Example 1: Give myuser read permissions to file1:

$ setfacl -m user:myuser:r file1

POSIX

NFSv4

41

So Do I Just Need an NFS Server and Im in Business?

42

Maybe.

but how important is performance, scalability, availability,

durability, multi-protocol access, backup, disaster recovery, encryption, compression, cost, ease of management, tiering, archiving, etc

to you?43

If so, you need Scale-Out NAS

High Availability Tricky with NFSv4, since state must be migrated Failure requires grace period to recover from state

Capacity and Performance Scaling Much, much more....

Many good options depending on requirements and budget

44

Workloads and Benchmarks

Modern NAS systems can support 100k+ iops, from 1000s of clients So the range of workloads they are currently doing is practically everything

Spec SFS 2008 only represents a very specific metadata heavy workload

45

W2W1

W2W1

NAS Appliance

Physical Machine

NFS/SMBPhysical Machine

Physical Virtual

GPFS, WAFL, ZFS GPFS, WAFL, ZFS

NAS Appliance

Application

NFS /SMB

Current NASBenchmarks

New NASBenchmarks

Physical Machine Physical Machine

Application Application

Virtual Machine

Application

Virtual Machine Virtual Machine

Application Application

Meta-data opsSPECsfs2008: 72%Virtual setup: < 1%

46

VM Workload ChangesWorkload Property Physical NAS clients Virtual NAS clients

File and directory count Many files and directories Few files per VM

Directory tree depth Deep and non-uniform Shallow and uniform

File size Lean towards small files Multi-gigabyte, but sparse

Meta-data operations Many Almost none

I/O synchronization Async and sync All writes are sync

In-file randomness Workload-dependent Increased randomness

Cross-file randomness Workload-dependent Predictable

I/O sizes Workload-dependent Increased and decreased

Read-modify-write Infrequent More frequent

Think time Workload-dependent Increased

47

Workloads Spec SFS 2014 - 4 separate workloads that support any POSIX interface

number of simultaneous builds number of video streams that can be captured number of simultaneous databases number of virtual desktops

So Spec SFS 2014 is a step forward, but still only represents a very very marginal slice of possible workloads

Makes assumptions on architecture and use of features sparse/allocated files, file size, direct I/O, data ingest rates, etc

Client now plays a pivotal role in results NAS systems rarely support a single dedicated workload Locking? Doesnt cover day to day operations such as copying files, find, grep, etc

Wont see big performance difference between NFSv4 and NFSv3 NFSv4 is more than just performance enhancements (pNFS an exception :)

48

Summary Comparison of NFSv4 over NFSv3

Single protocol Coherent locking Security

NFSv4 ACLs Enhanced Kerberos support

Eliminate hotspots with pNFS Ride wave of NFS enhancements Exactly once semantics Asynchronous creates Close-to-open semantics

Benefits Drawbacks

More work for NFS developers

49

Summary In order for NFS to advance, need to move to NFSv4 Lets work together to stop implementing new v3 servers

I do love the 90s though...but not everything is worth keeping Ask your NAS vendor if they have a path from NFSv3 to NFSv4.1

And to NFSv4.2 and beyond

*Can* do most anything, but really good at the following use cases easy access to data within a LAN since laptops and servers have built-in clients

plug-n-play for any file-based application very good performance without installing extra specialized clients

small to moderate amounts of data interoperability with SMB storage for virtualization (and other emerging areas like containers)

NFS continues to struggle with several areas Mobile, WAN, HPC, Cost (for H/A), Scalability, Searching for Files and Data 50

Introduction


NFS


Swift

S3



Conclusion

Outline

51

Significantly Reduced

Complexity

Highly Scalable with

Low Cost

Global, Secure Multi-tenant

Access

Supports Emerging

Workloads

Simplified data scaling through flat namespace

Easy to use REST based data access

Storage automation

User defined metadata and search capabilities

Software defined storage flexibility

Leverage low cost commodity hardware

High density storage

Handles ever increasing storage requirements

Global data access and distribution

Multi-tenant management and data access

Role based authentication

Encryption of data in flight and at rest

Unstructured immutable data store

Social Mobile Analytics

Why Do Clients Need Object Storage?

52

53

Backup and Disaster

RecoveryArchive Content storage

and distribution Cloud Service Provider (CSP)

Private, public, hybrid backup repository

Recover after data loss event (corruption, deletion)

Leverage copy on object storage to recover from a disaster

Active archive Compliant

archive Cold archive Big data

storage / analytics

Content management repository

Global collaboration and distribution

Non-ephemeral data store for cloud compute

Public cloud storage Static web content

Sample Object Storage Use Cases

53

How Do Clients Access Object Storage? Two APIs emerging for on-premise object storage deployments

OpenStack Swift Amazon S3

Many products/public clouds support proprietary APIs Microsoft Azure Google Cloud Storage EMC Atmos DDN WOS

CDMI is an attempt to standardize, but support is fading Concepts are similar across all APIs - we will focus on Swift and S3

54


Some questions well answer:

1. Object APIs are built using RESTful APIs - what does that mean?2. What are the commands Object APIs support?

Are there extensions?3. What does an object command look like?

How do I know if my client request succeeded?4. What is object data? 5. What is eventual consistency?

Is it the same for every kind of object storage?6. How do I make my object store secure and protected?

55

Just Enough REST

REpresentational State Transfer Defined:

Resource Based Stateless Client Server Cacheable Layered System

In Practice:

Simple Interfaces Resources uniquely identified by URIs Relationships are identified by URIs Can access from ANY HTTP client

Note: There is no REST standard. It is an architectural style with plenty of best practices defined. It is typically composed using standards like http, xml, json, etc.

56

Object ResourcesResource Description Swift S3

Your Data! Object Object

Collections of Objects Container Bucket

Collections of Containers/Buckets in an Organization Unit (Department, Company, Site)

Account Service (implicit)

Discoverability - provides listing of configuration Information

Info n/a

Location Information - provides URI to access resource directly from the storage server

Endpoints n/a

Bucket Sub-resources (features provided with middleware)

acl, policy, lifecycle, version, replication, tagging, cors, website, ...

Object Sub-resources n/a acl, torrent 57

Object Namespace - Super Simple

Amazon S3 ServiceSwift Account

ContainerContainer Container Bucket Bucket Bucket

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Object

Swift Account Swift Account

58

Object REST OperationsOperation Description Idempotent? Safe?

GET Return the contents of the resource Y Y

HEAD Return the metadata for a resource Y Y

PUT Create or update the contents of the resource Y X

POST Create, Update or Delete metadata for the resource, or create a subresource

X X

DELETE Remove the resource from the system Y X

COPY Copy an object to another location (Swift only) Y X 59

General Format: command uri[?query-string] headers [data]

Swift URI: http(s)://server:port/api_version/account/container/object

Example: GET https://192.168.56.101:8080/v1/AUTH_acct/Demo-Container/object1 -H "X-Auth-Token: xxxxxx

S3 URI: GET http(s)://server:port/bucket/object

Example GET https://192.168.56.101:8080/s3_test_bucket/object1 -H 'Date: Sat, 06 Feb 2016 19:25:22 +0000'-H 'Authorization: AWS s3key:xxxx'

Example Command Format

60

And some common response codes...Description Code Client Retry? Common Examples

Success 20x No effect 200: Success (GET)201: New resource created (PUT)202: Accepted (POST)204: No content (HEAD)

Client Error 4xx No, will still fail 400: Bad Request (incorrectly formatted request, i.e., non-numeric quota specification)401: Unauthorized (wrong credentials)403: Forbidden (no access to resource)404: Not Found (wrong url)405: Method Not Allowed (PUT to a resource that doesnt support put)

Server Error 5xx Yes, in most cases

500: Internal Server Error (system problem - can be transient)503: Service Not Available (often due to loading - internal timeout)

S3 Details: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.htmlSwift Details: http://developer.openstack.org/api-ref-objectstorage-v1.html

61

Some Simple Example Clients/Libraries

Swift:

curl boto (python library) poster (firefox browser plugin) swiftclient

S3:

curl/s3curl boto poster s3sh s3cmd

62

Note that client caching is not common in object libraries/clients today

Some Example Requests - Firefox PosterPoster: When you want full control - container HEAD request

63

Some Example Requests - Firefox PosterResults of HEAD request

64

Some Example Requests: GET

Get a list of containers in a Swift account using swift command line (hiding all command details):

Using curl:

Note: We will talk about authentication details later 65

Some Example Requests: GET

Using curl and formatting output as json or xml:

66

Some Example Requests: GETGet a list of all objects in an S3 bucket using boto:

Output:

67

Additional Object API FeaturesFeature

Access Control Lists

Quotas

Versioned Objects

Expiring Objects

Automatic Storage Tiering

Storage Policies (placement, durability, etc.)

Upload Multipart Large Objects

Container Synchronization

Notification Infrastructure

Metadata Search 68

Object Storage Metadata

Useful for flexibly organizing data in the flat namespace and enriching data System Metadata on Objects

Creation time etag (md5sum of object contents)

User Metadata on Objects (and Accounts and Containers in Swift) Attribute/Value pair passed as a header in PUT or POST request Objects: new metadata overwrites all previous metadata for that object Accounts & containers (Swift only): new metadata is added to existing metadata

Coming Soon: Metadata Search

69

Object Storage Metadata - An Example

bill_selfie.jpg

System Metadata:Content-Length: 68351Content-Type: image/jpegEtag: 1f32161a3c3baefb9a548a72daffa7abX-Timestamp: 1455144452.21496X-Object-Meta-Mtime: 1455144440.207139

User (Client) Metadata: X-Object-Meta-Brightness: 10.5 X-Object-Meta-Latitude: 117.2303 X-Object-Meta-Longitude: 33.03279 X-Object-Meta-Altitude: 2322.16 X-Object-Meta-Aperture: 2.275 X-Object-Meta-Camera-Model: iphone6.0.1.3

Metadata can be as valuable as the data itself!70

Eventual Consistency - CAP Theorem

CAP Theorem: Pick Any 2 1. Consistency

2. Availability

3. Partition tolerance

Object Storage systems are typically AP Consistency is eventual No standard

Note that POSIX-based distributed file system would require CP...

71

Eventual Consistency: I/O Characteristics

Typically no locking Object operations are atomic

The entire object must be written successfully to be committed Reads will always return a consistent object or no object

Range reads supported - not range writes This is an artifact of HTTP GET/PUT, and derives from consistency model

Last writer (creator) wins: For concurrent creates of the same object, the one with latest timestamp wins

Container/Bucket listings may be updated asynchronously

72

Eventual Consistency Consistency is a characteristic of object store implementation

No standard Different products, different architectures = different consistency models

When writing an object: container listing will not show object until container updates are completed

When deleting an object: object may continue to to appear in container listing until container updates are completed

When replacing an object: reads may return existing version until new version is propagated across entire system [1]

[1] If storage backend is strongly consistent (like a parallel file system), the new or updated object is available to all nodes as soon as the write is committed.

73

Object Storage Architectures

Community Swift: Object PUTs 3x replicated

Majority of writes must succeed for success status

Consistency daemons ensure that failed replicas are eventually written

Reads try each replica sequentially until success

Account & Container listings updated asynchronously 74

Object Storage Architectures

Swift with clustered file system storage Object storage writes single replica The file system is responsible for

data replication Account and Container listings

updated asynchronously Reads always go to single replica

Clustered File System

Object Node Object Node Object Node

75

Object Security

Production Object Storage systems typically interface with a dedicated identity service like OpenStack Keystone

Simpler schemes can be used for proof of concept (Swift tempauth)

Authentication: does the user in a request have a valid password or security token?

Authentication service may integrate with enterprise directory service using LDAP or Microsoft Active Directory

Authorization: does the user in a request have permission to execute that request?

76

Authentication/Authorization Example using OpenStack Swift

A client want to upload an object to a container in project MYACCOUNT:

1. The client sends credentials to Keystone identity service 2. Keystone verifies credentials, creates a new token and returns it to client3. Token contains authorization information:

a. Endpoint catalog (a list of available services)b. Projects the requesting user is assigned toc. Role for that projectd. Token expiration time

4. The client sends the upload request (including the token) to object storage service5. Object storage service verifies the token with Keystone (or with a cached copy of the

token)6. If the client has a valid role for MYACCOUNT, the upload request is implemented

77

Object Security

Secure Data In Flight SSL can be enabled from client to identity service, and to object

storage service Load balancer can also provide SSL termination

Secure Data at Rest Data encryption can be provided by object storage software or by

storage backend (or by client) Not all data needs to be encrypted - enable encryption on a bucket or

container basis Consider maturity of encryption implementation External key manager vs. integral key encoding

78

Object Data Protection

Object Storage data protection typically implemented with 3x replication - local or geo-dispersed Erasure coding - local or geo-dispersed

Either approach can be implemented by Object Storage software or delegated to storage backend

How to protect against user error? Or application bugs? Backups and snapshots still have their place Snapshot and/or backup critical portions of your data Easy to select by container but can also select by metadata values

79

Introduction


NFS


Swift

S3



Conclusion

Outline

BREAK

80

Introduction


NFS


Swift

S3



Conclusion

Outline

81

82

OpenStack Open Source IAAS platform & global collaboration

Mission: Create an ubiquitous open source cloud computing platform that

is simple to implement and massively scalable

Scalable Massive scale Design Goals 1 Million physical machines, 60 Million VMs Billions of objects stored

Controlled by the OpenStack Foundation IBM is proud to be a Platinum Sponsor

Open All code is Apache 2 licensed

Simple Architecture is modular

Composed of multiple projects around the four capabilities Compute Network Storage Shared services

Mar2013

Oct2014859

Contributors8,500 Members

2556Contributors16,000+ Members

Exponential growth in ~1 YR

82

History of OpenStack Swift

Early OpenStack History: http://www.tiki-toki.com/timeline/entry/138134/OpenStack-History/

Date Release Description

Aug 2009 n/a Swift development started by Rackspace

Jul 2010 n/a OpenStack launches with 25 member companies

Oct 2010 Swift 1.1.0 (Austin) First OpenStack release includes Swift & Nova

Jun 2012 Swift 1.6.0 (Essex) Integration with Keystone

Jun 2014 Swift 2.0.0 (Icehouse) Add Swift Storage Policy support

Jan 2016 Swift 2.6.0 (Liberty) Current release

84

History of OpenStack Swift

As of June 2015:Over 300 PB of Swift storage deployed

85

Swift API and SemanticsOpenStack Swift is two parts: API specification & middleware description Object storage implementation

Two choices for object storage implementation: Native Swift

Can be extended, but core is Swift API Emulation

Can never be 100% compatible Especially difficult to emulate middleware

API & Middleware Links: http://developer.openstack.org/api-ref-objectstorage-v1.html http://docs.openstack.org/developer/swift/middleware.html 86

http://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://docs.openstack.org/developer/swift/middleware.htmlhttp://docs.openstack.org/developer/swift/middleware.html

High-Level on OpenStack Swift

Load balancer (e.g., HAProxy) to balance requests

Each request stateless

Proxy Nodes (public face) authorizes and forwards to appropriate storage server(s) using ring.

Storage Nodes (account, container and object) store, serve and manage data and metadata partitioned based upon ring

Object mapping and layout Objects mapped to partitions by hash on fully

qualified object name Partitions mapped to Virtual Devices using

consistent hashing ring

87

Keystone Authentication Service (public face) Authenticates credentials and provides an access token for future requests. Users can be defined locally or in external LDAP or AD system. Also defines user roles for accounts / projects.

Additional Swift Services Maintain eventual consistency in the distributed object storage environment. Account, container & object updaters, replicators, auditors, reaper.

Proxy Server Architecture

88

Process All User Requests Requests & responses pass through wsgi pipeline Community and custom middleware Requests delegated to controller module Controller forwards requests

account, container or object server Responses are received by controller & passed to

the client

Proxy Serverwsgi pipeline

account container

Controllers

object

Storage Server ArchitectureObject Server Reads and writes object files onto storage Pipeline for community or custom middleware Pluggable backend interface

Diskfile controls objects layout on filesystem SwiftOnFile diskfile provides file access to object data

Account and Container Servers Manage listing db for each account and container Pipeline for community or custom middleware Pluggable backend interface

Specified but no community implementations Could allow the use of directory listings instead of

account and container dbs for SwiftonFile layout

Object Server

diskfile

Pluggable Backend

wsgi pipeline

Account Server

Pluggable Backend

wsgi pipeline

Container Server

Pluggable Backend

wsgi pipeline

89

Anatomy of an Object Write: Client Gets a Token

90

1. Client sends token request to Keystone with credentials

2. Keystone authenticates credentials using local or external identity server

3. If credentials are OK, Keystone returns token to client

Example:curl -i \ -H "Content-Type: application/json" \ -d @mycreds.json \ http://localhost:5000/v3/auth/tokens



Keystone

Anatomy of an Object Write: Client Issues PUT request

91

1. Client sends PUT request to proxy-server with token, object URI and object data

2. Client saves the token for use until token expires

Example:curl I -X PUT H "X-Auth-Token $TOKEN \http://util5:8080/v1/acct/container/newobject \-T vacation.mp4



Keystone

1. Proxy-server receives request and each middleware in pipeline looks at and optionally acts on the request

2. authtoken and keystoneauth middleware authenticate and authorize the request (against data in memcached if possible)

Anatomy of an Object Write: Proxy Processes PUT request

92



Keystone

1. Proxy-server adds X-timestamp header to the request with current system time

2. Use ring to determine where each replica of the object is to be placed

object-server IP virtual device partition object uri hash

3. Pass PUT request to designated object-server(s)4. Embedded wsgi server manages reading data a

chunk at a time from client and passing on to object-server

Example uri: http://util5:8080/v1/acct/container/newobject

is placed here:192.167.12.22:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/

Anatomy of an Object Write: Proxy Processes PUT request

93



Keystone

http://util5:8080/v1/acct/container/newobjecthttp://util5:8080/v1/acct/container/newobject

Anatomy of an Object Write: Object Processes PUT request

94

1. Object-server receives PUT request and checks that it satisfies object constraints (valid timestamp, object name length within limits, etc.)

2. create diskfile instance for the new object3. diskfile creates tmp file and begins writing to it4. calculate length & md5sum for the new object as the object is

written5. when object write is complete, write system

metadata to the object as file xattrs6. move data to location specified by ring; filename

.data7. Remove any files older than

Example tmp file location:$mount/object_fileset/o/z1device42/tmp/tmpVkeXj

Example object location:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/1442395677.59514.data Clustered File System


Keystone

Anatomy of an Object Write: Update Container and Return Status

95

1. Send request to container server to add new object to container listing

2. Wait for a short time (2 sec) for container server response.

3. If container update times out, write data into async_pending directoryNote: object-updater is responsible to update container dbs with async_pending entries

4. Return status to proxy server, and on to client

Example async_pending location:$mount/object_fileset/o/z1device42/async_pending/


Object Server Object Server Object Server

Keystone

Container Server Container Server Container Server

Extending Swift - Diskfile Interface

Object server diskfile: on disk abstraction layer

Deployers can implement their own storage interface Specialized classes for Manager, Reader & Writer Example Diskfiles:

Community (default) Swiftonfile: Redhat, IBM Swift-ceph Seagate-kinetics Isilon In-memory

Swiftonfile provides native access to object data through the filesystem interface.

Object Server

diskfile

Pluggable Backend

wsgi pipeline

96

Extending Swift - WSGI Middleware

API? or Implementation? Web Services Gateway Interface:

Python standard PEP 3333 Chain together of modules to process requests Used by all OpenStack services

Middleware: Pluggable modules that can be configured in request pipeline Specified in configuration service configuration file Each middleware module has a chance to process (or change) request coming in And process (or change) response on the way out


account container

Controllers

object

97

Proxy Server Middleware

Proxy Server

WSGI Pipelinemware-1 proxy-servermware-2 mware-n...

ControllersOperations

GET PUT POST HEAD DEL

Client

proxy-server.conf

98

Extending Swift - WSGI Middleware

API? or Implementation?

authentication & authorization: auth_token, keystoneauth multi-part upload: slo, dlo quotas: account-quotas, container-quotas protocol emulation: swift3, s3token bulk operations: expand archive on upload object versioning container sync rate limiting domain remapping static web & temporary url profiling & monitoring your custom middleware

http://docs.openstack.org/developer/swift/middleware.html


account container

Controllers

object

99

Storage Policies

Used by Object Server only Allow you to specify:

Durability levels: 1, 2 or 3x replication Storage backends

cost vs performance tradeoffs storage features - encryption, compression,

Grouping of storage nodes including multi-region

Containers are permanently assigned to policies on creation default or explicit policies can be deprecated - no new containers assigned

100

Geo-Distributed Object ClustersBuilding an Active-Active Multi-Site Storage Cloud

101

Global DistributionIngest and Access from Any

Data Center

Multi-Site Availability Objects Replicated Across 2

or more Sites

FlexibleAsync or Sync Replication

Geo-Distributed Object ClustersArchitecture Details

Disaster Recovery of data center failures - Active-Active storage cloud Binds geo-distributed sites into a extended capacity storage cloud Leverages Swift replication between sites Objects are stored in one or more regions depending on

Required durability - data copies can be 1 to N (typically max of 3) Required number of supported data center failures

Objects accessible from ANY site If object not local, system retrieves it from remote region

Asynchronous or synchronous replication Research on WAN acceleration technologies

Aspera or TransferSoft are examples

102

Region A

Region C

Region B

Data Center 1

Data Center 2

Data Center 3

Swift AuthenticationPluggable Authentication and Authorization

Three common flavors, one choice for production environments

1. Keystone Production ready identity system Models users, roles, projects, domains (v3) & groups (v3) Supports integration with backend LDAP and AD Authtoken (authentication) and keystoneauth (authorization) middleware Authentication through separate Keystone API

2. tempauth, aka version 1 super simple user credentials & project assignment stored in proxy-server.conf

3. swauth user credentials & project assignment stored in Swift

103

Swift AuthenticationRole Based Access Control

Two Swift authorization roles today:

1. operatora. Can create, update and delete containers and objects in projects where role is assignedb. Can assign ACLs to control other users accessc. operator_roles config value (proxy-server.conf) specify Keystone roles

2. reseller_admina. can operate on any accountb. reseller_admin config value (proxy-server.conf) specify Keystones roles

Finer access control with Swift Container ACLs

104

Swift Additional Features

Quotas on Accounts and Containers Must have reseller_admin role to set Account quotas

StaticWeb - serve container contents as static web site Versioning

Current version in current container Older versions in dedicated container Implemented in middleware (as of Swift version 2.4)

Static and Dynamic Large Objects - multi part upload RateLimit - limit operations on Account and Container Object Expiration

105

Some OpenStack Swift Issues

Community software hard to install & manage Performance

Standard Swift daemons scans directory metadata every 30s, decreasing performance of entire system by increasing CPU and disk utilization

No data caching Upcoming erasure coding can hurt performance for small objects Slow to rebuild

Inefficient to scale capacity Swift must re-balance partitions to add additional storage, creating potential for out-of-space

conditions and requiring excessive over provisioning and data movement Lack of enterprise features

Backup/snapshots/encryption No ILM for tiering or to external storage (Tape) RAS, etc

106

Get Involved! Core Swift community:

Weekly meetings on IRC Fix bugs, improve tests, improve docs Single process optimizations Container sharding Improved Versioning Encryption Erasure Codes

swiftonfile: Unified File and object access Bi-weekly meetings on IRC

swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC

107

Introduction


NFS


Swift

S3



Conclusion

Outline

108

History of Amazon S3 Storage & APIDate Description

June 2006 Amazon launches Simple Storage Service

2008 Amazon reports over 29 billion objects hosted by S3

2010 S3 API support for versioning, bucket policies, notifications, multi-part upload

2011 S3 API support for server side encryption, multi-object delete & object expiration

2012 S3 API support for CORS & archiving to glacier

2013 Amazon reports over 2 trillion objects hosted by S3

2014 S3 API support for life cycle versioning policies, sigv4, event notification

2015 S3 API support for cross region replication, infrequent access storage class

Approx S3 Object Count in S3 (billions)

109

Why Use S3 for On-Premise Storage?

Run same apps against on premise and cloud storage Repatriate S3 cloud data & applications to reduce cost Rich API and tool set Swift3 middleware provides emulation layer in Swift environment

But

Some APIs may not apply on premise: i.e, torrents, payments API is controlled by Amazon with no published extension points On-premise implementations will not be 100% compatible

110

S3 Models features explicitly Middleware not required Each resource/subresource is managed explicitly from the REST API (GET, PUT, DELETE)

But, how do you get changes into the API spec? 111

S3 AuthenticationS3 Requests authenticated using credentials:

Access Key ID (AWSAccessKeyID) Secret Access Key (AWSSecretKey)

Two signing algorithms today: AWS Signature V2: Secret Access Key is used to sign request string using AWS Signature V4: Secret Access Key to create signing key (valid for 7 days)

Each S3 request passes authorization header constructed using on of these algorithms

Both are tedious to construct - let your client create the signature for you!

Swift3 middleware today only supports AWS Signature V2.

112

S3 Lifecycle and Bucket PoliciesPolicy Resources to automate and manage of object storage resources

Lifecycle Policies Expire aged objects or object versions

Example: Automatically delete versions older than 90 days

Transition objects to another storage class Example: Move objects from Standard to Glacier after 30 days

Combining Policies: Example: Move from Standard to Standard_IA to Glacier to Expired

Bucket policies Another way to control access to bucket resources

Allow read-only access to anonymous user Require MFA for bucket resources Restrict access to specific client IP addresses

113

S3 Access Control

S3 ACLs manage access to Buckets and Objects Every Bucket and Object has an ACL subresource

if no ACL specified on create a default ACL is used giving owner full control

ACLs consist of Grants, Grantee and Permission up to 100 grants per ACL

Grantee types: User: user id, user email, Group: Authenticated User, All Users, Log Delivery Group

Note that these are the only possible groups

Permissions: Read, Write Read_acp, Write_acp Full_control

Canned ACLs are predefined ACLs to simplify access control definition114

S3 Access Control - PermissionsPermission Granted On a Bucket Granted On a Object

READ Allows grantee to list objects in a bucket Allows grantee to read object data and its metadata

WRITE Allows grantee to create, overwrite, and delete any object in a bucket

Not applicable

READ_ACP Allows grantee to read a bucket ACL Allows grantee to read object ACL

WRITE_ACP Allows grantee to write an ACL for applicable bucket Allows grantee to write ACL for applicable object

FULL_CONTROL Allows grantee READ, WRITE, READ_ACP, and WRITE_ACP permissions on a bucket

Allows grantee READ, READ_ACP, and WRITE_ACP permissions on an object

115

S3 Access Control - Example Default ACL

Owner-Canonical-User-IDowner-display-name

Owner-Canonical-User-ID display-name FULL_CONTROL

Single grant giving owner full control:

116

S3 Access Control - Example ACL

Owner-canonical-user-ID display-name FULL_CONTROL

user1-canonical-user-ID display-name WRITE

http://acs.amazonaws.com/groups/global/AllUsers READ

ACL with 2 user and 1 group grants

117

S3 Access Control - Canned ACLsCanned ACL Applies To Permissions added to ACL

private Bucket & Object Owner gets FULL_CONTROL. No one else has access rights (default).

public-read Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ access.

public-read-write Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. Granting this on a bucket is generally not recommended.

aws-exec-read Bucket & Object Owner gets FULL_CONTROL. Amazon EC2 gets READ access to GET an Amazon Machine Image (AMI) bundle from Amazon S3.

authenticated-read Bucket & Object Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access.

bucket-owner-read Object only** Object owner gets FULL_CONTROL. Bucket owner gets READ access.

bucket-owner-full-control

Object only** Both the object owner and the bucket owner get FULL_CONTROL over the object.

log-delivery-write Bucket only The LogDelivery group gets WRITE and READ_ACP permissions on the bucket.

** If you specify this canned ACL when creating a bucket, Amazon S3 ignores it.118

S3 Access Control - Limitations

Object PUTs reset object ACL to default (unless ACL specified in PUT request) If you give another user WRITE access to a bucket you own, they will be the owner

of any objects they create. You will not have READ access to those objects, and wont be able to see metadata like size You still have WRITE access from Bucket ACL, so you can delete or overwrite them

Caution: When granting WRITE access at the bucket level There is no object level WRITE access With Bucket WRITE access, I can create or delete objects that you created

Caution: Be especially careful giving Bucket WRITE access to groups

119

S3 Object Versioning

Versioning enabled at the Bucket level Objects in these buckets have a current object and 0 or more versions PUT creates a new instance that becomes the current object GET bucket?versions lists all object versions GET bucket?versions&prefix=myobject lists all versions of myobject DELETE inserts a "delete marker" but no objects are removed DELETE myobject?version=1001 permanently deletes object version Undelete by deleting the marker: DELETE myobject?version=9876 GET myobject?version=1001 to retrieve older version

myobjectid=1001

myobjectid=1002

myobjectid=9876

delete marker

120

Validating the APIceph-s3 tests: Open source compatibility tests for S3 clones Approximately 350 tests Swift3 v1.9 passes approx 75% of tests

https://github.com/ceph/s3-tests121

Comparing Swift and S3 FeaturesFeature Swift S3

Access Control Lists Container Container & Object, plus policies

Quotas Account & Container No API support

Versioned Objects Y (limited functionality) Y

Expiring Objects Y Y (with lifecycle policies)

Automatic Storage Tiering Y (based on storage backend) Y (with lifecycle policies)

Storage Policies (placement, durability, etc.) Y No API support

Upload Multipart Large Objects SLO & DLO Y

Container Synchronization Y Y (cross region replication)

Notification Infrastructure Future SNS, SQS,AWS Lambda (cloud only)

Metadata Search Future Future? 122

Swift & S3 SummarySwift S3

100% Open Source with active community that is steadily adding features

Closed source implementation (except Swift3)

Deployers and customers can influence API and features

API controlled by single company

Documented ways that you can extend with middleware and diskfile changes

No documented extension points

Vendor extensions can address many of the management issues listed on earlier Swift slide

No documented extension points

Large and growing support community Limited options to support S3 on premise deployments

123

Swift & S3 SummarySwift S3

API and middleware provide feature set Well defined API, with features explicitly modelled

More complete feature set:ACL and Access Control model, versioning support, notification service

On premise deployment allows repatriating apps & data from the cloud

On premise deployment allows repatriating apps & data from the cloud

Native Swift deployments are 100% compliant.API-only deployments may lack key features, especially middleware.

On premise vendors have different levels of compliance - each says we support core features but what are those?

Improving development ecosystem Rich development ecosystem

124

Get Involved with S3 also!

swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC S3 versioning Lifecycle policies Bucket Policies

ceph/s3-tests Improve test coverage Fix compliance bugs in Swift3

125

Introduction


NFS


Swift

S3



Conclusion

Outline

126

Object Storage Challenge...

The world is not object today!

(and never will be completely)

Multi-Protocol Access to the Same Dataset Can Provide Value(S3/Swift/NFS/SMB/POSIX/HDFS)

127

Using File to Access ObjectsPrimary Use Cases

1. Transition period Use file API as transition to object API

2. Single Management Pane Manage file and object within single system

3. Sharing Data Globally Create data via file interface and share globally using object interface

4. Analysis Many analysis tools are not a good match for object immutability semantics

5. Connecting NAS clients to object storage Home directories, shared storage from Linux clusters, etc

128

File Access to Objects -NAS Gateways and Accessors

...

Swift/S3

Gateway

Accessor

GW and AccessorUse Cases

Good for browsing files Ok for migration into

object store Ok for backup tool

Optional disk cache

Caution Cant control users How are users to know

what works well and what doesnt?

Scalability issues129

File Access to Objects -Gateway and Accessor Vendors

Panzura NAS front-end to cloud Distributed caching and link to off-

premise cloud (solution includes disks)

Avere NAS front-end to cloud

Maldivica NAS gateway

Nasuni NAS front-end to cloud

Riverbed Backup of branch offices

Ctera Consolidation of branch offices

Example NAS Gateway Vendors Storage Made Easy

Sync-and-share Direct integration with Windows explorer,

Mac Finder Only Swift mobile access app

Cloudberry Windows only object access Separate application Supports all clouds Has backup apps as well

Cyberduck/Swift-explorer Separate app for Mac, Windows, Linux

support to Swift, S3, etc Open-source

Expandrive Virtual USB drive that allows dropbox to

most cloud providers

Example File Accessor Vendors

130

File and Objects AccessIntegrated Solutions

Several solutions exist that offer File and Objects in a single solution Object Solutions with Integrated NAS Gateway

Object storage solution that directly integrates a NAS gateway Same advantages and disadvantages as with NAS Gateways This is offered by almost every object storage vendor

Full Integration of File and Object support NAS support is just as good as a native NAS storage solution Object support is just as good as a native object storage solution This can include separate or the same datasets Examples include IBM Spectrum Scale (GPFS) and Red Hat GlusterFS

131

File and Object Access To the Same DataWhat Should It Look Like?

Research challenge: Dream of Full Simultaneous Access How to achieve a unified user namespace? Possible to achieve behavior similar to NFSv4+SMB3?

Should File see file semantics, and Object see object semantics? For workflows, this works quite well

e.g., Ingest through file, read through object e.g., ingest through object, analyze and update, read results through object

Its All Semantics Eventual semantics vs file semantics

Objects are allowed to just disappear...how would File deal with that Buckets/Containers are supposed to scale without limit...but directories typically do not Objects do not respect locks, but how does this fit with file?

Should object protocols wait on a lock? How would Object deal with the delay? How in sync do the namespaces need to be? Across sites, maintaining strong File Semantics is a challenge Separate security, e.g., ACLs, authentication servers, interpreting LDAP/AD users

Do we need a new set of semantics?132

A Way Forward: Swift-On-File

A OpenStack Swift Per-Bucket/Container Storage Policy Stores objects on any cluster/parallel file system Objects created using Object API can be accessed as files and vice-versa

Newly created files immediately accessible via Swift/S3 Newly created objects are immediately available for editing

Challenges it overcomes Harden object visibility semantics to ensure read after write

Object namespace eventually consistent Object data is strongly consistent Common LDAP/AD user database for both file and object

Maintaining both file attributes on new Object PUT Currently working on further integrating ACLs, metadata and xattrs, etc

Leverages File System data protection Part of IBM Spectrum Scale 4.2 and experimental with Redhat GlusterFS

Swift code available at https://github.com/openstack/swiftonfile133

Co-Existence of Traditional and Swift-On-File

ObjectRing 1

ObjectRing 2

Proxy Tier

Traditional Swift Storage policy

Swift on FileStorage policy

Object storage path:

-rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/sdb1/2/node/sdb2/objects/981/f79/d7b843bf79/1401254393.89313.data

File System storage path:

-rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/fs/container/object1

Swift/S3 user

Spec

tru

m S

cale

11

134

File in Object:

http://swift.example.com/v1/acct/cont/obj

Object in File:

/mnt/fs/acct/cont/obj

135

Analytics for File and Object Analytics on File is well established Is Object storage storing Big Data or Dead Data? If data cannot be analyzed, might as well use Tape

Tape is still much cheaper Running directly through Swift/S3 API limits functionality

Hive and HBase (among others) lack efficient support due to file append requirement

Plus many more...

Load Imbalance Due to Inefficient Data

Distribution

Large Data Movement on

Name Changes

HTTP slower than RPC

Multiple Network Hops When Writing Data

Loss of Data Locality 136

1. Use object storage solution HDFS APIsMileage will varyPerformance results specific analytics framework

2. SparkTargeted towards in-memory analyticsLower demands on storage depending on application

3. Analytics Tool + TachyonTachyon creates an in-memory distributed storage system

Not yet for production...Can lower demands on storage solution

4. Use File + Object solutionRealize native file performance

Analytic Possibilities On Object StorageNo Single Solution

137

Introduction


NFS


Swift

S3



Conclusion

Outline

138

Between File and Object...

So are NFSv4, S3, and Swift really all needed?

139

Gross Generalization of Target Workloads

Backup (write mostly) Immutable object storage

Backup => write mostly Distribution/streaming =>read mostly

Archive (write mostly) Rarely accessed data, but when needed, it

must be retrieved quickly

***Note that this is what Object is today, not necessarily where it will be tomorrow

It can do object workloads and much more... User data and home directories Applications with small to medium

performance and scalability requirements Analytics

***Note that NFS (without pNFS) is still not ideal for scientific applications that require high-throughput data access from medium to large compute clusters

FileObject

140

Applications

Converse in whole objects Simple API that doesnt have complicated concepts like

hard links, crash consistency operations, etc Many short-lived TCP connections

Adds latency but increases parallelism Must tolerate eventual consistency

Must be willing to retry Objects could temporarily disappear... But highly available...

Simple hierarchy makes objects hard to find Many vendors disable even listing containers/buckets Many apps keep separate database

Must tolerate low bandwidth/high latency This is today, so could change in future

Converse in bytes, files, inodes, file descriptors Complicated yet now familiar

Single long-lived TCP connection It's a benefit, but 1 TCP conn. not good in WAN

Stronger consistency, but that makes it confusing Must be aware of scaling issues

E.g., too many objects in a single directory Data sharing has shortcomings

Locking typically only advisory and creates delays during failure (due to state)

High performance, but NFS has inherent load imbalances without pNFS

141

Ease of Access

Access data from anywhere on the globe Very thin client with no optimizations Mobile integration

iPhone includes S3 client More and more applications supporting native

object access To ease user transition, several startups have

file-based viewers for Mac/Windows/Linux Storage Made Easy, Cloudberry, Cyberduck, etc

Several S3/Swift mobile apps exist as well Storage Made Easy among many others

Use curl and build your own HTTP request

NFS clients available in all OSs for laptops, desktops, servers

But not to mobile devices Most applications today natively support POSIX

142

Data Protection - What Can Go Wrong...

Coordinated H/W failures

Server Failure

Disk Failure/Corruption

Rack Failure

Data Center Failure

Accidental User Error

Data Transfer Corruption between storage client to storage device

Storage software bugs

143

Data Protection

Object vendors writing SW from scratch Very new Support 3-way replication and erasure coding

Object vendors currently focused on being the backup, not backing up its data

Little attention to backup More focus on DR support

Beware the snake oil salesman Triplication and erasure coding does not prevent data loss

Versioning No ability to capture entire dataset

NAS vendors support a wide variety of storage systems

software based controller based with specialize H/W controller based with commodity H/W

Backup and DR support widely available Snapshots widely available

144

Security

Typically provide multi-tenancy at the level of authentication of users

No client software required Few if any provide data isolation

Encryption becoming more common Each protocol has its own ACL format and

granularity HTTP-based token mechanisms work

nicely for web and global access Privacy through HTTPS

Variety of authentication mechanisms Kerberos now standard, and supports multi-tenancy, but

requires client-side support Typically used in LAN, but can work in WAN Rich ACL format Data transfer encryption supported True multi-tenancy (network and data

isolation) available from some vendors

145

Cost and Features

Current solutions consist sold as SW-only SW+commodity H/W

Currently priced low to what market will bear) OpenStack Swift is *free* (Minus blood, sweat, and tears)

Typically simply storing data at this point Analytics support mostly in name only

Relatively easy to manage Only applies to supported vendor solution Note this correlates with fewer features

Cost can vary widely Roll your own SW-only SW+commodity H/W SW+specialized H/W

Many have tape support Viable analytics support Enterprise vendors support multi-protocol access Block-storage support for VMs

Can support entire OpenStack storage ecosystem VMWare, Hyper-V

146

Each Protocol Has Purpose and Real Value

S3

Swift

NFS

147

Require POSIX?

File

Proprietary Applications In-place updates File append Locking Strong Consistency

Unique to FileS3

Swift

NFS

148

Require Mobile or Cloud?

Object

Smartphone/tablet access Cloud-friendly security Cloud-friendly tools

Unique to ObjectS3

Swift

NFS

149

The Overlap...today

S3

Swift

NFS

Chances are you have applications that fit in the middle as well Today, stark differences exist between vendors, choice relatively easy

Object vendors by and large have lower cost/capacity Targeting backup/archive market

NAS vendors by and large have higher performance and are feature-rich

150

The Overlap...tomorrow

S3

Swift

NFS

Remember that NFS/Swift/S3 are simply protocols to access data Nothing in the Swift or S3 limits performance or future possible features Most enterprise and advanced features are independent of protocol

Object vendors are busy working their way up the application chain Even in-place updates can be mitigated to some degree

Many videos are stored frame by frame, with each one updated in their entirety With small files, updating entire file isnt a big deal

E.g., IoT

With better integration, maybe you wont have to decide :)

151

Metadata Search It is hard to find data in both File and Object

A key issue with Objects flat namespace is finding dataEven File can become difficult with billions of files

Scalable search becoming required to realize value of dataFind needles in unstructured haystacks

Goal is to dynamically index objects/files Create structure of well known system and user attributes Tags and attributes automatically added to database

Useful for both users and administratorsUsers search based upon their tagsAdministrators search based upon system attributes

E.g., account_last_activity_time, container_read_permissions, object_content_type

Rest-based search API IBM has built open-source solution with OpenStack Swift

using RabbitMQ and ElasticSearch

FIND ITTAG IT

General Use Cases Data Mining Data Warehousing Selective data retrieval, data backup,

data archival, data migration Management/Reporting

152

File vs. Object Summary

So it's not cut and dry File is very mature, but can be complicated Object is very immature, but all disruptive technologies are

The real question is how much of the NAS pie will Object eat?

153

Introduction


NFS


Swift

S3



Conclusion

Outline

154

struct CLOSE4args {/* CURRENT_FH: object */; seqid4 seqid; stateid4 open_stateid;};

Whew...that was a lot of info The 5 Ws of File and Object NFS, Swift, S3 Industry File and Object Solutions

There are few easy decisions There are some now, but it's getting harder as object vendors mature

NFS A long history...but lets work together to advance the technology Check out NFSv4.2 and help to make it the new default!

Swift/S3 on-premise are still emerging as standards Object access will become an essential data access mechanism for ALL data

Get Involved! Swift and NFS have active open source communities

155

More InformationNFSv4 IETF working group

https://datatracker.ietf.org/wg/nfsv4NFSv4 RFC

http://www.ietf.org/rfc/rfc3530.txtNFSv4.1 RFC

http://www.ietf.org/rfc/rfc5661.txtNFSv4.2 RFC Draft

https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41Ganesha

http://nfs-ganesha.sourceforge.netSNIA white papers & tutorials on NFS

https://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevance http://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdf http://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdf http://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdf

Original pNFS paper - Exporting Storage Systems in a Scalable Manner with pNFS, MSST05 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdf

NFS XATTR Draft https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02

156

https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4http://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttps://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41http://nfs-ganesha.sourceforge.nethttp://nfs-ganesha.sourceforge.nethttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttps://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02

More InformationNFS FAQ

http://nfs.sourceforge.net/Virtual Machine Workloads: The Case for New Benchmarks for NAS, FAST13

https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfNewer Is Sometimes Better: An Evaluation of NFSv4.1., SIGMETRICS15

https://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14

http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfBoosting the Power of Swift Using Metadata Search

https://www.youtube.com/watch?v=_bODZWvIprYFrom Archive to Insight: Debunking Myths of Analytics on Object Stores

https://www.youtube.com/watch?v=brhEUptD3JQSwift 101: Technology and Architecture for Beginners

https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginners

Building Applications with Swift: The Swift Developer On-Ramp https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-

developer-on-ramp

157

http://nfs.sourceforge.net/http://nfs.sourceforge.net/https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramp

More InformationBuilding web-applications using OpenStack Swift

https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swift

SwiftOnFile Project https://github.com/openstack/swiftonfile

Swift3 Project https://github.com/openstack/swift3

ceph/s3-tests Project https://github.com/ceph/s3-tests

158

https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swift3https://github.com/openstack/swift3https://github.com/ceph/s3-testshttps://github.com/ceph/s3-tests

BACKUP

159

What is Object Storage?

Multi-Site Cloud

Storage

Multi-Tenancy

Simpler management

and flatter namespace

Simple APIs and Semantics

(Swift/S3 and Whole File Updates)

Scalable Metadata Access

Scalable and Highly-AvailableVersioning

Ubiquitous Access

160

Data ProtectionIn The Context of What Can Actually Go Wrong(and not what is only likely to go wrong)

Per-object Auditing common Low coverage Disk Failure/Corruption

Per file or block Auditing is vendor specific Typically high coverage

Erasure coding

Erasure Coding or Triplication Server Failure High-end supports Erasure CodingLow-end has no support

Erasure Coding or Triplication Rack Failure High-end supports Erasure Coding

Erasure Coding or ReplicationScalability can be a concern... Data Center Failure

High-end supports ReplicationAt file or block level

Per Object Versioning S3 supports undelete User Error

Snapshots - Dataset ConsistentBackup

End-to-end checksums vendor specific Data Transfer Corruption End-to-end checksums vendor specificBackup

Typically lack scalable backup Storage Software Bugs Backup

Typically lack scalable backup Coordinated H/W failures Backup

161

File and Object Security ComparisonStandard APIs, both standard and custom implementations

Designed for Global Access

Userid/password or certificate

Many support an enterprise directory service, ldap/ad

Authentication

Standard (Kerberos)

Typically not globally accessible

Userid/password or certificate

Many support an enterprise directory service, ldap/ad

ACLs (of varying granularity) Authorization NFSv4 and Posix ACLs

HTTPS Data privacy Kerberos ipsec

Typically software-based separation

Shared servers and storage for everyone

Multi-Tenancy

Typically software-based separation

High-end vendors can provide physical separation as well

162

S3 Authentication Signing V2 (Backup)

Access Key ID (AWSAccessKeyID)Secret Access Key (AWSSecretKey)

signature = Base64( HMAC-SHA1( AWSSecretKey, UTF-8-Encoding-Of( StringToSign )))

StringToSign = HTTP-Verb + "\n" + Content-MD5 + "\n" + Content-Type + "\n" + Date + "\n" + CanonicalizedAmzHeaders + CanonicalizedResource

-H Authorization: AWS awsaccesskeyid:signature

Authorization Header

http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html163

S3 Authentication Signature V4 (backup)Access Key ID (AWSAccessKeyID)

Secret Access Key (AWSSecretKey)

-H Authorization: AWS4-HMAC-SHA256 Credential=awsaccesskeyid/20160220/us-east-1/s3/aws4_request, SignedHeaders=host;range;x-amz-date, Signature=signature

http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html

Authorization Header

164

an overview of on- premise file and object storage access protocols

Documents