an overview of on- premise file and object storage access protocols
TRANSCRIPT
-
An Overview of On-Premise File and Object
Storage Access Protocols
Dean Hildebrand - Research Staff Member, IBM ResearchBill Owen - Senior Engineer, IBM
v1.2
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
2
-
Dean HildebrandResearch Staff Member
IBM Research
Bill OwenSenior Engineer
IBM
3
-
Attendance Poll
SysAdmin/Storage Architect/
ManagerDevelopers Students Researchers
4
-
Software Storage Market Growth
5
-
Accessing Data in On-Premise Storage Systems
6
-
Local vs SharedStorage
7
-
Local Storage
Most common for laptops, desktops, mobile devices, server OS boot disks Typically formatted with a file system
E.g., Ext4, XFS, NTFS, HFS+, BtrFS, ZFS Invaluable to manage a single device (or maybe a few with LVM) Varying levels of availability, durability, scalability, etc, supported All limited to a single node
E.g., Cannot support VM or container migration, support 1000s of applications, etc, etc
In your research, think about the real benefits of further optimizing local storage How many pressing problems are left to be solved? Only incremental gains? Common to be used as a building block in higher level storage systems
8
-
Shared Storage
SSD FastDisk
SlowDisk
Tape
Network
Supports any kind of storage device
Supports any type of network and network/file protocol
Supports any kind of client deviceIndependent Scaling of
clients
Independent Scaling of storage bandwidth and capacity
9
-
Block Shared Storage
Used to dominate, now mostly shrinking...except
FC continues to have very low
latency, and so finding new life with Flash storage systems
iSCSI still very popular for VMs
SSD FastDisk
SlowDisk
Tape
FibreChannel/Ethernet
iSCSI/FC/FCoE (and others)
Typically in pairs for H/A
10
-
Parallel and Scale-out File Systems Scalability (all dimensions) Performance (all dimensions) Support general applications and
middleware Make managing billions of files, TB/s
of bandwidth, and PBs of data *easy*
HPC
Commercial
SSD FastDisk
SlowDisk
Tape
Infiniband/Ethernet
Proprietary File Access Protocol
Scale-out as needed
...
11
-
Distributed Access Protocols
SlowDisk
Ethernet
SSD FastDisk
SlowDisk
Tape
Infiniband/Ethernet
...
Wide variety of solutions
Vast Range of Performance and
Scalability Options
Standard and Non-Standard Protocols
12
-
Distributed Access Protocols:Portability and Lock-In
Standard APIs help Maximize Application Portability Minimize Vendor Lock-in
Numerous benefits of standard protocols Standard protocol clients ship in most OSs Promote predictability of semantics and application behavior
Minimize changes to applications and system infrastructure when switching to new storage system (many times due to reasons out of your control)
Applications can move between on-premise and off-premise (cloud) systems
Wider and more broad user base makes it easier to find support and also hardens implementations
13
-
Distributed Access Protocols:Standards Are Not A Silver Bullet
For File, while applications use POSIX, they are sensitive to implementation No common set of commands guarantee crash consistency [***] For distributed file systems, it becomes even more complicated
Different crash consistency semantics, cache consistency semantics, locking semantics, security levels/APIs/tools, etc
For Object, each implementation varies w.r.t. level of eventual consistency, security, versioning, etc
Even what we consider standards are not actually well defined E.g., SMB, S3
Examples: CIFS/SMB does sequential consistency whereas NFS has close-to-open semantics Versioning is quite different between object protocols
[***] - All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14 14
-
Distributed Access Protocols:One Storage Protocol CANNOT Do It All
There are so many vendors...each claiming they have *solved* data storage or is it world hunger?
Vendors sell what they have, not what you need A storage seller takes what they have a makes it fit for practically any requirement and use case Leads to many unsatisfied customers soon after deployment
Many protocols have existed: DDN WOS, EMC ATMOS, CDMI, AFS, DFS, RFS, 9P, etc
Tips Attend sessions like this to learn more about reality and not hype :) Dig into advertised feature support
How many customers use a feature, will the customer talk about it, in what context do they use it, etc Validate system on-premise using realistic workloads (do you know your workloads?)
Remember there is no guarantee for what you havent tried (x- and y-axis have an upper bound for a reason) Dont buy H/W first and then expect any storage S/W vendor to support it efficiently
15
-
On Premise Data Access Protocols:NFS and now Swift, S3
NFS and SMB are the clear winners
SMB is being discussed in SNIA tutorial this week, so well focus on
NFS
Note: HDFS also dominant for analytics
Winners
Industry appears to be centralizing around Swift and S3
S3: Amazon + many, many apps/tools
Swift: Open source + API + 3 cloud vendors (or more)
Easily repatriate apps due to cost
File Object
16
-
Tutorial GoalsSysAdmin/
Storage Architect/Manager
Developers Students Researchers
Understand which protocols are best for which applications
Understand tradeoffs between protocols
Introduction to vendor landscape
Be able to determine which file-based applications are good candidates for using an object protocol
Understand how to choose the best protocol for an application (and consequences of choosing the wrong protocol)
Introduction to NAS and Object history and vendor landscape
Introduction to distributed data access research potential
Understand challenges of on-premise distributed data access
Understand on-premise data center challenges
Introduction to distributed data access research potential
17
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
18
-
File and ObjectBoth Can Do Anything
Fantasy
19
-
File and ObjectEach has their strengths and weaknesses
vs
Reality
20
-
File and ObjectEach has their strengths and weaknesses
vs
Reality
Confusion21
-
Object vs File Summary Object Store File
Target most workloads (except HPC)
Medium to High performance
Typically Scales to Medium Scalability
Low to High Cost
Limited Capability for Global Distribution
Standard File Data Access
POSIX + Snapshots
Strong(er) Consistency
Target cold data (Backup, Read-only, Archive)
Low to Medium Performance
Typically Scales to Large Capacity
Low cost
Global and Ubiquitous/Mobile Access
Data Access through REST APIs
Immutable Objects and Versioning
Loose/Eventual Consistency
22
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
23
-
NFS: A Little History... NFSv2 in 1983
Synchronous, stable writes,...outdated Finally removed from Fedora
NFSv3 in 1995 Still default on many distros...
NFSv4 in 2003, updated 2015 Default in RHEL... possibly others
NFSv4.1 and pNFS in 2010 Many structural changes and new features
NFSv4.2 practically complete now Many new features, VM workloads specifically targeted
Now going to try per-feature releases 24
-
Deployment
The beauty is that it is everywhere (even Windows) Well mostly...more on that later with object
Most NFS servers are in-kernel or proprietary But Ganesha is the first open-source user-level NFS server daemon
For the enterprise, Scale-out NAS now a requirement for capacity and availability
New clients and environments emerging VMware announces support for NFSv4.1 as a client for storing VMDKs Amazon announces support for NFSv4.0 in AWS Elastic File System (EFS) OpenStack Manila is a shared file service with NFS as the initial file protocol Docker has volume plugins that support NFS
25
-
NFS Caching Semantics
Not POSIX But a single client with exclusive data access should see POSIX semantics
v2 could not cache data Sync writes
v3 can cache data, but... Weak cache consistency Revalidates on open and periodically (30s in Linux) Data must be kept in cache until committed by server (just in case the server fails)
v4 standardizes close to open cache consistency Similar to v3, but guarantees cache is revalidated on OPEN, flushed at CLOSE Also checked periodically and at LOCK/LOCKU Note granularity is typically on 1 second Delegations reduces number of revalidations required...
26
-
NFSv3 Collection of protocols (file, mount, lock, status)
Each on their own port Stateless (mostly)
Locks add state Server must keep request cache to prevent duplicate non-idempotent RPCs
UNIX-centric, but seen in Windows too 32 bit numeric uids/gids UNIX permissions, but Kerberos also possible Works over UDP, TCP Needs a-priori agreement on character sets
27
-
NFSv4 New Features
Finally standardized almost everything Custom export tree with pseudo-name space Mandatory use of congestion protocol (TCP) Delegations
Clients become server for a file, coordinating multi-threaded access
Less communication and better caching Also includes callbacks from server to client Linux only implements RO delegations
Uses a universal character set for file names Integrated and well defined locking
Removes need for additional ports and daemons Share reservations for Windows Mandatory locks supported Much easier to support consistency across failures
Security NFSv4 ACLs (much more full featured than POSIX ACLs) Use of named strings instead of 32-bit integers Lofty goals with new GSS-API, but essentially benefit is that
Kerberos is officially supported and easier to configure Kerberos V5 must always supported (but not
necessarily used)
Compound RPCs Dream was to reduce number of messages, but. Due to state operations, and POSIX API, number of
messages actually increases in some cases
Referrals Server can refer clients to other servers for subtree Migration, load balancing
Increased create rate through asynchronous ops Better inter-protocol support
OPEN operation allows coordination with CIFS, etc
28
-
56% 43%OPEN &CLOSE
NFSv4 Statefulness Implies Talkative
*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16
29
http://doi.acm.org/10.1145/2745844.2745845
-
NFSv3 NFSv4
Lease-based state
StatelessnessThe state of being immortal
30
-
But Does Statelessness Really Justify Lack of Innovation?
Are We Frozen In Time?31
-
*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16 32
http://doi.acm.org/10.1145/2745844.2745845
-
*From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16
Reading Small Files
33
http://doi.acm.org/10.1145/2745844.2745845
-
NFSv4.1 New Features Introduces a session layer
Exactly once semantics Vastly simplifies locking
Multipathing via Trunking Utilize more paths by using multiple IPs can be identified as same server Retry failed requests over other paths
Retention attributes for compliance Delegations are easier to manage
Recall ANY semantics allow clients to decide what is the best delegations to recall Re-acquisition without re-open
pNFS Scalable data access to scale-out storage systems Improved load balancing
34
-
What is pNFS? Scalable access to YOUR data
Direct and parallel data access Scale with underlying storage system Better load balancing If NFS can access it, then so can pNFS
Standard file access (part of OS) Open client, no client licensing issues
Layouts Metadata
Clients always issue metadata requests to an NFSv4.1 server Scale-out systems can support multiple metadata servers to the same data
Data File layout part of NFSv4.1 Object and Block variants in separate Internet Drafts
Security and Access Control Control path uses NFSv4.1 security Data path uses security of I/O protocol
GPFS HDFS
Lustre
ZFSPanFS
NetappdCache
pNFS
35
-
What Coming in NFSv4.2 Sparse File Support
Hole punching to reclaim space Avoid transferring 0s and unallocated space across wire on reads
Space Reservation Ensure application does not run out of space
Server Side Copy Finally stop copying data through client machine
Labeled NFS Allows (partial) SELinux support for Mandatory Access Control (MAC)
Client can inform server of I/O patterns Provide a fadvise-like mechanism over a network
Application Data Holes Allows definition of file format E.g., Initializing a 30G database takes a single over the wire operation instead of 30G of traffic.
Great for managing virtual disk images
36
-
Other Notable NFS Features
RDMA Support possible for all versions of NFS, but best with NFSv4.1
Federated File System Enables file access and namespace traversal across independent file servers Across organizations or within a single organization Suite of standards including DNS, NSDB, ADMIN, and file-access (NFS)
Extended Attribute (xattr) support on track for first post-NFSv4.2 feature Existing named attributes did not work well with modern OS xattrs New NFS xattrs will interoperate with existing OS xattr support
37
-
Ganesha: User Space NFS Servero Ganesha History
Developed by Philippe Deniel (CEA)
o Ganesha Features Efficient caching device for data and metadata Scalable, High Performance Per file system namespace (FSAL) modules
Abstraction that allows each file system to perform its own optimizations Also allows for proxy support and other non-traditional back-ends
o User Space makes life easier Security Managers (like Kerberos) reside in User Space
Can be accessed directly via GSSAPI (no need for rpc_pipefs) ID mappers (NIS, LDAP) reside in User Space
Accessed directly (the daemon is linked with libnfsidmap) Less constraints for memory allocation than in kernel space
Managing huge pieces of memory is easy Developing in User Space is soooooooooooo much easier
Hopefully increased community support
Great Open Source Community that
includes IBM, Panasas, DDN, CEA,
Redhat
38
-
Ganesha FSAL: File System Abstraction Layer Namespace independent API
Translation layer between NFS server and underlying file system
Allows file system to customize how files and directories are managed
Allows for file system value add
Handle based API (lookup, readdir, )
Implements namespaces specific authentication mechanisms
Many FSAL modules exist GPFS, HPSS, Proxy, Lustre, XFS, ZFS, GlusterFS,
etc
39
-
NFS Security NFSv3 first relied on ONC RPC
AUTH_SYS is trivial to exploit AUTH_DES is trivial to exploit by someone with a
degree in Mathematics AUTH_KERB is better, but it isnt standard
No written specification to refer to Like AUTH_SYS, AUTH_DES, there is no integrity
or privacy protection. All NFS versions now support RPCSEC_GSS NFSv4 added
Mandatory support for Kerberos V5 krb5 (authentication) krb5i (auth+integrity) krb5p (auth+integrity+privacy)
Removed external mount protocol NFSv4 ACLs
Kerberos
40
-
Quick Basics on ACLs (Authorization) Linux permissions too coarse
Single user to narrow Group too broad
POSIX ACLs are very basic Allows multiple users/groups per file/directory Files/Directories inherit ACLs of their parent directory Use standard userids
NFSv4 ACLs are richer Close to subsuming Windows ACLs is a user/group (at an org) defined by text name 4 types of Access Control Entities (ACEs) for a
ALLOW - Grant access DENY - Deny access AUDIT - Log access to any file or directory ALARM - Generate an alarm on attempt to access any
file or directory Can control inheritance among other things Works well with enterprise directory services
Example 2: Give myuser read permissions to file1:
$ nfs4_setfacl -a "A::[email protected]:R" file1
Example 1: Give myuser read permissions to file1:
$ setfacl -m user:myuser:r file1
POSIX
NFSv4
41
-
So Do I Just Need an NFS Server and Im in Business?
42
-
Maybe.
but how important is performance, scalability, availability,
durability, multi-protocol access, backup, disaster recovery, encryption, compression, cost, ease of management, tiering, archiving, etc
to you?43
-
If so, you need Scale-Out NAS
High Availability Tricky with NFSv4, since state must be migrated Failure requires grace period to recover from state
Capacity and Performance Scaling Much, much more....
Many good options depending on requirements and budget
44
-
Workloads and Benchmarks
Modern NAS systems can support 100k+ iops, from 1000s of clients So the range of workloads they are currently doing is practically everything
Spec SFS 2008 only represents a very specific metadata heavy workload
45
-
W2W1
W2W1
NAS Appliance
Physical Machine
NFS/SMBPhysical Machine
Physical Virtual
GPFS, WAFL, ZFS GPFS, WAFL, ZFS
NAS Appliance
Application
NFS /SMB
Current NASBenchmarks
New NASBenchmarks
Physical Machine Physical Machine
Application Application
Virtual Machine
Application
Virtual Machine Virtual Machine
Application Application
Meta-data opsSPECsfs2008: 72%Virtual setup: < 1%
46
-
VM Workload ChangesWorkload Property Physical NAS clients Virtual NAS clients
File and directory count Many files and directories Few files per VM
Directory tree depth Deep and non-uniform Shallow and uniform
File size Lean towards small files Multi-gigabyte, but sparse
Meta-data operations Many Almost none
I/O synchronization Async and sync All writes are sync
In-file randomness Workload-dependent Increased randomness
Cross-file randomness Workload-dependent Predictable
I/O sizes Workload-dependent Increased and decreased
Read-modify-write Infrequent More frequent
Think time Workload-dependent Increased
47
-
Workloads Spec SFS 2014 - 4 separate workloads that support any POSIX interface
number of simultaneous builds number of video streams that can be captured number of simultaneous databases number of virtual desktops
So Spec SFS 2014 is a step forward, but still only represents a very very marginal slice of possible workloads
Makes assumptions on architecture and use of features sparse/allocated files, file size, direct I/O, data ingest rates, etc
Client now plays a pivotal role in results NAS systems rarely support a single dedicated workload Locking? Doesnt cover day to day operations such as copying files, find, grep, etc
Wont see big performance difference between NFSv4 and NFSv3 NFSv4 is more than just performance enhancements (pNFS an exception :)
48
-
Summary Comparison of NFSv4 over NFSv3
Single protocol Coherent locking Security
NFSv4 ACLs Enhanced Kerberos support
Eliminate hotspots with pNFS Ride wave of NFS enhancements Exactly once semantics Asynchronous creates Close-to-open semantics
Benefits Drawbacks
More work for NFS developers
49
-
Summary In order for NFS to advance, need to move to NFSv4 Lets work together to stop implementing new v3 servers
I do love the 90s though...but not everything is worth keeping Ask your NAS vendor if they have a path from NFSv3 to NFSv4.1
And to NFSv4.2 and beyond
*Can* do most anything, but really good at the following use cases easy access to data within a LAN since laptops and servers have built-in clients
plug-n-play for any file-based application very good performance without installing extra specialized clients
small to moderate amounts of data interoperability with SMB storage for virtualization (and other emerging areas like containers)
NFS continues to struggle with several areas Mobile, WAN, HPC, Cost (for H/A), Scalability, Searching for Files and Data 50
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
51
-
Significantly Reduced
Complexity
Highly Scalable with
Low Cost
Global, Secure Multi-tenant
Access
Supports Emerging
Workloads
Simplified data scaling through flat namespace
Easy to use REST based data access
Storage automation
User defined metadata and search capabilities
Software defined storage flexibility
Leverage low cost commodity hardware
High density storage
Handles ever increasing storage requirements
Global data access and distribution
Multi-tenant management and data access
Role based authentication
Encryption of data in flight and at rest
Unstructured immutable data store
Social Mobile Analytics
Why Do Clients Need Object Storage?
52
-
53
Backup and Disaster
RecoveryArchive Content storage
and distribution Cloud Service Provider (CSP)
Private, public, hybrid backup repository
Recover after data loss event (corruption, deletion)
Leverage copy on object storage to recover from a disaster
Active archive Compliant
archive Cold archive Big data
storage / analytics
Content management repository
Global collaboration and distribution
Non-ephemeral data store for cloud compute
Public cloud storage Static web content
Sample Object Storage Use Cases
53
-
How Do Clients Access Object Storage? Two APIs emerging for on-premise object storage deployments
OpenStack Swift Amazon S3
Many products/public clouds support proprietary APIs Microsoft Azure Google Cloud Storage EMC Atmos DDN WOS
CDMI is an attempt to standardize, but support is fading Concepts are similar across all APIs - we will focus on Swift and S3
54
-
Object Storage Introduction
Some questions well answer:
1. Object APIs are built using RESTful APIs - what does that mean?2. What are the commands Object APIs support?
Are there extensions?3. What does an object command look like?
How do I know if my client request succeeded?4. What is object data? 5. What is eventual consistency?
Is it the same for every kind of object storage?6. How do I make my object store secure and protected?
55
-
Just Enough REST
REpresentational State Transfer Defined:
Resource Based Stateless Client Server Cacheable Layered System
In Practice:
Simple Interfaces Resources uniquely identified by URIs Relationships are identified by URIs Can access from ANY HTTP client
Note: There is no REST standard. It is an architectural style with plenty of best practices defined. It is typically composed using standards like http, xml, json, etc.
56
-
Object ResourcesResource Description Swift S3
Your Data! Object Object
Collections of Objects Container Bucket
Collections of Containers/Buckets in an Organization Unit (Department, Company, Site)
Account Service (implicit)
Discoverability - provides listing of configuration Information
Info n/a
Location Information - provides URI to access resource directly from the storage server
Endpoints n/a
Bucket Sub-resources (features provided with middleware)
acl, policy, lifecycle, version, replication, tagging, cors, website, ...
Object Sub-resources n/a acl, torrent 57
-
Object Namespace - Super Simple
Amazon S3 ServiceSwift Account
ContainerContainer Container Bucket Bucket Bucket
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Object
Swift Account Swift Account
58
-
Object REST OperationsOperation Description Idempotent? Safe?
GET Return the contents of the resource Y Y
HEAD Return the metadata for a resource Y Y
PUT Create or update the contents of the resource Y X
POST Create, Update or Delete metadata for the resource, or create a subresource
X X
DELETE Remove the resource from the system Y X
COPY Copy an object to another location (Swift only) Y X 59
-
General Format: command uri[?query-string] headers [data]
Swift URI: http(s)://server:port/api_version/account/container/object
Example: GET https://192.168.56.101:8080/v1/AUTH_acct/Demo-Container/object1 -H "X-Auth-Token: xxxxxx
S3 URI: GET http(s)://server:port/bucket/object
Example GET https://192.168.56.101:8080/s3_test_bucket/object1 -H 'Date: Sat, 06 Feb 2016 19:25:22 +0000'-H 'Authorization: AWS s3key:xxxx'
Example Command Format
60
-
And some common response codes...Description Code Client Retry? Common Examples
Success 20x No effect 200: Success (GET)201: New resource created (PUT)202: Accepted (POST)204: No content (HEAD)
Client Error 4xx No, will still fail 400: Bad Request (incorrectly formatted request, i.e., non-numeric quota specification)401: Unauthorized (wrong credentials)403: Forbidden (no access to resource)404: Not Found (wrong url)405: Method Not Allowed (PUT to a resource that doesnt support put)
Server Error 5xx Yes, in most cases
500: Internal Server Error (system problem - can be transient)503: Service Not Available (often due to loading - internal timeout)
S3 Details: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.htmlSwift Details: http://developer.openstack.org/api-ref-objectstorage-v1.html
61
-
Some Simple Example Clients/Libraries
Swift:
curl boto (python library) poster (firefox browser plugin) swiftclient
S3:
curl/s3curl boto poster s3sh s3cmd
62
Note that client caching is not common in object libraries/clients today
-
Some Example Requests - Firefox PosterPoster: When you want full control - container HEAD request
63
-
Some Example Requests - Firefox PosterResults of HEAD request
64
-
Some Example Requests: GET
Get a list of containers in a Swift account using swift command line (hiding all command details):
Using curl:
Note: We will talk about authentication details later 65
-
Some Example Requests: GET
Using curl and formatting output as json or xml:
66
-
Some Example Requests: GETGet a list of all objects in an S3 bucket using boto:
Output:
67
-
Additional Object API FeaturesFeature
Access Control Lists
Quotas
Versioned Objects
Expiring Objects
Automatic Storage Tiering
Storage Policies (placement, durability, etc.)
Upload Multipart Large Objects
Container Synchronization
Notification Infrastructure
Metadata Search 68
-
Object Storage Metadata
Useful for flexibly organizing data in the flat namespace and enriching data System Metadata on Objects
Creation time etag (md5sum of object contents)
User Metadata on Objects (and Accounts and Containers in Swift) Attribute/Value pair passed as a header in PUT or POST request Objects: new metadata overwrites all previous metadata for that object Accounts & containers (Swift only): new metadata is added to existing metadata
Coming Soon: Metadata Search
69
-
Object Storage Metadata - An Example
bill_selfie.jpg
System Metadata:Content-Length: 68351Content-Type: image/jpegEtag: 1f32161a3c3baefb9a548a72daffa7abX-Timestamp: 1455144452.21496X-Object-Meta-Mtime: 1455144440.207139
User (Client) Metadata: X-Object-Meta-Brightness: 10.5 X-Object-Meta-Latitude: 117.2303 X-Object-Meta-Longitude: 33.03279 X-Object-Meta-Altitude: 2322.16 X-Object-Meta-Aperture: 2.275 X-Object-Meta-Camera-Model: iphone6.0.1.3
Metadata can be as valuable as the data itself!70
-
Eventual Consistency - CAP Theorem
CAP Theorem: Pick Any 2 1. Consistency
2. Availability
3. Partition tolerance
Object Storage systems are typically AP Consistency is eventual No standard
Note that POSIX-based distributed file system would require CP...
71
-
Eventual Consistency: I/O Characteristics
Typically no locking Object operations are atomic
The entire object must be written successfully to be committed Reads will always return a consistent object or no object
Range reads supported - not range writes This is an artifact of HTTP GET/PUT, and derives from consistency model
Last writer (creator) wins: For concurrent creates of the same object, the one with latest timestamp wins
Container/Bucket listings may be updated asynchronously
72
-
Eventual Consistency Consistency is a characteristic of object store implementation
No standard Different products, different architectures = different consistency models
When writing an object: container listing will not show object until container updates are completed
When deleting an object: object may continue to to appear in container listing until container updates are completed
When replacing an object: reads may return existing version until new version is propagated across entire system [1]
[1] If storage backend is strongly consistent (like a parallel file system), the new or updated object is available to all nodes as soon as the write is committed.
73
-
Object Storage Architectures
Community Swift: Object PUTs 3x replicated
Majority of writes must succeed for success status
Consistency daemons ensure that failed replicas are eventually written
Reads try each replica sequentially until success
Account & Container listings updated asynchronously 74
-
Object Storage Architectures
Swift with clustered file system storage Object storage writes single replica The file system is responsible for
data replication Account and Container listings
updated asynchronously Reads always go to single replica
Clustered File System
Object Node Object Node Object Node
75
-
Object Security
Production Object Storage systems typically interface with a dedicated identity service like OpenStack Keystone
Simpler schemes can be used for proof of concept (Swift tempauth)
Authentication: does the user in a request have a valid password or security token?
Authentication service may integrate with enterprise directory service using LDAP or Microsoft Active Directory
Authorization: does the user in a request have permission to execute that request?
76
-
Authentication/Authorization Example using OpenStack Swift
A client want to upload an object to a container in project MYACCOUNT:
1. The client sends credentials to Keystone identity service 2. Keystone verifies credentials, creates a new token and returns it to client3. Token contains authorization information:
a. Endpoint catalog (a list of available services)b. Projects the requesting user is assigned toc. Role for that projectd. Token expiration time
4. The client sends the upload request (including the token) to object storage service5. Object storage service verifies the token with Keystone (or with a cached copy of the
token)6. If the client has a valid role for MYACCOUNT, the upload request is implemented
77
-
Object Security
Secure Data In Flight SSL can be enabled from client to identity service, and to object
storage service Load balancer can also provide SSL termination
Secure Data at Rest Data encryption can be provided by object storage software or by
storage backend (or by client) Not all data needs to be encrypted - enable encryption on a bucket or
container basis Consider maturity of encryption implementation External key manager vs. integral key encoding
78
-
Object Data Protection
Object Storage data protection typically implemented with 3x replication - local or geo-dispersed Erasure coding - local or geo-dispersed
Either approach can be implemented by Object Storage software or delegated to storage backend
How to protect against user error? Or application bugs? Backups and snapshots still have their place Snapshot and/or backup critical portions of your data Easy to select by container but can also select by metadata values
79
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
BREAK
80
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
81
-
82
OpenStack Open Source IAAS platform & global collaboration
Mission: Create an ubiquitous open source cloud computing platform that
is simple to implement and massively scalable
Scalable Massive scale Design Goals 1 Million physical machines, 60 Million VMs Billions of objects stored
Controlled by the OpenStack Foundation IBM is proud to be a Platinum Sponsor
Open All code is Apache 2 licensed
Simple Architecture is modular
Composed of multiple projects around the four capabilities Compute Network Storage Shared services
Mar2013
Oct2014859
Contributors8,500 Members
2556Contributors16,000+ Members
Exponential growth in ~1 YR
82
-
83
-
History of OpenStack Swift
Early OpenStack History: http://www.tiki-toki.com/timeline/entry/138134/OpenStack-History/
Date Release Description
Aug 2009 n/a Swift development started by Rackspace
Jul 2010 n/a OpenStack launches with 25 member companies
Oct 2010 Swift 1.1.0 (Austin) First OpenStack release includes Swift & Nova
Jun 2012 Swift 1.6.0 (Essex) Integration with Keystone
Jun 2014 Swift 2.0.0 (Icehouse) Add Swift Storage Policy support
Jan 2016 Swift 2.6.0 (Liberty) Current release
84
-
History of OpenStack Swift
As of June 2015:Over 300 PB of Swift storage deployed
85
-
Swift API and SemanticsOpenStack Swift is two parts: API specification & middleware description Object storage implementation
Two choices for object storage implementation: Native Swift
Can be extended, but core is Swift API Emulation
Can never be 100% compatible Especially difficult to emulate middleware
API & Middleware Links: http://developer.openstack.org/api-ref-objectstorage-v1.html http://docs.openstack.org/developer/swift/middleware.html 86
http://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://docs.openstack.org/developer/swift/middleware.htmlhttp://docs.openstack.org/developer/swift/middleware.html
-
High-Level on OpenStack Swift
Load balancer (e.g., HAProxy) to balance requests
Each request stateless
Proxy Nodes (public face) authorizes and forwards to appropriate storage server(s) using ring.
Storage Nodes (account, container and object) store, serve and manage data and metadata partitioned based upon ring
Object mapping and layout Objects mapped to partitions by hash on fully
qualified object name Partitions mapped to Virtual Devices using
consistent hashing ring
87
Keystone Authentication Service (public face) Authenticates credentials and provides an access token for future requests. Users can be defined locally or in external LDAP or AD system. Also defines user roles for accounts / projects.
Additional Swift Services Maintain eventual consistency in the distributed object storage environment. Account, container & object updaters, replicators, auditors, reaper.
-
Proxy Server Architecture
88
Process All User Requests Requests & responses pass through wsgi pipeline Community and custom middleware Requests delegated to controller module Controller forwards requests
account, container or object server Responses are received by controller & passed to
the client
Proxy Serverwsgi pipeline
account container
Controllers
object
-
Storage Server ArchitectureObject Server Reads and writes object files onto storage Pipeline for community or custom middleware Pluggable backend interface
Diskfile controls objects layout on filesystem SwiftOnFile diskfile provides file access to object data
Account and Container Servers Manage listing db for each account and container Pipeline for community or custom middleware Pluggable backend interface
Specified but no community implementations Could allow the use of directory listings instead of
account and container dbs for SwiftonFile layout
Object Server
diskfile
Pluggable Backend
wsgi pipeline
Account Server
Pluggable Backend
wsgi pipeline
Container Server
Pluggable Backend
wsgi pipeline
89
-
Anatomy of an Object Write: Client Gets a Token
90
1. Client sends token request to Keystone with credentials
2. Keystone authenticates credentials using local or external identity server
3. If credentials are OK, Keystone returns token to client
Example:curl -i \ -H "Content-Type: application/json" \ -d @mycreds.json \ http://localhost:5000/v3/auth/tokens
Clustered File System
Object Node Object Node Object Node
Keystone
-
Anatomy of an Object Write: Client Issues PUT request
91
1. Client sends PUT request to proxy-server with token, object URI and object data
2. Client saves the token for use until token expires
Example:curl I -X PUT H "X-Auth-Token $TOKEN \http://util5:8080/v1/acct/container/newobject \-T vacation.mp4
Clustered File System
Object Node Object Node Object Node
Keystone
-
1. Proxy-server receives request and each middleware in pipeline looks at and optionally acts on the request
2. authtoken and keystoneauth middleware authenticate and authorize the request (against data in memcached if possible)
Anatomy of an Object Write: Proxy Processes PUT request
92
Clustered File System
Object Node Object Node Object Node
Keystone
-
1. Proxy-server adds X-timestamp header to the request with current system time
2. Use ring to determine where each replica of the object is to be placed
object-server IP virtual device partition object uri hash
3. Pass PUT request to designated object-server(s)4. Embedded wsgi server manages reading data a
chunk at a time from client and passing on to object-server
Example uri: http://util5:8080/v1/acct/container/newobject
is placed here:192.167.12.22:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/
Anatomy of an Object Write: Proxy Processes PUT request
93
Clustered File System
Object Node Object Node Object Node
Keystone
http://util5:8080/v1/acct/container/newobjecthttp://util5:8080/v1/acct/container/newobject
-
Anatomy of an Object Write: Object Processes PUT request
94
1. Object-server receives PUT request and checks that it satisfies object constraints (valid timestamp, object name length within limits, etc.)
2. create diskfile instance for the new object3. diskfile creates tmp file and begins writing to it4. calculate length & md5sum for the new object as the object is
written5. when object write is complete, write system
metadata to the object as file xattrs6. move data to location specified by ring; filename
.data7. Remove any files older than
Example tmp file location:$mount/object_fileset/o/z1device42/tmp/tmpVkeXj
Example object location:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/1442395677.59514.data Clustered File System
Object Node Object Node Object Node
Keystone
-
Anatomy of an Object Write: Update Container and Return Status
95
1. Send request to container server to add new object to container listing
2. Wait for a short time (2 sec) for container server response.
3. If container update times out, write data into async_pending directoryNote: object-updater is responsible to update container dbs with async_pending entries
4. Return status to proxy server, and on to client
Example async_pending location:$mount/object_fileset/o/z1device42/async_pending/
Clustered File System
Object Server Object Server Object Server
Keystone
Container Server Container Server Container Server
-
Extending Swift - Diskfile Interface
Object server diskfile: on disk abstraction layer
Deployers can implement their own storage interface Specialized classes for Manager, Reader & Writer Example Diskfiles:
Community (default) Swiftonfile: Redhat, IBM Swift-ceph Seagate-kinetics Isilon In-memory
Swiftonfile provides native access to object data through the filesystem interface.
Object Server
diskfile
Pluggable Backend
wsgi pipeline
96
-
Extending Swift - WSGI Middleware
API? or Implementation? Web Services Gateway Interface:
Python standard PEP 3333 Chain together of modules to process requests Used by all OpenStack services
Middleware: Pluggable modules that can be configured in request pipeline Specified in configuration service configuration file Each middleware module has a chance to process (or change) request coming in And process (or change) response on the way out
Proxy Serverwsgi pipeline
account container
Controllers
object
97
-
Proxy Server Middleware
Proxy Server
WSGI Pipelinemware-1 proxy-servermware-2 mware-n...
ControllersOperations
GET PUT POST HEAD DEL
Client
proxy-server.conf
98
-
Extending Swift - WSGI Middleware
API? or Implementation?
authentication & authorization: auth_token, keystoneauth multi-part upload: slo, dlo quotas: account-quotas, container-quotas protocol emulation: swift3, s3token bulk operations: expand archive on upload object versioning container sync rate limiting domain remapping static web & temporary url profiling & monitoring your custom middleware
http://docs.openstack.org/developer/swift/middleware.html
Proxy Serverwsgi pipeline
account container
Controllers
object
99
-
Storage Policies
Used by Object Server only Allow you to specify:
Durability levels: 1, 2 or 3x replication Storage backends
cost vs performance tradeoffs storage features - encryption, compression,
Grouping of storage nodes including multi-region
Containers are permanently assigned to policies on creation default or explicit policies can be deprecated - no new containers assigned
100
-
Geo-Distributed Object ClustersBuilding an Active-Active Multi-Site Storage Cloud
101
Global DistributionIngest and Access from Any
Data Center
Multi-Site Availability Objects Replicated Across 2
or more Sites
FlexibleAsync or Sync Replication
-
Geo-Distributed Object ClustersArchitecture Details
Disaster Recovery of data center failures - Active-Active storage cloud Binds geo-distributed sites into a extended capacity storage cloud Leverages Swift replication between sites Objects are stored in one or more regions depending on
Required durability - data copies can be 1 to N (typically max of 3) Required number of supported data center failures
Objects accessible from ANY site If object not local, system retrieves it from remote region
Asynchronous or synchronous replication Research on WAN acceleration technologies
Aspera or TransferSoft are examples
102
Region A
Region C
Region B
Data Center 1
Data Center 2
Data Center 3
-
Swift AuthenticationPluggable Authentication and Authorization
Three common flavors, one choice for production environments
1. Keystone Production ready identity system Models users, roles, projects, domains (v3) & groups (v3) Supports integration with backend LDAP and AD Authtoken (authentication) and keystoneauth (authorization) middleware Authentication through separate Keystone API
2. tempauth, aka version 1 super simple user credentials & project assignment stored in proxy-server.conf
3. swauth user credentials & project assignment stored in Swift
103
-
Swift AuthenticationRole Based Access Control
Two Swift authorization roles today:
1. operatora. Can create, update and delete containers and objects in projects where role is assignedb. Can assign ACLs to control other users accessc. operator_roles config value (proxy-server.conf) specify Keystone roles
2. reseller_admina. can operate on any accountb. reseller_admin config value (proxy-server.conf) specify Keystones roles
Finer access control with Swift Container ACLs
104
-
Swift Additional Features
Quotas on Accounts and Containers Must have reseller_admin role to set Account quotas
StaticWeb - serve container contents as static web site Versioning
Current version in current container Older versions in dedicated container Implemented in middleware (as of Swift version 2.4)
Static and Dynamic Large Objects - multi part upload RateLimit - limit operations on Account and Container Object Expiration
105
-
Some OpenStack Swift Issues
Community software hard to install & manage Performance
Standard Swift daemons scans directory metadata every 30s, decreasing performance of entire system by increasing CPU and disk utilization
No data caching Upcoming erasure coding can hurt performance for small objects Slow to rebuild
Inefficient to scale capacity Swift must re-balance partitions to add additional storage, creating potential for out-of-space
conditions and requiring excessive over provisioning and data movement Lack of enterprise features
Backup/snapshots/encryption No ILM for tiering or to external storage (Tape) RAS, etc
106
-
Get Involved! Core Swift community:
Weekly meetings on IRC Fix bugs, improve tests, improve docs Single process optimizations Container sharding Improved Versioning Encryption Erasure Codes
swiftonfile: Unified File and object access Bi-weekly meetings on IRC
swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC
107
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
108
-
History of Amazon S3 Storage & APIDate Description
June 2006 Amazon launches Simple Storage Service
2008 Amazon reports over 29 billion objects hosted by S3
2010 S3 API support for versioning, bucket policies, notifications, multi-part upload
2011 S3 API support for server side encryption, multi-object delete & object expiration
2012 S3 API support for CORS & archiving to glacier
2013 Amazon reports over 2 trillion objects hosted by S3
2014 S3 API support for life cycle versioning policies, sigv4, event notification
2015 S3 API support for cross region replication, infrequent access storage class
Approx S3 Object Count in S3 (billions)
109
-
Why Use S3 for On-Premise Storage?
Run same apps against on premise and cloud storage Repatriate S3 cloud data & applications to reduce cost Rich API and tool set Swift3 middleware provides emulation layer in Swift environment
But
Some APIs may not apply on premise: i.e, torrents, payments API is controlled by Amazon with no published extension points On-premise implementations will not be 100% compatible
110
-
S3 Models features explicitly Middleware not required Each resource/subresource is managed explicitly from the REST API (GET, PUT, DELETE)
But, how do you get changes into the API spec? 111
-
S3 AuthenticationS3 Requests authenticated using credentials:
Access Key ID (AWSAccessKeyID) Secret Access Key (AWSSecretKey)
Two signing algorithms today: AWS Signature V2: Secret Access Key is used to sign request string using AWS Signature V4: Secret Access Key to create signing key (valid for 7 days)
Each S3 request passes authorization header constructed using on of these algorithms
Both are tedious to construct - let your client create the signature for you!
Swift3 middleware today only supports AWS Signature V2.
112
-
S3 Lifecycle and Bucket PoliciesPolicy Resources to automate and manage of object storage resources
Lifecycle Policies Expire aged objects or object versions
Example: Automatically delete versions older than 90 days
Transition objects to another storage class Example: Move objects from Standard to Glacier after 30 days
Combining Policies: Example: Move from Standard to Standard_IA to Glacier to Expired
Bucket policies Another way to control access to bucket resources
Allow read-only access to anonymous user Require MFA for bucket resources Restrict access to specific client IP addresses
113
-
S3 Access Control
S3 ACLs manage access to Buckets and Objects Every Bucket and Object has an ACL subresource
if no ACL specified on create a default ACL is used giving owner full control
ACLs consist of Grants, Grantee and Permission up to 100 grants per ACL
Grantee types: User: user id, user email, Group: Authenticated User, All Users, Log Delivery Group
Note that these are the only possible groups
Permissions: Read, Write Read_acp, Write_acp Full_control
Canned ACLs are predefined ACLs to simplify access control definition114
-
S3 Access Control - PermissionsPermission Granted On a Bucket Granted On a Object
READ Allows grantee to list objects in a bucket Allows grantee to read object data and its metadata
WRITE Allows grantee to create, overwrite, and delete any object in a bucket
Not applicable
READ_ACP Allows grantee to read a bucket ACL Allows grantee to read object ACL
WRITE_ACP Allows grantee to write an ACL for applicable bucket Allows grantee to write ACL for applicable object
FULL_CONTROL Allows grantee READ, WRITE, READ_ACP, and WRITE_ACP permissions on a bucket
Allows grantee READ, READ_ACP, and WRITE_ACP permissions on an object
115
-
S3 Access Control - Example Default ACL
Owner-Canonical-User-IDowner-display-name
Owner-Canonical-User-ID display-name FULL_CONTROL
Single grant giving owner full control:
116
-
S3 Access Control - Example ACL
Owner-canonical-user-ID display-name FULL_CONTROL
user1-canonical-user-ID display-name WRITE
http://acs.amazonaws.com/groups/global/AllUsers READ
ACL with 2 user and 1 group grants
117
-
S3 Access Control - Canned ACLsCanned ACL Applies To Permissions added to ACL
private Bucket & Object Owner gets FULL_CONTROL. No one else has access rights (default).
public-read Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ access.
public-read-write Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. Granting this on a bucket is generally not recommended.
aws-exec-read Bucket & Object Owner gets FULL_CONTROL. Amazon EC2 gets READ access to GET an Amazon Machine Image (AMI) bundle from Amazon S3.
authenticated-read Bucket & Object Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access.
bucket-owner-read Object only** Object owner gets FULL_CONTROL. Bucket owner gets READ access.
bucket-owner-full-control
Object only** Both the object owner and the bucket owner get FULL_CONTROL over the object.
log-delivery-write Bucket only The LogDelivery group gets WRITE and READ_ACP permissions on the bucket.
** If you specify this canned ACL when creating a bucket, Amazon S3 ignores it.118
-
S3 Access Control - Limitations
Object PUTs reset object ACL to default (unless ACL specified in PUT request) If you give another user WRITE access to a bucket you own, they will be the owner
of any objects they create. You will not have READ access to those objects, and wont be able to see metadata like size You still have WRITE access from Bucket ACL, so you can delete or overwrite them
Caution: When granting WRITE access at the bucket level There is no object level WRITE access With Bucket WRITE access, I can create or delete objects that you created
Caution: Be especially careful giving Bucket WRITE access to groups
119
-
S3 Object Versioning
Versioning enabled at the Bucket level Objects in these buckets have a current object and 0 or more versions PUT creates a new instance that becomes the current object GET bucket?versions lists all object versions GET bucket?versions&prefix=myobject lists all versions of myobject DELETE inserts a "delete marker" but no objects are removed DELETE myobject?version=1001 permanently deletes object version Undelete by deleting the marker: DELETE myobject?version=9876 GET myobject?version=1001 to retrieve older version
myobjectid=1001
myobjectid=1002
myobjectid=9876
delete marker
120
-
Validating the APIceph-s3 tests: Open source compatibility tests for S3 clones Approximately 350 tests Swift3 v1.9 passes approx 75% of tests
https://github.com/ceph/s3-tests121
-
Comparing Swift and S3 FeaturesFeature Swift S3
Access Control Lists Container Container & Object, plus policies
Quotas Account & Container No API support
Versioned Objects Y (limited functionality) Y
Expiring Objects Y Y (with lifecycle policies)
Automatic Storage Tiering Y (based on storage backend) Y (with lifecycle policies)
Storage Policies (placement, durability, etc.) Y No API support
Upload Multipart Large Objects SLO & DLO Y
Container Synchronization Y Y (cross region replication)
Notification Infrastructure Future SNS, SQS,AWS Lambda (cloud only)
Metadata Search Future Future? 122
-
Swift & S3 SummarySwift S3
100% Open Source with active community that is steadily adding features
Closed source implementation (except Swift3)
Deployers and customers can influence API and features
API controlled by single company
Documented ways that you can extend with middleware and diskfile changes
No documented extension points
Vendor extensions can address many of the management issues listed on earlier Swift slide
No documented extension points
Large and growing support community Limited options to support S3 on premise deployments
123
-
Swift & S3 SummarySwift S3
API and middleware provide feature set Well defined API, with features explicitly modelled
More complete feature set:ACL and Access Control model, versioning support, notification service
On premise deployment allows repatriating apps & data from the cloud
On premise deployment allows repatriating apps & data from the cloud
Native Swift deployments are 100% compliant.API-only deployments may lack key features, especially middleware.
On premise vendors have different levels of compliance - each says we support core features but what are those?
Improving development ecosystem Rich development ecosystem
124
-
Get Involved with S3 also!
swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC S3 versioning Lifecycle policies Bucket Policies
ceph/s3-tests Improve test coverage Fix compliance bugs in Swift3
125
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
126
-
Object Storage Challenge...
The world is not object today!
(and never will be completely)
Multi-Protocol Access to the Same Dataset Can Provide Value(S3/Swift/NFS/SMB/POSIX/HDFS)
127
-
Using File to Access ObjectsPrimary Use Cases
1. Transition period Use file API as transition to object API
2. Single Management Pane Manage file and object within single system
3. Sharing Data Globally Create data via file interface and share globally using object interface
4. Analysis Many analysis tools are not a good match for object immutability semantics
5. Connecting NAS clients to object storage Home directories, shared storage from Linux clusters, etc
128
-
File Access to Objects -NAS Gateways and Accessors
...
Swift/S3
Gateway
Accessor
GW and AccessorUse Cases
Good for browsing files Ok for migration into
object store Ok for backup tool
Optional disk cache
Caution Cant control users How are users to know
what works well and what doesnt?
Scalability issues129
-
File Access to Objects -Gateway and Accessor Vendors
Panzura NAS front-end to cloud Distributed caching and link to off-
premise cloud (solution includes disks)
Avere NAS front-end to cloud
Maldivica NAS gateway
Nasuni NAS front-end to cloud
Riverbed Backup of branch offices
Ctera Consolidation of branch offices
Example NAS Gateway Vendors Storage Made Easy
Sync-and-share Direct integration with Windows explorer,
Mac Finder Only Swift mobile access app
Cloudberry Windows only object access Separate application Supports all clouds Has backup apps as well
Cyberduck/Swift-explorer Separate app for Mac, Windows, Linux
support to Swift, S3, etc Open-source
Expandrive Virtual USB drive that allows dropbox to
most cloud providers
Example File Accessor Vendors
130
-
File and Objects AccessIntegrated Solutions
Several solutions exist that offer File and Objects in a single solution Object Solutions with Integrated NAS Gateway
Object storage solution that directly integrates a NAS gateway Same advantages and disadvantages as with NAS Gateways This is offered by almost every object storage vendor
Full Integration of File and Object support NAS support is just as good as a native NAS storage solution Object support is just as good as a native object storage solution This can include separate or the same datasets Examples include IBM Spectrum Scale (GPFS) and Red Hat GlusterFS
131
-
File and Object Access To the Same DataWhat Should It Look Like?
Research challenge: Dream of Full Simultaneous Access How to achieve a unified user namespace? Possible to achieve behavior similar to NFSv4+SMB3?
Should File see file semantics, and Object see object semantics? For workflows, this works quite well
e.g., Ingest through file, read through object e.g., ingest through object, analyze and update, read results through object
Its All Semantics Eventual semantics vs file semantics
Objects are allowed to just disappear...how would File deal with that Buckets/Containers are supposed to scale without limit...but directories typically do not Objects do not respect locks, but how does this fit with file?
Should object protocols wait on a lock? How would Object deal with the delay? How in sync do the namespaces need to be? Across sites, maintaining strong File Semantics is a challenge Separate security, e.g., ACLs, authentication servers, interpreting LDAP/AD users
Do we need a new set of semantics?132
-
A Way Forward: Swift-On-File
A OpenStack Swift Per-Bucket/Container Storage Policy Stores objects on any cluster/parallel file system Objects created using Object API can be accessed as files and vice-versa
Newly created files immediately accessible via Swift/S3 Newly created objects are immediately available for editing
Challenges it overcomes Harden object visibility semantics to ensure read after write
Object namespace eventually consistent Object data is strongly consistent Common LDAP/AD user database for both file and object
Maintaining both file attributes on new Object PUT Currently working on further integrating ACLs, metadata and xattrs, etc
Leverages File System data protection Part of IBM Spectrum Scale 4.2 and experimental with Redhat GlusterFS
Swift code available at https://github.com/openstack/swiftonfile133
-
Co-Existence of Traditional and Swift-On-File
ObjectRing 1
ObjectRing 2
Proxy Tier
Traditional Swift Storage policy
Swift on FileStorage policy
Object storage path:
-rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/sdb1/2/node/sdb2/objects/981/f79/d7b843bf79/1401254393.89313.data
File System storage path:
-rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/fs/container/object1
Swift/S3 user
Spec
tru
m S
cale
11
134
-
File in Object:
http://swift.example.com/v1/acct/cont/obj
Object in File:
/mnt/fs/acct/cont/obj
135
-
Analytics for File and Object Analytics on File is well established Is Object storage storing Big Data or Dead Data? If data cannot be analyzed, might as well use Tape
Tape is still much cheaper Running directly through Swift/S3 API limits functionality
Hive and HBase (among others) lack efficient support due to file append requirement
Plus many more...
Load Imbalance Due to Inefficient Data
Distribution
Large Data Movement on
Name Changes
HTTP slower than RPC
Multiple Network Hops When Writing Data
Loss of Data Locality 136
-
1. Use object storage solution HDFS APIsMileage will varyPerformance results specific analytics framework
2. SparkTargeted towards in-memory analyticsLower demands on storage depending on application
3. Analytics Tool + TachyonTachyon creates an in-memory distributed storage system
Not yet for production...Can lower demands on storage solution
4. Use File + Object solutionRealize native file performance
Analytic Possibilities On Object StorageNo Single Solution
137
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
138
-
Between File and Object...
So are NFSv4, S3, and Swift really all needed?
139
-
Gross Generalization of Target Workloads
Backup (write mostly) Immutable object storage
Backup => write mostly Distribution/streaming =>read mostly
Archive (write mostly) Rarely accessed data, but when needed, it
must be retrieved quickly
***Note that this is what Object is today, not necessarily where it will be tomorrow
It can do object workloads and much more... User data and home directories Applications with small to medium
performance and scalability requirements Analytics
***Note that NFS (without pNFS) is still not ideal for scientific applications that require high-throughput data access from medium to large compute clusters
FileObject
140
-
Applications
Converse in whole objects Simple API that doesnt have complicated concepts like
hard links, crash consistency operations, etc Many short-lived TCP connections
Adds latency but increases parallelism Must tolerate eventual consistency
Must be willing to retry Objects could temporarily disappear... But highly available...
Simple hierarchy makes objects hard to find Many vendors disable even listing containers/buckets Many apps keep separate database
Must tolerate low bandwidth/high latency This is today, so could change in future
Converse in bytes, files, inodes, file descriptors Complicated yet now familiar
Single long-lived TCP connection It's a benefit, but 1 TCP conn. not good in WAN
Stronger consistency, but that makes it confusing Must be aware of scaling issues
E.g., too many objects in a single directory Data sharing has shortcomings
Locking typically only advisory and creates delays during failure (due to state)
High performance, but NFS has inherent load imbalances without pNFS
141
-
Ease of Access
Access data from anywhere on the globe Very thin client with no optimizations Mobile integration
iPhone includes S3 client More and more applications supporting native
object access To ease user transition, several startups have
file-based viewers for Mac/Windows/Linux Storage Made Easy, Cloudberry, Cyberduck, etc
Several S3/Swift mobile apps exist as well Storage Made Easy among many others
Use curl and build your own HTTP request
NFS clients available in all OSs for laptops, desktops, servers
But not to mobile devices Most applications today natively support POSIX
142
-
Data Protection - What Can Go Wrong...
Coordinated H/W failures
Server Failure
Disk Failure/Corruption
Rack Failure
Data Center Failure
Accidental User Error
Data Transfer Corruption between storage client to storage device
Storage software bugs
143
-
Data Protection
Object vendors writing SW from scratch Very new Support 3-way replication and erasure coding
Object vendors currently focused on being the backup, not backing up its data
Little attention to backup More focus on DR support
Beware the snake oil salesman Triplication and erasure coding does not prevent data loss
Versioning No ability to capture entire dataset
NAS vendors support a wide variety of storage systems
software based controller based with specialize H/W controller based with commodity H/W
Backup and DR support widely available Snapshots widely available
144
-
Security
Typically provide multi-tenancy at the level of authentication of users
No client software required Few if any provide data isolation
Encryption becoming more common Each protocol has its own ACL format and
granularity HTTP-based token mechanisms work
nicely for web and global access Privacy through HTTPS
Variety of authentication mechanisms Kerberos now standard, and supports multi-tenancy, but
requires client-side support Typically used in LAN, but can work in WAN Rich ACL format Data transfer encryption supported True multi-tenancy (network and data
isolation) available from some vendors
145
-
Cost and Features
Current solutions consist sold as SW-only SW+commodity H/W
Currently priced low to what market will bear) OpenStack Swift is *free* (Minus blood, sweat, and tears)
Typically simply storing data at this point Analytics support mostly in name only
Relatively easy to manage Only applies to supported vendor solution Note this correlates with fewer features
Cost can vary widely Roll your own SW-only SW+commodity H/W SW+specialized H/W
Many have tape support Viable analytics support Enterprise vendors support multi-protocol access Block-storage support for VMs
Can support entire OpenStack storage ecosystem VMWare, Hyper-V
146
-
Each Protocol Has Purpose and Real Value
S3
Swift
NFS
147
-
Require POSIX?
File
Proprietary Applications In-place updates File append Locking Strong Consistency
Unique to FileS3
Swift
NFS
148
-
Require Mobile or Cloud?
Object
Smartphone/tablet access Cloud-friendly security Cloud-friendly tools
Unique to ObjectS3
Swift
NFS
149
-
The Overlap...today
S3
Swift
NFS
Chances are you have applications that fit in the middle as well Today, stark differences exist between vendors, choice relatively easy
Object vendors by and large have lower cost/capacity Targeting backup/archive market
NAS vendors by and large have higher performance and are feature-rich
150
-
The Overlap...tomorrow
S3
Swift
NFS
Remember that NFS/Swift/S3 are simply protocols to access data Nothing in the Swift or S3 limits performance or future possible features Most enterprise and advanced features are independent of protocol
Object vendors are busy working their way up the application chain Even in-place updates can be mitigated to some degree
Many videos are stored frame by frame, with each one updated in their entirety With small files, updating entire file isnt a big deal
E.g., IoT
With better integration, maybe you wont have to decide :)
151
-
Metadata Search It is hard to find data in both File and Object
A key issue with Objects flat namespace is finding dataEven File can become difficult with billions of files
Scalable search becoming required to realize value of dataFind needles in unstructured haystacks
Goal is to dynamically index objects/files Create structure of well known system and user attributes Tags and attributes automatically added to database
Useful for both users and administratorsUsers search based upon their tagsAdministrators search based upon system attributes
E.g., account_last_activity_time, container_read_permissions, object_content_type
Rest-based search API IBM has built open-source solution with OpenStack Swift
using RabbitMQ and ElasticSearch
FIND ITTAG IT
General Use Cases Data Mining Data Warehousing Selective data retrieval, data backup,
data archival, data migration Management/Reporting
152
-
File vs. Object Summary
So it's not cut and dry File is very mature, but can be complicated Object is very immature, but all disruptive technologies are
The real question is how much of the NAS pie will Object eat?
153
-
Introduction
File and Object Discussion
NFS
Object Storage Introduction
Swift
S3
File to Object and Object to File
Comparison Discussion
Conclusion
Outline
154
-
struct CLOSE4args {/* CURRENT_FH: object */; seqid4 seqid; stateid4 open_stateid;};
Whew...that was a lot of info The 5 Ws of File and Object NFS, Swift, S3 Industry File and Object Solutions
There are few easy decisions There are some now, but it's getting harder as object vendors mature
NFS A long history...but lets work together to advance the technology Check out NFSv4.2 and help to make it the new default!
Swift/S3 on-premise are still emerging as standards Object access will become an essential data access mechanism for ALL data
Get Involved! Swift and NFS have active open source communities
155
-
More InformationNFSv4 IETF working group
https://datatracker.ietf.org/wg/nfsv4NFSv4 RFC
http://www.ietf.org/rfc/rfc3530.txtNFSv4.1 RFC
http://www.ietf.org/rfc/rfc5661.txtNFSv4.2 RFC Draft
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41Ganesha
http://nfs-ganesha.sourceforge.netSNIA white papers & tutorials on NFS
https://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevance http://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdf http://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdf http://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdf
Original pNFS paper - Exporting Storage Systems in a Scalable Manner with pNFS, MSST05 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdf
NFS XATTR Draft https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02
156
https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4http://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttps://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41http://nfs-ganesha.sourceforge.nethttp://nfs-ganesha.sourceforge.nethttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttps://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02
-
More InformationNFS FAQ
http://nfs.sourceforge.net/Virtual Machine Workloads: The Case for New Benchmarks for NAS, FAST13
https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfNewer Is Sometimes Better: An Evaluation of NFSv4.1., SIGMETRICS15
https://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14
http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfBoosting the Power of Swift Using Metadata Search
https://www.youtube.com/watch?v=_bODZWvIprYFrom Archive to Insight: Debunking Myths of Analytics on Object Stores
https://www.youtube.com/watch?v=brhEUptD3JQSwift 101: Technology and Architecture for Beginners
https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginners
Building Applications with Swift: The Swift Developer On-Ramp https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-
developer-on-ramp
157
http://nfs.sourceforge.net/http://nfs.sourceforge.net/https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramp
-
More InformationBuilding web-applications using OpenStack Swift
https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swift
SwiftOnFile Project https://github.com/openstack/swiftonfile
Swift3 Project https://github.com/openstack/swift3
ceph/s3-tests Project https://github.com/ceph/s3-tests
158
https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swift3https://github.com/openstack/swift3https://github.com/ceph/s3-testshttps://github.com/ceph/s3-tests
-
BACKUP
159
-
What is Object Storage?
Multi-Site Cloud
Storage
Multi-Tenancy
Simpler management
and flatter namespace
Simple APIs and Semantics
(Swift/S3 and Whole File Updates)
Scalable Metadata Access
Scalable and Highly-AvailableVersioning
Ubiquitous Access
160
-
Data ProtectionIn The Context of What Can Actually Go Wrong(and not what is only likely to go wrong)
Per-object Auditing common Low coverage Disk Failure/Corruption
Per file or block Auditing is vendor specific Typically high coverage
Erasure coding
Erasure Coding or Triplication Server Failure High-end supports Erasure CodingLow-end has no support
Erasure Coding or Triplication Rack Failure High-end supports Erasure Coding
Erasure Coding or ReplicationScalability can be a concern... Data Center Failure
High-end supports ReplicationAt file or block level
Per Object Versioning S3 supports undelete User Error
Snapshots - Dataset ConsistentBackup
End-to-end checksums vendor specific Data Transfer Corruption End-to-end checksums vendor specificBackup
Typically lack scalable backup Storage Software Bugs Backup
Typically lack scalable backup Coordinated H/W failures Backup
161
-
File and Object Security ComparisonStandard APIs, both standard and custom implementations
Designed for Global Access
Userid/password or certificate
Many support an enterprise directory service, ldap/ad
Authentication
Standard (Kerberos)
Typically not globally accessible
Userid/password or certificate
Many support an enterprise directory service, ldap/ad
ACLs (of varying granularity) Authorization NFSv4 and Posix ACLs
HTTPS Data privacy Kerberos ipsec
Typically software-based separation
Shared servers and storage for everyone
Multi-Tenancy
Typically software-based separation
High-end vendors can provide physical separation as well
162
-
S3 Authentication Signing V2 (Backup)
Access Key ID (AWSAccessKeyID)Secret Access Key (AWSSecretKey)
signature = Base64( HMAC-SHA1( AWSSecretKey, UTF-8-Encoding-Of( StringToSign )))
StringToSign = HTTP-Verb + "\n" + Content-MD5 + "\n" + Content-Type + "\n" + Date + "\n" + CanonicalizedAmzHeaders + CanonicalizedResource
-H Authorization: AWS awsaccesskeyid:signature
Authorization Header
http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html163
-
S3 Authentication Signature V4 (backup)Access Key ID (AWSAccessKeyID)
Secret Access Key (AWSSecretKey)
-H Authorization: AWS4-HMAC-SHA256 Credential=awsaccesskeyid/20160220/us-east-1/s3/aws4_request, SignedHeaders=host;range;x-amz-date, Signature=signature
http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
Authorization Header
164