exoscale: pithos: your personal s3 object store on cassandra
TRANSCRIPT
@PYRCTO at Exoscale, Swiss Cloud Hosting.Open source developer: pithos, cyanite, riemann, collectd.
AIM OF THIS TALKPresenting object storageShow-casing efficient uses of object storagePresenting pithosFeedback on usage
OUTLINEObject Storage 1016 things you should do with S3Pithos, your personal Object StorePithos in production
OBJECT STORAGE 101
THE ELEVATOR PITCHObject Storage is a storage architecture that
manages data as objects
Wikipedia
INCEPTIONAsset and content storage for large hosting platforms.Livejournal's MogileFS.A shift in how we perceive distributed storage.
ESSENTIAL PROPERTIESNo POSIX guaranteesNo atomicityEventual consistencyPushes some responsibility back to the application.
THE OBJECT STORAGE LANDSCAPEMostly hosted solutions:
AWS S3Rackspace Cloud FilesDreamObjectsExoscale SOS
No real API standardisationAWS S3 is the de-facto standard
THE ON-PREMISE OBJECT STORAGE LANDSCAPESome vendor-backed solutions:
EMC AtmosScalityCloudian
SwiftCephRiak CSPithos
A TYPICAL OBJECT STORE REQUEST# curl -X PUT -d @file.txt https://mybucket.myprovider.com/some-file.txt# curl https://mybucket.myprovider.com/some-file.txt
S3 TERMINOLOGYRegion: Determines where objects will be stored.Storage Class: Storage properties for objects.Bucket: A named container for objects.Object: A file.
THE S3 APIA global bucket namespaceArtificial hierarchy supportAuthentication and Authorization through ACLsMultipart uploadsCORS support & Form based uploadsEventual consistency
A GLOBAL BUCKET NAMESPACEA single consistent namespace for buckets:
Across tenants.There is only one highlander bucket.
A bucket is located within a region.
HIERACHY SUPPORTListing requests may supply a delimiter and prefix.Emulates directories when keys contain slashes.
HIERARCHY SUPPORTGET /?delimiter=/ HTTP/1.1Host: mybucket.service.uriDate: <date>Authorization: AWS <key>:<signature>
HIERARCHY SUPPORT<?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>batman</Name> <Prefix></Prefix> <MaxKeys>100</MaxKeys> <Delimiter>/</Delimiter> <IsTruncated>false</IsTruncated> <Contents> <Key>sample.txt</Key> <LastModified>2014-10-17T12:35:10.423Z</LastModified> <ETag>"a4b7923f7b2df9bc96fb263978c8bc40"</ETag> <Size>1603</Size> <Owner> <ID>[email protected]</ID> <DisplayName>[email protected]</DisplayName> </Owner> <StorageClass>Standard</StorageClass> </Contents></ListBucketResult>
AUTHENTICATION & AUTHORIZATION THROUGH ACLSSimple canned ACLs allow common settings.
e.g: public.An XML syntax is also available.
MULTIPART UPLOADSAllows uploading several chunks of files.User-controlled re-aggregation step.Limits the impact of upload failures for large files.
CORS SUPPORT AND FORM-BASED UPLOADSWeb interaction without any backend components.CORS setup through an XML configuration syntax.Form based uploads through pre-signed requests.
EVENTUAL CONSISTENCYAn easy sell at Cassandra SummitPossible delay between PUT and GET availability.Checksums avoid massive inconsistencies.
6 THINGS TO DO WITH S3
12-FACTOR APP SUPPORT FOR PERSISTENCEEliminates the need for NFSEases interaction with PaaS type platforms
http://12factor.net/
STATIC CONTENT HOSTINGPerfect for hosting CSS, JS and other static assetsSimply requires setting a bucket's ACL to public
FORM BASED UPLOADSPre-signed requestsRequests encapsulate a policyNo proxying to the S3 service requiredGreat for supporting user generated content
ARTIFACT STORAGESupported in MavenSupported in Docker RegistrySupported in AptSupported in Mesos fetcher
BACKUPSGreat Open-Source options like duplicity.Commercial storage gateway support.Some home NAS-type products support S3 as well.
CLIENT-SIDE ENCRYPTIONGPG encryption support.Guarantees full data ownership, even when leveraging third-party providers.Don't lose your keys!
PITHOS, YOUR PERSONAL OBJECT-STORE
FROM THE WEBSITEPithos is a daemon which provides an S3-compatible frontend for storing files in a
Cassandra cluster.
WHY ?Provide your own S3-compatible service (that's us!)Restricted from using hosted object-storage services.Willingness to fully own availability.
PITHOS ESSENTIAL PROPERTIESExtensive S3 API coverage.Fully Stateless.Multi-region support.Fully Cassandra-backed.Extensible.Open-Source.
MISC.Runs on the JVM.Written in Clojure.Small codebase (~ 5300 LoC).Can run an embedded cassandra for tests purposes.
PITHOS ARCHITECTUREA daemon built out of 5 isolated and pluggable components.
PITHOS ARCHITECTUREKeystoreBucketstoreMetastoreBlobstoreReporter
OVERALL CONCEPT
THE KEYSTOREAuthentication & Authorization handled outside of pithos.Only component which doesn't rely on Cassandra by default.Default implementation relies on the pithos configuration file.Maps an API key to a credentials.Example alternative implementation in the documentation.
THE KEYSTORE{ "tenant": "tenant name", "secret": "secret key", "memberof": ["group1", "group2"]}
THE BUCKETSTOREStores essential bucket properties
Bucket tenant.Region and storage-class where bucket is located.Optional CORS properties.
THE BUCKETSTOREBucket ownership is transactional.Cassandra is not the best suited for this task.The lightweight transaction features help.
THE BUCKETSTORE{ "bucket": "batman", "created": "2012-01-01 01:30:00", "tenant": "[email protected]", "region": "ch-dk-2", "acl": "...", "cors": "..."}
THE METASTOREStores all object details.References an inode an version in the bucketstore.Using the path as a key in a wide colum ensures keys aresorted.
THE METASTORE{ "bucket": "test", "object": "file.txt", "inode": "4e682d3d-28fa-4ea6-aa28-282c2757f31b", "version": "c97894cd-e2cd-46d5-a217-1add544e88a4", "atime": "2012-01-01 01:30:00", "size": 1024, "checksum": "d41d8cd98f00b204e9800998ecf8427e", "storageclass": "standard", "acl": "...", "metadata": { }}
THE BLOBSTOREStores data.Inodes are lists of blocks.Blocks are lists of chunks.Chunks contain small (128k) chunks of the file.
THE BLOBSTORENot what Cassandra was meant for.Works suprisingly well.
THE REPORTEREmits useful usage information.Good basis for building billing extensions.
CONFIGURATIONA single configuration file to configure all aspects
Logging & server options.Keystore, bucketstore, metastore and blobstore.Each can have its own details / cassandra cluster.
CONFIGURATIONservice: host: "0.0.0.0" port: 8080logging: level: info console: true overrides: io.pithos: debugoptions: service-uri: s3.example.com default-region: myregion
CONFIGURATIONkeystore: keys: AKIAIOSFODNN7EXAMPLE: tenant: [email protected] secret: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'bucketstore: default-region: myregion cluster: "localhost" keyspace: storageregions: myregion: metastore: cluster: "localhost" keyspace: storage storage-classes: standard: cluster: "localhost" keyspace: storage max-chunk: "128k" max-block-chunk: 1024
AREAS OF IMPROVEMENTV4 Signatures.Overall S3 API coverage.Overall S3 Client coverage.Promoting Cassandra compact storage.Simple web interface.More contributors and users!
V4 SIGNATURESV4 type signatures are still not supported in pithos and are item
number 1 on the todo-list.
OVERALL S3 API COVERAGEThe S3 API is byzantine and corner cases are poorlydocumented.Still missing some useful bits (versioning, bucket policies,session tokens).
OVERALL S3 CLIENT COVERAGESome clients are very sensitive with regard to API behavior.The essentials work.Glitches are quickly fixed when caught.
PROMOTING CASSANDRA COMPACT STORAGEWITH COMPACT STORAGE gives great benefits.Not yet promoted or automatically converged on startup.
SIMPLE WEB INTERFACEA simple JavaScript SPA would be nice.
PITHOS IN PRODUCTION
A WORD OF WARNINGRunning an object-store is not necessarily for the faint of heart.
HOW WE USE ITNo multi-datacenter clusters.Dedicated metadata cluster.Dedicated "blobstore" clusters.
ELSEWHEREFew known installations (in the 10s).Always rather large.Always used where cassandra previously existed.
MAINTENANCE (PITHOS)A few cases generate orphan inodes and must be prunedmanually.Internal tooling used for this, should eventually be released.Rather worry-free
MAINTENANCE (CASSANDRA)The usual applies
Schedule regular repairs of your clustersFollow releases
Best supported version: 2.1.xQuorum is satisfactory in terms of performance.
SCALINGPithos is stateless.Colocate cassandra and pithos daemons.Split blobstore and metastore keyspaces into separateclusters.Split Data/Proxy nodes is worth investigating for hugedeployments.Haproxy to distribute queries to pithos instances.
PARTING WORDSTry it out! (There's an all-in-one version)Get involved
Docs need proof-reading, additions.Some issues need to be tackled.
THANKS !Pithos owes a lot to:
Max Penet (@mpenet) for the great alia & jet librariesDatastax for the awesome cassandra java-driverIts contributorsApache Cassandra obviously
@pyr