silo can we implement our own amazon glacier api?
DESCRIPTION
Silo Can we implement our own Amazon Glacier API?. Author: Steven Murray ([email protected]). Scope and contents of this talk. What is the scope and contents Why think about Silo? Amazon Glacier Implementing the Amazon Glacier API Mapping CASTOR concepts to Amazon Glacier - PowerPoint PPT PresentationTRANSCRIPT
Silo
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
DSS
Silo
Can we implement our own Amazon Glacier API?
Author: Steven Murray
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 2
Scope and contents of this talk
What is the scope and contents
•Why think about Silo?
•Amazon Glacier
•Implementing the Amazon Glacier API
•Mapping CASTOR concepts to Amazon Glacier
What is NOT the scope
•How our own private Glacier instance would fit within the
catalogue of DSS services
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Why think about Silo?
• Silo is currently vaporware so it can do anything and does not crash
• Tape drives need their own staging area• The storage and processing of tape
metadata should be separate from disk-cache metadata
• A system focusing on tape storage is easier to make efficient than a system focusing on both a disk-cache and tape storage
Silo - 3
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS A disk cache is not a staging area
Silo - 4
Once a file is uploaded to a staging area the user can no longer access it. Once a file has been downloaded from a staging area it will be deleted as soon as possible.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Why a staging area?• Avoid the shoe-shining of tapes• Hold back enough data to warrant a tape mount• Disk is multi-stream, tape at CERN is not• There are thousands of disk-cache hard drives and around a
hundred tape drives• If disk cannot supply an individual file to tape at tape-drive speed
then a disk staging-area is required to allow multiple disk streams to write files concurrently and then give mutually exclusive access to tape
• Likewise if a disk cannot sink an individual file from tape at tape-drive speed…
• A multilane staging can prevent one or more inefficient users from hogging a tape drive by allowing efficient users to overtake
Silo - 5
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS How do we stage today?
• The CASTOR disk cache acts as both a disk cache and a tape staging area
• CASTOR gives priority disk to tape streams• CASTOR does not give priority to tape to disk streams• CASTOR does not explicitly limit the number of tape
streams per disk
Silo - 6
DiskDiskDiskDiskDisk to diskLower priority
Disk to tapeHigher priority
Bandwidth is limited by the networkMore streams = less bandwidth per stream
DiskDiskDiskDiskDisk to diskLower priority
Tape to diskLower priority
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS How big could a staging area be?
• Worst case “back of an envelope” calculation• Assume a simple scheduling algorithm
– 2 spindles per tape drive for Pedaling
Whilst the client writes to one spindle the tape drive reads from the other and vice versa
– × 2 spindles for Raid 0
It will take two spindles in raid 0 to saturate a tape drive– × 2 spindles for Two data lanes
Inefficient users should not hog tape drives, efficient users should be able to overtake inefficient users
• Assume 80 tape drives always running in parallel• Total number of spindles = 80 × 8 = 640• Assume 3 Terabyte drives• Size of stager area = 640 × 3 T ≈ 2 Petabytes
Silo - 7
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 8
What is Amazon Glacier?• A cold storage data archival solution
• A REST-based web service
– Clients GET, POST, PUT and DELETE resources identified by URLs
• Has four types of resources:
– Archive – a user file with a unique ID generated by Glacier
– Vault – a user named container of archives
– Job – either take an inventory listing of a vault or retrieve an archive
– Notification-configuration – jobs can take up to 4 hours to complete
• The Amazon Glacier API is a potential wrapper to encapsulate all of
the tape software including a tape staging area
• If we wrapped our tape software within the Amazon Glacier API then
other tiers could replace us with the real Amazon Glacier
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Amazon Glacier Namespace
• The Amazon Glacier namespace is simpler than S3
• Users create named vaults
• Users cannot name their archives
• Amazon Glacier generates the IDs of archives
• Users can give a once off description of an archive in no more than 1024
characters
• A vault inventory can be taken and it includes giving back the 1024
character description of each archive
• Users specify access control to their vaults via the Amazon Identity
Access Management (IAM) service
• Users cannot specify quality of service or quotas
Silo - 9
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSSSimplifications implied by theAmazon Glacier namespace – part 1
• Users cannot update files– Amazon Glacier always returns a new ID for an uploaded
archive– A user cannot update an archive once it is uploaded– A user cannot recreate an archive with the same ID
• Users cannot specify ACLs at a finer granularity than a vault because they cannot specify any pattern matching expressions to identify a subset of the archives within a vault
Silo - 10
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSSSimplifications implied by theAmazon Glacier namespace – part 2
• Users cannot request inventory jobs of finer granularity than a vault
• There is only one level of inheritance for ACLs: from vault to archive
• There is no nesting of vaults and therefore no nesting of ACLs
Silo - 11
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Buzzword mayhem
I am going to talk about S3, so..
• Amazon Glacier– An archive is a file stored in Amazon Glacier– An archive has an ID generated by Amazon Glacier– A vault is a container of archives– A vault has a name given by the user
• Amazon S3– An object is a file stored in Amazon S3 – An object has a name given by the user– A bucket is a container of objects – A bucket has a name given by the user
Silo - 12
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 13
How to access Amazon Glacier?
• Directly through the Amazon Glacier API– In my opinion this is not targeted at end users
but rather end-user applications that can remember the generated archive IDs
• Indirectly through the Amazon S3 API– In my opinion this API as seen through client
applications is intended for direct usage by end users
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Direct access to Amazon Glacier
• Users specify the name of the vault• Users have to remember the opaque IDs of
their archives• Users can request the inventory of a vault
– A vault inventory can take up to 4 hours– A vault inventory is meant for disaster recovery
and infrequent namespace reconciliation
Silo - 14
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSSIndirect access to Amazon Glacier via Amazon S3
• Users can define lifecycle rules that transition objects to the Glacier storage class
• Users can list files in real time• Users cannot access or delete S3 objects
with the Glacier storage class using the Amazon Glacier API
• Users are not tempted to use the Amazon Glacier API– Users cannot specify or see the destination vault– Users do not see the archive IDs
Silo - 15
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS S3 and bucket lifecycles• A bucket lifecycle can have up to 1000 rules• A rule has
– Either a relative or absolute time
– A prefix used to identify a group of objects by name
– An action: either a storage class transition or an expiration
• Amazon S3 has three storage classes– Standard
– Reduced redundancy
– Glacier• Transitions to the Glacier storage class are one way• Users do not see vault names or archive IDs• Files are temporarily restored from Glacier in order to be accessed
through S3
• An expiration deletes objects whether they are archived in Amazon Glacier or not
Silo - 16
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Retrieval paradigms
• Retrieval from Amazon Glacier
– Job oriented
– Client creates a retrieval job
– Client queries job status
– Client downloads output of completed job
• Retrieval from Amazon Glacier via S3
– Object (file) oriented
– Client creates a restore job
– Client polls the properties of the object
– Client downloads the object once it’s contents are available
Silo - 17
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 18
Storing a file <= 100MB to Glacier
POST /ACCOUNT_ID/vaults/VAULT_NAME/archives
File contents
File contents
x-amz-archive-id: ARCHIVE_ID
• Amazon Glacier does not specify how to redirect
• HTTP Expect 100-Continue is NOT sent by the Amazon Java SDK
• Further investigations could include:
• Section 8.2.4 of RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1
Client Behaviour if Server Prematurely Closes Connection
Account ID can be a dash ‘-’
Major issueMajor issue
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Subliminal message!
• The Amazon Glacier API assumes the client file is safe once it has been uploaded
• Amazon can make this assumption because an uploaded file is stored in three separate storage centers
• To keep Silo simple here at CERN we will not be able to make the same assumption
• We will have to modify the Amazon Glacier API to address this issue
Silo - 19
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 20
Storing a file > 100MB to Glacier
POST /-/vaults/VAULT_NAME/multipart-uploads
Part offile
Part offile
x-amz-multipart-upload-id: MULTIPART_ID
PUT /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_IDContent-Range: bytes 0-16777215/*Expect: 100-continue
Account ID
For each part:
Byte ranges can comeout of order and in parallel
POST /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_ID
x-amz-archive-id: ARCHIVE_ID
Body (part of body) sent if told to continueRedirection can be applied here
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 21
Retrieving a file from Glacier
POST /-/vaults/VAULT_NAME/jobs
x-amz-job-id: JOB_ID
GET /-/vaults/VAULT_NAME/jobs/JOB_ID
Description
GET /-/vaults/VAULT_NAME/jobs/JOB_ID/output
Description
File contents
File contents
Loop until job output is ready for download:
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 22
Storing a file to Glacier via S3
PUT /ObjectNameHost: BucketName.s3.amazonaws.com
File contents
File contents
200 OK
• S3 explicitly supports the “Expect: 100-continue”
• S3 has 3 storage classes: standard, reduced redundancy and Glacier
• An S3 object must be transitioned to the Glacier storage class
PUT /?lifecycleHost: BucketName.s3.amazonaws.com
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 23
Retrieving a file from Glacier via S3
POST /ObjectName?restoreHost: BucketName.s3.amazonaws.com
File contents
File contents
202 Accepted
HEAD /ObjectNameHost: BucketName.s3.amazonaws.com
x-amz-restore: STATUS OF RESTORE
GET /ObjectNameHost: BucketName.s3.amazonaws.com
File contents
File contents
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 24
Amazon Glacier and security
• Signing
– Client and Amazon Glacier share a secret key
– Client authenticates messages by signing them
– String to sign = name of hash algorithm + request date +
credential scope + canonical form of message header
– Signature = Hash-based message authentication code
generated from string to sign + secret key + credential scope
– Message body sent without encryption
• HTTPS
– Both header and body encrypted
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Amazon Glacier clients
• Amazon does NOT provide a client application• Amazon provides an SDK for the following
platforms, but…– Android (but Glacier is NOT supported– iOS (but Glacier is NOT supported)– Java (Glacier IS supported)– .NET (Glacier IS supported)– Node.js (Glacier IS supported)– PHP (Glacier IS supported)– Ruby (Glacier IS supported)
• No support from Amazon for a C/C++ client
Silo - 25
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Why CMake?
• Because everybody else (EOS) is using it
• Has a simple but powerful language
• Calculates C/C++ header file dependencies itself – no
other tool required
• Explicitly covers:
– Configuration
– Building
– Installing
– Packaging
Silo - 26
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Cmake syntax
• Commands• Flow control• Regular expressions
Silo - 27
# Split the silo library source files into test and non-test filesfile (GLOB SILO_LIB_SRC_FILES_ALL silo/*.cpp silo/exception/*.cpp silo/parser/*.cpp silo/utils/*.cpp)foreach (SRC_FILE ${SILO_LIB_SRC_FILES_ALL}) get_filename_component (SRC_NAME ${SRC_FILE} NAME) if (${SRC_NAME} MATCHES ".*Test.cpp$") set (SILO_LIB_SRC_FILES_TST ${SILO_LIB_SRC_FILES_TST} ${SRC_FILE}) else () set (SILO_LIB_SRC_FILES_NTST ${SILO_LIB_SRC_FILES_NTST} ${SRC_FILE}) endif ()endforeach ()
# Split the silo library source files into test and non-test filesfile (GLOB SILO_LIB_SRC_FILES_ALL silo/*.cpp silo/exception/*.cpp silo/parser/*.cpp silo/utils/*.cpp)foreach (SRC_FILE ${SILO_LIB_SRC_FILES_ALL}) get_filename_component (SRC_NAME ${SRC_FILE} NAME) if (${SRC_NAME} MATCHES ".*Test.cpp$") set (SILO_LIB_SRC_FILES_TST ${SILO_LIB_SRC_FILES_TST} ${SRC_FILE}) else () set (SILO_LIB_SRC_FILES_NTST ${SILO_LIB_SRC_FILES_NTST} ${SRC_FILE}) endif ()endforeach ()
CMakeLists.txt
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS CMake configuration step
Silo - 28
set (SILO_VERSION_MAJOR 1)set (SILO_VERSION_MINOR 0)...configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp.in" "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp" @ONLY)
set (SILO_VERSION_MAJOR 1)set (SILO_VERSION_MINOR 0)...configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp.in" "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp" @ONLY)
CMakeLists.txt
namespace silo {
const uint32_t c_versionMajor = @SILO_VERSION_MAJOR@;const uint32_t c_versionMinor = @SILO_VERSION_MINOR@;
} // namespace silo
namespace silo {
const uint32_t c_versionMajor = @SILO_VERSION_MAJOR@;const uint32_t c_versionMinor = @SILO_VERSION_MINOR@;
} // namespace silo
version.hpp.in
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 29
Why the Apache HTTP server?
• Mature technology
• Decided to write an Apache module in C++ because of
performance and group knowledge
• Features of interest
– Concrete resource management (files, memory, etc.) through the use
of resource pools attached to the lifespans of the server, connections
and requests
– Database connection cache via the DBD module
– Hides chunked encoding
– API support for adding module specific configuration directives
– Transparently provides HTTPS via the SSL module
– Bucket brigades (avoids copying memory)
Data & Storage Services
Wrapping the Apache HTTP server
Silo - 30
• Reduce dependency on Apache HTTP server API
• Provide seams for CppUnit
Data & Storage Services
Resource oriented dispatcher
Silo - 31
switch(m_r->method_number) {case M_DELETE: return resource.httpDelete();case M_GET: return resource.httpGet();case M_POST: return resource.httpPost();case M_PUT: return resource.httpPut();default: throw exception::BadRequest(EXCEPTION_LOCALE, std::string("Unexpected HTTP method: ") + m_r->method);}
Only after the dispatch logic has either returned a HTTP response object or thrown an exception does the silo code start to construct the actual response message for the client
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS CppUnit
• 51 implementation classes (excluding test and mock)
• 93 Unit tests
• 23 Test classes
• 5 Mock classes
– MockCatalogue
– MockHttpInputStream
– MockLog
– MockResource
– MockTmpFileFactory
Silo - 32
The most complex test is the mock upload of a file from the client to the local disk of the httpd daemon
Data & Storage Services
The Silo vaporware prototype
Silo - 33
VDQMVDQM
VMGRVMGR
Simplified NS
Simplified NS
rfiodrfiod
BridgeBridge
rtcpdrtcpd tapedtaped
rmcdrmcd
Apache httpdApache httpd
Disk server module
Disk server module
readtp
writetp
Fork and exec
DiskDisk
Read /write
Read /write
Staging area and client interface
Gla
cier
AP
I
Tape server
Read /write
Non-CASTORScheduler
Non-CASTORScheduler
Central server private to Silo
Transfer requests
A Silo prototype wouldbe 90% CASTOR
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 34
CASTOR file classes
• Tape storage-class
• Specifies the number of copies to be stored on tape
• A file has one and only one file class
• Created by power users
• Standard users tag their files with file classes
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS CASTOR tape pools
Silo - 35
• Simply a list of tapes (can span multiple libraries)
• Created by tape operations
• Used to control collocation
• Used to store different copies in different buildings
• Used to direct small files to the most suitable tape
drives
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
Silo - 36
CASTOR migration routes
• Decide the destination tape pool of a file based on
the file’s:
– File class
– Copy number
– File size (big or small)
• Created by tape operators
• Decouples file classes from tape pools
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Migration routes within Silo?
Silo - 37
TapepoolTape
poolTapepool
Users specify in the namespace of the disk cache the vault they wish to store their files to
The namespace of the disk cache must remember the archive ID of file stored in Silo
Silo stores the migration routes from of files based on vault, copy number, and file size (big or small)
File class = Vault
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Conclusions and future – part 1
• Silo does not exist besides code developed to investigate parts of the Amazon Glacier API
• A Silo prototype would nearly be a complete CASTOR system
• Implementing the Amazon Glacier API as an Apache HTTP server module requires a lot of manpower
• The Amazon Glacier API is the right direction because it enforces a simple namespace and non real-time downloads, but– It does not include the the Amazon Identity and Access
Management (IAM) API
– It does not explicitly specify redirection for file uploads
– It does not specify how we manage our tape pools
– It assumes a staging area that is 100% safe which we cannot
– There is no official C/C++ client librarySilo - 38
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS Conclusions and future – part 2
• In my opinion we need to replace HTTP and modify the Amazon Glacier API so that it no longer relies on the staging area being 100% reliable
• We will have to write our own client library• In my opinion the idea of Amazon to separate
the Amazon IAM API from the Amazon Glacier API is a good one, but we still need to implement this separate access control / account management module
• Architecture meetings will now take place every Wednesday
Silo - 39