silo can we implement our own amazon glacier api?

Silo

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

DSS

Silo

Can we implement our own Amazon Glacier API?

Author: Steven Murray

([email protected])

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 2

Scope and contents of this talk

What is the scope and contents

•Why think about Silo?

•Amazon Glacier

•Implementing the Amazon Glacier API

•Mapping CASTOR concepts to Amazon Glacier

What is NOT the scope

•How our own private Glacier instance would fit within the

catalogue of DSS services

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Why think about Silo?

• Silo is currently vaporware so it can do anything and does not crash

• Tape drives need their own staging area• The storage and processing of tape

metadata should be separate from disk-cache metadata

• A system focusing on tape storage is easier to make efficient than a system focusing on both a disk-cache and tape storage

Silo - 3

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS A disk cache is not a staging area

Silo - 4

Once a file is uploaded to a staging area the user can no longer access it. Once a file has been downloaded from a staging area it will be deleted as soon as possible.

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Why a staging area?• Avoid the shoe-shining of tapes• Hold back enough data to warrant a tape mount• Disk is multi-stream, tape at CERN is not• There are thousands of disk-cache hard drives and around a

hundred tape drives• If disk cannot supply an individual file to tape at tape-drive speed

then a disk staging-area is required to allow multiple disk streams to write files concurrently and then give mutually exclusive access to tape

• Likewise if a disk cannot sink an individual file from tape at tape-drive speed…

• A multilane staging can prevent one or more inefficient users from hogging a tape drive by allowing efficient users to overtake

Silo - 5

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS How do we stage today?

• The CASTOR disk cache acts as both a disk cache and a tape staging area

• CASTOR gives priority disk to tape streams• CASTOR does not give priority to tape to disk streams• CASTOR does not explicitly limit the number of tape

streams per disk

Silo - 6

DiskDiskDiskDiskDisk to diskLower priority

Disk to tapeHigher priority

Bandwidth is limited by the networkMore streams = less bandwidth per stream

DiskDiskDiskDiskDisk to diskLower priority

Tape to diskLower priority

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS How big could a staging area be?

• Worst case “back of an envelope” calculation• Assume a simple scheduling algorithm

– 2 spindles per tape drive for Pedaling

Whilst the client writes to one spindle the tape drive reads from the other and vice versa

– × 2 spindles for Raid 0

It will take two spindles in raid 0 to saturate a tape drive– × 2 spindles for Two data lanes

Inefficient users should not hog tape drives, efficient users should be able to overtake inefficient users

• Assume 80 tape drives always running in parallel• Total number of spindles = 80 × 8 = 640• Assume 3 Terabyte drives• Size of stager area = 640 × 3 T ≈ 2 Petabytes

Silo - 7

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 8

What is Amazon Glacier?• A cold storage data archival solution

• A REST-based web service

– Clients GET, POST, PUT and DELETE resources identified by URLs

• Has four types of resources:

– Archive – a user file with a unique ID generated by Glacier

– Vault – a user named container of archives

– Job – either take an inventory listing of a vault or retrieve an archive

– Notification-configuration – jobs can take up to 4 hours to complete

• The Amazon Glacier API is a potential wrapper to encapsulate all of

the tape software including a tape staging area

• If we wrapped our tape software within the Amazon Glacier API then

other tiers could replace us with the real Amazon Glacier

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Amazon Glacier Namespace

• The Amazon Glacier namespace is simpler than S3

• Users create named vaults

• Users cannot name their archives

• Amazon Glacier generates the IDs of archives

• Users can give a once off description of an archive in no more than 1024

characters

• A vault inventory can be taken and it includes giving back the 1024

character description of each archive

• Users specify access control to their vaults via the Amazon Identity

Access Management (IAM) service

• Users cannot specify quality of service or quotas

Silo - 9

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSSSimplifications implied by theAmazon Glacier namespace – part 1

• Users cannot update files– Amazon Glacier always returns a new ID for an uploaded

archive– A user cannot update an archive once it is uploaded– A user cannot recreate an archive with the same ID

• Users cannot specify ACLs at a finer granularity than a vault because they cannot specify any pattern matching expressions to identify a subset of the archives within a vault

Silo - 10

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSSSimplifications implied by theAmazon Glacier namespace – part 2

• Users cannot request inventory jobs of finer granularity than a vault

• There is only one level of inheritance for ACLs: from vault to archive

• There is no nesting of vaults and therefore no nesting of ACLs

Silo - 11

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Buzzword mayhem

I am going to talk about S3, so..

• Amazon Glacier– An archive is a file stored in Amazon Glacier– An archive has an ID generated by Amazon Glacier– A vault is a container of archives– A vault has a name given by the user

• Amazon S3– An object is a file stored in Amazon S3 – An object has a name given by the user– A bucket is a container of objects – A bucket has a name given by the user

Silo - 12

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 13

How to access Amazon Glacier?

• Directly through the Amazon Glacier API– In my opinion this is not targeted at end users

but rather end-user applications that can remember the generated archive IDs

• Indirectly through the Amazon S3 API– In my opinion this API as seen through client

applications is intended for direct usage by end users

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Direct access to Amazon Glacier

• Users specify the name of the vault• Users have to remember the opaque IDs of

their archives• Users can request the inventory of a vault

– A vault inventory can take up to 4 hours– A vault inventory is meant for disaster recovery

and infrequent namespace reconciliation

Silo - 14

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSSIndirect access to Amazon Glacier via Amazon S3

• Users can define lifecycle rules that transition objects to the Glacier storage class

• Users can list files in real time• Users cannot access or delete S3 objects

with the Glacier storage class using the Amazon Glacier API

• Users are not tempted to use the Amazon Glacier API– Users cannot specify or see the destination vault– Users do not see the archive IDs

Silo - 15

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS S3 and bucket lifecycles• A bucket lifecycle can have up to 1000 rules• A rule has

– Either a relative or absolute time

– A prefix used to identify a group of objects by name

– An action: either a storage class transition or an expiration

• Amazon S3 has three storage classes– Standard

– Reduced redundancy

– Glacier• Transitions to the Glacier storage class are one way• Users do not see vault names or archive IDs• Files are temporarily restored from Glacier in order to be accessed

through S3

• An expiration deletes objects whether they are archived in Amazon Glacier or not

Silo - 16

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Retrieval paradigms

• Retrieval from Amazon Glacier

– Job oriented

– Client creates a retrieval job

– Client queries job status

– Client downloads output of completed job

• Retrieval from Amazon Glacier via S3

– Object (file) oriented

– Client creates a restore job

– Client polls the properties of the object

– Client downloads the object once it’s contents are available

Silo - 17

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 18

Storing a file <= 100MB to Glacier

POST /ACCOUNT_ID/vaults/VAULT_NAME/archives

File contents

File contents

x-amz-archive-id: ARCHIVE_ID

• Amazon Glacier does not specify how to redirect

• HTTP Expect 100-Continue is NOT sent by the Amazon Java SDK

• Further investigations could include:

• Section 8.2.4 of RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1

Client Behaviour if Server Prematurely Closes Connection

Account ID can be a dash ‘-’

Major issueMajor issue

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Subliminal message!

• The Amazon Glacier API assumes the client file is safe once it has been uploaded

• Amazon can make this assumption because an uploaded file is stored in three separate storage centers

• To keep Silo simple here at CERN we will not be able to make the same assumption

• We will have to modify the Amazon Glacier API to address this issue

Silo - 19

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 20

Storing a file > 100MB to Glacier

POST /-/vaults/VAULT_NAME/multipart-uploads

Part offile

Part offile

x-amz-multipart-upload-id: MULTIPART_ID

PUT /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_IDContent-Range: bytes 0-16777215/*Expect: 100-continue

Account ID

For each part:

Byte ranges can comeout of order and in parallel

POST /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_ID

x-amz-archive-id: ARCHIVE_ID

Body (part of body) sent if told to continueRedirection can be applied here

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 21

Retrieving a file from Glacier

POST /-/vaults/VAULT_NAME/jobs

x-amz-job-id: JOB_ID

GET /-/vaults/VAULT_NAME/jobs/JOB_ID

Description

GET /-/vaults/VAULT_NAME/jobs/JOB_ID/output

Description

File contents

File contents

Loop until job output is ready for download:

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 22

Storing a file to Glacier via S3

PUT /ObjectNameHost: BucketName.s3.amazonaws.com

File contents

File contents

200 OK

• S3 explicitly supports the “Expect: 100-continue”

• S3 has 3 storage classes: standard, reduced redundancy and Glacier

• An S3 object must be transitioned to the Glacier storage class

PUT /?lifecycleHost: BucketName.s3.amazonaws.com

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 23

Retrieving a file from Glacier via S3

POST /ObjectName?restoreHost: BucketName.s3.amazonaws.com

File contents

File contents

202 Accepted

HEAD /ObjectNameHost: BucketName.s3.amazonaws.com

x-amz-restore: STATUS OF RESTORE

GET /ObjectNameHost: BucketName.s3.amazonaws.com

File contents

File contents

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 24

Amazon Glacier and security

• Signing

– Client and Amazon Glacier share a secret key

– Client authenticates messages by signing them

– String to sign = name of hash algorithm + request date +

credential scope + canonical form of message header

– Signature = Hash-based message authentication code

generated from string to sign + secret key + credential scope

– Message body sent without encryption

• HTTPS

– Both header and body encrypted

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Amazon Glacier clients

• Amazon does NOT provide a client application• Amazon provides an SDK for the following

platforms, but…– Android (but Glacier is NOT supported– iOS (but Glacier is NOT supported)– Java (Glacier IS supported)– .NET (Glacier IS supported)– Node.js (Glacier IS supported)– PHP (Glacier IS supported)– Ruby (Glacier IS supported)

• No support from Amazon for a C/C++ client

Silo - 25

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Why CMake?

• Because everybody else (EOS) is using it

• Has a simple but powerful language

• Calculates C/C++ header file dependencies itself – no

other tool required

• Explicitly covers:

– Configuration

– Building

– Installing

– Packaging

Silo - 26

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Cmake syntax

• Commands• Flow control• Regular expressions

Silo - 27

# Split the silo library source files into test and non-test filesfile (GLOB SILO_LIB_SRC_FILES_ALL silo/*.cpp silo/exception/*.cpp silo/parser/*.cpp silo/utils/*.cpp)foreach (SRC_FILE ${SILO_LIB_SRC_FILES_ALL}) get_filename_component (SRC_NAME ${SRC_FILE} NAME) if (${SRC_NAME} MATCHES ".*Test.cpp$") set (SILO_LIB_SRC_FILES_TST ${SILO_LIB_SRC_FILES_TST} ${SRC_FILE}) else () set (SILO_LIB_SRC_FILES_NTST ${SILO_LIB_SRC_FILES_NTST} ${SRC_FILE}) endif ()endforeach ()

# Split the silo library source files into test and non-test filesfile (GLOB SILO_LIB_SRC_FILES_ALL silo/*.cpp silo/exception/*.cpp silo/parser/*.cpp silo/utils/*.cpp)foreach (SRC_FILE ${SILO_LIB_SRC_FILES_ALL}) get_filename_component (SRC_NAME ${SRC_FILE} NAME) if (${SRC_NAME} MATCHES ".*Test.cpp$") set (SILO_LIB_SRC_FILES_TST ${SILO_LIB_SRC_FILES_TST} ${SRC_FILE}) else () set (SILO_LIB_SRC_FILES_NTST ${SILO_LIB_SRC_FILES_NTST} ${SRC_FILE}) endif ()endforeach ()

CMakeLists.txt

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS CMake configuration step

Silo - 28

set (SILO_VERSION_MAJOR 1)set (SILO_VERSION_MINOR 0)...configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp.in" "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp" @ONLY)

set (SILO_VERSION_MAJOR 1)set (SILO_VERSION_MINOR 0)...configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp.in" "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp" @ONLY)

CMakeLists.txt

namespace silo {

const uint32_t c_versionMajor = @SILO_VERSION_MAJOR@;const uint32_t c_versionMinor = @SILO_VERSION_MINOR@;

} // namespace silo

namespace silo {

const uint32_t c_versionMajor = @SILO_VERSION_MAJOR@;const uint32_t c_versionMinor = @SILO_VERSION_MINOR@;

} // namespace silo

version.hpp.in

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 29

Why the Apache HTTP server?

• Mature technology

• Decided to write an Apache module in C++ because of

performance and group knowledge

• Features of interest

– Concrete resource management (files, memory, etc.) through the use

of resource pools attached to the lifespans of the server, connections

and requests

– Database connection cache via the DBD module

– Hides chunked encoding

– API support for adding module specific configuration directives

– Transparently provides HTTPS via the SSL module

– Bucket brigades (avoids copying memory)

Data & Storage Services

Wrapping the Apache HTTP server

Silo - 30

• Reduce dependency on Apache HTTP server API

• Provide seams for CppUnit


Resource oriented dispatcher

Silo - 31

switch(m_r->method_number) {case M_DELETE: return resource.httpDelete();case M_GET: return resource.httpGet();case M_POST: return resource.httpPost();case M_PUT: return resource.httpPut();default: throw exception::BadRequest(EXCEPTION_LOCALE, std::string("Unexpected HTTP method: ") + m_r->method);}

Only after the dispatch logic has either returned a HTTP response object or thrown an exception does the silo code start to construct the actual response message for the client

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS CppUnit

• 51 implementation classes (excluding test and mock)

• 93 Unit tests

• 23 Test classes

• 5 Mock classes

– MockCatalogue

– MockHttpInputStream

– MockLog

– MockResource

– MockTmpFileFactory

Silo - 32

The most complex test is the mock upload of a file from the client to the local disk of the httpd daemon


The Silo vaporware prototype

Silo - 33

VDQMVDQM

VMGRVMGR

Simplified NS

Simplified NS

rfiodrfiod

BridgeBridge

rtcpdrtcpd tapedtaped

rmcdrmcd

Apache httpdApache httpd

Disk server module

Disk server module

readtp

writetp

Fork and exec

DiskDisk

Read /write

Read /write

Staging area and client interface

Gla

cier

AP

I

Tape server

Read /write

Non-CASTORScheduler

Non-CASTORScheduler

Central server private to Silo

Transfer requests

A Silo prototype wouldbe 90% CASTOR

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 34

CASTOR file classes

• Tape storage-class

• Specifies the number of copies to be stored on tape

• A file has one and only one file class

• Created by power users

• Standard users tag their files with file classes

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS CASTOR tape pools

Silo - 35

• Simply a list of tapes (can span multiple libraries)

• Created by tape operations

• Used to control collocation

• Used to store different copies in different buildings

• Used to direct small files to the most suitable tape

drives

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS

Silo - 36

CASTOR migration routes

• Decide the destination tape pool of a file based on

the file’s:

– File class

– Copy number

– File size (big or small)

• Created by tape operators

• Decouples file classes from tape pools

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Migration routes within Silo?

Silo - 37

TapepoolTape

poolTapepool

Users specify in the namespace of the disk cache the vault they wish to store their files to

The namespace of the disk cache must remember the archive ID of file stored in Silo

Silo stores the migration routes from of files based on vault, copy number, and file size (big or small)

File class = Vault

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Conclusions and future – part 1

• Silo does not exist besides code developed to investigate parts of the Amazon Glacier API

• A Silo prototype would nearly be a complete CASTOR system

• Implementing the Amazon Glacier API as an Apache HTTP server module requires a lot of manpower

• The Amazon Glacier API is the right direction because it enforces a simple namespace and non real-time downloads, but– It does not include the the Amazon Identity and Access

Management (IAM) API

– It does not explicitly specify redirection for file uploads

– It does not specify how we manage our tape pools

– It assumes a staging area that is 100% safe which we cannot

– There is no official C/C++ client librarySilo - 38

CERN IT Department

CH-1211 Genève 23


it

InternetServices

DSS Conclusions and future – part 2

• In my opinion we need to replace HTTP and modify the Amazon Glacier API so that it no longer relies on the staging area being 100% reliable

• We will have to write our own client library• In my opinion the idea of Amazon to separate

the Amazon IAM API from the Amazon Glacier API is a good one, but we still need to implement this separate access control / account management module

• Architecture meetings will now take place every Wednesday

Silo - 39

silo can we implement our own amazon glacier api?

Documents