aws re:invent 2016: how amazon s3 storage management helps optimize storage at scale, with special...

40
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STG215 December 1, 2016 How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest Omair Gillani, Sr. Product Manager, AWS John Elliott, Mgr. Data and Storage, Pinterest

Upload: amazon-web-services

Post on 07-Jan-2017

148 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

STG215

December 1, 2016

How Amazon S3 Storage Management

Helps Optimize Storage at Scale, with

Special Guest, Pinterest

Omair Gillani, Sr. Product Manager, AWS

John Elliott, Mgr. Data and Storage, Pinterest

Page 2: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

What to Expect from the Session

• How we think about Storage Management for

Amazon S3

• Storage Management portfolio for S3

• Understand your data

• Monitor your data

• Manage your data

• Pulling it all together

• Storage management @ Pinterest

Page 3: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

How we think about Storage Management

for Amazon S3

Page 4: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

2012 2013 2014

Amazon storage usage

Trillions of objects

Millions of transactions per second

Page 5: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

What data do I have?

How is my data being used?

How can I better manage my data?

Do I have data that is not being accessed?

Can I perform data-driven storage management?

“Why Storage Management?”

The New Yorker 2013

What data should I archive?

Page 6: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

A comprehensive Storage Management

portfolio for Amazon S3

Page 7: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Cross-Region

ReplicationLifecycle

Policy

S3 Object TagsEvent

Notifications

Amazon S3 CloudWatch

Metrics S3 Inventory Audit with AWS CloudTrail

S3 Data EventsS3 Analytics

Standard Standard - Infrequent Access Amazon Glacier

Storage Management for S3

Page 8: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Understand your storage usage

S3 InventoryAnalyze Logs with

Amazon EMR S3 Analytics

Page 9: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Inventory

Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output

Trigger business workflows and applications such as secondary index, garbage collection,

data auditing, and offline analytics

Half the price of LIST API at $0.0025 per million objects listed

Page 10: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Inventory

More information about your objects than provided by LIST API such as replication

status, multipart upload flag, and delete marker

Name Value Type Description

Bucket String Bucket name. UTF-8 encoded.

Key String Object key name. UTF-8 encoded.

Version Id String Version Id of the object

Is Latest Boolean true if object is the latest version (current version) of a versioned object, otherwise false

Delete Marker Boolean true if object is a delete marker of a versioned object, otherwise false

Size Long Object size in bytes

Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ

ETag String eTag in HEX encoded format

StorageClass StringValid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA.

UTF-8 encoded.

Multipart Uploaded Boolean true if object is uploaded by using multipart, otherwise false

Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.

Page 11: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Inventory

Setup notification when S3 Inventory is complete

/Data/<InventoryFile>.gz

/<InventoryFile>.gz

/<DayofReport>/manifest.json

/manifest.checksum

…AWS Lambda

Amazon SQS

Amazon SNS

Page 12: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Eventually consistent rolling snapshot

S3 Inventory

New objects may not be listed Recently deleted objects may still be included

O1

O2

O3

O1

O2

O3

O1

O2

O1

O2

O3NEW

Validate before you act!Use HEAD OBJECT or GET OBJECT

Page 13: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Analytics – Storage Class Analysis

Analyze buckets,

prefixes or tags

$0.10 per million objects

analyzed per month

Daily Storage

Class Analysis

&

Lifecycle

candidates

Data-driven storage management for S3

Export Analysis data

to your S3 bucket

Page 14: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Analytics – Storage Class Analysis

Export to use BI tool of your choice

Page 15: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Demo

Heavily used storage

Archival storage

Infrequently used storage

Page 16: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Analytics – Storage Class Analysis

Page 17: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Analytics – Storage Class Analysis

Page 18: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Simple to configure S3 Analytics

S3 Management Console PUT Bucket AnalyticsMultiple Policy

Documents

<AnalyticsConfiguration>

<Id>...</Id>

<Filter>

...

</Filter>

<StorageClassAnalysis>

<DataExport>

...

</DataExport>

</StorageClassAnalysis> </AnalyticsConfiguration>

Page 19: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Monitor your storage

Monitor and Alert with

CloudWatch

Audit your storage with

CloudTrail Data Events

Server Access Logs

Page 20: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

CloudWatch metrics for S3

Operational & performance monitoring

• Generate metrics for data of your choice

• Entire bucket, Prefixes, and Tags

• Up to 1,000 object groups

• 1-minute CloudWatch metrics

• Alert and alarm on metrics

Page 21: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

CloudWatch metrics for S3

Metric Name Metric value

AllRequests Count

PutRequests Count

PostRequests Count

GetRequests Count

ListRequests Count

DeleteRequests Count

HeadRequests Count

Metric Name Metric value

BytesDownloaded MB

BytesUploaded MB

4xxErrors Count

5xxErrors Count

FirstByteLatency ms

TotalRequestLatency ms

$0.30 per metric per month

Page 22: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Demo

S3 CloudWatch Metrics

Page 23: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

S3 Data Events in CloudTrail

Perform security analysis, meet your IT auditing and compliance needs,

and take immediate action on object-level activity to immediately improve

security posture

Pricing: $1 per million data events recorded and storage charges apply

Log object level

operations

Changes to bucket

configurations

SNS notification for

log delivery

Page 24: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Manage your data

Cross-Region

Replication

Lifecycle Policies Event

Notifications

S3 Object Tags

Page 25: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Manage your data

S3 Object Tags

Easily manage and control access for Amazon S3 objects

• Classify your data

• Tag your objects with key-value pairs

• Write policies once based on the type of data

AnalyzeLifecycle PolicyAccess Control

Page 26: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Deep dive on tags

• Tags are key-value pairs

• Maximum 10 tags per object

• Maximum key length—127 Unicode characters

• Maximum value length—255 Unicode characters

• Tag keys and values are case-sensitive.

2 ways to put tags via API

• Put objects with tag parameter, or

• add tag API after object is created

Simple pricing

• $0.01 per 10,000 tags per month

Page 27: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

What can I do with tags?

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"s3:GetObject"

],

"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"

"Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"}}

}

]

}

Manage permissions with tags

Page 28: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Lifecycle policies based on tags<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Filter>

<And>

<Prefix>documents/</Prefix>

<Tag>

<Key>Project</Key>

<Value>Delta</Value>

</Tag>

<Tag>

<Key>Data type</Key>

<Value>HPI</Value>

</Tag>

</And>

</Filter>

<Status>Enabled</Status>

<Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass>

</Transition>

<Expiration>

<Days>3650</Days>

</Expiration>

</Rule>

</LifecycleConfiguration>

• Transition or expire storage using tags

• Simplify S3 lifecycle policies

• Filter with prefix, tag, or both

Page 29: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Putting it all together

Page 30: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Storage Management for S3

Cross-Region

ReplicationLifecycle Policy S3 Object TagsEvent

Notifications

S3 CloudWatch Metrics S3 Inventory CloudTrail S3 Data EventsS3 Analytics

Page 31: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Confidential

Pinterest Infrastructure

John Elliott

31

Page 32: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Confidential

80+ Billion Pinscategorized by people into more than

2.6 Billion Boards

3

2

Page 33: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Confidential

80+ terabytes of new data...every dayAlmost entirely log data...

Over 140 petabytes of data

33

Page 34: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)
Page 35: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Proprietary and Confidential

Pinterest Growth for S3

35

Storage Growth

YTD 60%

12 Months 86%

Since Jan ‘14 1,467%

Page 36: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Proprietary and Confidential

Old data flow 6hr runtime

Inventory Job

Operations Job Efficiency Job

• Count object sizes and read API log

• Join datasets to determine object access activity in

order to make tiering decisions

S3 API

logs

Rollup Job

Efficiency

Report

S3 bucket

listing

Page 37: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Proprietary and Confidential

New data flow 20 min runtime

Efficiency Job

• S3 Inventory report allows full bucket inventory

and operations data

• S3 Analytics provides much needed data on

object age and access patterns

Rollup Job

Efficiency

Report

S3

Inventory

Page 38: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Proprietary and Confidential

A single click with S3 Analytics

● S3 Analytics provides Storage Class Analysis

Page 39: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who: Lead Software Development Engineers, Architects, and Technical PMs

Where: Storage Booth Walk-up Bar

When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p)

What: Architecture best practices, code reviews, feature requests

Storage “Office Hours”Meet the People who Build AWS Storage

Page 40: AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage at Scale, with Special Guest, Pinterest (STG215)

Remember to complete

your evaluations!