aws re:invent 2016: how amazon s3 storage management helps optimize storage at scale, with special...
Post on 07-Jan-2017
148 Views
Preview:
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STG215
December 1, 2016
How Amazon S3 Storage Management
Helps Optimize Storage at Scale, with
Special Guest, Pinterest
Omair Gillani, Sr. Product Manager, AWS
John Elliott, Mgr. Data and Storage, Pinterest
What to Expect from the Session
• How we think about Storage Management for
Amazon S3
• Storage Management portfolio for S3
• Understand your data
• Monitor your data
• Manage your data
• Pulling it all together
• Storage management @ Pinterest
How we think about Storage Management
for Amazon S3
2012 2013 2014
Amazon storage usage
Trillions of objects
Millions of transactions per second
What data do I have?
How is my data being used?
How can I better manage my data?
Do I have data that is not being accessed?
Can I perform data-driven storage management?
“Why Storage Management?”
The New Yorker 2013
What data should I archive?
A comprehensive Storage Management
portfolio for Amazon S3
Cross-Region
ReplicationLifecycle
Policy
S3 Object TagsEvent
Notifications
Amazon S3 CloudWatch
Metrics S3 Inventory Audit with AWS CloudTrail
S3 Data EventsS3 Analytics
Standard Standard - Infrequent Access Amazon Glacier
Storage Management for S3
Understand your storage usage
S3 InventoryAnalyze Logs with
Amazon EMR S3 Analytics
S3 Inventory
Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output
Trigger business workflows and applications such as secondary index, garbage collection,
data auditing, and offline analytics
Half the price of LIST API at $0.0025 per million objects listed
S3 Inventory
More information about your objects than provided by LIST API such as replication
status, multipart upload flag, and delete marker
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest Boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker Boolean true if object is a delete marker of a versioned object, otherwise false
Size Long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass StringValid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA.
UTF-8 encoded.
Multipart Uploaded Boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
S3 Inventory
Setup notification when S3 Inventory is complete
/Data/<InventoryFile>.gz
/<InventoryFile>.gz
…
/<DayofReport>/manifest.json
/manifest.checksum
…AWS Lambda
Amazon SQS
Amazon SNS
Eventually consistent rolling snapshot
S3 Inventory
New objects may not be listed Recently deleted objects may still be included
O1
O2
O3
O1
O2
O3
O1
O2
O1
O2
O3NEW
Validate before you act!Use HEAD OBJECT or GET OBJECT
S3 Analytics – Storage Class Analysis
Analyze buckets,
prefixes or tags
$0.10 per million objects
analyzed per month
Daily Storage
Class Analysis
&
Lifecycle
candidates
Data-driven storage management for S3
Export Analysis data
to your S3 bucket
S3 Analytics – Storage Class Analysis
Export to use BI tool of your choice
Demo
Heavily used storage
Archival storage
Infrequently used storage
S3 Analytics – Storage Class Analysis
S3 Analytics – Storage Class Analysis
Simple to configure S3 Analytics
S3 Management Console PUT Bucket AnalyticsMultiple Policy
Documents
<AnalyticsConfiguration>
<Id>...</Id>
<Filter>
...
</Filter>
<StorageClassAnalysis>
<DataExport>
...
</DataExport>
</StorageClassAnalysis> </AnalyticsConfiguration>
Monitor your storage
Monitor and Alert with
CloudWatch
Audit your storage with
CloudTrail Data Events
Server Access Logs
CloudWatch metrics for S3
Operational & performance monitoring
• Generate metrics for data of your choice
• Entire bucket, Prefixes, and Tags
• Up to 1,000 object groups
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
CloudWatch metrics for S3
Metric Name Metric value
AllRequests Count
PutRequests Count
PostRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
Metric Name Metric value
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms
$0.30 per metric per month
Demo
S3 CloudWatch Metrics
S3 Data Events in CloudTrail
Perform security analysis, meet your IT auditing and compliance needs,
and take immediate action on object-level activity to immediately improve
security posture
Pricing: $1 per million data events recorded and storage charges apply
Log object level
operations
Changes to bucket
configurations
SNS notification for
log delivery
Manage your data
Cross-Region
Replication
Lifecycle Policies Event
Notifications
S3 Object Tags
Manage your data
S3 Object Tags
Easily manage and control access for Amazon S3 objects
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
AnalyzeLifecycle PolicyAccess Control
Deep dive on tags
• Tags are key-value pairs
• Maximum 10 tags per object
• Maximum key length—127 Unicode characters
• Maximum value length—255 Unicode characters
• Tag keys and values are case-sensitive.
2 ways to put tags via API
• Put objects with tag parameter, or
• add tag API after object is created
Simple pricing
• $0.01 per 10,000 tags per month
What can I do with tags?
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"}}
}
]
}
Manage permissions with tags
Lifecycle policies based on tags<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Filter>
<And>
<Prefix>documents/</Prefix>
<Tag>
<Key>Project</Key>
<Value>Delta</Value>
</Tag>
<Tag>
<Key>Data type</Key>
<Value>HPI</Value>
</Tag>
</And>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
• Transition or expire storage using tags
• Simplify S3 lifecycle policies
• Filter with prefix, tag, or both
Putting it all together
Storage Management for S3
Cross-Region
ReplicationLifecycle Policy S3 Object TagsEvent
Notifications
S3 CloudWatch Metrics S3 Inventory CloudTrail S3 Data EventsS3 Analytics
Confidential
Pinterest Infrastructure
John Elliott
31
Confidential
80+ Billion Pinscategorized by people into more than
2.6 Billion Boards
3
2
Confidential
80+ terabytes of new data...every dayAlmost entirely log data...
Over 140 petabytes of data
33
Proprietary and Confidential
Pinterest Growth for S3
35
Storage Growth
YTD 60%
12 Months 86%
Since Jan ‘14 1,467%
Proprietary and Confidential
Old data flow 6hr runtime
Inventory Job
Operations Job Efficiency Job
• Count object sizes and read API log
• Join datasets to determine object access activity in
order to make tiering decisions
S3 API
logs
Rollup Job
Efficiency
Report
S3 bucket
listing
Proprietary and Confidential
New data flow 20 min runtime
Efficiency Job
• S3 Inventory report allows full bucket inventory
and operations data
• S3 Analytics provides much needed data on
object age and access patterns
Rollup Job
Efficiency
Report
S3
Inventory
Proprietary and Confidential
A single click with S3 Analytics
● S3 Analytics provides Storage Class Analysis
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who: Lead Software Development Engineers, Architects, and Technical PMs
Where: Storage Booth Walk-up Bar
When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p)
What: Architecture best practices, code reviews, feature requests
Storage “Office Hours”Meet the People who Build AWS Storage
Remember to complete
your evaluations!
top related