introduction to amazon redshift and what's next (dat103) | aws re:invent 2013

36
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Redshift Overview & What’s Next Rahul Pathak, Redshift PM ([email protected]) Anurag Gupta, Redshift GM ([email protected]) November 13, 2013

Upload: amazon-web-services

Post on 11-May-2015

2.020 views

Category:

Technology


2 download

DESCRIPTION

Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.

TRANSCRIPT

Page 1: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift Overview & What’s Next

Rahul Pathak, Redshift PM ([email protected]) Anurag Gupta, Redshift GM ([email protected])

November 13, 2013

Page 2: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Page 3: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift dramatically reduces I/O

• Data compression

• Zone maps

• Direct-attached storage

• With row storage you do unnecessary I/O

• To get total amount, you have to read everything

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

Page 4: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

• With column storage, you only read the data you need

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

Amazon Redshift dramatically reduces I/O

• Data compression

• Zone maps

• Direct-attached storage

Page 5: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw

Slides not intended for redistribution.

Amazon Redshift dramatically reduces I/O

• Data compression

• Zone maps

• Direct-attached storage

• COPY compresses automatically on load

• You can analyze and override

• More performance, less cost

Page 6: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift dramatically reduces I/O

• Data compression

• Zone maps

• Direct-attached storage

10 | 13 | 14 | 26 |…

… | 100 | 245 | 324

375 | 393 | 417…

… 512 | 549 | 623

637 | 712 | 809 …

… | 834 | 921 | 959

10

324

375

623

637

959

• Track the minimum and maximum value for each block

• Skip over blocks that don’t contain relevant data

Page 7: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift dramatically reduces I/O

• Data compression

• Zone maps

• Direct-attached storage

DW.HS1.8XL:

• > 2 GB/s scan rate

• Optimized for data processing

• High disk density

DW.HS1.XL:

Page 8: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift architecture

• Leader Node – SQL endpoint – Stores metadata – Coordinates query execution

• Compute Nodes

– Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB

• Single node version available

10 GigE (HPC)

Ingestion Backup Restore

JDBC/ODBC

Page 9: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift parallelizes and distributes everything

• Load

• Backup/Restore

• Resize

Page 10: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

• Load in parallel from Amazon S3 or Amazon DynamoDB

• Data automatically distributed and sorted according to DDL

• Scales linearly with number of nodes

Amazon Redshift parallelizes and distributes everything

• Load

• Backup/Restore

• Resize

Page 11: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

• Backups to Amazon S3 are automatic, continuous and incremental

• Configurable system snapshot retention period

• Take user snapshots on-demand

• Streaming restores enable you to resume querying faster

Amazon Redshift parallelizes and distributes everything

• Load

• Backup/Restore

• Resize

Page 12: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

• Resize while remaining online

• Provision a new cluster in the background

• Copy data in parallel from node to node

• Only charged for source cluster

Amazon Redshift parallelizes and distributes everything

• Load

• Backup/Restore

• Resize

Page 13: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

• Automatic SQL endpoint switchover via DNS

• Decommission the source cluster

• Simple operation via Console or API

Amazon Redshift parallelizes and distributes everything

• Load

• Backup/Restore

• Resize

Page 14: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift lets you start small and grow big

Extra Large Node (DW.HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (DW.HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB)

Note: Nodes not to scale

Page 15: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift is priced to let you analyze all your data

Price Per Hour for HS1.XL Single Node

Effective Hourly Price per TB

Effective Annual Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go

Page 16: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift has security built in • SSL to secure data in transit

• Encryption to secure data at rest

– AES-256; hardware accelerated – All blocks on disk and in Amazon S3

encrypted

• No direct access to compute nodes

• Amazon VPC support

10 GigE (HPC)

Ingestion Backup Restore

Customer VPC

Internal VPC

JDBC/ODBC

Page 17: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift automatically manages data replication and hardware failures

• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times

• Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability

• Continuous monitoring and automated recovery from failures of drives and

nodes

• Able to restore snapshots to any Availability Zone within a region

Page 18: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Growing ecosystem

Page 19: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

AWS Marketplace • Find software to use with

Amazon Redshift

• One-click deployments

• Flexible pricing options

http://aws.amazon.com/marketplace

Page 20: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Over 40 new features since launch on Feb 14 • Regions

– N. Virginia, Oregon, Dublin, Tokyo, Singapore, Sydney

• Certifications – PCI, SOC 1/2/3

• Security

– Load/unload encrypted files, Resource-level IAM, Temporary credentials

• Manageability – Snapshot sharing, backup/restore progress indicators

• Query

– Regex, Cursors, MD5, SHA1, Time zone, workload queue timeout

• Ingestion – S3 Manifest, LZOP/LZO, JSON built-ins, UTF-8 4byte, invalid character substitution, CSV, auto datetime format

detection, epoch

Page 21: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Redshift – What’s Next

Page 22: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Security, visibility and control

• Audit logging

• SNS Alerts

Redshift

Page 23: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Visibility and control

• Audit logging

• SNS Alerts

Amazon S3

Amazon Redshift

Database Activity Logins, Login failures,

Queries, Loads

System Activity Creates, Changes,

Deletes, Resizes

AWS CloudTrail

Page 24: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Visibility and control

• Audit logging

• SNS Alerts

Amazon Redshift

SNS Topic

Monitoring Security

Maintenance Errors

Page 25: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Batch operations

• Cluster Creation

• Faster Resize Amazon

Redshift

Amazon S3

Amazon EMR

Amazon EC2

Corporate Data Center

Page 26: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Batch operations

• Cluster Creation

• Faster Resize Amazon

Redshift

Amazon S3

Amazon EMR

Amazon EC2

Corporate Data Center

Page 27: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Batch operations

• Cluster Creation

• Faster Resize

15-20 min

3 min

Page 28: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Batch operations

• Cluster Creation

• Faster Resize

29 hours

7 hours

Page 29: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Performance & Concurrency

Page 30: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Performance & Concurrency

692.8s

34.9s

< 2%

Page 31: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Performance & Concurrency

5,951.7s

2,151.9s

Page 32: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Performance & Concurrency

15

50

Page 33: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Service Launch (2/14)

PDX (4/2)

Temp Credentials (4/11)

Unload Encrypted Files

DUB (4/25)

NRT (6/5)

JDBC Fetch Size (6/27)

Unload logs (7/5)

4 byte UTF-8 (7/18)

Statement Timeout (7/22)

SHA1 Builtin (7/15)

Timezone, Epoch, Autoformat (7/25)

WLM Timeout/Wildcards (8/1)

CRC32 Builtin, CSV, Restore Progress (8/9)

UTF-8 Substitution (8/29)

JSON, Regex, Cursors (9/10)

Split_part, Audit tables (10/3)

SIN/SYD (10/8)

HSM Support (11/11)

EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail,

Concurrency, Resize Perf., Approximate Count Distinct, SNS

Alerts, WLM Memory Management (11/13)

SOC1/2/3 (5/8)

Sharing snapshots (7/18)

Resource Level IAM (8/9)

PCI (8/22)

Feature Delivery

6 weeks left

Page 34: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Redshift Customers at re:Invent BDT 101: Big Data ‘State of the Union’

Earlier today

DAT 305: Getting Maximum Performance from Amazon Redshift Wednesday 11/13: 3pm in Murano 3303

Page 35: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Redshift Customers at re:Invent DAT 306: How Amazon.com is Leveraging Amazon Redshift

Thursday 11/14: 3pm in Murano 3303

DAT 205: Amazon Redshift in Action: Enterprise, Big Data, SaaS Friday 11/15: 9am in Lido 3006

Page 36: Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

DAT 103