aws webcast - introducing amazon redshift

Post on 15-Jan-2015

1.369 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

This webinar is aimed at older portfolio companies who may have started when AWS wasn't as strong as it is today. Redshift is a great way to to use the cloud and bring data to the cloud where other cloud services (EMR) can consume it.

TRANSCRIPT

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Introducing Amazon Redshift

Amazon’s Data Warehouse as a Service

Ben Butler, Solutions Architect

Worldwide Public Sector

butlerb@amazon.com

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon Web Services?

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

Database Storage Compute

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon Web Services?

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

Storage Compute Database

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Database Services Fully managed SQL database service for OLTP

workloads

Fully managed NoSQL service for massively

scalable, high throughput, low latency workloads

Fully managed, fast and powerful, petabyte-scale

data warehouse service

Fully managed Memcached-compliant in memory caching service

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Database Services Fully managed SQL database service for OLTP

workloads

Fully managed NoSQL service for massively

scalable, high throughput, low latency workloads

Fully managed, fast and powerful, petabyte-scale

data warehouse service

Fully managed Memcached-compliant in memory caching service

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Traditional data warehousing is expensive and

complicated

Expensive Hardware and Software

Complex Tuning and Admin

Enterprises average between 3

and 4 DBAs per data

warehouse

Source: Oracle technology global price list 11/1/2012

Gartner: Critical factors in calculating the data warehouse TCO, July 2009

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Customers Aren’t Happy with Today’s Solutions

Large Companies Small Companies

Expensive

Hard to scale

Can’t afford to have a data warehouse

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data warehousing done the AWS way

• Pay as you go, no up front costs

• Fast, cheap, easy to use

• SQL

• Provision in minutes

Introducing Amazon Redshift

Data Warehousing the AWS Way

Easily and rapidly analyze

petabytes of data

1/10 the cost of traditional data

warehouses

Automated deployment &

administration

Compatible with popular BI tools

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Most data never makes it to a data warehouse

1990 2000 2010 2020

The Data Analysis Gap

Enterprise Data

Data in Warehouse

Enterprise Data is growing at over 50% yearly

Data Warehousing growing at less than 10% yearly

Most data is left on the floor

Sources:

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

We set out to build… A fast and powerful, petabyte-scale data warehouse that is:

A Lot Faster

A Lot Cheaper

A Lot Simpler

Amazon Redshift

Delivered as a

Managed Service

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data warehousing performance is all about IO

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O

Data compression

Zone maps

Direct-attached storage

Large data block sizes

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

• With row storage you do

unnecessary I/O

• To get total amount, you have to

read everything

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O

Data compression

Zone maps

Direct-attached storage

Large data block sizes

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

• With column storage, you only

read the data you need

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

• Columnar compression saves

space & reduces I/O

• Amazon Redshift analyzes and

compresses your data

analyze compression listing;

Table | Column | Encoding

---------+----------------+----------

listing | listid | delta

listing | sellerid | delta32k

listing | eventid | delta32k

listing | dateid | bytedict

listing | numtickets | bytedict

listing | priceperticket | delta32k

listing | totalprice | mostly32

listing | listtime | raw

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Direct-attached storage

Large data block sizes

• Keep track of the minimum and

maximum value for each block

• Skip over blocks that don’t

contain the data needed for a

given query

• Minimize unnecessary I/O

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

• Use direct-attached storage to

maximize throughput

• Hardware optimized for high

performance data processing

• Large block sizes to make the

most of each read

• Amazon Redshift manages

durability for you

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift architecture Leader Node

• SQL endpoint

• Stores metadata

• Coordinates query execution

Compute Nodes • Local, columnar storage

• Execute queries in parallel

• Load, backup, restore via Amazon S3

• Parallel load from Amazon DynamoDB

Single node version available

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate

HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage

Optimized for I/O intensive workloads

High disk density

Runs in HPC - fast network

HS1.8XL available on Amazon EC2

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift parallelizes and distributes everything

Query

Load

Backup/Restore

Resize

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Load in parallel from Amazon S3 or Amazon DynamoDB

• Data automatically distributed and sorted

• Scales linearly with number of nodes

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Backups to Amazon S3 are automatic, continuous and incremental

• Configurable system snapshot retention period

• Take user snapshots on-demand

• Streaming restores enable you to resume querying faster

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Resize while remaining online

• Provision a new cluster in the

background

• Copy data in parallel from node to

node

• Only charged for source cluster

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query

Load

Backup/Restore

Resize

• Automatic SQL endpoint switchover

via DNS

• Decommission the source cluster

• Simple operation via AWS Console or

API

Amazon Redshift parallelizes and distributes everything

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift lets you start small and grow big

Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Note: Nodes not to scale

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift is priced to let you analyze all your data

Price Per Hour for

HS1.XL Single Node

Effective Hourly Price

Per TB

Effective Annual Price

per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year

Reservation

$ 0.500 $ 0.250 $ 2,190

3 Year

Reservation

$ 0.228 $ 0.114 $ 999

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift is easy to use Provision in minutes

Monitor query performance

Point and click resize

Built in security

Automatic backups

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Provision a data warehouse in minutes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Monitor query performance

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Deep dive analysis

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Point and click resize

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift has security built-in

SSL to secure data in transit

Encryption to secure data at rest • AES-256; hardware accelerated

• All blocks on disks and in Amazon S3 encrypted

No direct access to compute nodes

Amazon VPC support

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal

VPC

JDBC/ODBC

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift continuously backs up your data and

recovers from failures

Replication within the cluster and backup to Amazon S3 to maintain multiple copies of

data at all times

Backups to Amazon S3 are continuous, automatic, and incremental

• Designed for eleven nines of durability

Continuous monitoring and automated recovery from failures of drives and nodes

Able to restore snapshots to any Availability Zone within a region

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift integrates with multiple data sources

Amazon

DynamoDB

Amazon Elastic

MapReduce

Amazon Simple

Storage Service (S3)

Amazon Elastic

Compute Cloud

(EC2)

AWS Storage

Gateway

Service

Corporate

Data Center

Amazon Relational

Database Service

(RDS)

Amazon

Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift provides multiple data loading options

Upload to Amazon S3

AWS Import/Export

AWS Direct Connect

Work with a partner

Data Integration

Systems Integrators

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift works with your existing analysis tools

JDBC/ODBC

Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Pilot results have been dramatic

Tested 2 Billion row data set, 6

representative queries on a 2-

node Amazon Redshift cluster

Queries ran between 12x and

150x faster

Current environment:

32 nodes, 128 CPUs, 4.2TB

RAM, 1.6 PB disk

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Reporting Warehouse

Accelerated operational reporting

Support for short-time use cases

Data compression, index redundancy

RDBMS Redshift

OLTP

ERP Reporting

and BI

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Integration Partners*

On-Premises Integration

RDBMS Redshift

OLTP

ERP Reporting

and BI

* as of 3/14/2013

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Live Archive for (Structured) Big Data

Direct integration with copy command

High velocity data ages into Redshift

Low cost, high scale option for new apps

DynamoDB Redshift

OLTP

Web Apps Reporting

and BI

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Cloud ETL for Big Data

Maintain online SQL access to historical logs

Transformation and enrichment with EMR

Longer history ensures better insight

Redshift Reporting

and BI EMR

S3

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Resources & Questions

Ben Butler | butlerb@amazon.com

RedShift on AWS - http://aws.amazon.com/redshift

Marketplace - https://aws.amazon.com/marketplace/redshift/

Documentation/User Guide - http://aws.amazon.com/documentation/redshift/

Best Practices

• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html

• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Introducing Amazon Redshift

Amazon’s Data Warehouse as a Service

http://aws.amazon.com/resources/databaseservices/webinars

Ben Butler, Solutions Architect

Worldwide Public Sector

butlerb@amazon.com

top related