amsterdam - amazon web services€¦ · amazon s3 / dynamodb / ssh jdbc/odbc 128gb ram 16tb disk...

42
AMSTERDAM ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Upload: others

Post on 22-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

AMSTERDAM

©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved

Page 2: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved ©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved

Building Your Data Warehouse with Amazon Redshift

Alex Sinner, AWS (@alexsinner)

Guest Speaker: Bartosz Kuzmicki, CTO and co-founder, FLXone ([email protected])

Page 3: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Data Warehouse - Challenges

Cost

Complexity

Performance

Rigidity

1990   2000   2010   2020  

Enterprise  Data   Data  in  Warehouse  

Page 4: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Petabyte scale; massively parallel

Relational data warehouse

Fully managed; zero admin

SSD & HDD platforms

As low as $1,000/TB/Year

Amazon Redshift

Page 5: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Redshift powers Clickstream Analytics for Amazon.com

Web log analysis for Amazon.com Over one petabyte workload Largest table: 400TB 2TB of data per day

Understand customer behavior Who is browsing but not buying Which products / features are winners What sequence led to higher customer conversion

Solution Best scale out solution – query across 1 week Hadoop – query across 1 month

Page 6: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Redshift Performance Realized

Performance Scan 15 months of data: 14 minutes

2.25 trillion rows Load one day worth of data: 10 minutes

5 billion rows Backfill one month of data: 9.75 hours

150 billion rows Pig ! Amazon Redshift: 2 days to 1 hr

10B row join with 700M rows Oracle ! Amazon Redshift: 90 hours to 8 hrs

Reduced number of SQLs by a factor of 3

Cost 2PB cluster

100 node ds2.8xl (3yr RI) $180/hr

Complexity

20% time of one DBA Backup Restore Resizing

Page 7: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

0   4   8   12   16   20   24   28   32  

128    Nodes  

16    Nodes  

 2    Nodes  

Dura%on  (Minutes)  

Time  to  Deploy  and  Manage  a  Cluster  

Time  spent    on  clicks  

Deploy     Connect   Backup   Restore   Resize    (2  to  16  nodes)  

Simplicity

Page 8: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Who uses Amazon Redshift?

Page 9: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Common Customer Use Cases

•  Reduce costs by extending DW rather than adding HW

•  Migrate completely from existing DW systems

•  Respond faster to business

•  Improve performance by an order of magnitude

•  Make more data available for analysis

•  Access business data via standard reporting tools

•  Add analytic functionality to applications

•  Scale DW capacity as demand grows

•  Reduce HW & SW costs by an order of magnitude

Traditional Enterprise DW Companies with Big Data SaaS Companies

Page 10: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Selected Amazon Redshift Customers

Page 11: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift Partners

Page 12: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift Architecture •  Leader Node

–  SQL endpoint –  Stores metadata –  Coordinates query execution

•  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via

Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH

•  Two hardware platforms

–  Optimized for data processing –  DS2: HDD; scale from 2TB to 2PB –  DC1: SSD; scale from 160GB to 326TB

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3 / DynamoDB / SSH

JDBC/ODBC

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

Leader Node

Page 13: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift dramatically reduces I/O

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

ID   Age   State   Amount  

123   20   CA   500  

345   25   WA   250  

678   40   FL   125  

957   37   WA   375  

Page 14: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift dramatically reduces I/O

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

ID   Age   State   Amount  

123   20   CA   500  

345   25   WA   250  

678   40   FL   125  

957   37   WA   375  

Page 15: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw

Page 16: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Direct-attached storage

•  Large data block sizes

•  Track of the minimum and maximum value for each block

•  Skip over blocks that don’t contain the data needed for a given query

•  Minimize unnecessary I/O

Page 17: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

•  Use direct-attached storage to maximize throughput

•  Hardware optimized for high performance data processing

•  Large block sizes to make the most of each read

•  Amazon Redshift manages durability for you

Page 18: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift has security built-in

•  SSL to secure data in transit

•  Encryption to secure data at rest –  AES-256; hardware accelerated –  All blocks on disks and in Amazon S3 encrypted –  KMS & HSM Support

•  No direct access to compute nodes

•  Audit logging & AWS CloudTrail integration

•  Amazon VPC support

•  SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

128GB RAM

16TB disk

16 cores

Amazon S3 / Amazon DynamoDB

Customer VPC

Internal VPC

JDBC/ODBC

Leader Node

Compute Node

Compute Node

Compute Node

Page 19: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift is 1/10th the Price of a Traditional Data Warehouse

DS2 (HDD) Price Per Hour for DS2.XL Single Node

Effective Annual Price per TB

On-Demand $ 0.850 $ 3,723

1 Year Reserved Instance $ 0.215 $ 2,192

3 Year Reserved Instance $ 0.114 $ 999

DC1 (SSD) Price Per Hour for DC1.L Single Node

Effective Annual Price per TB

On-Demand $ 0.250 $ 13,688

1 Year Reserved Instance $ 0.075 $ 8,794

3 Year Reserved Instance $ 0.050 $ 5,498

Page 20: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Expanding Amazon Redshift’s Functionality

Page 21: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

New Dense Storage Instance DS2, based on EC2’s D2, has twice the memory and CPU as DW1

Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate your RIs

•  Twice the memory and compute power of DW1

•  Enhanced networking and 1.5X gain in disk throughput

•  40% to 60% performance gain over DW1

•  Available in the two node types: XL (2TB) and 8XL (16TB)

Page 22: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Custom ODBC and JDBC Drivers

•  Up to 35% higher performance than open source drivers

•  Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau

•  Will continue to support PostgreSQL open source drivers

•  Download drivers from console

Page 23: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Explain Plan Visualization

Page 24: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

User Defined Functions

•  We’re enabling User Defined Functions (UDFs) so you can add your own

–  Scalar and Aggregate Functions supported

•  You’ll be able to write UDFs using Python 2.7 –  Syntax is largely identical to PostgreSQL UDF Syntax –  System and network calls within UDFs are prohibited

•  Comes with Pandas, NumPy, and SciPy pre-

installed –  You’ll also be able import your own libraries for even more

flexibility

Page 25: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Scalar UDF example – URL parsing

CREATE FUNCTION f_hostname (VARCHAR url)

  RETURNS varchar

IMMUTABLE AS $$

  import urlparse

  return urlparse.urlparse(url).hostname

$$ LANGUAGE plpythonu;

Page 26: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Interleaved Multi Column Sort

•  Currently support Compound Sort Keys –  Optimized for applications that filter data by one leading

column

•  Adding support for Interleaved Sort Keys –  Optimized for filtering data by up to eight columns

–  No storage overhead unlike an index

–  Lower maintenance penalty compared to indexes

Page 27: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Compound Sort Keys Illustrated

Records in Redshift are stored in blocks. For this illustration, let’s assume that four records fill a block Records with a given cust_id are all in one block However, records with a given prod_id are spread across four blocks

1 1

1 1

2

3

4

1 4

4 4

2

3

4

4

1 3

3 3

2

3

4

3

1 2

2 2

2

3

4

2

1

1  [1,1] [1,2] [1,3] [1,4]

2  [2,1] [2,2] [2,3] [2,4]

3  [3,1] [3,2] [3,3] [3,4]

4  [4,1] [4,2] [4,3] [4,4]

1 2 3 4 prod_id

cust_id

cust_id prod_id other columns blocks

Page 28: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

1  [1,1] [1,2] [1,3] [1,4]

2  [2,1] [2,2] [2,3] [2,4]

3  [3,1] [3,2] [3,3] [3,4]

4  [4,1] [4,2] [4,3] [4,4]

1 2 3 4 prod_id

cust_id

Interleaved Sort Keys Illustrated

Records with a given cust_id are spread across two blocks Records with a given prod_id are also spread across two blocks

Data is sorted in equal measures for both keys

1 1

2 2

2

1

2

3 3

4 4

4

3

4

3

1 3

4 4

2

1

2

3

3 1

2 2

4

3

4

1

1

cust_id prod_id other columns blocks

Page 29: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

How to use the feature

•  New keyword ‘INTERLEAVED’ when defining sort keys –  Existing syntax will still work and behavior is unchanged

–  You can choose up to 8 columns to include and can query with any or all of them

•  No change needed to queries

•  Benefits are significant

[ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]

Page 30: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Amazon Redshift

Spend time with your data, not your database….

Page 31: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Understand and target your customers across all channels

Bartosz Kuzmicki Co-Founder / CTO

Page 32: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

FLXONE INTRODUCTION

FLXone Data Management Platform for marketing and advertising

•  Founded in 2012

•  Based in Eindhoven

•  International presence

Page 33: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

FLXONE DATA MANAGEMENT PLATFORM Unify and activate data

Advertising Content Email In-store Mobile

Page 34: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

CHALLENGE

•  Complex, disparate data sources

•  Terabytes of data that need to be unified

•  Only recent data is valuable, queries need to finish within minutes

Page 35: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

REDSHIFT

•  Started using Redshift in 2013 •  Up and running Redshift cluster in

one week

•  One-click scalability and upgradability

•  No server maintenance

•  Automatic backups

Page 36: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

SOLUTION

Amazon EC2

FLXone Data collection

Amazon S3

Long term storage

Amazon Redshift

Process and unify

Amazon EC2

Connect to external platforms

Onsite RTB Programmatic Mobile Apps CRM Email Search Contextual Social Offline

Audiences Lookalikes Insights Data export

OUTPUT INPUT

Page 37: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

REDSHIFT

SSD

All SSD instances (40 x dc1.large)

5TB

5TB compressed data

4K

4000 complex queries every day

30”

Audience creation within 30 seconds

Page 38: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

LESSONS LEARNED

Use query queues

Audience creation queue Reports queue

Resource allocation

Page 39: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

LESSONS LEARNED

dc1.large

2x more cost efficient than dc1.8xlarge,

5x more cost efficient than ds1.xlarge

Test all the instances for your use case

Page 40: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

SUMMARY

Redshift allows us to focus on what makes our product unique

Redshift is easy to test, change and tune to our use case

Audience creation decreased from 3 minutes to 30 seconds

SSD

Page 41: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

Thank you

Bartosz Kuzmicki Co-Founder / CTO

[email protected] linkedin.com/in/bartoszkuzmicki

Page 42: AMSTERDAM - Amazon Web Services€¦ · Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk Compute 16 cores Node 128GB RAM 16TB disk

AMSTERDAM