Transcript
Page 1: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AWS Summit 2013 Tel Aviv Oct 16 – Tel Aviv, Israel

Guy Ernest

Solutions Architecture, Amazon Web Services

Data Warehouse on AWS

Page 2: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DATAWAREHOUSE

ERP

ANALYST CRM

DB

Page 3: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DATAWAREHOUSE

ERP

ANALYST CRM

DB

OLTP

OLTP

OLTP

OLAP

Page 4: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Transactional Processing Analytical Processing

Transactional context Global context

Latency Throughput

Indexed access Full table scans

Random IO Sequential IO

Disk seek times Disk transfer rate

Page 5: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

OLTP

OLAP

Page 6: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DATAWAREHOUSE ANALYST

BUSINESS INTELLIGENCE REPORTS, DASHBOARD, …

PRODUCTION OFFLOAD DIFFERENT DATA STRUCTURE, USING ETLs, …

Page 7: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 8: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 9: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

BIG ENTREPRISES

VERY EXPENSIVE (ROI)

DIFFICULT TO MAINTAIN

NOT SCALABLE

Page 10: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

BIG ENTREPRISES SME

WAY TOO EXPENSIVE !

VERY EXPENSIVE (ROI)

DIFFICULT TO MAINTAIN

NOT SCALABLE

Page 11: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Jeff Bezos

Page 12: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Data Sources

Queries

Value

Page 13: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

+ ELASTIC CAPACITY + NO CAPEX + PAY FOR WHAT YOU USE + DISPOSE ON DEMAND

= NO CONTRAINTS

Page 14: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

COLLECT STORE ANALYZE SHARE

ACCELERATION

AMAZON REDSHIFT

Page 15: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AMAZON REDSHIFT

Page 16: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DWH that scales to petabyte and…

AMAZON REDSHIFT

… WAY LESS EXPENSIVE

… WAY FASTER

…WAY SIMPLER

Page 17: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AMAZON REDSHIFT RUNNING ON OPTIMIZED HARDWARE

HS1.8XL: 128 GB RAM, 16 Cores, 16 TB Compressed Data, 2 GB/sec Disk Scan

HS1.XL: 16 GB RAM, 2 Cores, 2 TB Compressed Data

Page 18: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Extra Large Node

(HS1.XL)

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (HS1.8XL) Cluster 2-100 Nodes (32 TB – 1.6 PB)

Page 19: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

10 GigE (HPC)

Ingestion Backup

Restoration

JDBC/ODBC

Page 20: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 21: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

…WAY SIMPLER

Page 22: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

LOADING DATA

Parallel Loading Data sorted and distributed automatically Linear Growth

Page 23: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DATA SNAPSHOTS

Automatic and Incremental snapshots in Amazon S3 Configurable Retention Period Manual Snapshots “Streaming” Restore

Page 24: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

REPLICATION IN CLUSTER +

AUTOMATIC SNAPSHOT IN AMAZON S3 +

MONITORING OF CLUSTER NODES

Page 25: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AUTOMATIC RESIZING

Page 26: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Read-only mode while resizing

New cluster is created in the

background

Parallel node-to-node data copy

Only charged for a single cluster

Page 27: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Automatic DNS based endpoint cut-over

Deletion of source cluster

Page 28: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 29: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

CREATE A DATAWAREHOUSE IN MINUTES

Page 30: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 31: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 32: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 33: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 34: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 35: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 36: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 37: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

…WAY FASTER

Page 38: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

MEMORY CAPACITY AND CPU ERFORMANCE DOUBLE EVERY 2 YEARS

DISK PERFORMANCE

DOUBLE EVERY 10 YEARS

Page 39: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Progress is not evenly distributed

1980 Today

14,000,000$/TB 100MB 4MB/s

30$/TB 3TB

200MB/s 30,000 X

50 X

450,000 ÷

Page 40: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

I/O IS THE MAIN FACTOR FOR PERFORMANCE

Page 41: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

• COLUMNAR STORAGE

• COMPRESSION PER COLUMN

• ZONE MAPS

• HARDWARE OPTIMIZE

• LARGE DATA BLOCK SIZE

Id Age State 123 20 CA 345 25 WA 678 40 FL

Page 42: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 43: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 44: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 45: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 46: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

TEST:

2 BILLION RECORDS

6 REPRESENTATIVE REQUETS

Page 47: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AMAZON REDSHIFT 2xHS1.8XL

Vs.

32 NODES, 4.2TB RAM, 1.6PB

Page 48: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

12x - 150x FASTER

Page 49: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 50: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

30 MINUTES

12 SECONDES

Page 51: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

…WAY LESS EXPENSIVE

Page 52: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

2x HS1.8XL 3.65$ / HOUR

32 000$ / YEAR

Page 53: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Instance HS1.XL per hour

Hourly Price per TB Yearly Price per TB

On-Demand 0.850 $ 0.425 $ 3 723 $

1 Year Reservation

0.500 $ 0.250 $ 2 190 $

3 Years Reservation

0.228 $ 0.114 $ 999 $

Page 54: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 55: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Intel Analytics on AWS

Assaf Araki

October, 2013

Page 56: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Agenda

• Advanced Analytics @ Intel

• Enterprise on the Cloud

• Use Case

Page 57: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Advanced Analytics

• Vision: Make analytics a competitive advantage for Intel

• Mission:

• Solve strategic high value business line problems

• Leverage analytics to grow Intel revenue

• About the team:

• ~100 employees - corporate ownership of advanced analytics

• Big data and Machine Learning are key focus areas

• Skills: Software Engineering / Decision Science / Business Acumen

• Value driven – ROI>$10M and/or key corporate problem as defined by VPs

• Part of the Israel Academy Computational research center

Intel AA Team

Page 58: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Big Data Analytics Platform

• Highly scalable, hybrid platform to support a range of business use cases

MPP High Speed Data Loader

Rich advanced analytics and real-

time, in-database data mining

capabilities

Heterogeneous data, batch oriented

on advanced analytics

Prediction Module

AA Overview

Page 59: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Why Cloud ?

• Known reasons

– Reduce cost

– Universal access

– Scale fast

• Additional reasons

– Flexible & Agile platform – no need to certify each tool by

engineering team

– Development accelerator – R&D team can start develop while

engineering teams implement the platform on premise

Enterprise On the Cloud

Page 60: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Use Case

• Characteristics:

– CPU behavior data

– Size: 30TB of data per month

– Type: Structured data

– Processing:

• Create aggregation facts and grant ad hoc analysis

• Create ML solutions

• Current Status:

– Data is sampled and processed on SMP RDBMS

– Takes almost 24 hours to process the entire data

• Problem Statement

– Limited ability analyze all data

Use Case

Page 61: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Platforms

• On premise

– Hbase – Hadoop platform exists

• No Hbase

– MPP DB – Exists with Machine Learning capabilities

• Lower cost platform evaluate and purchase

• Cloud

– HBase - EMR

– MPP DB - AWS Redshift

Enterprise On the Cloud

Go for POC on the Cloud

Page 62: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Evaluation Criteria

• Capabilities

– Create statistics calculations

• Cost of HW per TB

– Replication

– Compression

• Performance

– Load, transformation, querying

• Scalability

• Ability to execute

Enterprise On the Cloud

Page 63: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Preliminary Results • Dataset example

– 34GB compressed data divided to files

– ~1,500,000,000 records

– 24B compressed, 240B per record ( ~15 columns )

• Performance & Scalability - 8 x 1XL nodes

– Load time – for 32 files – 2 hours ( 4 files – 5 hours )

– Table size – 202GB (compression rate ~1.5:1)

– SQL aggregation statements

• 38K records – 6 minutes

• 14M records – 7 minutes

• 66M records – 11 minutes ( on 4 x 1XL – 22 minutes )

• 939M records – 34 minutes ( on 4 x 1XL – 77 minutes )

Use Case

Page 64: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Capabilities and Cost

• No current ability to write code (Java/C++/Python/R)

– Implement statistics and algorithm in SQL

• Compression is not strait forward

• Cost sensitive for actual compression

– 2.6 : 1 is break even

• 8XL vs. High Storage instance (16 cores 48TB)

• 3 years with 100% utilization

Use Case

Page 65: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

[email protected]

Page 66: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Intel Confidential

Thank You!

Page 67: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

USE CASE

Page 68: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

AMAZON ELASTIC

MAPREDUCE

AMAZON

DYNAMODB

AMAZON EC2

AWS STORAGE GATEWAY

AMAZON S3

DATA CENTER

AMAZON RDS

AMAZON REDSHIFT

Page 69: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

UPLOAD TO AMAZON S3

AWS IMPORT/EXPORT

AWS DIRECT CONNECT

DATA

INTEGRATION

INTEGRATION

SYSTEMS

Page 70: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 71: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 72: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

2 million

15 million

MEMBRES REGISTRATION

2011 2012 2013

Page 73: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

1,500,000+ NEW MEMBRES EACH MONTH

Page 74: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

1,200,000,000+ SOCIAL CONNECTIONS IMPORTED

Page 75: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Data Analyst

Raw Data

Get Data

Join via Facebook

Add a Skill Page

Invite Friends

Web Servers Amazon S3 User Action Trace Events

EMR Hive Scripts Process Content

• Process log files with regular expressions to parse out the info we need.

• Processes cookies into useful searchable data such as Session, UserId, API Security token.

• Filters surplus info like internal varnish logging.

Amazon S3

Aggregated Data

Raw Events

Internal Web

Excel Tableau

Amazon Redshift

Page 76: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

ELASTIC DATA WAREHOUSE

Page 77: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 78: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 79: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 80: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 81: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 82: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 83: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Monthly Reports on a new cluster

Page 84: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Redshift Reporting

and BI EMR

S3

Page 85: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DynamoDB Redshift

OLTP Web Apps

Reporting and BI

Page 86: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

RDBMS Redshift

OLTP ERP

Reporting & BI

Page 87: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

+

RDBMS Redshift

OLTP ERP

Reporting & BI

Page 88: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

JDBC/ODBC

Amazon Redshift

Page 89: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 90: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 91: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 92: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 93: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 94: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Page 95: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

DATAWAREHOUSE BY AWS

Pay per use, no CAPEX

Low cost for high performances

Open and integrate with existing BI tools

Simple to use and scalable

Page 96: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Speed and Agility

Frequent Experiments

Low Cost of Failure

More Innovation

Fewer Experiments

High Cost of Failures

Less Innovation

“On Premise”

Page 97: AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

תודה רבה


Top Related