aws partner webcast - analyze big data for consumer applications with looker bi and amazon redshift

42
Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Upload: amazon-web-services

Post on 11-May-2015

1.185 views

Category:

Technology


0 download

DESCRIPTION

Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business intelligence helps analyze and model large amounts of data. Looker offers a modern approach to BI leveraging AWS that’s fast, agile, and easy to manage. Join this webinar to learn how MessageMe, which provides emotionally engaging messaging apps to consumers, leverages Looker business intelligence software and the Amazon Redshift data warehouse service to analyze billions of rows of customer data in seconds. Webinar topics include: • How MessageMe turns billions of rows of customer data stored in Amazon Redshift into actionable insights • How Looker connects directly to Amazon Redshift in just a few clicks, enabling MessageMe to build a modern, big data analytics in the cloud. Who should attend • Information or Solution Architects, Data Analysts, BI Directors, DBAs, Development Leads, Developers, or Technical IT Leaders. Presenters: • Justin Rosenthal, CTO, MessageMe • Keenan Rice, VP, Marketing & Alliances, Looker • Tina Adams, Senior Product Manager, AWS

TRANSCRIPT

Page 1: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Page 2: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Welcome

Maya Cabassi

Partner Marketing Manager

Amazon Web Services

Page 3: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Webinar Overview

Submit Your Questions using the Q&A tool.

A copy of today’s presentation will be made available on:

AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/

AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCT-

nPlVzJI-ccQXlxjSvJmw

Page 4: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Tina Adams Senior Product Manager

Amazon Web Services

Keenan Rice VP, Marketing & Alliances

Looker

Introducing

Justin Rosenthal Chief Technology Officer

MessageMe

Page 5: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Overview of Amazon Redshift data warehouse

How Looker integrates with Amazon Redshift to enable

big data analytics in the cloud

How MessageMe turns application metrics stored in

Amazon Redshift into actionable insights with Looker BI

Q&A

What We’ll Cover

Page 6: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Tina Adams| [email protected]

Senior Product Manager

Page 7: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

We set out to build…

A fast and powerful, petabyte-scale data warehouse that is:

A Lot Faster

A Lot Cheaper

A Lot Simpler Amazon Redshift

Page 8: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Data warehousing done the AWS way

• Easy to provision

• Pay as you go, no up front costs

• Fast, cheap, easy to use

• SQL

Deploy

Page 9: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Common Customer Use Cases

• Reduce costs by extending DW rather than adding HW

• Migrate completely from existing DW systems

• Respond faster to business;

provision in minutes

• Improve performance by an order of magnitude

• Make more data available for analysis

• Access business data via

standard reporting tools

• Add analytic functionality to applications

• Scale DW capacity as demand grows

• Reduce HW & SW costs

by an order of magnitude

Traditional Enterprise DW Companies with Big Data SaaS Companies

Page 10: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Channel

Amazon Redshift Customers

Page 11: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Feature Delivery

Service Launch (2/14)

PDX (4/2)

Temp Credentials (4/11)

Unload Encrypted Files

DUB (4/25)

NRT (6/5)

JDBC Fetch Size (6/27)

Unload logs (7/5)

4 byte UTF-8 (7/18)

Statement Timeout (7/22)

SHA1 Builtin (7/15)

Timezone, Epoch, Autoformat (7/25)

WLM Timeout/Wildcards (8/1)

CRC32 Builtin, CSV, Restore Progress (8/9)

UTF-8 Substitution (8/29)

JSON, Regex, Cursors (9/10)

Split_part, Audit tables (10/3)

SIN/SYD (10/8)

HSM Support (11/11)

Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit

Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count

Distinct, SNS Alerts (11/13)

SOC1/2/3 (5/8)

Sharing snapshots (7/18)

Resource Level IAM (8/9)

PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13)

EIP Support for VPC Clusters (12/28)

Page 12: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Amazon Redshift architecture

• Leader Node

– SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes

– Local, columnar storage

– Execute queries in parallel

– Load, backup, restore via Amazon S3

– Parallel load from Amazon Amazon S3, DynamoDB, EMR/HDFS/SSH

– Kinesis integration

• Hardware optimized for data

processing

• Scale while remaining online from a

single node to a 100 node 1.6 PB cluster

10 GigE

(HPC)

Ingestion Backup Restore

JDBC/ODBC

Page 13: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Amazon Redshift is priced to let you analyze all your data

Effective Hourly

Price (single node)

Effective Hourly

Price Per TB

Effective Annual

Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

Page 14: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Amazon Redshift has security built-in

• SSL to secure data in transit

• Encryption to secure data at rest

– AES-256; hardware accelerated

– All blocks on disks and in Amazon

S3 encrypted

– HSM/CloudHSM

• No direct access to compute

nodes

• Amazon VPC support

• SOC1/2/3, PCI level 1, and others

coming soon

10 GigE (HPC)

Ingestion Backup Restore

Customer VPC

Internal Security Group

JDBC/ODBC

Page 15: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Amazon Redshift integrates with multiple data sources

Amazon EMR

Amazon

DynamoDB

Amazon RDS

Amazon Redshift

Amazon S3

Amazon Kinesis

JDBC

ODBC

Corporate

Datacenter

Page 16: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift
Page 17: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Analytics For Today’s Data-Driven Organizations

17

Keenan Rice, Vice President, Marketing & Alliances

1.28.14

Page 18: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

The New Data Landscape

The Missed Innovation Cycle

The Next Generation

Innovative Customers

MessageMe Intro

18

Page 19: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

New Data Landscape 19

Business Data

Data Analysts Data Modeling

New MPP Databases

Ridiculous Quantities of

Event & Business Data

ETL Data Warehouse

Business Users Limited data discovery

Data Analysts New Breed of Data Experts

Business Users New Curious Generation

Expect Immediate Results

Page 20: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Back to hand-

coding SQL

Missed Innovation Cycle BI is a relic of the old (expensive) data landscape 20

Event & Business

Application Data

Traditional

BI

New MPP databases

BI Software Heavy desktop apps

No reusability

One-time-use queries

No direct data access

Cubes / Simple models

Data Analysts New Breed of Data Experts

Business Users New Curious Generation

Expect Immediate Results

Page 21: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

21

BI Software Web Based App

Data Analysts Flexible Delivery

Agile Modeling

Load

Transform Query

Looker — The Next Generation Modern analytics, built for the new data landscape

Business Users High-Resolution Discovery

Sharing & Collaboration

Page 22: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Business Users High-Resolution Discovery

Sharing & Collaboration

Data Analysts Flexible Delivery

Agile Modeling

Looker Inside 22

BI Software Web Based App

• Near real-time access to your Redshift data

• Exploit the computing power of the

AWS cloud and Redshift

• No need to re-architect or cube data

Load

Transform Query

Page 23: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Copy

Business Users High-Resolution Discovery

Sharing & Collaboration

Looker Intelligence 23

BI Software Web Based App

• Extend the power of your data analysts

• Fold data as complex as necessary

without any database effort

• Use Git for agile team development

Transform Query

Data Analysts Flexible Delivery

Agile Modeling

Page 24: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Copy

Data Analysts Flexible Delivery

Agile Modeling

Looker Everywhere

BI Software Web Based App

• Powerful data discovery for anyone

• Share, save, and collaborate

• Access all the data, in an interactive

web application

24

Business Users High-Resolution Discovery

Sharing & Collaboration

Transform Query

Page 25: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

© 2014 Looker Inc. All Rights Reserved.

A New Perspective Changing the way organizations make decisions

25

Founded in Santa Cruz, California 2012

$18M Redpoint, First Round Capital & Pivot North

1200 Hours per month spent in Looker per customer

50+ Customers changing how they run their businesses

Lloyd Tabb Created first app server

(Netscape), founder

Mozilla.org, LiveOps, etc.

Frank Bien Proven software exec:

Greenplum, EMC

Marc Randolph Founder and first

CEO Netflix

Page 26: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

© 2014 Looker Inc. All Rights Reserved.

Who’s Lookering? Data-driven organizations realizing the power of Looker

26

Page 27: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift
Page 28: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Powering Analytics @ MessageMe

1. Redshift + Looker

2. Example Looker Report & Model

3. MessageMe Data Storage

4. Analytics Strategies

5. DynamoDB → Redshift

Page 29: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Empower your team to answer their own questions.

Redshift + Looker

• What types of Stickers are sent the most?

• How do event/holiday themed-packs perform?

Internal dashboards and Looker link-sharing are commonplace.

Looker makes the data accessible and Redshift makes it fast.

• Which SMS provider is most cost-effective?

Page 30: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Redshift + Looker

Page 31: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Redshift + Looker

Page 32: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Data Storage: Why Redshift?

RDS Config (March 2013)

Master: db.m1.xlarge (15GB)

Slave: db.m1.xlarge (15GB)

Master: db.m1.xlarge (15GB)

Slave: db.m2.4xlarge (68GB)

At Launch:

• DynamoDB for all application data

• MySQL for all statistics data

90% of writes were via LOAD_DATA_INFILE, so write IOPS were not a problem.

However, index sizes were growing quickly…

RDS Config (April 2013)

Page 33: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Data Storage: Why Redshift?

event

Engine InnoDB

Index Width 48 Bytes / Row

Row Count ~3 Billion

Index Size 144 GB

message

Engine InnoDB

Index Width 32 Bytes / Row

Row Count ~2 Billion

Index Size 64 GB

Slave: db.m2.4xlarge (68GB)

MySQL Status (April 2013)

Page 34: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

We could put data in, but we couldn’t get it back out!

Possible Solutions

1. Summarize

• PRO: Data compression

• CON: Data loss

2. Shard

• PRO: No data loss

• CON: Difficult to query

3. Redshift?

Data Storage: Why Redshift?

Page 35: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Data Storage: Current System

Redshift (90%)

• Append-only tables

• Delayed, bulk inserts OK

• Inline inserts

• Non-negotiable uniqueness

requirements (ON DUPLICATE

KEY UPDATE)

MySQL (10%)

Examples:

• `event`

• `message`

• `user_demographic`

Examples:

• `purchase`

• `user_demograhic`

Page 36: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Analytics Strategies w/ Billions of Rows

Deep-dive queries w/ row-level specifics

Super fast top-line metrics, aggregates

vs.

You get this out-of-the-box with Redshift

How do we get these, really fast? 1. Summarization

2. Cached Derived Tables

Page 37: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

How many doodles were sent each day in the US since we launched?

100 seconds vs. 3 seconds

Analytics Strategies: Summarization

message

Columns

`id`

`sender_id`

`recipient_user_id`

`recipient_room_id`

`message_type`

`country`

`os_family`

`os_version`

`app_version`

`timestamp`

Rows / Day 10-100,000,000

sm_message

Columns

`send_hour`

`recipient_type`

`message_type`

`country`

`os_family`

`send_count`

Rows / Day 10-100,000 1,000:1

Compression

Page 38: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Some important queries will be complex and demand row-specific data.

Summarizing is not an option, what to do?

Analytics Strategies: Cached Derived Tables

…build Cached Derived Tables

• Turn long-running, complex queries into flat tables

Page 39: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Analytics Strategies: Cached Derived Tables

sm_retention_day

`join_day`

`nday`

`country`

`os_family`

`os_version`

`traffic_source`

`active_users`

`signups`

SELECT

INTO TABLE `sm_retention_day`

FROM (

SELECT

….

FROM `user`

JOIN `message`

JOIN `user_source`

), (

SELECT

….

FROM `user`

JOIN `user_source`

)

Example: Retention by Cohort

Page 40: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

• Stats tables are homogenous and compact

• Application data can be heterogeneous and heavy

– Mixture of numbers, strings, binary, etc.

DynamoDB → Redshift

How many users signed up this week with a .edu email address?

COPY dynamodb://user

Page 42: AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

We’d like your feedback. Please respond to a short survey.

https://aws.asia.qualtrics.com/SE/?SID=SV_1yUN9wjaZX960kd