automating big data technologies for faster time-to-value

36
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 1, 2017 | 11:00 AM PT Automating Big Data Technologies for Faster Time- to-Value © 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Upload: amazon-web-services

Post on 21-Jan-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 1, 2017 | 11:00 AM PT

Automating Big Data

Technologies for Faster Time-

to-Value

© 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 2: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s PresentersDavid Potes, Solutions Architect, Amazon Web Services

Minesh Patel, Technical Director, Qubole

Seth Myers, Senior Data Scientist, Demandbase

Page 3: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s Agenda1. An overview of AWS and AWS Marketplace, with an emphasis on

AWS data lake solutions and Qubole

2. Overview of the Qubole solutions featured in our story

3. Challenges faced by Demandbase

4. The Demandbase success story with AWS and Qubole

5. Q&A/Discussion

Page 4: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Learning Objectives1. How to dramatically reduce management complexities for analytics

operations

2. How to reduce the costs of processing and analyzing data in a data

lake on AWS

3. How to operate at the scale and efficiency of a large enterprise,

with a small data team

Page 5: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Introduction to Data Lake

Concepts

Page 6: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Unlocking Data

Most companies and organizations are embarking on ambitious innovation initiatives to unlock their data.

The data already exists but goes unused or is locked away from complimentary data sets in isolated data silos.

Page 7: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enter Data Lake Architectures

Data Lake is a new and increasingly

popular architecture to store and analyze

massive volumes and heterogeneous

types of data.

Page 8: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – All Data in One Place

Store and analyze all of your data,

from all of your sources, in one

centralized location.

“Why is the data distributed in

many locations? Where is the

single source of truth ?”

Page 9: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Quick Ingest

Quickly ingest data

without needing to force it into a

pre-defined schema.

“How can I collect data quickly

from various sources and store

it efficiently?”

Page 10: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Storage vs Compute

Separating your storage and compute

allows you to scale each component as

required

“How can I scale up with the

volume of data being generated?”

Page 11: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Schema on Read

“Is there a way I can apply multiple

analytics and processing frameworks

to the same data?”

A Data Lake enables ad-hoc

analysis by applying schemas

on read, not write.

Page 12: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Approach to Data Lake

Page 13: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon S3 is the Data Lake

Page 14: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Designed Benefits of an Amazon S3 Data Lake

Fixed Cluster Data Lake Amazon S3 Data Lake

• Limited to only the single tool contained

on the cluster (i.e. Hadoop or data

warehouse or Cassandra, etc.). Use

cases & ecosystem tools change

rapidly

• Expensive to add nodes to add storage

capacity

• Expensive to replicate data against

node loss

• Complexity in scaling local storage

capacity

• Long refresh cycles to add additional

storage equipment

• Decouple storage and compute by

making Amazon S3 object based

storage, not a fixed tool cluster the data

lake

• Flexibility to use any and all tools in the

ecosystem. The right tool for the job

• Future proof your architecture. As new

use cases and new tools emerge you

can plug and play current best of breed.

Page 15: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Amazon S3 for Data Lake?

Designed for 11 9s

of durability

Designed for

99.99% availability

Durable Available High performance Multiple upload

Range GET

Store as much as you need

Scale storage and compute

independently

No minimum usage commitments

Scalable

Amazon EMR

Amazon Redshift

Amazon DynamoDB

Integrated

Simple REST API

AWS SDKs

Read-after-create consistency

Event notification

Lifecycle policies

Easy to use

Page 16: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Automating Complex Tasks

Qubole makes Big Data technologies swift and simple

Page 17: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

About Qubole

One of the largest cloud-

agnostic Big Data as a Service

companies

Founded by the pioneers of “big

data” @ Facebook and the

creators of Apache Hive

Page 18: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Poll Question #1

What is the status of your big data initiative?

Page 19: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Vision

Page 20: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Data Service

Amazon

Page 21: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Autonomous Data Management

Page 22: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Cloud Agents

Page 23: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Total Cost Savings Among Qubole Customers in 2016

and 2017

Cluster Life Cycle

Management$150M

Workload-aware

Autoscaling$121M

Spot Shopper

$40M

Cluster Life Cycle Management

Savings

– Amount saved by automatically

terminating a cluster when inactive

Workload-aware Auto-scaling Saving

– Amount saved by predictively adjusting

the number of nodes to meet demand

Spot Shopper savings

– Amount saved by utilizing SPOT

instances

Page 24: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architectural Diagram

Page 25: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Poll Question #2

What big data technology are you using or evaluating?

Page 26: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Qubole?

Page 27: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demandbase Automates With

QuboleDemandbase provides more value for their B2B marketing customers

by automating Big Data and Machine Learning operations.

Page 28: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who is Demandbase?

Demandbase is a B2B marketing automation company that leverages

artificial intelligence to automate all aspects of the advertising, selling,

and marketing process.

Page 29: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Challenge

• Many factors determine which accounts a business should target

• Do they have a need/budget for the product?

• Are they currently in-market for the product?

• Do they have decision makers ready to buy?

• These insights must come from many different types of big datasets

• Demandbase’s previous account identification tool took multiple days to

run

• Our clients could not iterate or modify their strategies with such slow

turn-around

Page 30: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Data Used to Identify Accounts

• To determine an account’s need for the product

• We have firmographic information on 14 Million accounts

• We’ve built a knowledge graph of all accounts using NLP

technology that crawls 350 TB of web pages a month

• To determine if an account is in-market

• We track 700 Billion web interactions a year, each one mapped

to employees across all accounts

• To identify decision makers

• We are currently tracking over a 100 Million employees across

all accounts

Page 31: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

All 14M accounts are scored,

top 5K available to user

Keywords extracts from 700B

web interactions

Buyers at each account

identified from 100M+ contactsCompany 2

Company 3

Page 32: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Solution

• The user requests a new list of accounts with a button-

press• 60 EC2 servers are spun up

• A machine learning algorithm is built using Spark and MLLIB

• For each of 14 Million accounts

• Information about relevant web interactions, buyers, online content, etc. fed into

machine learning model

• The model scores each account

• Top 5K accounts are pushed to web app, along with

relevant info

• From button-press to new account list – 20 minutes

Page 33: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Makes This Possible

• Qubole manages all of our EC2 instances

• So far, we’ve tested 20 different concurrent models (20 X 60

EC2 servers) successfully

• Qubole keeps our costs down through dynamic bidding and

heterogeneous server clusters

• Our web app calls Qubole’s easy-to-implement Play API, which

spins up the EC2 instances and deploys our Spark job

• With Qubole taking care of the infrastructure, we could focus on

developing the machine learning

• Qubole allowed us to build a self-serve machine-learning-as-service

solution

Page 34: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Next Steps and Further Information

• Try a pre-configured production-ready Qubole deployment on AWS Data Lake:

• https://aws.amazon.com/quickstart/architecture/qubole-on-data-lake-foundation/

• Buy on AWS Marketplace:

• https://aws.amazon.com/marketplace/pp/B06XX76R24

• Learn more about Qubole:

• https://www.qubole.com/products/qds-for-aws/

• Learn more about Demandbase:

• https://www.demandbase.com/technology/

• Try AWS:

• https://aws.amazon.com/

Page 35: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Q & A

Page 36: Automating Big Data Technologies for Faster Time-to-Value

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!