aws re:invent 2016: how dataxu scaled its attribution system to handle billions of events per day...

47
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 30, 2016 How DataXu Scaled Its Attribution System to Handle Billions of Events per Day with Amazon DynamoDB Padma Malligarjunan, AWS Yekesa Kosuru, DataXu Rohit Dialani, DataXu

Upload: amazon-web-services

Post on 15-Apr-2017

251 views

Category:

Technology


1 download

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 30, 2016

How DataXu Scaled Its Attribution

System to Handle Billions of Events

per Day with Amazon DynamoDB

Padma Malligarjunan, AWS

Yekesa Kosuru, DataXu

Rohit Dialani, DataXu

Agenda

• Benefits of NoSQL

• Fully managed features of Amazon DynamoDB

• DynamoDB integration with AWS services

• DataXu’s DynamoDB use case

Traditional SQL NoSQL

DB

Primary Secondary

Scale up

DB

SQL (Relational) vs. NoSQL (Non-relational)

Traditional SQL NoSQL

DB

Primary Secondary

Scale up

DB

DB

DBDB

DB DB

Scale out

SQL (Relational) vs. NoSQL (Non-relational)

SQL (Relational)

Price Desc.

$11.50

$8.99Must

watch..

Columns

Rows

Primary Key Index

$14.95

One of 2

major …

The

Sounds..

Product

IDType

1 Book

2 Album

3 Movie

Products

SQL (Relational)

Price Desc.

$11.50

$8.99Must

watch..

Columns

Rows

Primary Key Index

$14.95

One of 2

major …

The

Sounds..

Product

IDType

1

2

3

Title Date

Harry

Potter…2010

Book ID Author

1 JK Ro..

Books

Products

Book

Album

Movie

SQL (Relational)

Price Desc.

$11.50

$8.99Must

watch..

Columns

Rows

Primary Key Index

$14.95

One of 2

major …

The

Sounds..

Product

IDType

1

2

3

Title Date

Harry

Potter…2010

Book ID Author

1 JK Ro..

BooksAlbums

Title

The Fox

Album

IDArtist

2 Ylvis

Products

Book

Album

Movie

SQL (Relational)

Price Desc.

$11.50

$8.99Must

watch..

Columns

Rows

Primary Key Index

$14.95

One of 2

major …

The

Sounds..

Product

IDType

1

2

3

Title Date

Harry

Potter…2010

Book ID Author

1 JK Ro..

BooksAlbums

Title

The Fox

Album

IDArtist

2 Ylvis

Genre Director

ActionZack

Snyder

Movie ID Title

3Batman

vs Super

Movies

Products

Book

Album

Movie

SQL (Relational) vs. NoSQL (Non-relational)

Product

IDType

Harry

Potter..

JK

Rowling1 Book ID

2 Album ID The Fox

3 Movie IDBatman

vs Super

Ylvis

Attributes

Schema is defined per item

Items

Partition Key Sort Key

Price Desc.

$11.50

$8.99Must

watch..

Columns

Rows

Primary Key Index

$14.95

One of 2

major …

The

Sounds..

3Movie ID:

Actor ID

Ben

Affleck

Action

2010

Zack

Snyder

Primary Key

Product

IDType

1

2

3

Title Date

Harry

Potter…2010

Book ID Author

1 JK Ro.. Title

The Fox

Album

IDArtist

2 Ylvis

Genre Director

ActionZack

Snyder

Movie ID Title

3Batman

vs Super

Products Products

Book

Album

Movie

BooksAlbums

Movies

NoSQL design optimizes for

compute instead of storage

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Fully managed

Fast, consistent performance

Highly scalable

Flexible

Event-driven programming

Fine-grained access control

DynamoDB Benefits

Ad Tech Gaming MobileIoT Web

Scaling High-Velocity Use Cases with DynamoDB

Products

Product_Id

Table and Item API

Admin CRUD

Create Table Put/Get Item

Update Table Batch Put/Get Item

Delete Table Update Item

Describe Table Delete Item

Query

Scan

DynamoDB

Streams

ListStreams

DescribeStream

GetShardIterator

GetRecords

Stream of updates to a table

Asynchronous

Exactly once

Strictly ordered

• Per item

Highly durable

• Scale with table

24-hour lifetime

Sub-second latency

DynamoDB Streams

Stream

Table

Partition 1

Partition 2

Partition 3

Partition 4

Partition 5

Table

Shard 1

Shard 2

Shard 3

Shard 4

KCL

Worker

KCL

Worker

KCL

Worker

KCL

Worker

Amazon Kinesis Client

Library application

DynamoDB

client application

Updates

DynamoDB Streams and

Amazon Kinesis Client Library

DynamoDB Streams and AWS Lambda

Triggers

Lambda functionNotify change

Derivative tables

Amazon Elasticsearch

Service

Amazon

ElastiCache

Analytics with

DynamoDB Streams

Collect and de-dupe data in DynamoDB

Aggregate data in memory and flush periodically

Performing real-time aggregation and

analytics

Reference architecture

DataXu’s DynamoDB Use Case

• Who is DataXu

• Attribution Use Case

• Why DynamoDB

• Deployment Architecture

• Capacity & Performance

• Tips & Lessons Learned

DataXu

• Who

• Spun out of MIT Labs

• A petabyte-scale digital

marketing platform

• One of the fastest growing

companies in Inc. 5000

• What

• Help world’s most

valuable brands

understand and engage

with their consumers

• Maximize ROI

Quick Statistics

• 2M+ bid requests per second

• Billions of impressions per

month, petabytes of data

• ~10ms round-trip response time

• 180+ TB logs per day

• 2 PB data analyzed

• 3000+ servers powering the

platform

• 13 regions, 24x7

Real Time Bidding

DataXu Reads and Writes on DynamoDB

X-Axis = Day

Y-Axis = Read/Write Capacity used

X-Axis = Time (6 hour intervals)

Y-Axis = Read/Write Capacity used

Attribution Use Case

Attribution is the science of allocating credit from an activity/sale to

the marketing touchpoints that a customer was exposed to prior to

the purchase/activity.

Attribution

Online

Purchase

Impression ClickImpression

Customer Journey

EI EventImpression A Activity

Generalized Event Chains

AI E I A

Time

• Billions of events and activities are organized into sequences.

• Events are correlated based on time and user to construct paths leading to an

activity.

EI Event

E

Impression A Activity

I E

Why DynamoDB

Why DynamoDB

• Managed Service

• Easy to use

• Elastic scaling, no need to overprovision

• API driven

• Fast & Predictable Performance (millisecs)

• Fast lookup/scan of user events

• Consistent & predictable read/write performance

• TCO

• Reasonable capex and no opex

Deployment Architecture

DataXu Flows

CDN

Real-Time

Bidding

Retargeting

Platform

Streams

(Amazon

Kinesis)

Advanced Analytics

(Third-Party)

Reporting Tools

(Third-Party)Machine

Learning

(Spark)

S3All Data

(Amazon

S3)

ETL (SPARK

SQL)

Attribution (MR)

Ecosystem of tools and services

Attribution Engine

Meta

Amazon

EMR

JobAmazon

Cloud

Watch

DynamoDB

AWS

Data

Pipeline

3rd

Party

S3

Buckets

1st

Party

AWS Direct

Connect

Amazon

VPC

Amazon

EC2

Amazon

RDS

Amazon SNS

AWS

IAM

Inside DynamoDB: Events Table

User Events

Table

Users Events_<month_1>

hash=userid

range=timestamp

<payload>

Put Item Events

Users Events_<month_2>

hash=userid

range=timestamp

<payload>

Property Value

Storage 25 TB

Avg. Record

Size

4 KB

1:N Relationship

Events Table Schema

(partition key) (sort key) (attributes)

Userid-1 epoch1 ..

Userid-1 epoch2 ..

userid timestamp payload

rLsWAQZU1C00TU5 1475624579321 <Binary compressed>

rLsWAQZU1C00TU5 1477762942692 <Binary compressed>

rLsWAQZU1C00TU5 1475624579695 <Binary compressed>

rLsWAQZU1C00TU5 1475624579703 <Binary compressed>

SS2U6KnX1BWziP5 1476829764673 <Binary compressed>

I

I

E

A

Capacity & Performance

R/W Operations vs. R/W Capacity Units

What influences capacity units for your table?

• Item size: Capacity unit size

• 4 KB per Read or 1 KB per Write

• Read/write request rate: Item Gets and Puts by your

Application

• Consistency: Strongly Consistent Read is counted double of

Eventually Consistent Read

• Local Secondary Index: Synchronized with the table

Capacity Planning: Unit of Scaling

• Partition:

• Storage: 10 GB per partition

• Compute: 3000 RCU or 1000 WCU per partition

• Partitions(for throughput) = (RCU/3000) + (WCU/1000)

• Partitions(for size) = Storage used in GB/10

• Total Number of Partitions =

Ceiling(MAX (Partitions(for throughput) , Partitions(for size)))• e.g., Ceiling(Max(100/10, 9000/3000+3000/1000)) = 10

Capacity Examples

Storage Provisioned

RCU

Provisioned

WCU

Partitions Reads

/Sec

/Partition

Writes

/Sec

/Partition

35 GB* 1000 500 4 250 125

1000 GB* 1000 500 100 10 5

100 GB* 9000 3000 10 900 300

100 GB* 90K 30K 60 1500 500

100 GB** 9000 3000 10 450 60

* Item size of 1 KB or less

** Item size 5 KB

Throttling

100 GB 9000 3000 10 900 300

Storage Provisioned

RCU

Provisioned

WCU

Partitions Reads Per

Partition

Writes Per

Partition

900 Reads and 300 Writes Per Partition

Throttling kicks in > 900 R and 300 W

Partitions

How to Detect Throttling

Auto (Predictive) Scaling

Reduce TCO by tuning throughput to match usage

0.01% of the runs

Tips & Lessons Learned

Design Tips

• Understand Scaling

• Understand Hot Keys/Throttling

• Capture Application Metrics

• Configure Table Alarms

• Application Tuning for Outliers

• Retry w/Backoff

• DynamoDB Best Practices

• http://docs.aws.amazon.com/amazondynamodb/latest/developergui

de/BestPractices.html

• AWS Service Limits

• http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

Lessons Learned

• Reduce RCU and WCU

• Combined Reads and Writes, Batch API

• Combined multiple rows that share the same hash key to the

same row (3X less puts)

• LZ4 compression

• How do we handle Deletes?

• Table rotation to match attribution windows

• Drop entire table when it is no longer necessary

Lessons Learned

• Dynamic scaling to large number of partitions takes time

• Debugging

• Application logging/metrics

• TCP dumps

• Turn on Request ID logging

• CloudWatch

• Local DynamoDB for testing

• http://docs.aws.amazon.com/amazondynamodb/latest/develo

perguide/Tools.DynamoDBLocal.html

Remember to complete

your evaluations!