paypal real time analytics

Post on 22-Nov-2014

274 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3

TRANSCRIPT

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3

Anil MadanSr. Director Engineering, PayPal

$1 in every $6Spent on e-commerce is

spent through PayPal.*

*Source: Morgan Stanley, “eCommerce Disruption: A Global Theme,” January 6, 2013, p.21.

Creating Tomorrow’s

Mobile PaymentExperiences

25 countries with live PayPal fingerprint authenticationon Samsung devices.

Helping DevelopersInnovate & Monetize

New Mobile Apps

Braintree launches its new API, including Pay with PayPal.

PayPal Now Available in 203 Markets10 new markets added in the second quarter,

making PayPal available to 80 million new internet users.

Paraguay

Côte d’Ivoire

Nigeria

Monaco

Belarus

Moldova

Cameroon

Zimbabwe

Montenegro

Macedonia

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

We need to better understand our customers…

Where do prospects sign up for accounts?

How do prospective customers learn about

PayPal?

Acquisition Activation AdoptionAwareness

How can we help them

use PayPal even more?

How can we help them to

complete their 1st

payment?

Business Problem

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

How we solved it…

Direct/Home Page

ProductExperiences

Search EngineMarketing

TransactionEmails

Tracking MetadataTool

Taxonomy

Tracking Event Service

Tracking Servers

Tag Catalog

Tracking Validation Service

Marketing

Segmentation

Real Time Systems

Experimentation

Metadata

AttributionExploratory Analytics Predictive Analytics

Big Data

Mobile

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Reporting & Visualization

Pathing Store

Logical View

Client Side Events

Page Performance

Events

Server Side Events

Collection Service

Sessionization

Behavioral Metrics

Marketing Metrics

Metadata Instrumentation Collection Processing Analytics

Performance Metrics

Operational Metrics (OpenTSDB)

DRUIDMetrics Store

Real Time Event

Metrics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Metadata –Logical Entity Model

COMPONENTS

PAGETEMPLATE

TAGS

LINK

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Metadata – Logical Event Model

ImpressionEvent

TrackingEvent

ReactionEvent

ComponentImpression

Event

AdImpression

Event

ClickEvent

Click-ThroughEvent

Mouse-overEvent

EntryEvent

ExitEvent

OutcomeEvent

PageImpression

Event

Client PageImpression

Event

Server PageImpression

Event

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

11

Metadata - Self-Service Management Workflow…

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

DATA PIPELINEProcessing Analysis &

VisualizationClientSide

Metadata

Performance

Collection

Metrics

Tools

RESTSpout

Bot flagging

Bolt

AggregationSessionization

RESTProxy

HTTP

ServerSide

Geo Enrichment

Bolt Reporting

Data Stores

Druid

Apache Titan

DevelopersProduct Owners

Customers

Meta data

Reporting Consumers

Metadata Service

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

• Open-source• Distributed • Real-time • Highly-Available Data store• Column-oriented• Approximate or Exact

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

14

• Ingest data and buffer events in memory

• Incremental indexing• Query data as soon as it is

ingested• Periodically persist collected

events to disk • Combine multiple disk indexes

to create immutable ‘segments’• Log-structured merge-tree

Real Time Nodes

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

16

Historical Nodes

• Load immutable read-optimized data from deep storage

• Memory mapped storage engine• Caches segments • Supports tiered storage

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

18

Druid Systems Overview

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

19

"type": "doubleSum", "name": "pageviews", "fieldName": "PV" }, { "type": "doubleSum", "name": "bounces", "fieldName": "bnc" },.... { "type": "hyperUnique", "name": "unique_visits", "fieldName": "user_session_guid" }, { "type": "hyperUnique", "name": "unique_visitors", "fieldName": "user_guid" }

2014/06/11/10", "filter": "part-", "parser": { "type": "string", "timestampSpec": { "column": "timestamp", "format": "auto" }, "data": { "format": "json", "dimensions": [ "timestamp", "USER_GUID", "USER_SESSION_GUID", "PAGE_GROUP", "PAGE_NAME", "PAGEGROUP_LINK_NAME", "PAGE_LINK_NAME",

Metrics & Dimensions

Standard

Metrics

Estimated

Metrics

HyperLogLog

Dimensions

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

20

Sessionization

Visitor ID

SessionID

Timestamp EventPayload

V1 S1 2014-10-16 05:12

E1

V2 S2 2014-10-16 05:14

E2

V1 S1 2014-10-16 05:15

E3

V1 S1 2014-10-16 05:20

E4

V2 S2 2014-10-16 05:21

E5

V1 S3 2014-10-16 05:25

E6

… … … …

Visitor ID

SessionID

Payload

V1 S1 sf, mac, {flash, quicktime}, {ca, usa}, 480 secs,….

E1

E3

E4V2 S2 ff, win, {acrobat, mediaplayer}.

{wb, in}, 420 secs…..E2

E5

V1 S3 sf, mac, {quicktime, java}, {on, ca}, 60 secs

E6

Events VisitContainer

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

21

Druid Storage – Columns & Dictionaries Timestamp (Hr) Sessi

onID

Country OS UserAgent

Page Name

2014-10-16 05 S1 US MAC SF LoginAccountOverview

2014-10-16 05 S2 DE WIN IE LoginPaymentReviewAccountHistory

2014-10-16 05 S3 US LNX FF LoginPaymentReview

Checkout

2014-10-16 05 S4 UK LNX FF LoginProfile

Checkout

2014-10-16 05 S5 DE WIN CR LoginProfile

2014-10-16 05 S6 UK MAC SF LoginAccountOverview

Checkout

Page Name

01

023

024

054

05

014

Dictionary

Login 0

AccountOverview

1

PaymentReview 2

AccountHistory 3

Checkout 4

Profile 5LZF

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

22

Druid Data Structure - Bitmap Indices

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

23

Herald – Self Service Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

24

Herald – Self Service Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

25

Druid Metrics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

26

Enter

Pathing

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

27

Fallout Reports

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

28

Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2

S1 A B C null null

S1 B C D A null

S1 C D X B A

S1 D X A C B

S1 X A M D C

S1 A M null X D

S1 M Null null A X

S2 A B C null Null

S2 B C D null A

S2 C D E B A

S2 D E Null C B

S2 E Null null D C

A->B->C->D->X->A->M and A->B->C->D->E Pathing

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

29

Next Page{ “queryType” : “groupBy” “dimensions” : (“current_page”, “dimensions like country, segmentation etc”} “aggregations” : [ { “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}

Previous Page{ “queryType” : “groupBy” “dimensions” : {“current_page”, “dimensions like country, segmentations etc”} “aggregations” : [ { “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}

Pathing

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

30

Fallout

• Apply them to the dictionary• Figure out the values that match• Take those bitmap indices• OR the bitmap indices together• Use the output bitmap as the filter

A->D-> X->M

“queryType” : “search” “dimensions” : { “current_page_path_count”, “dimensions like country, segmentation etc”} “filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” }}

A->B->C->D->X->A->M

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

31

Model View

Controller

NVD3Directives

CL

IEN

TS

ER

VE

RHerald Architecture

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

32

SSO

Druid

Herald Deployment

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

33

Name:

Login_2014101611

Country: USCount: 15

Name:

AccountOverview_2014101611

Name:

PaymentReview_

2014101611

Name:

Checkout_2014101611

Count: 8

Country: USCount: 5

Count: 7

Country: USCount: 5

Country: USCount: 10

Count: 5

Count: 5

5

8

7

6

Adhoc Graph Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

34

Name: Login_2014101611

Country: USCount: 15

Name: AccountOverview_2014101611

Name: PaymentReview_2014101611

Name: Checkout_2014101611

Count: 8

Country: USCount: 5

Count: 7

Country: USCount: 5

Country: USCount: 10

Count: 5

Count: 5

5

8

7

6

gremlin> g.v(‘Name’, ‘Login_2014101611').as('x’).

outE.inV.loop('x'){it.loops < 4}

{it.object.getProperty('name') ==

'Checkout_2014101611'}.path

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

35

Summary

• Problem• Understand our customer behavior• Across disparate channels & experiences

• Solution• Democratize data• Consistent standardized metadata• Disciplined instrumentation• Distributed scalable backend for adhoc & interactive analytics• Self-service BI through modern visualization tools

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Questions ?

top related