realtime attribution flink

11

Click here to load reader

Upload: vishnu-viswanath

Post on 22-Jan-2018

129 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Realtime attribution flink

©2016 MediaMath Inc. 1

08.08.2016

Real-time attribution from streaming data using Apache Flink

Real-time Attribution - POC

Page 2: Realtime attribution flink

©2016 MediaMath Inc. 2

AGENDA

Demo

What is Attribution1

Stream vs Batch2

Architecture & Flow3

Windowing & Attribution4

Handling out-of-order messages5

Result

6

7

Page 3: Realtime attribution flink

©2016 MediaMath Inc. 3

What is Attribution

Impression – ad served to the user

Events – User’s reaction to the Impression, Clicks, Conversion Event

Process of matching Conversion event with the Impression

Needed for assigning the credit of a Event to the right Impression

PostView and PostClick

UsersImpression Click Conversion

Page 4: Realtime attribution flink

©2016 MediaMath Inc. 4

Stream Vs Batch

All data sources are Streaming datasource

We do batch since we buffer data

No need to wait for the batch to finish

More control over the actions

Social Networks Server logs IOT User Activity

Data sources:

Page 5: Realtime attribution flink

©2016 MediaMath Inc. 5

Architecture & Flow

Page 6: Realtime attribution flink

©2016 MediaMath Inc. 6

Windowing & Attribution

RollingWindow of 30 days

Key by UUID and AdvertiserID

Page 7: Realtime attribution flink

©2016 MediaMath Inc. 7

Handling out-of-order messages

6 possible ways for i,e,c to occur

6 possible ways each can arrive to the system

Have to keep the state of the stream consistent

Page 8: Realtime attribution flink

©2016 MediaMath Inc. 8

Querying the result

QueryableStream API for real-time queries

Flink’s internal state backend

Avoid the overhead of communicating with external system

Druid datastore (updated less frequently)

Page 9: Realtime attribution flink

©2016 MediaMath Inc. 9

Demo

Page 10: Realtime attribution flink

©2016 MediaMath Inc. 10

Conclusion

Ran on AWS cluster with 15 task slots with 1.3gb each

Processed 40gb of data in about 4 hours

Customers can see how their campaigns are doing in real-time

Can switch to more complex Attribution logics – Multi-touch attribution

Page 11: Realtime attribution flink

©2016 MediaMath Inc. 11

Vishnu Viswanath

Data Engineer Intern

4 World Trade Center, 46th FloorNew York, NY 10007

THANK YOU!