realtime attribution flink
TRANSCRIPT
©2016 MediaMath Inc. 1
08.08.2016
Real-time attribution from streaming data using Apache Flink
Real-time Attribution - POC
©2016 MediaMath Inc. 2
AGENDA
Demo
What is Attribution1
Stream vs Batch2
Architecture & Flow3
Windowing & Attribution4
Handling out-of-order messages5
Result
6
7
©2016 MediaMath Inc. 3
What is Attribution
Impression – ad served to the user
Events – User’s reaction to the Impression, Clicks, Conversion Event
Process of matching Conversion event with the Impression
Needed for assigning the credit of a Event to the right Impression
PostView and PostClick
UsersImpression Click Conversion
©2016 MediaMath Inc. 4
Stream Vs Batch
All data sources are Streaming datasource
We do batch since we buffer data
No need to wait for the batch to finish
More control over the actions
Social Networks Server logs IOT User Activity
Data sources:
©2016 MediaMath Inc. 5
Architecture & Flow
©2016 MediaMath Inc. 6
Windowing & Attribution
RollingWindow of 30 days
Key by UUID and AdvertiserID
©2016 MediaMath Inc. 7
Handling out-of-order messages
6 possible ways for i,e,c to occur
6 possible ways each can arrive to the system
Have to keep the state of the stream consistent
©2016 MediaMath Inc. 8
Querying the result
QueryableStream API for real-time queries
Flink’s internal state backend
Avoid the overhead of communicating with external system
Druid datastore (updated less frequently)
©2016 MediaMath Inc. 9
Demo
©2016 MediaMath Inc. 10
Conclusion
Ran on AWS cluster with 15 task slots with 1.3gb each
Processed 40gb of data in about 4 hours
Customers can see how their campaigns are doing in real-time
Can switch to more complex Attribution logics – Multi-touch attribution
©2016 MediaMath Inc. 11
Vishnu Viswanath
Data Engineer Intern
4 World Trade Center, 46th FloorNew York, NY 10007
THANK YOU!