modelling event data in look ml
DESCRIPTION
Description of how we use Looker at Snowplow Analytics, including the specific steps involved in modelling event data using LookMLTRANSCRIPT
Modeling event data in LookMLLondon Look & Tell, Nov 19 2014
Modeling event data in Looker
• Snowplow: what is it?
• Snowplow + Looker: why?
• LookML: why is it so important?
Snowplow is an event analytics platform
1. Trackers
2. Collectors 3. Enrich 5. Modelling 6. Analytics
2. Webhooks
4. Storage
Unified log: record of every event that has
occurred
Snowplow works great with Looker
Enormous, detailed record of events
Turn that data into insight
So what is actually happening in the data modeling step of the pipeline?
1. Trackers
2. Collectors 3. Enrich 5. Modelling 6. Analytics
2. Webhooks
4. Storage
?
1. Identity stitching: identifying that groups of events belong to the same user
time
Page view
Product summary view
Transaction
Product detailed view
Share product
Add product to basket
Viewed ad
…
…
…
…
…
…
…
…
…
…
1. Generate single record for each user
2. Perform any behavioral segmentation based on that user’s event stream
3. Join that user record with other sources of user data e.g. CRM
Customer record
2. Group micro-events into macro-events
time
Listed video
Viewed synopsis
Paused video
Paused video
Played video
Finished video
User A engagement with video Y
3. Group sequences of events into sessions
time
Session record
Session record
Session record
Session record
Session record
Session record
Session record
4. Join Snowplow event data to data on the entities involved in the events
CMS
Marketing
CRM
Articles Products Videos …Levels
Adwords Display Social ……
Customers
…
5. Finally, we define a consistent set of dimensions and measures across the consolidated data set
• Products
• Brands
• Categories
• Articles
• Author
• Days since published
• Categories
• Users
• User cohort
• Behavioral segments
• Demographic segments
• Stage in funnel
• …
• Users count
• Engagement levels
• Current value
• Forecast lifetime value
• Number of SKUs
• Number of articles
• Number of upsells
• Number of new users
• …
Dimensions Measures
Accessible to the whole business
In summary
• LookML: application of business logic to our underlying data
• Data from Snowplow represents what has happened
• In LookML we define how we interpret that underlying data, given our own business logic e.g.
• How do we identify users?
• How do we segment users?
• How do we join multiple different data sets into a single source of truth?
• How do we measure engagement?
• We need to do this at the end of the data pipeline
• Business evolve: as you get more sophisticated, your LookML model will evolve
• Your data is constantly recast as your model – data never goes stale
• LookML is the best framework we’ve used to manage the data modeling process required on Snowplow event data