snowplow is at the core of everything we do

Post on 16-Apr-2017

2.883 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Snowplow drives everything we do

What and why?

Digital and print publisher

Family-owned German company

116 sites across Australia and New Zealand

Tag management across all sites

Bauer Media

Just start collecting

Snowplow data collection in 2014

We didn’t really have a use case

Stuff we record

Page views

Metadata around content

User logins

Email click-throughs

Ad impressions

Use cases started showing up

Cross-site integrated reporting

Ad hoc tricky analysis

Sanity checking industry audience reporting

Stalking individual users

Audience overlaps

User behaviour

Ad impressions

Content metadata

Trending service

Recommendations

Dashboards

Ad hoc analysis

Some things you can’t do in GA

Tag-based reporting

Accurate reporting of in-app Facebook using user-agent contains FBAN

We’re using Snowplow 0.9.2 from 2014-04-29!

It just works

We’ve been busy building other stuff

But...

Page pings is b0rken: no time spent or scroll depth

(Out-of-the-box) browser categorisation is terrible

Hourly batches are a bit higher latency than we’d like

No context shredding, but JSON queries are performant enough

runSnowPlow.shWeb page

(JavaScript in page creates

image beacon)

S3

CloudfrontSnowCannon

(Node app in Elastic

Beanstalk)

Redirects to

Writes logs to

ETL(Elastic Map

Reduce)

S3

events(Redshift)

events_temp(Redshift)

x_events(Redshift)

Tips

Redshift can get very expensive very quickly

Decent dashboarding platforms are rare

And plenty of crap ones are overpriced

Just tip everything in and worry about what you’ll do later

What’s next?

Future plans

Upgrade ETL to real-time: probably our own solution

Time spent and scroll depth

Shredding?

top related