google analytics and adwords optimisation with gnu r

Google Analytics and AdWords optimisation withGNU R

Hinnerk Gnutzmann & Piotr Śpiewanowski

flexponsive UG

Booster Conference, 9th March 2016

About flexponsive UG

• e-commerce consulting• Big Data focus• Qualitative user testing• Academic (PhD in economics) and programming background

Contact• mailto: spiewanowski@flexponsive.net• web: https://www.flexponsive.net/• t: @flexponsive

Topic of the day

• Marketing outcomes• difficult to define• even more difficult to measure

• Before Big Data: “Half the money I spend on advertising is wasted;the trouble is I don’t know which half.” (John Wanamaker, 1838 -1922)

• With Big Data: “AdWords brand keyword ads have no measurableshort-term benefis” (Blake et al., 2015) - 100% wasted?

• Open Questions:• Incrementality Debate: Do AdWords campaings cannibalise organictraffic?

• Quality: Are bought visitors good or bad customers?• Heterogenity: Campaign effects differ between customers?

Agenda

1. Case study Brand Keyword: The Secret of vanishing AdWords ROI2. What can we do?

• attribution models• controlled experiments• GNU R & Analytics: A Dream Team

3. How to do that?• Google Core Reporting API & GNU R• GA Query Explorer• Configuring an experiment in AdWords

4. Analysis with GNU R• Data wrangling, sampling, etc.• GA replicate metrics• Regression Analysis

5. Case Study II: adClicks and rain in Bergen

Example - Skandiabanken

What happened?

• the AdWord is highly relevant to the search• Navigational Query: The visitor wants to visit Skandiabanken.• Customer knows the bank and maybe even has a service in mind

• Result: Probably the best keyword in the account• Excellent CTR• Very good conversion on-site• CPC perhaps not so high

• Any questions?

• Organic result is the same!• What would you click if there was no ad?

What happened?

• Any questions?

What happened?

• Any questions?

Example - Skandiabanken (without AdWords)

Problem: SEM expenditure a function not only of the campaign, but alsoof the behavior and intent of consumer

The eBay study

• Blake et al. (2015), “Consumer Heterogeneity and Paid SearchEffectiveness: A Large Scale Field Experiment”

• Field Experiment: Does AdWords work for eBay?

• Very controversial results:1. Conventional methods used to measure the causal (incremental)

impact of SEM vastly overstate its effect.2. True effectiveness of SEM is small for a well-known company like eBay3. Click substition: When the brand keyword AdWord disappeared,

almost all the users click on the organic result4. Informative Advertising: AdWords work if a visitor gains additional

information through advertisement - AdWords had almost no effect onrevenues from existing customers - They found their own way to eBay!

What can be done? Attribution modelling

But how to know the true channel’s impact?

Attribution modelling

• a way to divide the “credit” for a sale between different marketingchannels

• if you don’t know what attribution model you are using, it’s “lastclick” => you believe the sale only depends on the last ad thecustomer saw before purchasing

• probably that’s not true: perhaps the customer had been followingthe company blog for a long time, heard friends talk off-line about theproduct, or saw many banner ads on different sides before making apurchase

• problem: no good way to decide how to “attribute” between differentmarketing channels

• results depend a lot on assumptions, which you cannot test

• similar problem: if you advertise your brick-and-mortar store on TVand on radio, what drives the customer to your store?

What can be done? Controlled experiments

• Select by random treatment and control group, for example:• Per user: A / B Testing• By Geographical Region

• Assumption: Without experiment, both groups behave similarly• Evaluation: difference in differences

• difference in the control group: Noise• difference in treatment group: Effect + Noise

• Metrics: ∆TREATED − ∆UNTREATED

• Advantages of a geographical experiment:• no multi-device tracking necessary• easy integration with external data

• Caveat: Geographical groups really need to be comparable(e.g. commuters)

Difference in Differences

GNU R and Google Analytics: Dream Team

1. Selection of the treated and control group• Install R, generate a sample with GNU R• Export: Copy & paste to AdWords

2. Data collection• Google Analytics already configured

3. Aggregation and query• In the cloud: Google Analytics Query Explorer• Integration with RGoogleAnalytics

4. Evaluation: Estimation and Visualization• All necessary functions available as packages in R

About R

• Programming language and software environment for statisticalcomputing and graphics, a dialect of S

• Quite lean; functionality is divided into modular packages• Graphics better than in most stat packages.• Useful for interactive work, but contains a powerful programming

language for developing new tools (user -> programmer)• Very active and vibrant user community; R-help and R-devel mailing

lists and Stack Overflow• Markdown packages for reproducable research and automated

reporting• It’s free!

Install R

• Open Source for Windows / Mac / Linux etc.• GNU R: https://www.r-project.org/• RStudio IDE: http://www.rstudio.com

• Cheat Sheets to help!• R Reference Card• RStudio cheatsheets

• Package management via CRAN

install.packages('RGoogleAnalytics',repos = "http://cran.no.r-project.org");

install.packages('plm',repos = "http://cran.no.r-project.org");

install.packages('ggplot2',repos = "http://cran.no.r-project.org");

Selecting Treatment Group

download.file('https://goo.gl/qVgiYp',destfile='geoid.csv');

#Kommune level selection, but Fylke level also possibleregions <- read.csv('geoid.csv');norway<-regions[which(regions$Country.Code == 'NO'& regions$Target.Type == 'County'& regions$Status == 'Active'),];

set.seed(1);

norway$isTreatment <- sample(c(0,1),nrow(norway), replace =T)

write.csv(norway, file='norway.csv');

# paste into AdWordswriteLines(as.vector(norway[which(norway$isTreatment == '1'),]$Canonical.Name),file('treatment.csv'));

Configuring Google AdWords I

Configuring Google AdWords II

Configuring Google AdWords III

Configuring Google AdWords IV

Configuring Google AdWords V

Done!!

. . . for the results

Google Analytics Core Reporting API & R

1. Create an “app”• Google Developers page• Enable Google Analytics API• Create Credentials: OAuth client ID, Application type: Other• Result: Client ID and Client Secret

2. Find your GA Profile ID

Setting up GNU R

client.id <- 'xxxxxxxxxxxxxxx.apps.googleusercontent.com';client.secret <- 'xxxxxxxxxxxxxxx';analyticsProfileId <- '111111111';

# redirect to google, paste, coderequire(RGoogleAnalytics);token <- Auth(client.id, client.secret)

# savesave(token, file = 'gatoken.txt');

# next timetoken <- load("./gatoken.txt")ValidateToken(token);

Create a query

query.list <- Init(start.date = "2015-10-01",end.date = "2016-02-29",dimensions = "ga:region,ga:date,ga:medium",metrics = "ga:sessions,ga:transactionRevenue",filter = "ga:country==Norway",max.results = 50000,sort = "-ga:date,ga:region",table.id = paste0("ga:",analyticsProfileId));

ga.query <- QueryBuilder(query.list);ga.data <- GetReportData(ga.query, token);

Real Data Example - www.flexponsive.net

kable(head(ga.data))

region date medium country sessions transactionRevenueBrussels 20160229 referral Belgium 1 0State of Parana 20160229 referral Brazil 1 0Baden-Wurttemberg 20160229 organic Germany 1 0Baden-Wurttemberg 20160229 referral Germany 1 0Rhineland-Palatinate 20160229 referral Germany 1 0(not set) 20160229 (none) Hong Kong 5 0

Tip: Query Explorer

Tip2: Dimensions & Metrics Explorer

Tip3: Avoiding sampling

> ga.data <- GetReportData(ga.query, token)Status of Query:The API returned 1393 resultsThe query response contains sampled data. It is based onXX.XX % of your visits. You can split the query day-wisein order to reduce the effect of sampling.

Set split_daywise = T in the GetReportData functionNote that split_daywise = T will automatically ....

• “Sampling occurs automatically when more than 500,000 sessions(25M for Premium) are collected for a report, allowing GoogleAnalytics to generate reports more quickly for those large data sets.”

Data Integration

• Wide Format: for each region and time a row• Long Format: Region / time / dimension one line (EAV)

require (reshape2);

## Loading required package: reshape2

w <- reshape (ga.data, timevar = 'medium',idvar = c( 'region', 'date'), direction = 'wide');

Data Integration: Almost finished

• Merge: Who is in which group?

ds <- merge (w, norway[, c ( 'Name', 'isTreatment')],by.x = 'region', by.y = 'Name', all.x = T)

• Data set is ready!• Comfortable DSL for data manipulation• Use packages to minimize code

Case Study: Wanderlust

• an app “developed” for this presentation• mysterious weekend getaway and short holidays booking engine• supports inventory management of hotels and airlines• seasonal demand fluctuations

Evaluation

• Simulated data for illustration: 3 summer months• 1st August: experiment starts in 10 random provinces (fylke) -AdWords stopped

• 1st August: start of school, search volume falls everywhere by 50%

• Scenario: 100% of visitors click organically when the AdWord invisible• Randomization has decided:

• Sor-Trondelag (Trondheim): In the treatment group - from 1st Augustno AdWords

• Hordaland (Bergen): In the control group - AdWords continue

Revenues in Sor-Trondelag (treatment)

Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01date

Revenues in Hordaland (control)

Revenues in both Fylke

region Hordaland Sor−Trondelag

ROI Calculation - standard regression

require(stargazer);out <- lm(transactionRevenue.total ~ isTreatment.cpc,

data = sd.w)stargazer(out, header=FALSE, type='latex')

Table 2

Dependent variable:transactionRevenue.total

isTreatment.cpc −48.358∗∗∗

(1.350)

Constant 111.350∗∗∗

(0.569)

Observations 1,748R2 0.424Adjusted R2 0.423Residual Std. Error 21.560 (df = 1746)F Statistic 1,282.996∗∗∗ (df = 1; 1746)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

ROI Calculation - standard regression

• Standard OLS regression with binary variable == comparing means• But not the right ones. In this case:

Revenues = β0 + β1 ∗ treatment

• The treatment takes value 1 for the treatment group after theAdWords were stopped in Sor-Trondelag, otherwise 0

• As a result β1 represents the difference between the average revenuesin Sor-Trondelag in August and average revenues in Hordaland andSor-Trondelag in June and July

• That’s clearly now what we are looking for!!

Difference in Differences

ROI Calculation - Differences in Differences

require(plm)out <- plm(transactionRevenue.total ~ isTreatment.cpc,

data=sd.w, index=c("region", "date"), model="between")stargazer(out, header=FALSE, type='latex')

Table 3

Dependent variable:transactionRevenue.total

isTreatment.cpc 0.189(0.254)

Constant 102.741∗∗∗

(0.062)

Observations 19R2 0.032Adjusted R2 0.028F Statistic 0.556 (df = 1; 17)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

ROI Calculation - Difference in Differences

• Difference in Differences estimator using fixed effects model withbinary varaibles allows to calculate the true effect of the treatment

• Econometrically we estimate this equation:

Revenues = β0 + β1 ∗ treatment + β2 ∗ before + γ ∗ fylke

• fylke is a matrix of binary variables for each district• before is a binary variable takes value 0 in a period in which AdWordswere running in all districts and value 1 in period in which experimentwas started in some regions

• treatment takes value 1 for the treatment group in the preiod inwhich the experimetn was started, i.e. after the AdWords werestopped in Sor-Trondelag, otherwise 0

• The estimation result reveals the true impact of AdWords onrevenues in this data set

Discussion

• The Missing counterfactual - we do not know what else could behappening - help: Experiment

• Challenge: Big Data without Big Code - Google Analytics & GNU R -Very rich toolbox

• Result: Differences in Differences can work - note assumptions

Table of Contents

Brand KeywordsThe eBay StudyCalculating the true ROI

Brand keywords with RConfiguring ExperimentUsing Google Analytics API

AdWords experiment: an exampleRegression Results

google analytics and adwords optimisation with gnu r

Marketing

conferencia google adwords | experto adwords

gnu wget 1.13 - the gnu operating system

marketing company johannesburg south africa : cognite...

adwords bid optimisation - vu · google adwords google...

google ads - netmatter › blog › wp-content › ... ·...

learning gnu c.pdf - non-gnu

adwords konferenz_2012: daniel steiner - adwords in google...

using gnu r and google analytics to optimize adwords bids

dpo - rgpd - juriste ruben darius landsberger...

chef de projet web - webmarketing · gestion du site...

learning gnu c - gnu project archives

google adwords tutorial | google adwords certification

adwords konferenz_2012: matthias schodits - adwords & mobile

google adwords footprint business solutions. google adwords...

webinar google adwords i (optimización google adwords)

digital magnet · wordpress web development sem (google...

adwords academy adwords付款全攻略

is your website generating leads for your business? ·...

de gnu linux a gnu linex.pdf

wouldn’t life be better without clients?€¦ · consider...