combining choices and response times in the field: a drift ... · vertisements on mobile devices, a...
TRANSCRIPT
Combining Choices and Response Times in the Field:
a Drift-Diffusion Model of Mobile Advertisements∗
Khai Chiong
University of Texas at Dallas
Matthew Shum
California Institute of Technology
Ryan Webb
University of Toronto
Richard Chen
Happy Elements, Inc.
Abstract
We utilize the class of drift-diffusion models — originally developed in psychology andneuroeconomics to jointly explain subjects’ choices and response times in laboratory ex-periments — to model users’ responses to video advertisements on mobile devices. Thecombination of response time with choice data allows separate identification of the diffusionprocesses characterizing users when the ad is playing, as well as when they subsequentlywhether to click-through on the ad. Our estimates identify two segments of users that alsoexhibit distinct out-of-sample behaviour including app engagement and in-app spending.Counterfactual simulations utilizing our estimates predict that ads with a 2 second durationyield roughly the same revenue as forcing a user to view the entire thirty-second ad, thusrationalizing the practice of some platforms (e.g. YouTube) where users can skip an ad afterwatching for a few seconds.
Keywords: Mobile advertising, Attention, Drift-diffusion model, Response times, Videoadvertisements, Skippable ads
JEL codes: L81, M37, D83, D87, C15, C22
1 Introduction
The notion of revealed preference — that agents’ preferences can be recovered from the
choices that they make — underlies large swaths of the empirical literature in economics. In∗First draft: December 2018; this draft: October 11, 2019. Comments welcome.
We thank Colin Camerer, John Clithero, Cary Frydman, Lawrence Jin, Shijie Lu, Rosemarie Nagel, Whitney Newey,
David Reiley, Xiaoxia Shi, Jakub Steiner, Tomasz Strzalecki, Pengfei Sui and seminar attendees at Berkeley (Haas),
Caltech, Chinese University (HK), Duke (Fuqua), HKUST, HK Baptist University, Ohio State, Toronto (Rotman), 2019
ESA (Los Angeles), 2019 Workshop on the Economics of Advertising and Marketing (Porto), 2019 DSE Conference
on Structural Dynamic Models (Chicago), 2018 California Econometrics Conference (Irvine) for helpful comments.
2
many choice situations, however, the amount of time it takes decision-makers’ to make a choice
(response times) are also sharply informative about their preferences (Frydman & Krajbich,
2017; Konovalov & Krajbich, 2019; Alos-Ferrer, Fehr & Netzer, 2018).1 A swift food choice at a
restaurant may indicate strong preferences for the chosen dish relative to the other available op-
tions; the time an online shopper spends deciding between different clothing items may suggest
each item is equally stylish in the shopper’s mind. In their seminal formulation of the Random
Utility model, Block & Marschak (1960) describe:
“The shorter the delay, the larger is the difference between certain values said tobe attached by the subject to two alternative responses. A very long delay revealsa state of (almost) indifference, the ‘conflict-situation’: Hamlet took a very long timeto decide whether to kill his uncle."
In this paper, we consider a structural model of choice which incorporates not only decision-
makers’ eventual choices, but also their response times. To understand why decisions take
time, recent theoretical work has examined models of sequential information sampling (Wood-
ford, 2014; Fudenberg, Strack & Strzalecki, 2018; Gossner, Steiner & Stewart, 2018; Hebert &
Woodford, 2017; Morris & Strack, 2019; Fudenberg, Newey, Strack & Strzalecki, 2019). In this
setting, the utilities of options are unknown and must be learned by the (costly) sampling of noisy
information. Building on research in psychology and neuroeconomics, this approach models the
decision process as an accumulation and comparison of this noisy information over time. Here,
we consider the drift-diffusion model (hereafter DDM) postulated for modeling agents’ choices
in settings where decisions are made on the order of seconds (Fehr & Rangel, 2011). In the
DDM, decision-makers accumulate information about the match values, or utilities, of available
options in a choice problem. The time it takes a decision-maker to accumulate enough infor-
mation or evidence favoring a given option is modeled directly; therefore both the choice and
response time are endogenously related as part of a dynamic choice process. The parame-
ters governing this information accumulation process can then be estimated from choice and
response time data.
The DDM and its various extensions have been well-tested in laboratory experiments (see
Clithero, 2018b, for a review). However the applicability of this model to real-world, consequen-
tial decisions has remained unexamined. Our contribution is to apply the sequential-sampling1Response times are referred to by a variety of names, including reaction times, decision times, latencies (in
psychology), and dwell times (in tech companies).
3
framework in a field setting: we adapt the drift-diffusion model to analyze responses to ad-
vertisements on mobile devices, a decision scenario which in recent years has become the
dominant component of internet advertising.2 An important vehicle of mobile advertising are
video trailers — in which users of an app are shown a short video advertisement before contin-
uing to use the app. After the ad video finishes, users are prompted to make a binary choice
to either “click-through” to the app-store and download the advertised app, or “click-back” and
return to the app which they were using (the “originating” app). This field setting, which directs
users’ attention towards the advertised item as the video plays, closely mimics the types of de-
cision tasks previously used to calibrate the DDM and measure attentional manipulations in the
laboratory (Mormann, Koch & Rangel, 2012; Gwinn, Leber & Krajbich, 2019).3 We can thus use
a dataset on users’ decisions and their corresponding response times to estimate a structural
econometric model based on the DDM.
Our model is a “two-stage" extension of the drift-diffusion model motivated by our mobile ad-
vertising setting. In the initial ad exposure stage, the user is exposed to the ad and cannot make
any decision while the video ad is being played (the ad is “non-skippable", in industry parlance:
it must be played in its entirety). In the second decision stage, after the video ad finishes, users
are presented with a visual end-screen prompting them to either click-through or click-back. By
combining response times with choice data, we can separately identify the parameters of the
information accumulation process during both the ad exposure stage as well as the decision
stage, and examine how users’ behaviour differs in response to the advertisement.
We find that both as the ad plays and after it is finished, users’ diffusion processes drift, on
average, away from the advertised app, implying that an average user perceives a negative util-
ity difference between the advertised app vis-a-vis the originating app. However, these average
effects mask a great deal of user-level heterogeneity. Indeed, for many users, the estimated
diffusion process drifts toward the advertised app as the video ad plays and, indeed, a key
finding is that users can be segmented into two groups depending on whether their estimated
diffusion process drifts toward or away from the decision threshold for the advertised app dur-
2In 2018, mobile advertising in the United States accounted for roughly 75% of all spending on digital ads,
eclipsing TV advertising for the first time (https://www.forbes.com/sites/johnkoetsier/2018/02/23/mobile-advertising-
will-drive-75-of-all-digital-ad-spend-in-2018-heres-whats-changing/).3Indeed, the DDM has been previously applied to model primates watching videos, specifically the accumulation
of visual evidence in a motion discrimination task (Roitman & Shadlen, 2002). In this study, the accumulation of
neural activity in the monkey parietal cortex varied with the strength of the visual stimuli.
4
ing the ad exposure stage. These two segments of users are behaviorally distinct, and exhibit
striking differences in out-of-sample behavioral metrics including app engagement and in-app
spending.
While many empirical studies of advertising have focused on measuring the effects of ad-
vertisements on consumer choices, our structural model of the decision process permits us to
go further and assess empirically the optimal design and format of advertisements. We use our
estimates to address an important design problem for mobile advertising platforms: whether
advertisers should make their ads “skippable”: that is, to allow ad viewers to skip part or all of
the ad and make a choice before the ad finishes. We find that the click-through rates from ex-
posing viewers to the entire 30-second ad during the ad exposure stage are virtually the same
as cutting the ad after its initial five seconds. This rationalizes the practice of some websites,
notably YouTube, where users can skip an ad after viewing the first few seconds.
However these effects are also very heterogeneous across the two user segments identified
in our estimates: one segment (the ones who drift away from the advertised app) are more
willing to click on short ads while for the others it is the opposite. Such heterogeneity suggests
that ad format, just like ad content, could be targeted and individualized according to user
demographics; indeed, ad revenue would be even higher if the skippability of the ad could be
targeted to the two user segments differently. Our results suggest that mobile advertisements
have high frequency effects which vary substantially both across users and over time. The use
of response times in conjunction with choice data is important for identifying these effects.
More broadly, the DDM-based framework used in this paper provides a way of integrating
endogenous response time data organically into an empirical analysis. As more “clickstream”
datasets are becoming available to researchers in economics and marketing — in which times-
tamps are recorded for all the choices that agents are observed to make — econometric meth-
ods and models for utilizing this additional data source are crucial.4 As far as we are aware,
this paper is the first attempt to apply a model in the drift-diffusion paradigm to field data, which
demonstrates the value of applying this framework beyond the controlled laboratory settings in
which it has previously been used.5
4For an extended discussion of the endogeneity of response times in discrete choice settings, see Webb (2019).5A handful of papers have estimated the structural parameters of the DDM using data from laboratory experi-
ments, including Frydman & Nave (2016) and Clithero (2018a); see Clithero (2018b) for a review.
5
The existing literature in marketing contains a small set of papers incorporating both choices
and response times in a (non-DDM) structural choice model. An early paper of this type is Otter,
Allenby & Van Zandt (2008), who incorporated response times in a conjoint analysis. More
recently, Seiler & Pinna (2017) and Ursu, Wang & Chintagunta (2018) incorporate response
time into a structural model of consumer search.6
2 The Drift-Diffusion Model (DDM) Paradigm
The drift-diffusion model (DDM) originated in psychology as a model of subjects’ choices and
reaction times for decisions typically made within a matter of seconds.7 The DDM, and its
various extensions often referred to as “sequential sampling models", have been successfully
verified and calibrated in a wide variety of laboratory experiments and mapped to neural ac-
tivity in the brain.8 More recently, the modeling framework has been applied in the context of
economic choices.9
Consider a binary choice situation, where the difference in the utilities of the two options de-
termines an optimizing agent’s choice. In a sequential sampling model, the utilities are assumed
to be unknown to the agent, who only learns about them gradually over time. Mathematically,
the perceived utility difference between option 1 and option 0 can be modeled as a Gaussian
process Z(t) which evolves according to the stochastic differential equation
dZ(t) = (—+ ‚Z(t)) dt + ff dW (t); t > 0; Z(0) = 0: (1)
This stochastic process can be interpreted as a process of “stochastic evidence accumulation”
in favor of one or the other alternatives. Agents do not observe the drift term —, which represents
the true utility difference between alternatives; rather they only observe the noisy signal process
Z(t), where the noise follows a Wiener process Wt with standard deviation ff.
Stochastic evidence accumulation is a natural interpretation for settings where information
about utilities arrives sequentially and must be integrated by the decision-maker, as in a typical6Relatedly, Krajbich, Bartling, Hare & Fehr (2015) critique the psychology literature which has used only reaction
time data to infer agents’ choice rules, without taking subjects’ preferences or valuations of the choices into account.7We follow the presentation in Webb (2019). Book-length treatments of this modeling framework include Luce
(1991) and Link (1992). Detailed mathematical derivations of accumulation models are presented in Smith (2000).
Also see Busemeyer & Townsend (1992), Ratcliff & McKoon (2008), Krajbich, Armel & Rangel (2010).8Gold & Shadlen (2002), Hare, Schultz, Camerer, O’Doherty & Rangel (2011)9Krajbich et al. (2010), Frydman & Nave (2016), Clithero (2018a), Fudenberg et al. (2018).
6
psychophysical experiment or the sequence of video images in an advertisement. Recent work
also suggests that a stochastic accumulation process applies to decisions in which evidence is
sampled (recalled) directly from the decision-maker’s memory, rather than simply from sensory
evidence (Shadlen & Shohamy, 2016).
The ‚ parameter in the diffusion process captures serial dependence in the diffusion pro-
cess. The accommodates “carryover” effects, which is familiar from the empirical literature on
advertising. In the DDM literature, ‚ is called a “leakage" parameter (e.g. Usher & McClelland,
2001). When ‚ = 0 the stochastic process (1) is a continuous-time random walk with drift in
which each sample is independent and weighted equally. When ‚ 6= 0, the information accumu-
lated prior to t can affect the information perceived by the user at time t. As we will see below,
the inclusion of these parameters is important for explaining observed patterns in the response
time data.
Figure 1: A Sample Path of Z(t) in DDM.
t
g ∗(t)
t
Z1−
Z2
-θ
θ
Text
-B
B
t* time
In this example, the user chooses 1, with a response time of t∗.
Paired with this utility difference process is a decision rule. The most common decision rule
is the symmetric “first passage” rule: for some threshold or barrier B > 0, the agent chooses
choice 1 once Z(t) > B and chooses choice 0 once Z(t) < −B. The response time is the first
passage time to either B or−B. See Figure 1. For the purposes of specification and estimation,
we will follow most applications of the DDM and take the threshold B to be exogenously given.10
10 Optimal decision rules in the no-leakage (‚ = 0) setting have been studied since Wald’s (1973) sequential
7
Broadly speaking, the DDM choice framework resembles a “high-frequency” or “split-second”
version of dynamic search or learning models which have been analyzed theoretically and es-
timated in the empirical industrial organization and marketing literatures11, or duration models
in statistics.
3 Application: Video advertisements on mobile platforms
The number and variety of applications on mobile devices (“mobile apps”) has soared in the last
decade. A dominant revenue model for apps is the “freemium" model in which users download
the app for free, but are subsequently monetized with periodic exposure to advertisements. As
a result, mobile advertising has recently become the largest segment of digital advertising. We
will refer to the originating mobile app with which the user was engaged as the “publisher”, and
the creator of the advertisement as the “advertiser”. In most cases, both the advertiser as well
as the publisher are apps – an example could be users playing the Candy Crush app being
shown a video ad for the Clash of Clans app.12 Because of this, the same app can be active on
both the publisher and advertiser sides of the market, as they seek both to monetize existing
users by showing them ads, as well as acquire new users by advertising on other apps.
The most familiar ad format in mobile advertising is the “non-skippable” video ad: users on
a particular publisher app are served a 30 seconds video trailer ad, and are not allowed to skip
the ad. After the ad finishes playing, the endscreen (a “Call to Action” screen) is displayed and
users are prompted to take one of two actions. First, the user can close the ad, and return to
the publisher’s originating app. Second, the user can click on the ‘install’ button which takes the
user to an App Store, where the user has the opportunity to find out more about the advertised
app and install it. We will use “click-back” to denote the first action, and “click-through” to denote
the second.
sampling model (see discussions in Lehmann (1959) and Bogacz, Brown, Moehlis, Holmes & Cohen (2006)), up to
recent theoretical work by Fudenberg et al. (2018), Echenique & Saito (2017), Baldassi, Cerreia-Vioglio, Maccheroni,
Marinacci & Pirazzini (2019) and Ke & Villas-Boas (2019). Particularly, Fudenberg et al. (2018) show that optimal
decision rules in the DDM setting typically involve time-varying choice thresholds, and Fudenberg et al. (2019)
discuss testing and identification of this model. As far as we are aware, there are no optimality results for the
leakage (‚ 6= 0) case.11See, e.g., Erdem & Keane (1996), Crawford & Shum (2005), Hong & Shum (2006), Branco, Sun & Villas-Boas
(2012), Honka (2014), Lu & Hutchinson (2017)12See Chen & Chiong (2016) for a study of a mobile advertising auction market in which advertising rates are set.
8
Our data come from a mobile ad network, which acts as an intermediary between advertisers
and publishers in this two-sided mobile advertising platform. We will focus on data from a single
advertiser-publisher pair, that is, the same ad is shown to users within a single app. Both the
publisher and advertiser are popular gaming apps, with daily average users numbering in the
millions. The advertised app, in particular, is one installment in a popular game franchise.
Thus, even within this single advertiser-publisher pair, we observe N = 194; 607 ad servings,
or “impressions”, which occurred between October 1 through November 15, 2016. Since we
naturally do not observe users who were not served the ad, our analysis is entirely conditional
on a sample of users which both the platform and advertiser deemed to be profitable enough to
be shown this particular ad.
Table 1: Summary statistics.
Variables Mean Sd. Dev.
User covariates:
Android Version ∈ {1; : : : ; 8} 2.82 2.71
Device Volume ∈ [0; 1] 0.541 0.314
iOS Dummy 0/1 0.477 0.500
iOS Version 4.69 4.91
Samsung Dummy 0/1 0.280 0.449
Screen Resolution pixels (/106) 1.30 0.832
Connected via WiFi 0/1 0.959 0.198
Choice variables:
Click-through 0.0209 0.143
response time, for click-through (secs.) 4.97 1.52
response time, for click-back (secs.) 4.14 1.73
# observations (“impressions”) 194,607
Table 1 summarizes our dataset. There is substantial observed heterogeneity across users,
and we include a number of user-specific covariates Xi in our analysis, as listed at the top of the
table. For the endogenous choice variables, 190,548 (97.91%) of impressions are click-backs,
and only 4,059 (2.09%) are click-throughs. This matches the industry norm, where click-through
9
rates are typically between 1.5-3%. We also dropped users with response times exceeding 8
seconds; such excessive response times likely arise from users’ inattention while the video ad
was playing (e.g., looking away from the phone). This left us with 194,607 observations (80%
of the full sample).13
We define the response time as the time (in seconds) elapsed from the end of the ad until
a decision has been made. We plot smoothed kernel densities of response time for click-backs
(y = 0) and click-throughs (y = 1) in Figure 2. Several interesting patterns emerge. First,
there is no “piling-up” in the response time distributions at t = 0, indicating that users are not
simply waiting for the ad to end in order to click-back at the first available opportunity. This lack
of piling-up plays an important role in the identification of model parameters, as we elaborate
below.
Figure 2: Empirical density of the actual observed response times
0 2 4 6 8 10
Response times (seconds)
0
0.05
0.1
0.15
0.2
0.25
Pro
babili
ty d
ensity
Second, response times vary systematically across users’ choices: users who click-through
take on average 4.97 seconds (median=4.88), while those who click-back take on average only
4.14 seconds (median=3.99), which is significantly lower. Thus, response times matter since
they contain information on preferences. This feature of the reaction time distributions is a
robust characteristic across most ad campaigns, not just the one we chose for our analysis.
13To assess robustness, we also estimated our model in turn on larger datasets where we removed observations
with response times greater than 9 or 10 seconds. The results are very similar.
10
The histogram in Figure 3 shows that, across 439 large ad campaigns (advertiser-publisher
pairs) from our full dataset, the median time to click-through outstrips the median time to click-
back for most of them. This observation – that the less-frequent choice (e.g. click-through) is
selected after longer response times – is familiar in lab experimental settings involving attention
tasks and, indeed, motivated development of extensions to the DDM framework.14 At the very
least, this suggests that click-through decisions were not hasty or lazy decisions made with
little forethought: if anything, the longer response times for clickthroughs suggest more mental
effort.
Figure 3: Histogram of response time differences for 439 large ad campaigns (adver-
tiser/publisher pairs)
Median RT of click−throughs − Median RT of click−backs (seconds)
Freq
uenc
y
−5 0 5 10
010
2030
4050
6070
14For instance, in lab experiments where subjects were asked to judge which of two objective stimuli is larger, a
“slow-errors” phenomena was observed whereby mistakes tended to be rare events, but when subjects made them,
the response time was longer (see discussion in Luce, 1991; Woodford, 2016). The standard DDM model — one
given by Eq. (1) with ‚ = 0, and with a fixed threshold B — cannot generate slow errors; instead, it implies that the
distribution of response times is the same regardless of choice. This arises from the independence and symmetry
of the increments; see Shadlen, Hanks, Churchland, Kiani & Yang (2006) for a discussion. The two-stage DDM we
propose below does not run into this difficulty.
11
3.1 A two-stage DDM of mobile advertisements
In order to map the DDM framework to the mobile advertising data, we propose a two-stage
extension of the DDM. In our model, users initially accumulate decision-relevant information
during the ad exposure stage, but are only able to actively make a choice in the second stage
(the decision stage) after the 30-second ad has finished. While users experience a diffusion
process in both stages, we permit the parameters of the process to vary across the two stages.
Changes in parameters may reflect differences in the types or sources of information between
the two stages: for instance, during the ad exposure stage, the predominant source of infor-
mation may be the video ad itself, but in the decision stage users may be comparing the new
information about the advertised app with their recollections of previous game-play experiences
about the originating app.15
Figure 4: Two-stage Drift-Diffusion Model
t
B
-B
t=0
Ad Exposure Stage Decision Stage
Z1(t)
Z0(t)
15Previously, the attentional drift diffusion model (“aDDM”) model has been developed to examine the role of at-
tention within a DDM framework (Krajbich et al., 2010; Krajbich, 2019). In this model, the parameters of the diffusion
process are affected by measurements of the eye-movements of a decision-maker. Analogously, the differences
in parameters in the two stages of our model can reflect attentional and informational differences between the two
stages.
12
Formally, during a user’s exposure to the 30-second video advertisement (in the initial ad
exposure stage), information accumulates as:
dZ0(t) = —0 dt + ‚0 Z0(t)dt + ff0 dW (t) for 0 ≤ t ≤ 30: (2)
The drift for this process is denoted —0 and represents the utility differences during the first
stage. In this stage, there is no barrier, as users are not allowed to make a decision while the
ad plays. As the ad plays, the position of the Z0(t), at any point in time, is a Gaussian random
variable:
Z0(t) ∼ N —0
‚0
`e‚0t − 1
´;ff2
0
2‚0
“e2‚0t − 1
”!; for 0 ≤ t ≤ 30: (3)
where N (a; b) denotes a Gaussian distribution with mean a and variance b.16
After the ad has finished playing and users have the opportunity to either click-back or click-
through in the decision stage, the information accumulation process is allowed to differ, instead
governed by —1 and ‚1:
dZ1(t) = —1 dt + ‚1Z1(t) dt + ff1 dW (t); for t > 30: (4)
As users are called upon to make a decision in this stage, a barrier B is present. Figure 4
illustrates the two-stage model. A user makes the choice y = 1 at the response time t∗ if and
only if the Z1(t) process first hits the barrier B at time t∗ > 30. Similarly, the choice is y = 0 if
it hits −B first. If the parameters governing these two processes are identical (e.g. —0 = —1),
then the two signal processes are identical across the two stages.
As illustrated in Figure 4, the first stage affects the second by situating the initial position for
the second-stage diffusion process. That is, the initial condition for the decision stage accu-
mulation process Z1(t) coincides with the (random) position of the ad exposure stage process
Z0(t = 30) at the end of the video ad, as given by Equation 3.17
Moreover, from examination of this Eq. (3), we see that the sign of the leakage parameter
‚0 determines whether the mean and variance of the sample path Z0(t) converges (for ‚0 < 0)
or diverges (for ‚0 > 0) as t →∞; in the absence of ‚0, the sample paths likewise diverge with
probability one. As we discuss below, the non-divergence of the diffusion in the initial stage
16This follows from the Fokker-Planck equation; see also Appendix B for derivations.17Frydman & Nave (2016) also consider a DDM model with random starting points across subjects.
13
is important for fitting the data; in particular, it is crucial for generating the lack of piling-up in
response times at zero evident in Fig. 2.
3.2 Estimation
For the two-stage DDM, the optimal choices and response times are not characterized in closed
forms.18 Therefore, we discretize time into small finite increments and use simulation to approx-
imate the joint distribution of users’ optimal choices and response times.As discussed above,
the two-stage model is essentially equivalent to a single-stage DDM model with a random start-
ing value – a starting value given by the normal distribution in Eq. (3). We use this equivalence
for simulation.
Let fi = {t0; t1; t2; : : :} denote a grid of time-point for the second decision stage of decision-
making process. Since the ad is 30 seconds long, and users are only allowed to make decisions
after that, we set t0 = 30. We discretize time into increments of ∆t = tj+1 − tj for all j , equal
to a quarter-second (∆t = 0:25) for our estimation.19 For each candidate parameter vector
„ = (—0; —0; ‚0; ‚1; B),20 we draw a starting point as Z0(t = 30) according for Eq. (3), and then
simulate the accumulation process Z1(tj) recursively for each tj ∈ fi and user i as:
Z1;i (tj)− Z1;i (tj−1) = —1∆t + ‚1Z1;i (tj−1)∆t + ff1∆Wi (tj); for t ≥ 30 (5)
The increments of the Brownian motion ∆Wi (tj) is drawn i.i.d from N (0;∆t) for each tj ∈fi . We do not estimate ff2
0 and ff21 separately from the other parameters, but normalize ff2
0 =
ff21 = 1=∆t = 4.21 This normalization implies that the increments of the Brownian motion
18In the simple DDM model, however, closed-form expressions are available for the optimal choice probabilities
and associated response times (Shadlen et al. (2006)), and software packages (Wiecki, Sofer & Frank (2013) or
Drugowitsch (2016)) are available which exploit these closed forms for fitting the model to data. For the standard
Ornstein-Uhlenbeck process proposed in Equation 1, a closed-form density of the first-passage time is only available
when there is a single absorbing barrier, as opposed to two absorbing barriers here (Alili, Patie & Pedersen (2005)).19Section B in the appendix shows that this time-increment is accurate enough in the sense that the pdf of the
discrete accumulation process closely approximates the pdf of the continuous accumulation process.20As explained below, in the preferred specification we allow the drift terms —0 and —1 to be linear functions of
user-specific covariates.21Consider two alternative sets of parameters „ = (—0; ‚0; ff
20 ; —1; ‚1; ff
21) and „′ =
(k—0; ‚0; (kff0)2; k—1; ‚1; (kff1)
2). Then from Equation 3, Z0(t; „′) ∼ kZ0(t; „) and Z1(t; „
′) ∼ kZ1(t; „).
Hence, the two-stage DDM under „ with barrier B, and under „′ with barrier kB are observationally equivalent. By
fixing the values of ff0 and ff1, we identify —0, —1 and B up to scale.
14
∆Wi (tj) are drawn i.i.d from a standard Normal, which is convenient for simulation. To obtain
the corresponding set of estimates with a different normalization, say ff0 = ff1 = k , we just
have to multiply all the parameters by k2 , except for ‚0 and ‚1, which are unaffected by the scale
normalization.
In the data, we observe (yi ; t∗i ; Xi ), the choice and covariates, for each user i . Correspond-
ingly, for each user, we generate S (=5000) independent sample paths for the diffusion pro-
cess. For each user i and each sample path s = (1; : : : ; S), we record (yi ;s ; t∗i ;s), where
t∗i ;s is the time the sample process s first hits either the upper bound or the lower bound
(whichever comes first) for user i . Similarly, yi ;s = 1 if the sample process hits the upper
bound at t∗i ;s and yi ;s = 0 otherwise. The likelihood of each observation is approximated by
P (yi ; t∗i |Xi ; „) ≈ 1
S
PSs=1 1(yi ;s = yi ; t
∗i ;s = t∗i ). Thus we estimate „ by maximizing the full
simulated log-likelihood functionPni=1 log(Pr(yi ; t
∗i |Xi ; „)).
Despite the large number of simulations, this problem is highly parallelizable and, as such,
we utilize GPU computing to evaluate the simulated likelihood. In addition, we use an approach
similar to importance sampling (see Ackerberg (2009)) to avoid repeated simulations at each
parameter value; this further reduces the computational burden.22
3.2.1 Parameter identification
Before proceeding, we provide some discussion regarding identification of the structural model
parameters. We start by considering identification in the simpler single-stage DDM without
leakage, corresponding to Eq. (1) with ‚ = 0. In this setting, the parameters of the model are
identified up to ff, using only two moments of the data: the mean response time,23 and the
click-through probability.24
22This importance sampling procedure exploits the assumed index form on the user-specific drifts, i.e. —i0 = Xi˛0
and —i1 = Xi˛1. We pre-solve and store the mapping between the following 5 parameters (—0; —1; ‚0; ‚1; B), and
the joint probability distribution of response time and choice. Subsequently, we approximate the likelihood at a
candidate of parameters, (˛0;˛1; ‚0; ‚1; B), as a weighted average of pre-solved likelihoods, computed at the k-
nearest neighbors in the space of (—0; —1; ‚0; ‚1; B).23As discussed above, the simple DDM predicts the same response time distribution conditional on choice y = 0
or y = 1, so data on response times conditional on choice are not useful here. Thus the distributions of response
times that we observe in Figure 2 are much richer than required for identification of the basic DDM.24To our knowledge, such identification results are new in the primarily experimental literature where DDM is
prevalent; in experimental settings, the drift parameter is usually assumed to be known or estimated from indepen-
15
Lemma 1. In the simple DDM, the drift — and the bound B are identified using only two mo-
ments of the data: t ≡ E[t∗], the mean response time, and y ≡ E[y ], the probability of click-
through. The variance of the diffusion, ff2, is not separately identified and we normalize ff = ff.
Then, the estimates — and B are:
— =
8>><>>:−rff2(2y−1) log
`y
1−y
´2t if y < 0:5r
ff2(2y−1) log`
y1−y
´2t if y ≥ 0:5
(6)
B =ff
2
vuut2t log“
y1−y
”2y − 1
(7)
The expression for — in Eq. 6 has components familiar from the discrete-choice literature: in
particular, the log odds-ratio transformation log“
y1−y
”is the inverse of the binary logit choice
probability y = exp(—)=[1 + exp(—)], in which — is the latent utility difference between the binary
choice options. Unlike the binary logit model, which is a model only of choice, here Eq. (6)
demonstrates how the estimated — is affected by the response time – namely, it is larger when
the average response time t is shorter. The intuition is clear: when a choice is made rather
quickly, the preference of the individual must be more intense.
Moreover, the impact of response time on the estimated utilities is the greatest when the
choice probability is further away from 0.5 (see also Clithero (2018a)). In other words, estimated
utilities are much more sensitive to response times precisely when the two choice probabilities
are very unequal. For instance, in our mobile advertising setting where y is only around 0:02,
the estimated utility would be — = −1:37 when t = 1, and — = −0:68 when t = 4: when one of
the choices is rare, as here, data on reaction times is particularly valuable for identification.
Compared to this setting, our two-stage model contains additionally the parameters ‚1, —0,
and ‚0. The identification of these parameters is related to differences in the response time
distributions conditional on click-backs vs. click-throughs. All else equal, more positive values
of —0 implies that the starting values for the second-stage diffusion process will be closer to the
upper boundary (cf. Eq. (3), so that first passage times to the upper boundary will be shorter
than the first passage time to the lower boundary. As such, the response times conditional
on y = 1 will be shorter compared to the response times conditional on y = 0. In the data,
dent data sources, such as elicited willingness-to-pay.
16
however, we observe the opposite – click-throughs are associated with longer response times
(as in Figure 2), suggesting that —0 should be negative (as we will see below).
In addition, the ‚ parameters affect the variance of the accumulation process (cf. Eq. (3)),
so that the higher-order moments of the distribution of response times provide identification
for these parameters. Specifically, from Figure 2, we observe no density mass at zero. This
suggests that ‚0 should take negative values: otherwise, the mean and variance of the first
stage diffusion process will diverge and many users would already have already crossed a
decision threshold by t = 30, resulting in a “piling up” of response times at zero.
3.3 Monte Carlo
Before moving on to the estimates, we present Monte Carlo simulations which demonstrate that
our estimation procedure performs well. The results are reported in Table 2. We considered
five designs, differing in the values of the model parameters. For each design, we generated
datasets consisting of N = 1000 observations, and repeated each design for 100 replications.
For each replication, we estimate these parameters using the procedure described in Section
3.2. Evidently, from Table 2, the estimation procedure works well, and recovers parameters
quite accurately even at a modest sample size of N = 1000.
4 Estimation Results
4.1 Single-stage model results
We first estimate a single-stage model which leaves out the ad exposure stage. While this model
is descriptively wrong for our mobile advertising platform, a discussion of these results will
highlight the benefits of using response time data in conjunction with choice data in estimation.
The results are eported in Table 3. In Column 1, the parameters of the simple DDM are
recovered using the equations in Lemma 1, which only requires two moments from the data —
the mean response times and the overall click-through rate. The drift parameter of the simple
DDM (column 1) is —1 = −1:332, indicating a utility difference favoring the more frequent choice
of y = 0 (click-back). The bound B is estimated to be 5.778, which is roughly three times the
standard deviation ff1, which is fixed at 2.
As a computational check, in Column 2, we estimate the parameters using the simulated
17
Table 2: Results from Monte Carlo Experiments
‚0 ‚1 —0 —1 B
Design 1 True values -1.00 0.25 -6.00 0.50 10.0Avg. estimate -1.0368 0.2671 -6.1188 0.4426 10.1798
(Std dev) (0.0216) (0.0182) (0.1492) (0.0324) (0.1258)
Design 2 True values -1.00 0.25 -2.00 -1.50 10.0Avg. estimate -0.992 0.2565 -1.948 -1.5443 10.4101
(Std dev) (0.028) (0.0083) (0.0516) (0.0488) (0.2193)
Design 3 True values -2.00 0.25 2.00 -1.5 10.0Avg. estimate -1.9816 0.2638 1.8712 -1.7024 10.4561
(Std dev) (0.1324) (0.0110) (0.1112) (0.0917) (0.5673)
Design 4 True values -1.00 0.5 2.00 -1.50 10.0Avg. estimate -1.0436 0.5289 1.8044 -1.5176 10.3994
(Std dev) (0.0224) (0.0150) (0.0912) (0.0369) (0.1551)
Design 5 True values -1.00 0.25 -4.00 0.50 11.0Avg. estimate -1.0236 0.2621 -3.9292 0.5009 10.7581
(Std dev) (0.0268) (0.0171) (0.0732) (0.0118) (0.0419)Each design was run for 100 replications, with datasets of size N = 1000.
Table 3: Estimates of the single-stage DDM.
(1) (2) (3) (3)Simple DDM DDM (simulated) Leaky DDM Binary Logit
B, Bound 5.778 (0.0204) 5.588 (0.632) 7.02 (0.3313) –
—1 -1.332 (0.0064) -1.257 (0.197) -1.36 (0.0553) –
ff1 2 (fixed) 2 (fixed) 2 (fixed) –
‚1 – – 0.163 (0.0325) –
u1, “utility diff” – – – -3.84
In column (1), estimates are recovered using the Method of Moments estimators in Equations 6 and 7. Standard
errors (enclosed in parenthesis) computed using the Delta Method. Columns (2) & (3) estimates obtained via
Simulated MLE, with bootstrapped standard errors.
MLE procedure. That is, we discretize time into using a fixed increment of ∆t = 0:25 seconds,
and simulate the accumulation process according to Section 3.2 above. As we expect, the
results do not deviate far from Column 1.
18
In Column 3, we consider a richer model allowing for a non-zero leakage parameter ‚. We
now observe that both the bound B and the drift —1 are larger in magnitude. This is consistent
with the positive estimated value (0.163) for the parameter ‚: since the drift —1 is negative,
a positive value for ‚ accelerates the downward trend of the accumulation process, so that a
wider bound is required to rationalize the observed click choices and response times.
As counterpoint, in Column 4 we naïvely disregard the response time data and fit a binary
logit model P (y = 1) = [1 + exp(−u1)]−1 using only the users’ choice data. Although the utility
parameters cannot be compared between the DDM and the binary Logit, we see that on the
basis of choice data alone, the logit model requires a substantial utility difference to explain
the asymmetric observed choice probabilities (< 3% click-throughs). However in the DDM,
asymmetric choice probabilities need not imply a large utility difference – with long response
times, asymmetric choice probabilities can be consistent with a more modest utility difference,
as here.25
4.2 Two-stage model results
In Table 4, we consider specifications of the full two-stage model. Column (1) contains estimates
of a “homogeneous user” model in which the model parameters are assumed to be identical
across users. Column (2) contains estimates from a “heterogeneous user” specification in which
—0 and —1, the drift terms in the model, vary across users depending on observed covariates.
For discussion, we focus primarily on the richer heterogeneous specification.
As the parameters in the heterogeneous specification are not straightforward to interpret, we
illustrate the collective implications of these parameter estimates on users’ diffusion processes
in Figure 5. In this figure, we simulate and plot the average diffusion processes corresponding
to our estimates (along with bands at +/- two standard deviations). We see that, during the ad
exposure stage, the diffusion path (on average) drifts down away from the decison threshold for
the advertised app, but flattens out quickly. Behaviorally, this is consistent with an interpretation
that users only pay attention during the initial segment of an ad. Formally, this flattening arises
from our negative estimate for ‚0, which implies (cf. Eq. (3) that the variance of the first-stage
process is convergent and bounded. Indeed, such a pattern is critical in order to generate the
25cf. Eq. (6) above, where longer response times is indicative of indifference of preferences.
19
Table 4: Two-stage model: Parameter estimates and bootstrapped standard errors
(1) (2)Homog. Users Model Heterog. Users Model
B, Bound 8.846 (0.6383) 8.7545 (0.5541)
‚0 -2.216 (0.3396) -2.9132 (0.2316)
—0: Constant -3.116 (0.4480) -3.8052 (0.3096)
—0: Android Version 0.8772 (0.0648)
—0: Device Volume 0.3704 (0.0392)
—0: iOS Dummy 0.8648 (0.0668)
—0: iOS Version -0.4616 (0.0968)
—0: Samsung Dummy 0.4640 (0.0364)
—0: Screen Resolution 0.8332 (0.0660)
—0: WiFi 0.8024 (0.0716)
‚1 0.178 (0.0340) 0.2041 (0.0148)
—1: Constant -1.006 (0.0974) -0.9901 (0.0661)
—1: Android Version -0.0601 (0.0132)
—1: Device Volume 0.1346 (0.0091)
—1: iOS Dummy 0.0389 (0.0061)
—1: iOS Version -0.0006 (0.0173)
—1: Samsung Dummy -0.1245 (0.0090)
—1: Screen Resolution -0.2113 (0.0163)
—1: WiFi 0.0162 (0.0066)
ff0 and ff1 2 (fixed) 2 (fixed)Log-Likelihood -409654 -395630
20
lack of “piling up” at t = 30 in the reaction time distributions plotted in Figure 2.26 In the second
stage, however, the estimated ‚1 turns positive, which hastens the diffusion process towards
the boundaries.27
Figure 5: Average diffusion path (implied by heterogeneous users estimates)
0 5 10 15 20 25 30 35
Time (seconds)
-10
-5
0
5
10
Blue line: average diffusion path; shaded area: +/- two standard deviations.
The heterogeneous users specification fits the data well. As summarized in Table 5, the
predicted click-through rates and response times from this model closely match the observed
data. Our two-stage DDM is able to generate longer response times for clickthrough versus
clickback.
The coefficients on the user covariates are precisely estimated. The coefficients associated
with Device Volume and iOS Dummy are positive in both stages, which means that these vari-
ables unambiguously increase ad effectiveness. This suggests that users who viewed the ad
in an iOS mobile device, with a higher audio volume, are associated with higher click-through
rate.26If, on the contrary, ‚0 > 0, then the variance diverges and would lead to a large mass of choices at t = 30.27This amplifying role of ‚1 > 0 is consistent with some neural theories of decision-making in which the need
to make the decision become more urgent as time passes (Cisek, Puskas & El-Murr (2009), Thura, Beauregard-
Racine, Fradet & Cisek (2012)).
21
Table 5: Model Fit of Heterogeneous Model Estimates
Predicted Observed(Heterog. Users Model)
Average clickthrough probabilities (%) 2.02 2.09Mean response time (secs) 4.03 4.16
(0.64) (1.73)Mean response time for clickbacks (secs) 4.02 4.14
(0.64) (1.73)Mean of response time for clickthroughs (secs) 5.39 4.97
(0.93) (1.52)Standard deviations in parentheses. Model-predicted outcomes simulated from 100 representative users, obtained
by k-means clustering of user covariates.
Figure 6 contains kernel densities of the estimated drift terms —0 and —1 across all users.
We can clearly reject equality of the distributions; there is considerably more heterogeneity in
the first-stage drifts (—0) than the second-stage drifts (—1).
A marked bimodality characterizes the estimated density function for —0 (and, to a lesser
extent, —1). For the exposure-stage drift —0, the two modes are far apart and differ in sign (with
roughly half the mass contained in each mode), suggesting substantial heterogeneity in users’
responses to the video ad.
While the estimated values for —0 can take positive or negative values for different users i ;
the second-stage drift, —1, exhibits more limited heterogeneity, with only negative support. Any
goodwill created by the video ad towards the advertised app appears short-lived, as once the
ad finishes, the diffusion process drifts away from the click-through barrier. This may reflect
the difference in information sources between the two stages: in the exposure stage, users
obtain decision-relevant information primarily from the video ad. However, in the second stage,
users may refocus their attention back to the publisher’s app, which shifts the diffusion process
back towards the click-back barrier. This explanation is assessed in Section 4.3 below, using a
selected subsample of users who were exposed to the video ad multiple times.
4.2.1 Behavioral differences between Positive and Negative Users
Given the substantial and bimodal heterogeneity in the estimated values of the first-stage drift
—0 (evident in Fig. 6), we segment users into two groups depending on the sign of the estimated
22
Figure 6: Kernel density plot of the estimated drifts —0i and —1i for all users.
-10 -5 0 5 100
0.5
1
1.5
2
2.5
For each user i , we compute the drift parameters as —0i = Xiˆ
0 and —1i = Xiˆ
1.
—0; we call these the positive (—0 < 0) vs. negative (—0 > 0) users. Positive users drift towards
the click-through barrier as the ad plays, while negative users drift away.
Positive and negative users are behaviorally distinct, even in out-of-sample measures not
used in the estimation, as we show here. As shown in Table 6, the negative users (—0 < 0)
have much lower click-through rates (0.46% vs. 3.57%), and shorter response times (3.63
seconds, vs. 4.64 seconds). In addition, while the negative users have response times for click-
throughs which are about a second longer than for click-backs (4.87 vs. 3.62 seconds), the
positive users (—0 > 0) have equal response times on average for both actions. This suggests
that the difference in the distribution of response times across choices (shown in Figure 2)
arises almost entirely due to the segment of “negative” users.
While differences in click-through rates and response times across segments are informa-
tive, we next relate our estimated structural parameters to additional out-of-sample measures
of users’ game-play engagement and behavior. Any systematic relationship between our esti-
mates and these out-of-sample metrics serves as a validity check on our model specification.
The results are presented in Table 7.
23
Table 6: Differences in ad responses for two user segments
Negative users Positive users—0 < 0 —0 > 0 p-values
Average click-through rate 0.00459 0.0357 0.00***(0.000222) (0.000582)N = 92; 913 N = 101; 694
Average response times 3.63 4.64 0.00***(0.00580) (0.00485)N = 92; 913 N = 101; 694
Average response times 4.87 4.98 0.1537conditional on click-throughs (0.0925) (0.0242)
N = 426 N = 3633
Average response times 3.62 4.62 0.00***conditional on click-backs (0.00581) (0.00495)
N = 92487 N = 98061
(standard errors in parentheses)
Table 7: Differences in out-of-sample game-play metrics across two user segments
Negative users Positive users—0 < 0 —0 > 0 p-values
Average auction price $0.4543 $0.2125 0.00***(0.0239) (0.00832)N = 663 N = 1735
Average install rate 0.4906 0.4776 0.61conditional on click-throughs (0.0242) (0.00829)
N = 426 N = 3633
Average game-progression rate 0.4615 0.3856 0.0007***conditional on installs (0.0194) (0.0117)
N = 663 N = 1735
Average predicted spending $0.2402 $0.0684 0.00***conditional on installs (0.0690) (0.0146)
N = 5936 N = 35157
24
In our mobile advertisement platform, advertisers reimburse publishers only when an adver-
tisement leads a user to install an app, and the prospective reimbursement amount is deter-
mined via high-bid auctions. A surprising pattern is that the auction price is higher, on average,
to show an ad to negative users ($0.45) vis-a-vis positive users ($0.21). Exploring further, we
see that the reason for this is that conditional on installing an app, negative users are more prof-
itable for advertisers. Negative users progress further in the game (they complete at least level
one with probability 0:4615, compared to positive users who have a lower probability of 0:3856).
Moreover, negative users spent almost three times more – $0.23 vs. $0.08 – on average on
in-app purchases.28
Since advertisers only pay once a user installs an app, the lower click-through (and hence
install) rates for the negative users don’t influence their bid; their bids should reflect primarily
these users’ higher post-install spending. More broadly, the negative users appear, at the same
time, to be less exploratory about trying new apps, but are more serious and engaged in the
apps which they do try; these are not “casual” game-players.29
4.3 Repeated Exposures
As we discussed above, one explanation for the differences in the estimated drift distribution
between the two stages (—0 6= —1) is the difference in information sources: in the first stage,
the predominant information source may be the video advertisement, while in the second stage
the user may be recalling experiences about the originating app. If this is the case, then pre-
sumably the advertising information should become more redundant upon repeated exposure.
While most of the users in our sample are only exposed to the advertisement once, there is
a subset of users who were repeatedly exposed to the same ad (16,451 users, or 10.5% of
all observations). We can therefore leverage estimates across this subsample to assess this
explanation for how the information accumulation process differs over subsequent, repeated
28This spending is computed across all users who installed the app after viewing this advertisement, aggregated
across all publishers (not just the one publisher used in the estimation). We do this because in-game spending
occurs too infrequently – among the 200,000 users in our estimating dataset, there were < 10 spending users.
However, after pooling together all the users exposed to the study ad across all publishers, there were 157 spending
users. For each of these users, we predicted their —0 and —1 using the estimates in Table 4, and report the average
spending for those with —0 < 0 and —0 > 0.29We speculate that these users may also be more willing to pay for “premium” versions of the game without ads,
but do not have the requisite data to test this.
25
exposures of the same ad. These results are reported in Table 8.
Of particular note, we observe a significant decrease in the estimate for ‚0, from -1.38 at
the first exposure to -1.95 at the second. This may reflect users’ underweighting information in
the ad in later viewings of the ad, consistently with an interpretation that advertising information
becomes redundant after multiple exposures. Conversely, we see that ‚1 does not change
much across exposures (0.14 vs. 0.17), suggesting that the information considered by users in
the second stage is not much affected by multiple ad exposures. Moreover, we do not observe
a significant change in the estimate of the decision threshold B.
5 Counterfactuals: Nonskippable vs. skippable ads
In this section we use our estimation results to consider alternative advertising designs beyond
the “non-skippable” ad format observed in our estimation dataset. In these counterfactuals,
we consider alternative ad formats, in which users are permitted to skip part of the ad.30 We
compute a range of simulations where we vary the ad exposure duration, the number of seconds
users are forced to be exposed to the ad, between 1 second to 30 seconds.
Specifically, in these simulations, we assume that the ad is cut off after x (∈ [1; : : : ; 30])
seconds. The accumulation process from the ad exposure stage follows Eq. (12), while the
accumulation process from the decision period (Eq. (4)) begins once users are allowed to skip
the ad.31 The counterfactual change in the click-through probabilities are reported in Table 9.
Counterfactuals for both the homogenous and heterogenous specifications demonstrate that
the biggest changes in the click-through rate occur for an ad exposure of 3 seconds or less —
beyond this, click-through rates do not change appreciably, even up to 30 seconds. Ultimately,
there is little difference between forcing viewers to watch only the first 3 seconds of a video ad
versus the entire 30-second ad. Mathematically, this arises from the estimates of the leakage
parameter ‚0 < 0, which implies that the variance of the accumulation during the ad exposure
stage converges. Such a result rationalizes the practice of YouTube’s ad platform, which allows
30Dukes, Liu & Shuai (2018) present a theoretical analysis of skippable ads.31Since the two-stage model is a one-stage DDM with a random starting point, for estimation we fit the latter model
(because no data is generated from the first stage). But in the counterfactual analysis here, the explicit structure of
the two-stage model becomes important, as it pins down (cf. Eq. (3)) how the distribution of starting values changes
as we vary the length of time before viewers are permitted to “skip” the ad.
26
Table 8: Parameter estimates: first vs. second exposure. Estimated on subsample of userswith multiple exposures to ad
(1) (2)First exposure Second exposure
B, Bound 9.5339 (0.6399) 9.2888 (0.7540)
‚0 -1.3808 (0.1012) -1.9468 (0.1324)
—0: Constant -0.1976 (0.0440) -1.7692 (0.1220)
—0: Android Version 0.4608 (0.0748) 0.5872 (0.0816)
—0: Device Volume 0.4496 (0.0372) 0.1964 (0.0260)
—0: iOS Dummy -0.3972 (0.0396) -1.8536 (0.1448)
—0: iOS Version -0.0728 (0.0620) 0.1532 (0.0648)
—0: Samsung Dummy -0.1920 (0.0420) 0.2044 (0.0224)
—0: Screen Resolution -0.6976 (0.0712) -0.0064 (0.0304)
—0: WiFi -1.0976 (0.0848) -1.3004 (0.1052)
‚1 0.1434 (0.0196) 0.1721 (0.0261)
—1: Constant -1.4036 (0.0870) -1.7017 (0.1269)
—1: Android Version -0.0014 (0.0176) 0.0253 (0.0210)
—1: Device Volume 0.3161 (0.0232) 0.2786 (0.0193)
—1: iOS Dummy -0.2351 (0.0165) -0.2282 (0.0179)
—1: iOS Version 0.0306 (0.0161) 0.0174 (0.0196)
—1: Samsung Dummy -0.5980 (0.0374) -0.2839 (0.0252)
—1: Screen Resolution 0.0173 (0.0126) -0.0453 (0.0096)
—1: WiFi -0.1359 (0.0107) -0.0150 (0.0066)
ff0 and ff1 2 (fixed) 2 (fixed)
N 16,451 16,451Log-Likelihood -32,292 -32,048
(bootstrapped standard errors in parentheses)
27
users to skip an ad after a few seconds.
Table 9: Counterfactual click-through probabilities.
Treatments: Homog. Users Model Heterog. Users Model
ad exposure Click-through Click-through Click-throughduration (secs.) probability probability (mean) probability (sd.)
0.5 0.0282 0.0184 0.0099
1.0 0.0225 0.0199 0.0120
2 0.0207 0.0204 0.0126
5 0.0201 0.0203 0.0126
15 0.0201 0.0204 0.0127
30 (benchmark) 0.0202 0.0204 0.0126
Homogeneous results utilize estimates from Column (1) of Table 4. Heterogeneous results utilize estimates from
Column (2) of Table 4, and report simulated counterfactual click-through probabilities across 50 representative
users (by partitioning the characteristics space of the users using k-means clustering).
However, the aggregate results shown in Table 9 mask a great deal of heterogeneity in the
counterfactual outcomes across users; particularly, across the two user segments (negative vs.
positive users) that we identified earlier. In Figure 7, we graph the counterfactual click-through
rates across the treatments, for two representative users from each of these segments. Clearly,
for negative users, the highest click-through rates are obtained by minimizing their ad exposure,
but positive users are more likely to respond positively after longer ad exposure. This difference
is important given the difference in revenue from these two segments, as we now discuss.
5.1 Implications for publishers’ mobile ad revenues
Finally, we quantify how different counterfactual scenarios affect the publisher’s revenue. Be-
cause in our mobile advertising platform, publishers are paid for hosting ads only when users
install the advertised app,32 revenue projections require models not only of users’ clicking be-
havior (which we have already estimated), but also predictive models of user installation behav-
ior, conditional on click-throughs.
32This differs from other advertising markets in which publishers are paid per click (as in banner ad markets) or
per exposure (as in traditional media markets).
28
Figure 7: Counterfactual click-through probabilities.
0 1 2 3 4 50.005
0.01
0.015
0.02
0.025
0.03
0.035
Horizontal axis: ad exposure duration (secs.). This figure shows the counterfactual click-through rates for two
representative users, one with —0 > 0, the other with —0 < 0.
For this exercise, then, we estimated predictive nonparametric models (random forests) for
the relationships between between user’s characteristics and (i) the rate of installs conditional
on clicks, (ii) the advertiser’s payment for that user conditional on installing. Subsequently we
used these predictive models to predict ad revenues for each user, under each counterfactual
scenario explored above.
The results in Table 10 show that if we uniformly reduce the ad exposure duration from the
30s benchmark to half a second for all users (second row of Table 10), total revenue would
increase by 6.35%. Moreover, because counterfactual click-through rates are virtually constant
for ad exposure duration ≥ 3 seconds (as in Figure 7), there are only minor effects on publisher
revenue from showing the entire ad of 30 seconds vs. cutting it off at 3 seconds. Indeed,
reducing ad exposure to 2 seconds uniformly for all users yields roughly the same revenue
(-0.30%) as the 30-second benchmark. These results may arise in part from the fact that
the advertised app is a “sequel” in a popular series of games, so that users may have little
information to glean about the advertised app, apart from the fact that it exists.
However, the publisher can do much better: their revenue can be increased further by per-
29
Table 10: Publishers’ advertising revenue under different counterfactual scenarios.
Counterfactual forced ad exposure duration % change in revenue relative to 30s
Uniform treatment:30s (benchmark) —0.5s +6.35%2s -0.30%Optimized treatment by segment:0.5s for negative users; 30s for positive users +18.03%0.5s for negative users; 2s for positive users +17.47%
sonalizing the counterfactual design treatment across users based on observable covariates. In
Figure 8, we plot the counterfactual revenue generated by representative users for the two user
segments, as a function of the ad exposure duration. (Revenues are denominated in dollars per
thousand user impressions, corresponding to the standard “CPM” measure in the advertising
industry.) For the negative users, the CPM is optimized at just over $2.15 when we cut the ad at
1s, while for positive users, revenues are optimized $2.56 when ad exposure is maintained at
the 30s benchmark.33 Accordingly, the bottom rows of Table 1 show that the publisher’s revenue
increases by 18.03% relative to the benchmark, when the ad exposure durations are optimized
by segment (0.5s for negative users, 30s for positive users).
6 Conclusions
We study how choice and response time data from the field can be combined to estimate
a structural model of smartphone users’ responses to mobile advertisements. We consider
a two-stage drift-diffusion model in which the combination of response time with choice data
allows separate identification of the diffusion processes characterizing users’ preferences when
the ad is playing, as well as when users face a subsequent decision to click-through on the
ad. In our analysis, we find two segments of users: a “positive" segment is initially drawn
to the advertised app (and is more likely to click-through with longer response times) while a
second “negative" segment responds much more negatively to the advertisement (and is less
likely to click-through). We also find that these two segments differ substantially on out-of-
sample metrics. Negative users are more engaged in the apps they install, and conditional on
33For perspective, Instagram’s average CPM for US users was $5.10 in 2016 Q1 (Salesforce Advertising Index
Q1 2016 Report).
30
Figure 8: Counterfactual ad revenue
0 2 4 6 8 100
2
4
6
8
Horizontal axis: ad exposure duration (in seconds). Vertical axis: publisher’s ad revenue ($/(’000 impressions)).
installation, spend more in-game.
Using our estimates, we address the counterfactual of whether users should be permitted to
skip part or all of a video advertisement before making a choice. Overall, we find that allowing
users to skip the ad after 3 seconds yields roughly the same revenue as forcing them to view the
entire thirty-second ad, thus rationalizing the practice of some platforms (e.g. YouTube) where
users can skip an ad after watching for a few seconds. However, the effects are very heteroge-
neous across users. While positive users’ propensity to click on the ad increases over time, the
opposite is true for negative users; overall, however, publishers’ revenues are optimized withh
shorter forced ad exposures, as this appeals more to the more lucrative negative users. Ad
revenue can be increased even further if the “skip-ability" of the ad is targeted and personalized
according to users’ demographics.
We conclude by mentioning several extensions. First, given the results in this paper, one
natural question is whether response time data can be useful for choice prediction. Such
usefulness has already been demonstrated in choice experiments in the lab,34 and it will be
interesting to examine whether these benefits are also present in field data.
34See Clithero (2018a) and the cites therein.
31
Second, the use of structural estimation and modeling to address the effects of skippable
vs. non-skippable ads is novel relative to existing methodologies for determining policy effects
in online and mobile platforms, which typically involve randomized A/B testing.35 These two
approaches are complementary. While AB testing yields convenient measures of policy effects,
the structural parameters estimated in our approach allow us to identify segments of behav-
iorally distinct users – the negative and positive users – which are policy-relevant and important
for effective ad targeting. More generally, a structural model is required to extract information
from endogenous variables like response time. Otherwise, simply including response times as
an explanatory variable in a regression leads to difficulties in both interpretation of parameters
and choice prediction.
Finally, most clickstream datasets keep track of consumers’ choices in multinomial choice
settings, involving choice among more than two items. A common extension of the DDM model
for multinomial choices is the “race” model, in which consumers face N competing accumulation
processes, and the first process to hit a common “finish line” is chosen.36 Formally, this model is
equivalent to the competing risk model from statistical duration analysis, and it will be interesting
to explore these connections.
35Lewis & Rao (2015) discuss limitations of using randomized trials for evaluating online advertising campaigns.36See Webb (2019) and Marley & Colonius (1992)
32
References
Ackerberg, D. A. (2009). A new use of importance sampling to reduce computational burden in simulationestimation. Quanatitative Marketing and Economics, 7 (4), 343–376.
Alili, L., Patie, P., & Pedersen, J. L. (2005). Representations of the first hitting time density of an ornstein-uhlenbeck process. Stochastic Models, 21(4), 967–980.
Alos-Ferrer, C., Fehr, E., & Netzer, N. (2018). Time will tell: Recovering preferences when choices arenoisy. SSRN Working Paper.
Baldassi, C., Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., & Pirazzini, M. (2019). A behavioralcharacterization of the drift diffusion model and its multi-alternative extension to choice under timepressure. Working Paper.
Block, H. & Marschak, J. (1960). Random orderings and stochastic theories of responses. CowlesFoundation Discussion Paper, 7 (147), 97–132.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decisionmaking: a formal analysis of models of performance in two-alternative forced-choice tasks. Psycho-logical Review, 113(4), 700.
Branco, F., Sun, M., & Villas-Boas, J. M. (2012). Optimal search for product information. ManagementScience, 58(11), 2037–2056.
Busemeyer, J. R. & Townsend, J. T. (1992). Fundamental derivations from decision field theory. Mathe-matical Social Sciences, 23(3), 255–282.
Chen, R. & Chiong, K. (2016). An empirical model of mobile advertising platforms. Working paper.
Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: the urgency-gatingmodel. Journal of Neuroscience, 29(37), 11560–11571.
Clithero, J. A. (2018a). Improving out-of-sample predictions using response times and a model of thedecision process. Journal of Economic Behavior & Organization, 148, 344–375.
Clithero, J. A. (2018b). Response times in economics: Looking through the lens of sequential samplingmodels. Journal of Economic Psychology.
Crawford, G. S. & Shum, M. (2005). Uncertainty and learning in pharmaceutical demand. Econometrica,73(4), 1137–1173.
Drugowitsch, J. (2016). Fast and accurate monte carlo sampling of first-passage times fromwiener diffusion models. Scientific Reports, 6, 20490. Software package available athttps://github.com/DrugowitschLab/dm.
Dukes, A. J., Liu, Q., & Shuai, J. (2018). Interactive advertising: The case of skippable ads. SSRN.
Echenique, F. & Saito, K. (2017). Response time and utility. Journal of Economic Behavior & Organiza-tion, 139, 49–59.
Erdem, T. & Keane, M. P. (1996). Decision-making under uncertainty: Capturing dynamic brand choiceprocesses in turbulent consumer goods markets. Marketing Science, 15(1), 1–20.
33
Fehr, E. & Rangel, A. (2011). Neuroeconomic Foundations of Economic Choice—Recent Advances.Journal of Economic Perspectives, 25(4), 3–30.
Frydman, C. & Krajbich, I. (2017). Using response times to infer others’ beliefs: An application toinformation cascades. SSRN Working Paper.
Frydman, C. & Nave, G. (2016). Extrapolative beliefs in perceptual and economic decisions: Evidenceof a common mechanism. Management Science, 63(7), 2340–2352.
Fudenberg, D., Newey, W., Strack, P., & Strzalecki, T. (2019). Testing the drift diffusion model. SSRNWorking Paper.
Fudenberg, D., Strack, P., & Strzalecki, T. (2018). Speed, accuracy, and the optimal timing of choices.American Economic Review, 108, 3651–3684.
Gold, J. I. & Shadlen, M. N. (2002). Banburismus and the brain: decoding the relationship betweensensory stimuli, decisions, and reward. Neuron, 36(2), 299–308.
Gossner, O., Steiner, J., & Stewart, C. (2018). Attention Please! Working Paper.
Gwinn, R., Leber, A. B., & Krajbich, I. (2019). The spillover effects of attentional learning on value-basedchoice. Cognition, 182, 6–11.
Hare, T. A., Schultz, W., Camerer, C. F., O’Doherty, J. P., & Rangel, A. (2011). Transformation of stimulusvalue signals into motor commands during simple choice. Proc. Nat. Academy of Sciences, 18120–18125.
Hebert, B. & Woodford, M. (2017). Rational inattention with sequential information sampling. StanfordWorking Paper.
Hong, H. & Shum, M. (2006). Using price distributions to estimate search costs. RAND Journal ofEconomics, 37 (2), 257–275.
Honka, E. (2014). Quantifying search and switching costs in the us auto insurance industry. RANDJournal of Economics, 45(4), 847–884.
Ke, T. & Villas-Boas, M. (2019). Optimal learning before choice. Journal of Economic Theory, (180),383–437.
Konovalov, A. & Krajbich, I. (2019). Revealed strength of preference: Inference from response times.Judgement and Decision Making.
Krajbich, I. (2019). Accounting for attention in sequential sampling models of decision making. CurrentOpinion in Psycholgy, 29.
Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison ofvalue in simple choice. Nature Neuroscience, 13(10), 1292.
Krajbich, I., Bartling, B., Hare, T., & Fehr, E. (2015). Rethinking fast and slow based on a critique ofreaction-time reverse inference. Nature Communications, 6, 7455.
Lehmann, E. (1959). Testing Statistical Hypotheses. Wiley. first edition.
34
Lewis, R. A. & Rao, J. M. (2015). The unfavorable economics of measuring the returns to advertising.Quarterly Journal of Economics, 130(4), 1941–1973.
Link, S. W. (1992). The wave theory of difference and similarity. Psychology Press.
Lu, J. & Hutchinson, J. W. (2017). Split-second decisions during online information search: Static vs.dynamic decision thresholds for making the first click. Technical report, Wharton.
Luce, R. D. (1991). Response Times: Their Role in Inferring Elementary Mental Organization. Oxford.
Marley, A. A. J. & Colonius, H. (1992). The “horse race” random utility model for choice probabilities andreaction times, and its compering risks interpretation. Journal of Mathematical Psychology, 36(1),1–20.
Mormann, M. amd Navalpakkam, V., Koch, C., & Rangel, A. (2012). Relative visual saliency differencesinduce sizable bias in consumer choice. Journal of Consumer Psychology, 22, 67–74.
Morris, S. & Strack, P. (2019). The wald problem and the relation of sequential sampling and ex-anteinformation costs. SSRN Working Paper.
Otter, T., Allenby, G. M., & Van Zandt, T. (2008). An integrated model of discrete choice and responsetime. Journal of Marketing Research, 45(5), 593–607.
Ratcliff, R. & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decisiontasks. Neural Computation, 20(4), 873–922.
Roitman, J. & Shadlen, M. (2002). Response of neurons in the lateral intraparietal area during a com-bined visual discrimination reaction time task. Journal of Neuroscience, 22(21), 9475–9489.
Seiler, S. & Pinna, F. (2017). Estimating search benefits from path-tracking data: measurement anddeterminants. Marketing Science, 36(4), 565–589.
Shadlen, M. & Shohamy, D. (2016). Decision making and sequential sampling from memory. Neuron,(90), 927–939.
Shadlen, M. N., Hanks, T. D., Churchland, A. K., Kiani, R., & Yang, T. (2006). The speed and accuracy ofa simple perceptual decision: a mathematical primer. In K. Doya, S. Ishii, A. Pouget, & R. Rao (Eds.),Bayesian Brain Probabilistic Approaches to Neural Coding (pp. 201–236). MIT Press.
Smith, P. L. (2000). Stochastic dynamic models of response time and accuracy: A foundational primer.Journal of Mathematical Psychology, 44(3), 408–463.
Thura, D., Beauregard-Racine, J., Fradet, C.-W., & Cisek, P. (2012). Decision making by urgency gating:theory and experimental support. Journal of Neurophysiology, 108(11), 2912–2930.
Ursu, R., Wang, Q., & Chintagunta, P. (2018). Search duration. Working paper.
Usher, M. & McClelland, J. L. (2001). The time course of perceptual choice: the leaky, competingaccumulator model. Psychological Review, 108(3), 550.
Wald, A. (1973). Sequential analysis. John Wiley.
Webb, R. (2019). The (Neural) Dynamics of Stochastic Choice. Management Science, 65, 230–255.
35
Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). Hddm: hierarchical bayesian estimation of the drift-diffusionmodel in python. Frontiers in Neuroinformatics, 7, 14.
Woodford, M. (2014). Stochastic choice: An optimizing neuroeconomic model. American EconomicReview, 104(5), 495–500.
Woodford, M. (2016). Optimal evidence accumulation and stochastic choice. Manuscript, ColumbiaUniversity.
36
A Proof of Lemma 1
Proof. Derivation from Shadlen et al. (2006), Equations (10.23) and (10.33), shows that the simple DDMdZ(t) = —dt + ffdB(t) implies:
y = E[yi ] =1
e−2B—=ff2 + 1(8)
t = E[t∗i ] =B“eB—=ff
2 − e−B—=ff2”
—`eB—=ff2 + e−B—=ff2
´ (9)
To see that ff2 is not identified through the moments above, suppose we multiply ff2 by k2. Then if wemultiply B by k and multiply — by k , the two equations above remained unchanged. That is, (ff2; B; —)and (k2ff2; kB; k—) are both solutions to the equations above.
First, we solve for — and B by normalizing ff = 1. If we normalize ff to be ff = ff, then (ffB; ff—)would be the corresponding solution. Given the normalization ff = 1, when we solve for these equationsin terms of — and B, there are only two real solutions. Among the two solutions, one of the solutions issuch that — is increasing in y , and the other solution is such that — is decreasing in y . Therefore the firstsolution is the valid one.
Moreover, when we solve for the first solution for —, we get:
— =
r2y − 1
2t
slog
„y
1− y
«(10)
Now when y < 0:5, both 2y−12t and log
“y
1−y
”are negative. Therefore, we can rewrite Equation 10
as — =√−1
2r˛
2y−12t
˛r˛log“
y1−y
”˛= −
q(2y−1) log( y
1−y )2t . When y > 0:5, both 2y−1
2t and log“
y1−y
”are
positive, and we have — =
q(2y−1) log( y
1−y )2t . This leads to the following solution (which we additionally
verified using numerical solvers):
— =
8<:−q
(2y−1) log( y1−y )
2t if y < 0:5q(2y−1) log( y
1−y )2t if y ≥ 0:5
B =1
2
vuut2t log“
y1−y
”2y − 1
˜
37
B How good is the discrete approximation?
Consider approximating the stochastic process dZt = —dt + ‚ Ztdt + ff dWt via Ztj − Ztj−1 = —∆t +‚Ztj−1 ∆t + ff∆Wtj , where tj − tj−1 = ∆t for all j = 1; : : : . Letting J = b t∆t c denote the number ofdiscrete time intervals at time t. The discretized process at time t has the following distribution:
Ztj ∼ N —∆t
J−1Xi=0
(1 + ‚∆t)i ; ff2∆tJ−1Xi=0
(1 + ‚∆t)2i
!(11)
or, equivalently:
Ztj ∼ N„—
1− (1 + ‚∆t)J
−‚ ; ff2∆t1− (1 + ‚∆t)2J
1− (1 + ‚∆t)2
«: (12)
As ∆t → 0, we check that lim∆t→0 —1−(1+‚∆t)
t∆t
−‚ = —‚
(e‚t − 1) and lim∆t→0 ∆t 1−(1+‚∆t)2t∆t
1−(1+‚∆t)2 =1
2‚
`e2‚t − 1
´. Suppose that at t = 30, Var(Z30) = 2:0833. This number is implied by the estimation
result (Column 2 of Table 4). Using the time-increment ∆t = 0:25, the resulting ‚0 is −0:247. Now as∆t → 0, we have ‚0 → −0:240. Therefore, the time-increment of a quarter-second appears to do areasonable job at approximating the continuous-time density.