combining choices and response times in the field: a drift ... · vertisements on mobile devices, a...

Combining Choices and Response Times in the Field:

a Drift-Diffusion Model of Mobile Advertisements∗

Khai Chiong

University of Texas at Dallas

Matthew Shum

California Institute of Technology

Ryan Webb

University of Toronto

Richard Chen

Happy Elements, Inc.

Abstract

We utilize the class of drift-diffusion models — originally developed in psychology andneuroeconomics to jointly explain subjects’ choices and response times in laboratory ex-periments — to model users’ responses to video advertisements on mobile devices. Thecombination of response time with choice data allows separate identification of the diffusionprocesses characterizing users when the ad is playing, as well as when they subsequentlywhether to click-through on the ad. Our estimates identify two segments of users that alsoexhibit distinct out-of-sample behaviour including app engagement and in-app spending.Counterfactual simulations utilizing our estimates predict that ads with a 2 second durationyield roughly the same revenue as forcing a user to view the entire thirty-second ad, thusrationalizing the practice of some platforms (e.g. YouTube) where users can skip an ad afterwatching for a few seconds.

Keywords: Mobile advertising, Attention, Drift-diffusion model, Response times, Videoadvertisements, Skippable ads

JEL codes: L81, M37, D83, D87, C15, C22

1 Introduction

The notion of revealed preference — that agents’ preferences can be recovered from the

choices that they make — underlies large swaths of the empirical literature in economics. In∗First draft: December 2018; this draft: October 11, 2019. Comments welcome.

We thank Colin Camerer, John Clithero, Cary Frydman, Lawrence Jin, Shijie Lu, Rosemarie Nagel, Whitney Newey,

David Reiley, Xiaoxia Shi, Jakub Steiner, Tomasz Strzalecki, Pengfei Sui and seminar attendees at Berkeley (Haas),

Caltech, Chinese University (HK), Duke (Fuqua), HKUST, HK Baptist University, Ohio State, Toronto (Rotman), 2019

ESA (Los Angeles), 2019 Workshop on the Economics of Advertising and Marketing (Porto), 2019 DSE Conference

on Structural Dynamic Models (Chicago), 2018 California Econometrics Conference (Irvine) for helpful comments.

2

many choice situations, however, the amount of time it takes decision-makers’ to make a choice

(response times) are also sharply informative about their preferences (Frydman & Krajbich,

2017; Konovalov & Krajbich, 2019; Alos-Ferrer, Fehr & Netzer, 2018).1 A swift food choice at a

restaurant may indicate strong preferences for the chosen dish relative to the other available op-

tions; the time an online shopper spends deciding between different clothing items may suggest

each item is equally stylish in the shopper’s mind. In their seminal formulation of the Random

Utility model, Block & Marschak (1960) describe:

“The shorter the delay, the larger is the difference between certain values said tobe attached by the subject to two alternative responses. A very long delay revealsa state of (almost) indifference, the ‘conflict-situation’: Hamlet took a very long timeto decide whether to kill his uncle."

In this paper, we consider a structural model of choice which incorporates not only decision-

makers’ eventual choices, but also their response times. To understand why decisions take

time, recent theoretical work has examined models of sequential information sampling (Wood-

ford, 2014; Fudenberg, Strack & Strzalecki, 2018; Gossner, Steiner & Stewart, 2018; Hebert &

Woodford, 2017; Morris & Strack, 2019; Fudenberg, Newey, Strack & Strzalecki, 2019). In this

setting, the utilities of options are unknown and must be learned by the (costly) sampling of noisy

information. Building on research in psychology and neuroeconomics, this approach models the

decision process as an accumulation and comparison of this noisy information over time. Here,

we consider the drift-diffusion model (hereafter DDM) postulated for modeling agents’ choices

in settings where decisions are made on the order of seconds (Fehr & Rangel, 2011). In the

DDM, decision-makers accumulate information about the match values, or utilities, of available

options in a choice problem. The time it takes a decision-maker to accumulate enough infor-

mation or evidence favoring a given option is modeled directly; therefore both the choice and

response time are endogenously related as part of a dynamic choice process. The parame-

ters governing this information accumulation process can then be estimated from choice and

response time data.

The DDM and its various extensions have been well-tested in laboratory experiments (see

Clithero, 2018b, for a review). However the applicability of this model to real-world, consequen-

tial decisions has remained unexamined. Our contribution is to apply the sequential-sampling1Response times are referred to by a variety of names, including reaction times, decision times, latencies (in

psychology), and dwell times (in tech companies).

3

framework in a field setting: we adapt the drift-diffusion model to analyze responses to ad-

vertisements on mobile devices, a decision scenario which in recent years has become the

dominant component of internet advertising.2 An important vehicle of mobile advertising are

video trailers — in which users of an app are shown a short video advertisement before contin-

uing to use the app. After the ad video finishes, users are prompted to make a binary choice

to either “click-through” to the app-store and download the advertised app, or “click-back” and

return to the app which they were using (the “originating” app). This field setting, which directs

users’ attention towards the advertised item as the video plays, closely mimics the types of de-

cision tasks previously used to calibrate the DDM and measure attentional manipulations in the

laboratory (Mormann, Koch & Rangel, 2012; Gwinn, Leber & Krajbich, 2019).3 We can thus use

a dataset on users’ decisions and their corresponding response times to estimate a structural

econometric model based on the DDM.

Our model is a “two-stage" extension of the drift-diffusion model motivated by our mobile ad-

vertising setting. In the initial ad exposure stage, the user is exposed to the ad and cannot make

any decision while the video ad is being played (the ad is “non-skippable", in industry parlance:

it must be played in its entirety). In the second decision stage, after the video ad finishes, users

are presented with a visual end-screen prompting them to either click-through or click-back. By

combining response times with choice data, we can separately identify the parameters of the

information accumulation process during both the ad exposure stage as well as the decision

stage, and examine how users’ behaviour differs in response to the advertisement.

We find that both as the ad plays and after it is finished, users’ diffusion processes drift, on

average, away from the advertised app, implying that an average user perceives a negative util-

ity difference between the advertised app vis-a-vis the originating app. However, these average

effects mask a great deal of user-level heterogeneity. Indeed, for many users, the estimated

diffusion process drifts toward the advertised app as the video ad plays and, indeed, a key

finding is that users can be segmented into two groups depending on whether their estimated

diffusion process drifts toward or away from the decision threshold for the advertised app dur-

2In 2018, mobile advertising in the United States accounted for roughly 75% of all spending on digital ads,

eclipsing TV advertising for the first time (https://www.forbes.com/sites/johnkoetsier/2018/02/23/mobile-advertising-

will-drive-75-of-all-digital-ad-spend-in-2018-heres-whats-changing/).3Indeed, the DDM has been previously applied to model primates watching videos, specifically the accumulation

of visual evidence in a motion discrimination task (Roitman & Shadlen, 2002). In this study, the accumulation of

neural activity in the monkey parietal cortex varied with the strength of the visual stimuli.

4

ing the ad exposure stage. These two segments of users are behaviorally distinct, and exhibit

striking differences in out-of-sample behavioral metrics including app engagement and in-app

spending.

While many empirical studies of advertising have focused on measuring the effects of ad-

vertisements on consumer choices, our structural model of the decision process permits us to

go further and assess empirically the optimal design and format of advertisements. We use our

estimates to address an important design problem for mobile advertising platforms: whether

advertisers should make their ads “skippable”: that is, to allow ad viewers to skip part or all of

the ad and make a choice before the ad finishes. We find that the click-through rates from ex-

posing viewers to the entire 30-second ad during the ad exposure stage are virtually the same

as cutting the ad after its initial five seconds. This rationalizes the practice of some websites,

notably YouTube, where users can skip an ad after viewing the first few seconds.

However these effects are also very heterogeneous across the two user segments identified

in our estimates: one segment (the ones who drift away from the advertised app) are more

willing to click on short ads while for the others it is the opposite. Such heterogeneity suggests

that ad format, just like ad content, could be targeted and individualized according to user

demographics; indeed, ad revenue would be even higher if the skippability of the ad could be

targeted to the two user segments differently. Our results suggest that mobile advertisements

have high frequency effects which vary substantially both across users and over time. The use

of response times in conjunction with choice data is important for identifying these effects.

More broadly, the DDM-based framework used in this paper provides a way of integrating

endogenous response time data organically into an empirical analysis. As more “clickstream”

datasets are becoming available to researchers in economics and marketing — in which times-

tamps are recorded for all the choices that agents are observed to make — econometric meth-

ods and models for utilizing this additional data source are crucial.4 As far as we are aware,

this paper is the first attempt to apply a model in the drift-diffusion paradigm to field data, which

demonstrates the value of applying this framework beyond the controlled laboratory settings in

which it has previously been used.5

4For an extended discussion of the endogeneity of response times in discrete choice settings, see Webb (2019).5A handful of papers have estimated the structural parameters of the DDM using data from laboratory experi-

ments, including Frydman & Nave (2016) and Clithero (2018a); see Clithero (2018b) for a review.

5

The existing literature in marketing contains a small set of papers incorporating both choices

and response times in a (non-DDM) structural choice model. An early paper of this type is Otter,

Allenby & Van Zandt (2008), who incorporated response times in a conjoint analysis. More

recently, Seiler & Pinna (2017) and Ursu, Wang & Chintagunta (2018) incorporate response

time into a structural model of consumer search.6

2 The Drift-Diffusion Model (DDM) Paradigm

The drift-diffusion model (DDM) originated in psychology as a model of subjects’ choices and

reaction times for decisions typically made within a matter of seconds.7 The DDM, and its

various extensions often referred to as “sequential sampling models", have been successfully

verified and calibrated in a wide variety of laboratory experiments and mapped to neural ac-

tivity in the brain.8 More recently, the modeling framework has been applied in the context of

economic choices.9

Consider a binary choice situation, where the difference in the utilities of the two options de-

termines an optimizing agent’s choice. In a sequential sampling model, the utilities are assumed

to be unknown to the agent, who only learns about them gradually over time. Mathematically,

the perceived utility difference between option 1 and option 0 can be modeled as a Gaussian

process Z(t) which evolves according to the stochastic differential equation

dZ(t) = (—+ ‚Z(t)) dt + ff dW (t); t > 0; Z(0) = 0: (1)

This stochastic process can be interpreted as a process of “stochastic evidence accumulation”

in favor of one or the other alternatives. Agents do not observe the drift term —, which represents

the true utility difference between alternatives; rather they only observe the noisy signal process

Z(t), where the noise follows a Wiener process Wt with standard deviation ff.

Stochastic evidence accumulation is a natural interpretation for settings where information

about utilities arrives sequentially and must be integrated by the decision-maker, as in a typical6Relatedly, Krajbich, Bartling, Hare & Fehr (2015) critique the psychology literature which has used only reaction

time data to infer agents’ choice rules, without taking subjects’ preferences or valuations of the choices into account.7We follow the presentation in Webb (2019). Book-length treatments of this modeling framework include Luce

(1991) and Link (1992). Detailed mathematical derivations of accumulation models are presented in Smith (2000).

Also see Busemeyer & Townsend (1992), Ratcliff & McKoon (2008), Krajbich, Armel & Rangel (2010).8Gold & Shadlen (2002), Hare, Schultz, Camerer, O’Doherty & Rangel (2011)9Krajbich et al. (2010), Frydman & Nave (2016), Clithero (2018a), Fudenberg et al. (2018).

6

psychophysical experiment or the sequence of video images in an advertisement. Recent work

also suggests that a stochastic accumulation process applies to decisions in which evidence is

sampled (recalled) directly from the decision-maker’s memory, rather than simply from sensory

evidence (Shadlen & Shohamy, 2016).

The ‚ parameter in the diffusion process captures serial dependence in the diffusion pro-

cess. The accommodates “carryover” effects, which is familiar from the empirical literature on

advertising. In the DDM literature, ‚ is called a “leakage" parameter (e.g. Usher & McClelland,

2001). When ‚ = 0 the stochastic process (1) is a continuous-time random walk with drift in

which each sample is independent and weighted equally. When ‚ 6= 0, the information accumu-

lated prior to t can affect the information perceived by the user at time t. As we will see below,

the inclusion of these parameters is important for explaining observed patterns in the response

time data.

Figure 1: A Sample Path of Z(t) in DDM.

t

g ∗(t)

t

Z1−

Z2

-θ

θ

Text

-B

B

t* time

In this example, the user chooses 1, with a response time of t∗.

Paired with this utility difference process is a decision rule. The most common decision rule

is the symmetric “first passage” rule: for some threshold or barrier B > 0, the agent chooses

choice 1 once Z(t) > B and chooses choice 0 once Z(t) < −B. The response time is the first

passage time to either B or−B. See Figure 1. For the purposes of specification and estimation,

we will follow most applications of the DDM and take the threshold B to be exogenously given.10

10 Optimal decision rules in the no-leakage (‚ = 0) setting have been studied since Wald’s (1973) sequential

7

Broadly speaking, the DDM choice framework resembles a “high-frequency” or “split-second”

version of dynamic search or learning models which have been analyzed theoretically and es-

timated in the empirical industrial organization and marketing literatures11, or duration models

in statistics.

3 Application: Video advertisements on mobile platforms

The number and variety of applications on mobile devices (“mobile apps”) has soared in the last

decade. A dominant revenue model for apps is the “freemium" model in which users download

the app for free, but are subsequently monetized with periodic exposure to advertisements. As

a result, mobile advertising has recently become the largest segment of digital advertising. We

will refer to the originating mobile app with which the user was engaged as the “publisher”, and

the creator of the advertisement as the “advertiser”. In most cases, both the advertiser as well

as the publisher are apps – an example could be users playing the Candy Crush app being

shown a video ad for the Clash of Clans app.12 Because of this, the same app can be active on

both the publisher and advertiser sides of the market, as they seek both to monetize existing

users by showing them ads, as well as acquire new users by advertising on other apps.

The most familiar ad format in mobile advertising is the “non-skippable” video ad: users on

a particular publisher app are served a 30 seconds video trailer ad, and are not allowed to skip

the ad. After the ad finishes playing, the endscreen (a “Call to Action” screen) is displayed and

users are prompted to take one of two actions. First, the user can close the ad, and return to

the publisher’s originating app. Second, the user can click on the ‘install’ button which takes the

user to an App Store, where the user has the opportunity to find out more about the advertised

app and install it. We will use “click-back” to denote the first action, and “click-through” to denote

the second.

sampling model (see discussions in Lehmann (1959) and Bogacz, Brown, Moehlis, Holmes & Cohen (2006)), up to

recent theoretical work by Fudenberg et al. (2018), Echenique & Saito (2017), Baldassi, Cerreia-Vioglio, Maccheroni,

Marinacci & Pirazzini (2019) and Ke & Villas-Boas (2019). Particularly, Fudenberg et al. (2018) show that optimal

decision rules in the DDM setting typically involve time-varying choice thresholds, and Fudenberg et al. (2019)

discuss testing and identification of this model. As far as we are aware, there are no optimality results for the

leakage (‚ 6= 0) case.11See, e.g., Erdem & Keane (1996), Crawford & Shum (2005), Hong & Shum (2006), Branco, Sun & Villas-Boas

(2012), Honka (2014), Lu & Hutchinson (2017)12See Chen & Chiong (2016) for a study of a mobile advertising auction market in which advertising rates are set.

8

Our data come from a mobile ad network, which acts as an intermediary between advertisers

and publishers in this two-sided mobile advertising platform. We will focus on data from a single

advertiser-publisher pair, that is, the same ad is shown to users within a single app. Both the

publisher and advertiser are popular gaming apps, with daily average users numbering in the

millions. The advertised app, in particular, is one installment in a popular game franchise.

Thus, even within this single advertiser-publisher pair, we observe N = 194; 607 ad servings,

or “impressions”, which occurred between October 1 through November 15, 2016. Since we

naturally do not observe users who were not served the ad, our analysis is entirely conditional

on a sample of users which both the platform and advertiser deemed to be profitable enough to

be shown this particular ad.

Table 1: Summary statistics.

Variables Mean Sd. Dev.

User covariates:

Android Version ∈ {1; : : : ; 8} 2.82 2.71

Device Volume ∈ [0; 1] 0.541 0.314

iOS Dummy 0/1 0.477 0.500

iOS Version 4.69 4.91

Samsung Dummy 0/1 0.280 0.449

Screen Resolution pixels (/106) 1.30 0.832

Connected via WiFi 0/1 0.959 0.198

Choice variables:

Click-through 0.0209 0.143

response time, for click-through (secs.) 4.97 1.52

response time, for click-back (secs.) 4.14 1.73

# observations (“impressions”) 194,607

Table 1 summarizes our dataset. There is substantial observed heterogeneity across users,

and we include a number of user-specific covariates Xi in our analysis, as listed at the top of the

table. For the endogenous choice variables, 190,548 (97.91%) of impressions are click-backs,

and only 4,059 (2.09%) are click-throughs. This matches the industry norm, where click-through

9

rates are typically between 1.5-3%. We also dropped users with response times exceeding 8

seconds; such excessive response times likely arise from users’ inattention while the video ad

was playing (e.g., looking away from the phone). This left us with 194,607 observations (80%

of the full sample).13

We define the response time as the time (in seconds) elapsed from the end of the ad until

a decision has been made. We plot smoothed kernel densities of response time for click-backs

(y = 0) and click-throughs (y = 1) in Figure 2. Several interesting patterns emerge. First,

there is no “piling-up” in the response time distributions at t = 0, indicating that users are not

simply waiting for the ad to end in order to click-back at the first available opportunity. This lack

of piling-up plays an important role in the identification of model parameters, as we elaborate

below.

Figure 2: Empirical density of the actual observed response times

0 2 4 6 8 10

Response times (seconds)

0

0.05

0.1

0.15

0.2

0.25

Pro

babili

ty d

ensity

Second, response times vary systematically across users’ choices: users who click-through

take on average 4.97 seconds (median=4.88), while those who click-back take on average only

4.14 seconds (median=3.99), which is significantly lower. Thus, response times matter since

they contain information on preferences. This feature of the reaction time distributions is a

robust characteristic across most ad campaigns, not just the one we chose for our analysis.

13To assess robustness, we also estimated our model in turn on larger datasets where we removed observations

with response times greater than 9 or 10 seconds. The results are very similar.

10

The histogram in Figure 3 shows that, across 439 large ad campaigns (advertiser-publisher

pairs) from our full dataset, the median time to click-through outstrips the median time to click-

back for most of them. This observation – that the less-frequent choice (e.g. click-through) is

selected after longer response times – is familiar in lab experimental settings involving attention

tasks and, indeed, motivated development of extensions to the DDM framework.14 At the very

least, this suggests that click-through decisions were not hasty or lazy decisions made with

little forethought: if anything, the longer response times for clickthroughs suggest more mental

effort.

Figure 3: Histogram of response time differences for 439 large ad campaigns (adver-

tiser/publisher pairs)

Median RT of click−throughs − Median RT of click−backs (seconds)

Freq

uenc

y

−5 0 5 10

010

2030

4050

6070

14For instance, in lab experiments where subjects were asked to judge which of two objective stimuli is larger, a

“slow-errors” phenomena was observed whereby mistakes tended to be rare events, but when subjects made them,

the response time was longer (see discussion in Luce, 1991; Woodford, 2016). The standard DDM model — one

given by Eq. (1) with ‚ = 0, and with a fixed threshold B — cannot generate slow errors; instead, it implies that the

distribution of response times is the same regardless of choice. This arises from the independence and symmetry

of the increments; see Shadlen, Hanks, Churchland, Kiani & Yang (2006) for a discussion. The two-stage DDM we

propose below does not run into this difficulty.

11

3.1 A two-stage DDM of mobile advertisements

In order to map the DDM framework to the mobile advertising data, we propose a two-stage

extension of the DDM. In our model, users initially accumulate decision-relevant information

during the ad exposure stage, but are only able to actively make a choice in the second stage

(the decision stage) after the 30-second ad has finished. While users experience a diffusion

process in both stages, we permit the parameters of the process to vary across the two stages.

Changes in parameters may reflect differences in the types or sources of information between

the two stages: for instance, during the ad exposure stage, the predominant source of infor-

mation may be the video ad itself, but in the decision stage users may be comparing the new

information about the advertised app with their recollections of previous game-play experiences

about the originating app.15

Figure 4: Two-stage Drift-Diffusion Model

t

B

-B

t=0

Ad Exposure Stage Decision Stage

Z1(t)

Z0(t)

15Previously, the attentional drift diffusion model (“aDDM”) model has been developed to examine the role of at-

tention within a DDM framework (Krajbich et al., 2010; Krajbich, 2019). In this model, the parameters of the diffusion

process are affected by measurements of the eye-movements of a decision-maker. Analogously, the differences

in parameters in the two stages of our model can reflect attentional and informational differences between the two

stages.

12

Formally, during a user’s exposure to the 30-second video advertisement (in the initial ad

exposure stage), information accumulates as:

dZ0(t) = —0 dt + ‚0 Z0(t)dt + ff0 dW (t) for 0 ≤ t ≤ 30: (2)

The drift for this process is denoted —0 and represents the utility differences during the first

stage. In this stage, there is no barrier, as users are not allowed to make a decision while the

ad plays. As the ad plays, the position of the Z0(t), at any point in time, is a Gaussian random

variable:

Z0(t) ∼ N —0

‚0

`e‚0t − 1

´;ff2

0

2‚0

“e2‚0t − 1

”!; for 0 ≤ t ≤ 30: (3)

where N (a; b) denotes a Gaussian distribution with mean a and variance b.16

After the ad has finished playing and users have the opportunity to either click-back or click-

through in the decision stage, the information accumulation process is allowed to differ, instead

governed by —1 and ‚1:

dZ1(t) = —1 dt + ‚1Z1(t) dt + ff1 dW (t); for t > 30: (4)

As users are called upon to make a decision in this stage, a barrier B is present. Figure 4

illustrates the two-stage model. A user makes the choice y = 1 at the response time t∗ if and

only if the Z1(t) process first hits the barrier B at time t∗ > 30. Similarly, the choice is y = 0 if

it hits −B first. If the parameters governing these two processes are identical (e.g. —0 = —1),

then the two signal processes are identical across the two stages.

As illustrated in Figure 4, the first stage affects the second by situating the initial position for

the second-stage diffusion process. That is, the initial condition for the decision stage accu-

mulation process Z1(t) coincides with the (random) position of the ad exposure stage process

Z0(t = 30) at the end of the video ad, as given by Equation 3.17

Moreover, from examination of this Eq. (3), we see that the sign of the leakage parameter

‚0 determines whether the mean and variance of the sample path Z0(t) converges (for ‚0 < 0)

or diverges (for ‚0 > 0) as t →∞; in the absence of ‚0, the sample paths likewise diverge with

probability one. As we discuss below, the non-divergence of the diffusion in the initial stage

16This follows from the Fokker-Planck equation; see also Appendix B for derivations.17Frydman & Nave (2016) also consider a DDM model with random starting points across subjects.

13

is important for fitting the data; in particular, it is crucial for generating the lack of piling-up in

response times at zero evident in Fig. 2.

3.2 Estimation

For the two-stage DDM, the optimal choices and response times are not characterized in closed

forms.18 Therefore, we discretize time into small finite increments and use simulation to approx-

imate the joint distribution of users’ optimal choices and response times.As discussed above,

the two-stage model is essentially equivalent to a single-stage DDM model with a random start-

ing value – a starting value given by the normal distribution in Eq. (3). We use this equivalence

for simulation.

Let fi = {t0; t1; t2; : : :} denote a grid of time-point for the second decision stage of decision-

making process. Since the ad is 30 seconds long, and users are only allowed to make decisions

after that, we set t0 = 30. We discretize time into increments of ∆t = tj+1 − tj for all j , equal

to a quarter-second (∆t = 0:25) for our estimation.19 For each candidate parameter vector

„ = (—0; —0; ‚0; ‚1; B),20 we draw a starting point as Z0(t = 30) according for Eq. (3), and then

simulate the accumulation process Z1(tj) recursively for each tj ∈ fi and user i as:

Z1;i (tj)− Z1;i (tj−1) = —1∆t + ‚1Z1;i (tj−1)∆t + ff1∆Wi (tj); for t ≥ 30 (5)

The increments of the Brownian motion ∆Wi (tj) is drawn i.i.d from N (0;∆t) for each tj ∈fi . We do not estimate ff2

0 and ff21 separately from the other parameters, but normalize ff2

0 =

ff21 = 1=∆t = 4.21 This normalization implies that the increments of the Brownian motion

18In the simple DDM model, however, closed-form expressions are available for the optimal choice probabilities

and associated response times (Shadlen et al. (2006)), and software packages (Wiecki, Sofer & Frank (2013) or

Drugowitsch (2016)) are available which exploit these closed forms for fitting the model to data. For the standard

Ornstein-Uhlenbeck process proposed in Equation 1, a closed-form density of the first-passage time is only available

when there is a single absorbing barrier, as opposed to two absorbing barriers here (Alili, Patie & Pedersen (2005)).19Section B in the appendix shows that this time-increment is accurate enough in the sense that the pdf of the

discrete accumulation process closely approximates the pdf of the continuous accumulation process.20As explained below, in the preferred specification we allow the drift terms —0 and —1 to be linear functions of

user-specific covariates.21Consider two alternative sets of parameters „ = (—0; ‚0; ff

20 ; —1; ‚1; ff

21) and „′ =

(k—0; ‚0; (kff0)2; k—1; ‚1; (kff1)

2). Then from Equation 3, Z0(t; „′) ∼ kZ0(t; „) and Z1(t; „

′) ∼ kZ1(t; „).

Hence, the two-stage DDM under „ with barrier B, and under „′ with barrier kB are observationally equivalent. By

fixing the values of ff0 and ff1, we identify —0, —1 and B up to scale.

14

∆Wi (tj) are drawn i.i.d from a standard Normal, which is convenient for simulation. To obtain

the corresponding set of estimates with a different normalization, say ff0 = ff1 = k , we just

have to multiply all the parameters by k2 , except for ‚0 and ‚1, which are unaffected by the scale

normalization.

In the data, we observe (yi ; t∗i ; Xi ), the choice and covariates, for each user i . Correspond-

ingly, for each user, we generate S (=5000) independent sample paths for the diffusion pro-

cess. For each user i and each sample path s = (1; : : : ; S), we record (yi ;s ; t∗i ;s), where

t∗i ;s is the time the sample process s first hits either the upper bound or the lower bound

(whichever comes first) for user i . Similarly, yi ;s = 1 if the sample process hits the upper

bound at t∗i ;s and yi ;s = 0 otherwise. The likelihood of each observation is approximated by

P (yi ; t∗i |Xi ; „) ≈ 1

S

PSs=1 1(yi ;s = yi ; t

∗i ;s = t∗i ). Thus we estimate „ by maximizing the full

simulated log-likelihood functionPni=1 log(Pr(yi ; t

∗i |Xi ; „)).

Despite the large number of simulations, this problem is highly parallelizable and, as such,

we utilize GPU computing to evaluate the simulated likelihood. In addition, we use an approach

similar to importance sampling (see Ackerberg (2009)) to avoid repeated simulations at each

parameter value; this further reduces the computational burden.22

3.2.1 Parameter identification

Before proceeding, we provide some discussion regarding identification of the structural model

parameters. We start by considering identification in the simpler single-stage DDM without

leakage, corresponding to Eq. (1) with ‚ = 0. In this setting, the parameters of the model are

identified up to ff, using only two moments of the data: the mean response time,23 and the

click-through probability.24

22This importance sampling procedure exploits the assumed index form on the user-specific drifts, i.e. —i0 = Xi˛0

and —i1 = Xi˛1. We pre-solve and store the mapping between the following 5 parameters (—0; —1; ‚0; ‚1; B), and

the joint probability distribution of response time and choice. Subsequently, we approximate the likelihood at a

candidate of parameters, (˛0;˛1; ‚0; ‚1; B), as a weighted average of pre-solved likelihoods, computed at the k-

nearest neighbors in the space of (—0; —1; ‚0; ‚1; B).23As discussed above, the simple DDM predicts the same response time distribution conditional on choice y = 0

or y = 1, so data on response times conditional on choice are not useful here. Thus the distributions of response

times that we observe in Figure 2 are much richer than required for identification of the basic DDM.24To our knowledge, such identification results are new in the primarily experimental literature where DDM is

prevalent; in experimental settings, the drift parameter is usually assumed to be known or estimated from indepen-

15

Lemma 1. In the simple DDM, the drift — and the bound B are identified using only two mo-

ments of the data: t ≡ E[t∗], the mean response time, and y ≡ E[y ], the probability of click-

through. The variance of the diffusion, ff2, is not separately identified and we normalize ff = ff.

Then, the estimates — and B are:

— =

8>><>>:−rff2(2y−1) log

`y

1−y

´2t if y < 0:5r

ff2(2y−1) log`

y1−y

´2t if y ≥ 0:5

(6)

B =ff

2

vuut2t log“

y1−y

”2y − 1

(7)

The expression for — in Eq. 6 has components familiar from the discrete-choice literature: in

particular, the log odds-ratio transformation log“

y1−y

”is the inverse of the binary logit choice

probability y = exp(—)=[1 + exp(—)], in which — is the latent utility difference between the binary

choice options. Unlike the binary logit model, which is a model only of choice, here Eq. (6)

demonstrates how the estimated — is affected by the response time – namely, it is larger when

the average response time t is shorter. The intuition is clear: when a choice is made rather

quickly, the preference of the individual must be more intense.

Moreover, the impact of response time on the estimated utilities is the greatest when the

choice probability is further away from 0.5 (see also Clithero (2018a)). In other words, estimated

utilities are much more sensitive to response times precisely when the two choice probabilities

are very unequal. For instance, in our mobile advertising setting where y is only around 0:02,

the estimated utility would be — = −1:37 when t = 1, and — = −0:68 when t = 4: when one of

the choices is rare, as here, data on reaction times is particularly valuable for identification.

Compared to this setting, our two-stage model contains additionally the parameters ‚1, —0,

and ‚0. The identification of these parameters is related to differences in the response time

distributions conditional on click-backs vs. click-throughs. All else equal, more positive values

of —0 implies that the starting values for the second-stage diffusion process will be closer to the

upper boundary (cf. Eq. (3), so that first passage times to the upper boundary will be shorter

than the first passage time to the lower boundary. As such, the response times conditional

on y = 1 will be shorter compared to the response times conditional on y = 0. In the data,

dent data sources, such as elicited willingness-to-pay.

16

however, we observe the opposite – click-throughs are associated with longer response times

(as in Figure 2), suggesting that —0 should be negative (as we will see below).

In addition, the ‚ parameters affect the variance of the accumulation process (cf. Eq. (3)),

so that the higher-order moments of the distribution of response times provide identification

for these parameters. Specifically, from Figure 2, we observe no density mass at zero. This

suggests that ‚0 should take negative values: otherwise, the mean and variance of the first

stage diffusion process will diverge and many users would already have already crossed a

decision threshold by t = 30, resulting in a “piling up” of response times at zero.

3.3 Monte Carlo

Before moving on to the estimates, we present Monte Carlo simulations which demonstrate that

our estimation procedure performs well. The results are reported in Table 2. We considered

five designs, differing in the values of the model parameters. For each design, we generated

datasets consisting of N = 1000 observations, and repeated each design for 100 replications.

For each replication, we estimate these parameters using the procedure described in Section

3.2. Evidently, from Table 2, the estimation procedure works well, and recovers parameters

quite accurately even at a modest sample size of N = 1000.

4 Estimation Results

4.1 Single-stage model results

We first estimate a single-stage model which leaves out the ad exposure stage. While this model

is descriptively wrong for our mobile advertising platform, a discussion of these results will

highlight the benefits of using response time data in conjunction with choice data in estimation.

The results are eported in Table 3. In Column 1, the parameters of the simple DDM are

recovered using the equations in Lemma 1, which only requires two moments from the data —

the mean response times and the overall click-through rate. The drift parameter of the simple

DDM (column 1) is —1 = −1:332, indicating a utility difference favoring the more frequent choice

of y = 0 (click-back). The bound B is estimated to be 5.778, which is roughly three times the

standard deviation ff1, which is fixed at 2.

As a computational check, in Column 2, we estimate the parameters using the simulated

17

Table 2: Results from Monte Carlo Experiments

‚0 ‚1 —0 —1 B

Design 1 True values -1.00 0.25 -6.00 0.50 10.0Avg. estimate -1.0368 0.2671 -6.1188 0.4426 10.1798

(Std dev) (0.0216) (0.0182) (0.1492) (0.0324) (0.1258)

Design 2 True values -1.00 0.25 -2.00 -1.50 10.0Avg. estimate -0.992 0.2565 -1.948 -1.5443 10.4101

(Std dev) (0.028) (0.0083) (0.0516) (0.0488) (0.2193)

Design 3 True values -2.00 0.25 2.00 -1.5 10.0Avg. estimate -1.9816 0.2638 1.8712 -1.7024 10.4561

(Std dev) (0.1324) (0.0110) (0.1112) (0.0917) (0.5673)

Design 4 True values -1.00 0.5 2.00 -1.50 10.0Avg. estimate -1.0436 0.5289 1.8044 -1.5176 10.3994

(Std dev) (0.0224) (0.0150) (0.0912) (0.0369) (0.1551)

Design 5 True values -1.00 0.25 -4.00 0.50 11.0Avg. estimate -1.0236 0.2621 -3.9292 0.5009 10.7581

(Std dev) (0.0268) (0.0171) (0.0732) (0.0118) (0.0419)Each design was run for 100 replications, with datasets of size N = 1000.

Table 3: Estimates of the single-stage DDM.

(1) (2) (3) (3)Simple DDM DDM (simulated) Leaky DDM Binary Logit

B, Bound 5.778 (0.0204) 5.588 (0.632) 7.02 (0.3313) –

—1 -1.332 (0.0064) -1.257 (0.197) -1.36 (0.0553) –

ff1 2 (fixed) 2 (fixed) 2 (fixed) –

‚1 – – 0.163 (0.0325) –

u1, “utility diff” – – – -3.84

In column (1), estimates are recovered using the Method of Moments estimators in Equations 6 and 7. Standard

errors (enclosed in parenthesis) computed using the Delta Method. Columns (2) & (3) estimates obtained via

Simulated MLE, with bootstrapped standard errors.

MLE procedure. That is, we discretize time into using a fixed increment of ∆t = 0:25 seconds,

and simulate the accumulation process according to Section 3.2 above. As we expect, the

results do not deviate far from Column 1.

18

In Column 3, we consider a richer model allowing for a non-zero leakage parameter ‚. We

now observe that both the bound B and the drift —1 are larger in magnitude. This is consistent

with the positive estimated value (0.163) for the parameter ‚: since the drift —1 is negative,

a positive value for ‚ accelerates the downward trend of the accumulation process, so that a

wider bound is required to rationalize the observed click choices and response times.

As counterpoint, in Column 4 we naïvely disregard the response time data and fit a binary

logit model P (y = 1) = [1 + exp(−u1)]−1 using only the users’ choice data. Although the utility

parameters cannot be compared between the DDM and the binary Logit, we see that on the

basis of choice data alone, the logit model requires a substantial utility difference to explain

the asymmetric observed choice probabilities (< 3% click-throughs). However in the DDM,

asymmetric choice probabilities need not imply a large utility difference – with long response

times, asymmetric choice probabilities can be consistent with a more modest utility difference,

as here.25

4.2 Two-stage model results

In Table 4, we consider specifications of the full two-stage model. Column (1) contains estimates

of a “homogeneous user” model in which the model parameters are assumed to be identical

across users. Column (2) contains estimates from a “heterogeneous user” specification in which

—0 and —1, the drift terms in the model, vary across users depending on observed covariates.

For discussion, we focus primarily on the richer heterogeneous specification.

As the parameters in the heterogeneous specification are not straightforward to interpret, we

illustrate the collective implications of these parameter estimates on users’ diffusion processes

in Figure 5. In this figure, we simulate and plot the average diffusion processes corresponding

to our estimates (along with bands at +/- two standard deviations). We see that, during the ad

exposure stage, the diffusion path (on average) drifts down away from the decison threshold for

the advertised app, but flattens out quickly. Behaviorally, this is consistent with an interpretation

that users only pay attention during the initial segment of an ad. Formally, this flattening arises

from our negative estimate for ‚0, which implies (cf. Eq. (3) that the variance of the first-stage

process is convergent and bounded. Indeed, such a pattern is critical in order to generate the

25cf. Eq. (6) above, where longer response times is indicative of indifference of preferences.

19

Table 4: Two-stage model: Parameter estimates and bootstrapped standard errors

(1) (2)Homog. Users Model Heterog. Users Model

B, Bound 8.846 (0.6383) 8.7545 (0.5541)

‚0 -2.216 (0.3396) -2.9132 (0.2316)

—0: Constant -3.116 (0.4480) -3.8052 (0.3096)

—0: Android Version 0.8772 (0.0648)

—0: Device Volume 0.3704 (0.0392)

—0: iOS Dummy 0.8648 (0.0668)

—0: iOS Version -0.4616 (0.0968)

—0: Samsung Dummy 0.4640 (0.0364)

—0: Screen Resolution 0.8332 (0.0660)

—0: WiFi 0.8024 (0.0716)

‚1 0.178 (0.0340) 0.2041 (0.0148)

—1: Constant -1.006 (0.0974) -0.9901 (0.0661)

—1: Android Version -0.0601 (0.0132)

—1: Device Volume 0.1346 (0.0091)

—1: iOS Dummy 0.0389 (0.0061)

—1: iOS Version -0.0006 (0.0173)

—1: Samsung Dummy -0.1245 (0.0090)

—1: Screen Resolution -0.2113 (0.0163)

—1: WiFi 0.0162 (0.0066)

ff0 and ff1 2 (fixed) 2 (fixed)Log-Likelihood -409654 -395630

20

lack of “piling up” at t = 30 in the reaction time distributions plotted in Figure 2.26 In the second

stage, however, the estimated ‚1 turns positive, which hastens the diffusion process towards

the boundaries.27

Figure 5: Average diffusion path (implied by heterogeneous users estimates)

0 5 10 15 20 25 30 35

Time (seconds)

-10

-5

0

5

10

Blue line: average diffusion path; shaded area: +/- two standard deviations.

The heterogeneous users specification fits the data well. As summarized in Table 5, the

predicted click-through rates and response times from this model closely match the observed

data. Our two-stage DDM is able to generate longer response times for clickthrough versus

clickback.

The coefficients on the user covariates are precisely estimated. The coefficients associated

with Device Volume and iOS Dummy are positive in both stages, which means that these vari-

ables unambiguously increase ad effectiveness. This suggests that users who viewed the ad

in an iOS mobile device, with a higher audio volume, are associated with higher click-through

rate.26If, on the contrary, ‚0 > 0, then the variance diverges and would lead to a large mass of choices at t = 30.27This amplifying role of ‚1 > 0 is consistent with some neural theories of decision-making in which the need

to make the decision become more urgent as time passes (Cisek, Puskas & El-Murr (2009), Thura, Beauregard-

Racine, Fradet & Cisek (2012)).

21

Table 5: Model Fit of Heterogeneous Model Estimates

Predicted Observed(Heterog. Users Model)

Average clickthrough probabilities (%) 2.02 2.09Mean response time (secs) 4.03 4.16

(0.64) (1.73)Mean response time for clickbacks (secs) 4.02 4.14

(0.64) (1.73)Mean of response time for clickthroughs (secs) 5.39 4.97

(0.93) (1.52)Standard deviations in parentheses. Model-predicted outcomes simulated from 100 representative users, obtained

by k-means clustering of user covariates.

Figure 6 contains kernel densities of the estimated drift terms —0 and —1 across all users.

We can clearly reject equality of the distributions; there is considerably more heterogeneity in

the first-stage drifts (—0) than the second-stage drifts (—1).

A marked bimodality characterizes the estimated density function for —0 (and, to a lesser

extent, —1). For the exposure-stage drift —0, the two modes are far apart and differ in sign (with

roughly half the mass contained in each mode), suggesting substantial heterogeneity in users’

responses to the video ad.

While the estimated values for —0 can take positive or negative values for different users i ;

the second-stage drift, —1, exhibits more limited heterogeneity, with only negative support. Any

goodwill created by the video ad towards the advertised app appears short-lived, as once the

ad finishes, the diffusion process drifts away from the click-through barrier. This may reflect

the difference in information sources between the two stages: in the exposure stage, users

obtain decision-relevant information primarily from the video ad. However, in the second stage,

users may refocus their attention back to the publisher’s app, which shifts the diffusion process

back towards the click-back barrier. This explanation is assessed in Section 4.3 below, using a

selected subsample of users who were exposed to the video ad multiple times.

4.2.1 Behavioral differences between Positive and Negative Users

Given the substantial and bimodal heterogeneity in the estimated values of the first-stage drift

—0 (evident in Fig. 6), we segment users into two groups depending on the sign of the estimated

22

Figure 6: Kernel density plot of the estimated drifts —0i and —1i for all users.

-10 -5 0 5 100

0.5

1

1.5

2

2.5

For each user i , we compute the drift parameters as —0i = Xiˆ

0 and —1i = Xiˆ

1.

—0; we call these the positive (—0 < 0) vs. negative (—0 > 0) users. Positive users drift towards

the click-through barrier as the ad plays, while negative users drift away.

Positive and negative users are behaviorally distinct, even in out-of-sample measures not

used in the estimation, as we show here. As shown in Table 6, the negative users (—0 < 0)

have much lower click-through rates (0.46% vs. 3.57%), and shorter response times (3.63

seconds, vs. 4.64 seconds). In addition, while the negative users have response times for click-

throughs which are about a second longer than for click-backs (4.87 vs. 3.62 seconds), the

positive users (—0 > 0) have equal response times on average for both actions. This suggests

that the difference in the distribution of response times across choices (shown in Figure 2)

arises almost entirely due to the segment of “negative” users.

While differences in click-through rates and response times across segments are informa-

tive, we next relate our estimated structural parameters to additional out-of-sample measures

of users’ game-play engagement and behavior. Any systematic relationship between our esti-

mates and these out-of-sample metrics serves as a validity check on our model specification.

The results are presented in Table 7.

23

Table 6: Differences in ad responses for two user segments

Negative users Positive users—0 < 0 —0 > 0 p-values

Average click-through rate 0.00459 0.0357 0.00***(0.000222) (0.000582)N = 92; 913 N = 101; 694

Average response times 3.63 4.64 0.00***(0.00580) (0.00485)N = 92; 913 N = 101; 694

Average response times 4.87 4.98 0.1537conditional on click-throughs (0.0925) (0.0242)

N = 426 N = 3633

Average response times 3.62 4.62 0.00***conditional on click-backs (0.00581) (0.00495)

N = 92487 N = 98061

(standard errors in parentheses)

Table 7: Differences in out-of-sample game-play metrics across two user segments

Negative users Positive users—0 < 0 —0 > 0 p-values

Average auction price $0.4543 $0.2125 0.00***(0.0239) (0.00832)N = 663 N = 1735

Average install rate 0.4906 0.4776 0.61conditional on click-throughs (0.0242) (0.00829)

N = 426 N = 3633

Average game-progression rate 0.4615 0.3856 0.0007***conditional on installs (0.0194) (0.0117)

N = 663 N = 1735

Average predicted spending $0.2402 $0.0684 0.00***conditional on installs (0.0690) (0.0146)

N = 5936 N = 35157

24

In our mobile advertisement platform, advertisers reimburse publishers only when an adver-

tisement leads a user to install an app, and the prospective reimbursement amount is deter-

mined via high-bid auctions. A surprising pattern is that the auction price is higher, on average,

to show an ad to negative users ($0.45) vis-a-vis positive users ($0.21). Exploring further, we

see that the reason for this is that conditional on installing an app, negative users are more prof-

itable for advertisers. Negative users progress further in the game (they complete at least level

one with probability 0:4615, compared to positive users who have a lower probability of 0:3856).

Moreover, negative users spent almost three times more – $0.23 vs. $0.08 – on average on

in-app purchases.28

Since advertisers only pay once a user installs an app, the lower click-through (and hence

install) rates for the negative users don’t influence their bid; their bids should reflect primarily

these users’ higher post-install spending. More broadly, the negative users appear, at the same

time, to be less exploratory about trying new apps, but are more serious and engaged in the

apps which they do try; these are not “casual” game-players.29

4.3 Repeated Exposures

As we discussed above, one explanation for the differences in the estimated drift distribution

between the two stages (—0 6= —1) is the difference in information sources: in the first stage,

the predominant information source may be the video advertisement, while in the second stage

the user may be recalling experiences about the originating app. If this is the case, then pre-

sumably the advertising information should become more redundant upon repeated exposure.

While most of the users in our sample are only exposed to the advertisement once, there is

a subset of users who were repeatedly exposed to the same ad (16,451 users, or 10.5% of

all observations). We can therefore leverage estimates across this subsample to assess this

explanation for how the information accumulation process differs over subsequent, repeated

28This spending is computed across all users who installed the app after viewing this advertisement, aggregated

across all publishers (not just the one publisher used in the estimation). We do this because in-game spending

occurs too infrequently – among the 200,000 users in our estimating dataset, there were < 10 spending users.

However, after pooling together all the users exposed to the study ad across all publishers, there were 157 spending

users. For each of these users, we predicted their —0 and —1 using the estimates in Table 4, and report the average

spending for those with —0 < 0 and —0 > 0.29We speculate that these users may also be more willing to pay for “premium” versions of the game without ads,

but do not have the requisite data to test this.

25

exposures of the same ad. These results are reported in Table 8.

Of particular note, we observe a significant decrease in the estimate for ‚0, from -1.38 at

the first exposure to -1.95 at the second. This may reflect users’ underweighting information in

the ad in later viewings of the ad, consistently with an interpretation that advertising information

becomes redundant after multiple exposures. Conversely, we see that ‚1 does not change

much across exposures (0.14 vs. 0.17), suggesting that the information considered by users in

the second stage is not much affected by multiple ad exposures. Moreover, we do not observe

a significant change in the estimate of the decision threshold B.

5 Counterfactuals: Nonskippable vs. skippable ads

In this section we use our estimation results to consider alternative advertising designs beyond

the “non-skippable” ad format observed in our estimation dataset. In these counterfactuals,

we consider alternative ad formats, in which users are permitted to skip part of the ad.30 We

compute a range of simulations where we vary the ad exposure duration, the number of seconds

users are forced to be exposed to the ad, between 1 second to 30 seconds.

Specifically, in these simulations, we assume that the ad is cut off after x (∈ [1; : : : ; 30])

seconds. The accumulation process from the ad exposure stage follows Eq. (12), while the

accumulation process from the decision period (Eq. (4)) begins once users are allowed to skip

the ad.31 The counterfactual change in the click-through probabilities are reported in Table 9.

Counterfactuals for both the homogenous and heterogenous specifications demonstrate that

the biggest changes in the click-through rate occur for an ad exposure of 3 seconds or less —

beyond this, click-through rates do not change appreciably, even up to 30 seconds. Ultimately,

there is little difference between forcing viewers to watch only the first 3 seconds of a video ad

versus the entire 30-second ad. Mathematically, this arises from the estimates of the leakage

parameter ‚0 < 0, which implies that the variance of the accumulation during the ad exposure

stage converges. Such a result rationalizes the practice of YouTube’s ad platform, which allows

30Dukes, Liu & Shuai (2018) present a theoretical analysis of skippable ads.31Since the two-stage model is a one-stage DDM with a random starting point, for estimation we fit the latter model

(because no data is generated from the first stage). But in the counterfactual analysis here, the explicit structure of

the two-stage model becomes important, as it pins down (cf. Eq. (3)) how the distribution of starting values changes

as we vary the length of time before viewers are permitted to “skip” the ad.

26

Table 8: Parameter estimates: first vs. second exposure. Estimated on subsample of userswith multiple exposures to ad

(1) (2)First exposure Second exposure

B, Bound 9.5339 (0.6399) 9.2888 (0.7540)

‚0 -1.3808 (0.1012) -1.9468 (0.1324)

—0: Constant -0.1976 (0.0440) -1.7692 (0.1220)

—0: Android Version 0.4608 (0.0748) 0.5872 (0.0816)

—0: Device Volume 0.4496 (0.0372) 0.1964 (0.0260)

—0: iOS Dummy -0.3972 (0.0396) -1.8536 (0.1448)

—0: iOS Version -0.0728 (0.0620) 0.1532 (0.0648)

—0: Samsung Dummy -0.1920 (0.0420) 0.2044 (0.0224)

—0: Screen Resolution -0.6976 (0.0712) -0.0064 (0.0304)

—0: WiFi -1.0976 (0.0848) -1.3004 (0.1052)

‚1 0.1434 (0.0196) 0.1721 (0.0261)

—1: Constant -1.4036 (0.0870) -1.7017 (0.1269)

—1: Android Version -0.0014 (0.0176) 0.0253 (0.0210)

—1: Device Volume 0.3161 (0.0232) 0.2786 (0.0193)

—1: iOS Dummy -0.2351 (0.0165) -0.2282 (0.0179)

—1: iOS Version 0.0306 (0.0161) 0.0174 (0.0196)

—1: Samsung Dummy -0.5980 (0.0374) -0.2839 (0.0252)

—1: Screen Resolution 0.0173 (0.0126) -0.0453 (0.0096)

—1: WiFi -0.1359 (0.0107) -0.0150 (0.0066)

ff0 and ff1 2 (fixed) 2 (fixed)

N 16,451 16,451Log-Likelihood -32,292 -32,048

(bootstrapped standard errors in parentheses)

27

users to skip an ad after a few seconds.

Table 9: Counterfactual click-through probabilities.

Treatments: Homog. Users Model Heterog. Users Model

ad exposure Click-through Click-through Click-throughduration (secs.) probability probability (mean) probability (sd.)

0.5 0.0282 0.0184 0.0099

1.0 0.0225 0.0199 0.0120

2 0.0207 0.0204 0.0126

5 0.0201 0.0203 0.0126

15 0.0201 0.0204 0.0127

30 (benchmark) 0.0202 0.0204 0.0126

Homogeneous results utilize estimates from Column (1) of Table 4. Heterogeneous results utilize estimates from

Column (2) of Table 4, and report simulated counterfactual click-through probabilities across 50 representative

users (by partitioning the characteristics space of the users using k-means clustering).

However, the aggregate results shown in Table 9 mask a great deal of heterogeneity in the

counterfactual outcomes across users; particularly, across the two user segments (negative vs.

positive users) that we identified earlier. In Figure 7, we graph the counterfactual click-through

rates across the treatments, for two representative users from each of these segments. Clearly,

for negative users, the highest click-through rates are obtained by minimizing their ad exposure,

but positive users are more likely to respond positively after longer ad exposure. This difference

is important given the difference in revenue from these two segments, as we now discuss.

5.1 Implications for publishers’ mobile ad revenues

Finally, we quantify how different counterfactual scenarios affect the publisher’s revenue. Be-

cause in our mobile advertising platform, publishers are paid for hosting ads only when users

install the advertised app,32 revenue projections require models not only of users’ clicking be-

havior (which we have already estimated), but also predictive models of user installation behav-

ior, conditional on click-throughs.

32This differs from other advertising markets in which publishers are paid per click (as in banner ad markets) or

per exposure (as in traditional media markets).

28

Figure 7: Counterfactual click-through probabilities.

0 1 2 3 4 50.005

0.01

0.015

0.02

0.025

0.03

0.035

Horizontal axis: ad exposure duration (secs.). This figure shows the counterfactual click-through rates for two

representative users, one with —0 > 0, the other with —0 < 0.

For this exercise, then, we estimated predictive nonparametric models (random forests) for

the relationships between between user’s characteristics and (i) the rate of installs conditional

on clicks, (ii) the advertiser’s payment for that user conditional on installing. Subsequently we

used these predictive models to predict ad revenues for each user, under each counterfactual

scenario explored above.

The results in Table 10 show that if we uniformly reduce the ad exposure duration from the

30s benchmark to half a second for all users (second row of Table 10), total revenue would

increase by 6.35%. Moreover, because counterfactual click-through rates are virtually constant

for ad exposure duration ≥ 3 seconds (as in Figure 7), there are only minor effects on publisher

revenue from showing the entire ad of 30 seconds vs. cutting it off at 3 seconds. Indeed,

reducing ad exposure to 2 seconds uniformly for all users yields roughly the same revenue

(-0.30%) as the 30-second benchmark. These results may arise in part from the fact that

the advertised app is a “sequel” in a popular series of games, so that users may have little

information to glean about the advertised app, apart from the fact that it exists.

However, the publisher can do much better: their revenue can be increased further by per-

29

Table 10: Publishers’ advertising revenue under different counterfactual scenarios.

Counterfactual forced ad exposure duration % change in revenue relative to 30s

Uniform treatment:30s (benchmark) —0.5s +6.35%2s -0.30%Optimized treatment by segment:0.5s for negative users; 30s for positive users +18.03%0.5s for negative users; 2s for positive users +17.47%

sonalizing the counterfactual design treatment across users based on observable covariates. In

Figure 8, we plot the counterfactual revenue generated by representative users for the two user

segments, as a function of the ad exposure duration. (Revenues are denominated in dollars per

thousand user impressions, corresponding to the standard “CPM” measure in the advertising

industry.) For the negative users, the CPM is optimized at just over $2.15 when we cut the ad at

1s, while for positive users, revenues are optimized $2.56 when ad exposure is maintained at

the 30s benchmark.33 Accordingly, the bottom rows of Table 1 show that the publisher’s revenue

increases by 18.03% relative to the benchmark, when the ad exposure durations are optimized

by segment (0.5s for negative users, 30s for positive users).

6 Conclusions

We study how choice and response time data from the field can be combined to estimate

a structural model of smartphone users’ responses to mobile advertisements. We consider

a two-stage drift-diffusion model in which the combination of response time with choice data

allows separate identification of the diffusion processes characterizing users’ preferences when

the ad is playing, as well as when users face a subsequent decision to click-through on the

ad. In our analysis, we find two segments of users: a “positive" segment is initially drawn

to the advertised app (and is more likely to click-through with longer response times) while a

second “negative" segment responds much more negatively to the advertisement (and is less

likely to click-through). We also find that these two segments differ substantially on out-of-

sample metrics. Negative users are more engaged in the apps they install, and conditional on

33For perspective, Instagram’s average CPM for US users was $5.10 in 2016 Q1 (Salesforce Advertising Index

Q1 2016 Report).

30

Figure 8: Counterfactual ad revenue

0 2 4 6 8 100

2

4

6

8

Horizontal axis: ad exposure duration (in seconds). Vertical axis: publisher’s ad revenue ($/(’000 impressions)).

installation, spend more in-game.

Using our estimates, we address the counterfactual of whether users should be permitted to

skip part or all of a video advertisement before making a choice. Overall, we find that allowing

users to skip the ad after 3 seconds yields roughly the same revenue as forcing them to view the

entire thirty-second ad, thus rationalizing the practice of some platforms (e.g. YouTube) where

users can skip an ad after watching for a few seconds. However, the effects are very heteroge-

neous across users. While positive users’ propensity to click on the ad increases over time, the

opposite is true for negative users; overall, however, publishers’ revenues are optimized withh

shorter forced ad exposures, as this appeals more to the more lucrative negative users. Ad

revenue can be increased even further if the “skip-ability" of the ad is targeted and personalized

according to users’ demographics.

We conclude by mentioning several extensions. First, given the results in this paper, one

natural question is whether response time data can be useful for choice prediction. Such

usefulness has already been demonstrated in choice experiments in the lab,34 and it will be

interesting to examine whether these benefits are also present in field data.

34See Clithero (2018a) and the cites therein.

31

Second, the use of structural estimation and modeling to address the effects of skippable

vs. non-skippable ads is novel relative to existing methodologies for determining policy effects

in online and mobile platforms, which typically involve randomized A/B testing.35 These two

approaches are complementary. While AB testing yields convenient measures of policy effects,

the structural parameters estimated in our approach allow us to identify segments of behav-

iorally distinct users – the negative and positive users – which are policy-relevant and important

for effective ad targeting. More generally, a structural model is required to extract information

from endogenous variables like response time. Otherwise, simply including response times as

an explanatory variable in a regression leads to difficulties in both interpretation of parameters

and choice prediction.

Finally, most clickstream datasets keep track of consumers’ choices in multinomial choice

settings, involving choice among more than two items. A common extension of the DDM model

for multinomial choices is the “race” model, in which consumers face N competing accumulation

processes, and the first process to hit a common “finish line” is chosen.36 Formally, this model is

equivalent to the competing risk model from statistical duration analysis, and it will be interesting

to explore these connections.

35Lewis & Rao (2015) discuss limitations of using randomized trials for evaluating online advertising campaigns.36See Webb (2019) and Marley & Colonius (1992)

32

References

Ackerberg, D. A. (2009). A new use of importance sampling to reduce computational burden in simulationestimation. Quanatitative Marketing and Economics, 7 (4), 343–376.

Alili, L., Patie, P., & Pedersen, J. L. (2005). Representations of the first hitting time density of an ornstein-uhlenbeck process. Stochastic Models, 21(4), 967–980.

Alos-Ferrer, C., Fehr, E., & Netzer, N. (2018). Time will tell: Recovering preferences when choices arenoisy. SSRN Working Paper.

Baldassi, C., Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., & Pirazzini, M. (2019). A behavioralcharacterization of the drift diffusion model and its multi-alternative extension to choice under timepressure. Working Paper.

Block, H. & Marschak, J. (1960). Random orderings and stochastic theories of responses. CowlesFoundation Discussion Paper, 7 (147), 97–132.

Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decisionmaking: a formal analysis of models of performance in two-alternative forced-choice tasks. Psycho-logical Review, 113(4), 700.

Branco, F., Sun, M., & Villas-Boas, J. M. (2012). Optimal search for product information. ManagementScience, 58(11), 2037–2056.

Busemeyer, J. R. & Townsend, J. T. (1992). Fundamental derivations from decision field theory. Mathe-matical Social Sciences, 23(3), 255–282.

Chen, R. & Chiong, K. (2016). An empirical model of mobile advertising platforms. Working paper.

Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: the urgency-gatingmodel. Journal of Neuroscience, 29(37), 11560–11571.

Clithero, J. A. (2018a). Improving out-of-sample predictions using response times and a model of thedecision process. Journal of Economic Behavior & Organization, 148, 344–375.

Clithero, J. A. (2018b). Response times in economics: Looking through the lens of sequential samplingmodels. Journal of Economic Psychology.

Crawford, G. S. & Shum, M. (2005). Uncertainty and learning in pharmaceutical demand. Econometrica,73(4), 1137–1173.

Drugowitsch, J. (2016). Fast and accurate monte carlo sampling of first-passage times fromwiener diffusion models. Scientific Reports, 6, 20490. Software package available athttps://github.com/DrugowitschLab/dm.

Dukes, A. J., Liu, Q., & Shuai, J. (2018). Interactive advertising: The case of skippable ads. SSRN.

Echenique, F. & Saito, K. (2017). Response time and utility. Journal of Economic Behavior & Organiza-tion, 139, 49–59.

Erdem, T. & Keane, M. P. (1996). Decision-making under uncertainty: Capturing dynamic brand choiceprocesses in turbulent consumer goods markets. Marketing Science, 15(1), 1–20.

33

Fehr, E. & Rangel, A. (2011). Neuroeconomic Foundations of Economic Choice—Recent Advances.Journal of Economic Perspectives, 25(4), 3–30.

Frydman, C. & Krajbich, I. (2017). Using response times to infer others’ beliefs: An application toinformation cascades. SSRN Working Paper.

Frydman, C. & Nave, G. (2016). Extrapolative beliefs in perceptual and economic decisions: Evidenceof a common mechanism. Management Science, 63(7), 2340–2352.

Fudenberg, D., Newey, W., Strack, P., & Strzalecki, T. (2019). Testing the drift diffusion model. SSRNWorking Paper.

Fudenberg, D., Strack, P., & Strzalecki, T. (2018). Speed, accuracy, and the optimal timing of choices.American Economic Review, 108, 3651–3684.

Gold, J. I. & Shadlen, M. N. (2002). Banburismus and the brain: decoding the relationship betweensensory stimuli, decisions, and reward. Neuron, 36(2), 299–308.

Gossner, O., Steiner, J., & Stewart, C. (2018). Attention Please! Working Paper.

Gwinn, R., Leber, A. B., & Krajbich, I. (2019). The spillover effects of attentional learning on value-basedchoice. Cognition, 182, 6–11.

Hare, T. A., Schultz, W., Camerer, C. F., O’Doherty, J. P., & Rangel, A. (2011). Transformation of stimulusvalue signals into motor commands during simple choice. Proc. Nat. Academy of Sciences, 18120–18125.

Hebert, B. & Woodford, M. (2017). Rational inattention with sequential information sampling. StanfordWorking Paper.

Hong, H. & Shum, M. (2006). Using price distributions to estimate search costs. RAND Journal ofEconomics, 37 (2), 257–275.

Honka, E. (2014). Quantifying search and switching costs in the us auto insurance industry. RANDJournal of Economics, 45(4), 847–884.

Ke, T. & Villas-Boas, M. (2019). Optimal learning before choice. Journal of Economic Theory, (180),383–437.

Konovalov, A. & Krajbich, I. (2019). Revealed strength of preference: Inference from response times.Judgement and Decision Making.

Krajbich, I. (2019). Accounting for attention in sequential sampling models of decision making. CurrentOpinion in Psycholgy, 29.

Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison ofvalue in simple choice. Nature Neuroscience, 13(10), 1292.

Krajbich, I., Bartling, B., Hare, T., & Fehr, E. (2015). Rethinking fast and slow based on a critique ofreaction-time reverse inference. Nature Communications, 6, 7455.

Lehmann, E. (1959). Testing Statistical Hypotheses. Wiley. first edition.

34

Lewis, R. A. & Rao, J. M. (2015). The unfavorable economics of measuring the returns to advertising.Quarterly Journal of Economics, 130(4), 1941–1973.

Link, S. W. (1992). The wave theory of difference and similarity. Psychology Press.

Lu, J. & Hutchinson, J. W. (2017). Split-second decisions during online information search: Static vs.dynamic decision thresholds for making the first click. Technical report, Wharton.

Luce, R. D. (1991). Response Times: Their Role in Inferring Elementary Mental Organization. Oxford.

Marley, A. A. J. & Colonius, H. (1992). The “horse race” random utility model for choice probabilities andreaction times, and its compering risks interpretation. Journal of Mathematical Psychology, 36(1),1–20.

Mormann, M. amd Navalpakkam, V., Koch, C., & Rangel, A. (2012). Relative visual saliency differencesinduce sizable bias in consumer choice. Journal of Consumer Psychology, 22, 67–74.

Morris, S. & Strack, P. (2019). The wald problem and the relation of sequential sampling and ex-anteinformation costs. SSRN Working Paper.

Otter, T., Allenby, G. M., & Van Zandt, T. (2008). An integrated model of discrete choice and responsetime. Journal of Marketing Research, 45(5), 593–607.

Ratcliff, R. & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decisiontasks. Neural Computation, 20(4), 873–922.

Roitman, J. & Shadlen, M. (2002). Response of neurons in the lateral intraparietal area during a com-bined visual discrimination reaction time task. Journal of Neuroscience, 22(21), 9475–9489.

Seiler, S. & Pinna, F. (2017). Estimating search benefits from path-tracking data: measurement anddeterminants. Marketing Science, 36(4), 565–589.

Shadlen, M. & Shohamy, D. (2016). Decision making and sequential sampling from memory. Neuron,(90), 927–939.

Shadlen, M. N., Hanks, T. D., Churchland, A. K., Kiani, R., & Yang, T. (2006). The speed and accuracy ofa simple perceptual decision: a mathematical primer. In K. Doya, S. Ishii, A. Pouget, & R. Rao (Eds.),Bayesian Brain Probabilistic Approaches to Neural Coding (pp. 201–236). MIT Press.

Smith, P. L. (2000). Stochastic dynamic models of response time and accuracy: A foundational primer.Journal of Mathematical Psychology, 44(3), 408–463.

Thura, D., Beauregard-Racine, J., Fradet, C.-W., & Cisek, P. (2012). Decision making by urgency gating:theory and experimental support. Journal of Neurophysiology, 108(11), 2912–2930.

Ursu, R., Wang, Q., & Chintagunta, P. (2018). Search duration. Working paper.

Usher, M. & McClelland, J. L. (2001). The time course of perceptual choice: the leaky, competingaccumulator model. Psychological Review, 108(3), 550.

Wald, A. (1973). Sequential analysis. John Wiley.

Webb, R. (2019). The (Neural) Dynamics of Stochastic Choice. Management Science, 65, 230–255.

35

Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). Hddm: hierarchical bayesian estimation of the drift-diffusionmodel in python. Frontiers in Neuroinformatics, 7, 14.

Woodford, M. (2014). Stochastic choice: An optimizing neuroeconomic model. American EconomicReview, 104(5), 495–500.

Woodford, M. (2016). Optimal evidence accumulation and stochastic choice. Manuscript, ColumbiaUniversity.

36

A Proof of Lemma 1

Proof. Derivation from Shadlen et al. (2006), Equations (10.23) and (10.33), shows that the simple DDMdZ(t) = —dt + ffdB(t) implies:

y = E[yi ] =1

e−2B—=ff2 + 1(8)

t = E[t∗i ] =B“eB—=ff

2 − e−B—=ff2”

—`eB—=ff2 + e−B—=ff2

´ (9)

To see that ff2 is not identified through the moments above, suppose we multiply ff2 by k2. Then if wemultiply B by k and multiply — by k , the two equations above remained unchanged. That is, (ff2; B; —)and (k2ff2; kB; k—) are both solutions to the equations above.

First, we solve for — and B by normalizing ff = 1. If we normalize ff to be ff = ff, then (ffB; ff—)would be the corresponding solution. Given the normalization ff = 1, when we solve for these equationsin terms of — and B, there are only two real solutions. Among the two solutions, one of the solutions issuch that — is increasing in y , and the other solution is such that — is decreasing in y . Therefore the firstsolution is the valid one.

Moreover, when we solve for the first solution for —, we get:

— =

r2y − 1

2t

slog

„y

1− y

«(10)

Now when y < 0:5, both 2y−12t and log

“y

1−y

”are negative. Therefore, we can rewrite Equation 10

as — =√−1

2r˛

2y−12t

˛r˛log“

y1−y

”˛= −

q(2y−1) log( y

1−y )2t . When y > 0:5, both 2y−1

2t and log“

y1−y

”are

positive, and we have — =

q(2y−1) log( y

1−y )2t . This leads to the following solution (which we additionally

verified using numerical solvers):

— =

8<:−q

(2y−1) log( y1−y )

2t if y < 0:5q(2y−1) log( y

1−y )2t if y ≥ 0:5

B =1

2

vuut2t log“

y1−y

”2y − 1

˜

37

B How good is the discrete approximation?

Consider approximating the stochastic process dZt = —dt + ‚ Ztdt + ff dWt via Ztj − Ztj−1 = —∆t +‚Ztj−1 ∆t + ff∆Wtj , where tj − tj−1 = ∆t for all j = 1; : : : . Letting J = b t∆t c denote the number ofdiscrete time intervals at time t. The discretized process at time t has the following distribution:

Ztj ∼ N —∆t

J−1Xi=0

(1 + ‚∆t)i ; ff2∆tJ−1Xi=0

(1 + ‚∆t)2i

!(11)

or, equivalently:

Ztj ∼ N„—

1− (1 + ‚∆t)J

−‚ ; ff2∆t1− (1 + ‚∆t)2J

1− (1 + ‚∆t)2

«: (12)

As ∆t → 0, we check that lim∆t→0 —1−(1+‚∆t)

t∆t

−‚ = —‚

(e‚t − 1) and lim∆t→0 ∆t 1−(1+‚∆t)2t∆t

1−(1+‚∆t)2 =1

2‚

`e2‚t − 1

´. Suppose that at t = 30, Var(Z30) = 2:0833. This number is implied by the estimation

result (Column 2 of Table 4). Using the time-increment ∆t = 0:25, the resulting ‚0 is −0:247. Now as∆t → 0, we have ‚0 → −0:240. Therefore, the time-increment of a quarter-second appears to do areasonable job at approximating the continuous-time density.

combining choices and response times in the field: a drift ... · vertisements on mobile devices, a...

Documents