measuring user engagement: the do, the do not do, and the we do not know
Post on 02-Jul-2015
1.183 Views
Preview:
DESCRIPTION
TRANSCRIPT
Measuring user engagement: the do, the do not do, and the we do not know
Mounia Lalmas mounia@acm.org
World Usability Day Berlin – November 2014
About me § Since October 2013: Principal Research Scientist at Yahoo Labs
London › User engagement, native advertising, social media, search
§ 2011- 2013: Visiting Principal Scientist at Yahoo Labs Barcelona › User engagement, social media, search
§ 2008 2010: Microsoft Research/RAEng Research Professor at the University of Glasgow › Quantum theory to model information retrieval
§ 1999 - 2008: Lecturer (assistant professor) to Professor at Queen
Mary, University of London › XML retrieval and evaluation (INEX)
Blog: labtomarket.wordpress.com
This talk § What is user engagement › Definitions › Characteristics › Approaches
§ Attributes of user engagement measurement › Scalability › Setting › Objectivity versus subjectivity › Temporality
What is user engagement?
What is user engagement?
“User engagement is a quality of the user experience that emphasizes the phenomena associated with wanting to use a technological resource longer and frequently” (Attfield et al, 2011)
self-report: happy, sad, enjoyment, …
emotional, cognitive and behavioural connection that exists, at any point in time and over time, between a user and a technological resource
analytics: click, upload, read, comment, share …
physiology: gaze, body heat, mouse movement, …
5
Why is it important to engage users? § In today’s wired world, users have enhanced expectations
about their interactions with technology … resulting in increased competition amongst the
purveyors and designers of interactive systems. § In addition to utilitarian factors, such as usability, we must
consider the hedonic and experiential factors of interacting with technology, such as fun, fulfillment, play, and user engagement.
(O’Brien, Lalmas & Yom-Tov, 2014)
Patterns of user engagement Online sites differ concerning their engagement!
Games Users spend much time per visit
Search Users come frequently and do not stay long
Social media Users come frequently and stay long
Niche Users come on average once a week e.g. weekly post
News Users come periodically, e.g. morning and evening
Service Users visit site, when needed, e.g. to renew subscription
(Lehmann etal, 2012)
Why is it important to measure and interpret user engagement well?
CTR
Characteristics of user engagement • Users must be focused to be engaged • Distortions in the subjective perception of time used to
measure it
Focused attention (Webster & Ho, 1997; O’Brien,
2008)
• Emotions experienced by user are intrinsically motivating • Initial affective “hook” can induce a desire for exploration,
active discovery or participation
Positive Affect (O’Brien & Toms, 2008)
• Sensory, visual appeal of interface stimulates user & promotes focused attention
• Linked to design principles (e.g. symmetry, balance, saliency)
Aesthetics (Jacques et al, 1995; O’Brien,
2008)
• People remember enjoyable, useful, engaging experiences and want to repeat them
• Reflected in e.g. the propensity of users to recommend an experience/a site/a product
Endurability (Read, MacFarlane, & Casey,
2002; O’Brien, 2008)
Characteristics of user engagement • Novelty, surprise, unfamiliarity and the unexpected • Appeal to users’ curiosity; encourages inquisitive
behavior and promotes repeated engagement
Novelty (Webster & Ho, 1997; O’Brien,
2008)
• Richness captures the growth potential of an activity • Control captures the extent to which a person is able
to achieve this growth potential
Richness and control (Jacques et al, 1995; Webster &
Ho, 1997)
• Trust is a necessary condition for user engagement • Implicit contract among people and entities which is
more than technological
Reputation, trust and expectation (Attfield et al,
2011)
• Difficulties in setting up “laboratory” style experiments
• Why should users engage?
Motivation, interests, incentives, and
benefits (Jacques et al., 1995; O’Brien & Toms, 2008)
Attributes of user engagement
§ Scale (large versus small) § Setting (laboratory versus field) § Objective versus subjective § Temporality (short- versus long-term)
one is not better than other: it depends on aims and constraints.
Measuring user engagement Measures Attributes
Self-report Questionnaire, interview, think-aloud and think after protocols
Subjective Short- and long-term Lab and field Small scale
Physiology EEG, SCL, fMRI eye tracking mouse-tracking
Objective Short-term Lab and field Small and large scale
Analytics Intra and inter-session metrics Data science
Objective Short- and long-term Field Large scale
Towards reliable and valid measurement
Scalability
dozen – qualitative & physiology hundred to thousand – online survey million – analytics
from rich but limited in generalisation … to powerful but hard to explain
Large scale measurement – analytics
Metrics • Dwell time • Session duration • Bounce rate • Play time (video) • Mouse movement • Click through rate
(CTR) • Number of pages
viewed (click depth) • Conversion rate • Number of UCG
(comments) • …
Dwell time as a proxy of user interest Dwell time as a proxy of relevance Dwell time as a proxy of conversion
Intra-session measurement
Small scale measurement – eye tracking
18 users, 16 tasks each (chose one story & rate it) eye movement recorded
Attention (gaze) interest has no role position > saliency
Selection mainly driven by interest position > attention
(Navalpakkam etal, 2012)
(Lin et al, 2007)
Small scale measurement – focused attention questionnaire 5-point scale (strong disagree to strong agree)
1. I lost myself in this news tasks experience 2. I was so involved in my news tasks that I lost track of time 3. I blocked things out around me when I was completing the
news tasks 4. When I was performing these news tasks, I lost track of the
world around me 5. The time I spent performing these news tasks just slipped
away 6. I was absorbed in my news tasks 7. During the news tasks experience I let myself go
(O'Brien & Toms, 2010)
Small scale measurement – PANAS questionnaire (10 positive items and 10 negative items)
§ You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a little; 3 = moderately;
4 = quite a bit; 5 = extremely] [randomize items]
distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, afraid interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, active
(Watson, Clark & Tellegen, 1988)
Small scale measurement – gaze and self-reporting
§ News § interest § 57 users § reading task (114)
§ questionnaire (qualitative data) § record eye tracking (quantitative data)
Three metrics: gaze, focus attention and
positive affect
All three metrics align: interesting content promote all engagement metrics
(Arapakis etal, 2014)
From small- to large-scale measurement – mouse tracking § Navigation & interaction with digital
environment usually involves the use of a mouse (selecting, positioning, clicking)
§ Several works show mouse cursor as weak proxy of gaze (attention)
§ Low-cost, scalable alternative
§ Can be performed in a non-invasive manner, without removing users from their natural setting
Relevance, dwell time & cursor
“reading” a relevant long document vs “scanning” a long non-relevant document
(Guo & Agichtein, 2012)
Mouse Gestures à Features
x0y0
x1y1
x2y2
x3y3 x4y4
x5y5
x6y6
x7y7
x8y8
t
Δt rest Δt rest
resting cursor (500ms) resting cursor (1000ms) resting cursor (1500ms) click
−2000 0 2000 4000
02000
4000
6000
x
y
●●
●
●●●●●●●●●●●
●●●
(Arapakis, Lalmas & Valkanas, 2014)
22 users reading two articles 176, 550 cursor positions 2,913 mouse gestures
Towards a taxonomy of mouse gestures for user engagement measurement
§ The top-ranked clustering configuration is the Spectral Clustering for the original dataset, with hyperbolic tangent kernel, for k = 38
• certain types of mouse gestures occur more or less often, depending on user interest in article
• significant correlations between certain types of mouse gestures and self-report measures
• cursor behaviour goes beyond measuring frustration • inform about the positive and negative interaction
Setting
laboratory “in the wild”
from high level of consistency and control … to greater external validity and more “true to life”
§ How the visual catchiness (saliency) of “relevant” information impacts › focused attention › affect
§ Saliency model of visual attention developed by (Itti & Koch, 2000)
Crowdsourcing and self-report
Manipulating saliency
Web page screenshot
Saliency maps
salie
nt c
ondi
tion
non-
salie
nt c
ondi
tion
(McCay-Peet, Lalmas & Navalpakkam, 2012)
Study design § 8 tasks = finding latest news or headline on celebrity or
entertainment topic
§ Affect measured pre- and post- task using the Positive e.g. “determined”, “attentive” Affect Schedule (PANAS)
§ Focused attention measured with 7-item focused attention scale e.g. “I was so involved in my news tasks that I lost track of time”, “I blocked things out around me when I was completing the news tasks” and perceived time
§ Interest level in topics (pre-task) and questionnaire (post-task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages”
§ 189 (90+99) participants from Amazon Mechanical Turk
Using crowdsourcing works
§ When headlines are non-salient: users are slow at finding them, report more distraction due to web page features, and show a drop in affect
§ When headlines are salient: user find them faster, report that it is easy to focus, and maintain positive affect
§ Users reported “easier to focus in the salient condition” BUT no significant improvement in the focused attention scale or differences in perceived time spent on tasks
User interest in web page content is a good predictor è of focused attention, itself a good predictor èof positive affect
Objectivity vs Subjectivity
objective – analytics and physiological subjective – self-report
towards reliability and validity … mapping objective and subjective measurement
“U
gly”
vs
“N
orm
al” In
terf
ace
BBC News
Wikipedia
Mouse tracking and self-reporting § 324 users from Amazon Mechanical Turk (between
subject design) § Two domains (BBC News and Wikipedia) § Two tasks (reading and search) § “Normal vs Ugly” interface
§ Questionnaires (qualitative data) › focus attention, positive effect › interest, aesthetics › + demographics, hardware
§ Mouse tracking (quantitative data) › movement speed, movement rate, click rate, pause length, percentage of time
still
(Warnock & Lalmas, 2013)
Mouse tracking could not tell much about
§ focused attention and positive affect § user interests in the task/topic § aesthetics
§ BUT BUT BUT BUT › “ugly” variant did not result in lower USER aesthetics scores › although BBC > Wikipedia
• BUT – the comments left … › Wikipedia: “The website was simply awful. Ads flashing everywhere, poor
text colors on a dark blue background.”; “The webpage was entirely blue. I don't know if it was supposed to be like that, but it definitely detracted from the browsing experience.”
› BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.”
Flawed methodology? Non-existing signal? Wrong metric? Wrong measure? § Hawthorne Effect
§ Design › Usability versus engagement › Within- versus between-subject
§ Mouse movement was not sophisticated enough as shown by recent work (Arapakis etal 2014)
Temporality
short-term long-term
from intra-session … to inter-session
Large scale measurements – analytics
intra-session measures inter-session measures
• Dwell time • Session duration • Bounce rate • Play time (video) • Mouse movement • Click through rate (CTR) • Number of pages viewed (click
depth) • Conversion rate • Number of UCG (comments). • …
• Fraction of return visits • Time between visits (inter-session time,
absence time) • Total view time per month (video) • Lifetime value (number of actions) • Number of sessions per unit of time • Total usage time per unit of time • Number of friends on site (social
networks) • Number of UCG (comments) • …
• intra-session engagement measures success in attracting user to remain on site for as long as possible.
• inter-session engagement measured by observing lifetime user value.
loyalty popularity
activity
Inter-session metric – absence time
short absence is a sign of loyalty
important indication of user engagement
(Dupret & Lalmas, 2013)
Absence time – search experience
1. Clicks after the 5th results reflect poorer user experience; users cannot find what they are looking for
2. No click means a bad user experience 3. Clicking at bottom is a sign of low quality overall ranking 4. Users finding their answers quickly (click sooner) return
sooner to the search application 5. Returning to the same search result page is a worse user
experience than reformulating the query.
search session metrics absence time
Conclusions
Measuring User Engagement
1. No one measurement is perfect or complete.
2. Studies have different constraints.
3. Measurement should be applied consistently with attention to reliability.
4. Mostly “normal” interaction.
5. “It is a capital mistake to theorize before one has data.” - Arthur Conan Doyle
What is a good signal?
What is a good metric?
What is a correct interpretation?
Danke schön
This talk is based on tutorial & book “Measuring User Engagement” (with Heather O’Brien and Elad Yom-Tov)
top related