user behaviour modelling - online and offline methods, metrics, and challenges

Post on 15-Feb-2017

331 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

User Behaviour ModellingOnline and Offline Methods, Metrics and Challenges

System and User Centered Evaluation Approaches in Interactive Information Retrieval (SAUCE 2016)

PRESENTED BY Ioannis Arapakis (Sr Data Scientist, Eurecat)⎪March 17, 2016

Contents

1. Short Biography2. User Engagement in Web Search3. User Modelling Using Mouse Cursor Interactions4. On Human Information Processing in Information

Retrieval

Short Bio

Education & Research Experience

§Ph.D. in Computer Science, University of Glasgow (2010)• Supervisors: Prof. Joemon M. Jose

§M.Sc. in Information Technology, Royal Institute of Technology (KTH), Sweden (2007)

§2015 – 2016 Senior Data Scientist, Eurecat, Barcelona• Data Mining Group

§2011 – 2015 Researcher, Yahoo Labs, Barcelona• User Engagement, Web Search Group, Ad Processing and Retrieval Group

Research Interests

§Data Mining• Pattern recognition, predictive modelling, statistical inference, time series

analysis

§Information Retrieval• Multimedia mining and search, user modelling, personalised search

systems, recommender systems, evaluation and applications

§Human-Computer Interaction• Experimental methods, user engagement, neuro-physiological signal

processing, sentiment analysis

Internal Projects

§User Engagement§Ad Retrieval§Modelling News Article Quality§Mouse Tracking Analysis for Inferring User Behaviour§Discovery and Localisation of Points of Interest

Yahoo LabsImpact of Search Latency on User Engagement in Web Search

Trade-off between the speed of a search system and the quality of its results

Too slow or too fast may result in financial consequences for the search engine

Web Search Economics

§Web users• are impatient, • have limited time• expect sub-second response times

§High response latency• can distract users• decrease user engagement over time• results in fewer query submissions

§Sophisticated and costly solutions• More information stored in the inverted

index• Machine-learned ranking strategies• Fusing results from multiple resources

Research Methodology

• Small samples• Controlled conditions• High internal validity• Behavioural observations• Questionnaires• Neurophysiological

measures with high temporal and spatial resolution

Controlled Experimentation

• Large datasets / samples• High external validity• Flexible parameter

exploration• A/B testing• Bucket testing• Real-life conditions

Log Analysis

Research Questions

§What are the main components in the response latency of a search engine?§How sensitive are users to response latency?§How much does response latency affect user behaviour?

Components of User-Perceived Response Latency

§ network latency: tuf + tfu§ search engine latency: tpre + tfb + tproc + tbf + tpost

§ browser latency: trender

User Searchfrontend

Searchbackend

tpre tproc

tpost

tfb

tbf

tuf

tfu

trender

Contribution of Latency Components

0

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.40

0.2

0.4

0.6

0.8

1.0

Frac

tion

of q

uerie

s

Cum

ulat

ive

fract

ion

of q

uerie

s

Latency (normalized by the mean)

0

20

40

60

80

100

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

Con

tribu

tion

per c

ompo

nent

(%)

Latency (normalized by the mean)

search engine latencynetwork latencybrowser latency

Yahoo LabsImpact of Search Latency on User Engagement in Web Search

Controlled Study (1)

Tasks

§Task 1: Investigates users’ perception of the search site response (slow or fast?)

§Task 2: Users’ ability to estimate the experienced search site latency (what was the latency in milliseconds?)

§Task 3: How brand bias affects perceived search site usability and UX

Experimental Methodology (Task 1)

§Controlled study (12 participants) with two independent variables• Search latency (0 – 2750ms)• Search site speed (slow, fast)

§Participants submitted 40 navigational queries§For each query we increased latency by a fixed amount (0 – 1750ms)using a step of 250ms§Each latency value (e.g., 0, 250, 500) was introduced five times, in a random order§After submitting each query, they were asked to report if the response of the search site was “slow” or “normal”

Was it Too Slow or Too Fast?

§Up to a point (500ms) added response time delays are not noticeable by the users§Beyond a certain threshold (1000ms) the users can feel the added delay with very high likelihood

250 750 1250 1750Added latency (ms)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Like

lihoo

d of

feel

ing

adde

d la

tenc

ySlow SE (base)Slow SEFast SE (base)Fast SE

250 750 1250 1750Added latency (ms)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Incr

ease

rela

tive

to b

ase

likel

ihoo

d

Slow SEFast SE

Experimental Methodology (Task 2)

§Controlled study (12 participants) with two independent variables• Search latency (0 – 2750ms)• Search site speed (slow, fast)

§Participants submitted 50 navigational queries§For each query we increased latency by a fixed amount (500 –2750ms) using a step of 250ms§Each latency value (e.g., 0, 250, 500) was introduced five times, in a random order§After each query submission they provided an estimation of the search latency in milliseconds

Counting the Seconds

1750 2000 2250 2500 2750Actual latency (ms)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Pred

icte

d la

tenc

y (m

s)

ActualMalesFemalesAverage

750 1000 1250 1500 1750 2000 2250 2500 2750Actual latency (ms)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Pred

icte

d la

tenc

y (m

s)

ActualMalesFemalesAverage

Perception of search latency varies considerably across the population

Experimental Methodology (Task 3)

§Controlled study (20 participants) with two independent variables• Search latency (0, 750, 1250, 1750)• Search site speed (slow, fast)

§Participants submitted 50 navigational queries§Participants performed four search tasks

• Asked to evaluate the performance of four different backend search systems• Submit as many navigational queries from a list of 200 randomly sampled web

domains• For each query they were asked to locate the target URL among the first ten

results of the SERP

Reported User Engagement and System Usability

§The tendency to overestimate or underestimate system performance biases users’ perception of system usability• Positive bias towards SEfast

• SEfast participants were more deeply engaged

SEslow latency SEfast latency

0ms 750ms 1250ms 1750ms 0ms 750ms 1250ms 1750ms

Post-Task Positive Affect 16.20 14.50 15.50 15.20 20.50 19.00 20.80 19.30

Post-Task Negative Affect 7.00 6.80 7.60 6.90 6.80 7.40 7.40 7.20

Frustration 3.20 3.10 2.90 3.30 2.80 3.00 3.50 2.60

Focused Attention 22.80 22.90 19.90 22.20 27.90 26.60 23.90 29.50

SYSUS 32.80 28.90 29.80 27.90 35.20 31.30 29.80 33.20

Yahoo LabsImpact of Search Latency on User Engagement in Web Search

Large-scale Log Analysis (1)

Query Log Data

§Random sample of 30M web search queries obtained from Yahoo§End-to-end (user perceived) latency values§We select queries issued:

• Within the US• To a particular search data centre• From desktop computers

§ Compare presence of clicks for two given query instances qfast & qslow

• submitted by the same user• having the same query string• matching the same search results

0.0

0.2

0.4

0.6

0.8

1.0

0.4 0.6 0.8 1.0 1.2 1.4

Clic

ked

page

ratio

(nor

mal

ized

by

the

max

)

Latency (normalized by the mean)

Variation of Clicked Page Ratio Metric

0.02

0.04

0.06

0.08

0.10

0.12

0 250 500 750 1000 1250 1500 1750 20000.8

0.9

1.0

1.1

1.2

1.3

Frac

tion

of q

uery

pai

rs

Clic

k-on

-fast

/Clic

k-on

-slo

w

Latency difference (in milliseconds)

Click-on-fastClick-on-slowRatio

§ Given two content-wise identical result pages, users are more likely to click on the result page that is served with lower latency

§ 500ms of latency difference is the critical point beyond which users are more likely to click on a result retrieved with lower latency

Click Presence

0.02

0.04

0.06

0.08

0.10

0.12

0 250 500 750 1000 1250 1500 1750 20000.8

0.9

1.0

1.1

1.2

1.3

Frac

tion

of q

uery

pai

rs

Clic

k-m

ore-

on-fa

st/C

lick-

mor

e-on

-slo

w

Latency difference (in milliseconds)

Click-more-on-fastClick-more-on-slowRatio

§ Clicking on more results becomes preferable to submitting new queries when the latency difference exceeds a certain threshold (1250ms)

Click Count

Yahoo LabsImpact of Search Latency on User Engagement in Web Search

Controlled Study (2)

Do Small Latency Increases Affect User Engagement?

§Consciously unaware of the mental processes determining our behaviour§Such unconscious influences reach from basic or low-level mental processes to high-level psychological processes§Conclusions based on self-report methods are inherently limited§Users cannot provide information that is not consciously available to them

Human Information Processing

Psychophysiological Measures of Engagement

§User Engagement Scale (UES)• Positive affect (PAS)

• Negative affect (NAS)

• Perceived usability• Felt involvement and focused attention

§IBM’s Computer System Usability Questionnaire (CSUQ)• System usefulness (SYSUSE)

§Electrodermal activity (EDA)§Electromyography [corrugator supercilii] (EMG-CS)

EDA Signal§Applied 200ms smoothing filter & artifact removal§A temporal series was constructed from each physiological signal§Averaged the data every 1-second period (480 points == ~ 8 minutes)§Each 10-second period following a query submission was visually inspected for SCRs (skin conductance responses)§Data sample: 132 SCRs; 10 points (seconds) per SCR

15.0 15.2 15.4 15.6 15.816.0 16.2 16.4 16.6 16.817.0

0 1 2 3 4 5 6 7 8 9 10 11 12

µS

Time after stimulus onset (in seconds)

§ Band-pass filter 30-500Hz & artifact removal§ A temporal series was constructed from each physiological signal§ Averaged the data every 1-second period (480 points == ~ 8 minutes)§ Included the data for the entire 3-second period after each query

submission§ Outliers excluded. Data sample: 7256 samples (4 seconds by query)

EMG-CS Signal

Physiological Data

§Mixed multilevel models (a regression-based approach)• Allows comparison of data at different levels

• Level 1: conditions within-subjects• Level 2: subjects

• allows including random terms in the model for random factors• random intercepts for between-subject variability; accounts for the difference in means

between subjects• useful for physiological data, since between subject variability can be much larger than

variability due to experimental conditions, and, therefore, can mask it• random slopes for the effects of time and order of presentation

• Deals with autocorrelated data (e.g. physiological data)

Mixed multilevel models (a regression-based approach)

EDA ModelFixed factors CoefficientsIntercept - .31*Latency 500ms .50***Latency 750ms .42**Latency 1000ms .60***Seg 2 .11***Seg 3 .36***Seg 4 .68***Seg 5 .88***Seg 6 .90***Seg 7 .80***Seg 8 .74***Seg 9 .72***Seg 10 .69***

EMG-CS ModelFixed factors CoefficientsIntercept .0188***Latency 500ms .0019***Latency 750ms .0034***Latency 1000ms .0010*Seg 1 .0000393Seg 2 .0002397***Seg 3 .0003163***

§ Higher EMG values àmore negativeexperience

§ Higher EDA values àmore intense experience

§ Even short latency increases (>500ms) that are notconsciously perceived have sizeable physiologicaleffects

Yahoo LabsImpact of Search Latency on User Engagement in Web Search

Large-scale Log Analysis (2)

Query Log Data

§Random sample of 30M web search queries obtained from Yahoo§We select queries issued:

• Within the US• To a particular search data centre• From desktop computers

§ Compare presence of clicks for two given query instances qfast & qslow

• submitted by the same user• having the same query string• matching the same search results

§ Click presence (click-on-fast, click-on-slow)§ Click count (click-more-on-fast, click-more-on-slow)

0

0.05

0.10

0.15

0.20

0 500 750 1000 0

0,5

1.0

1.5

2.0

Fra

ctio

n o

f query

pairs

Clic

k-on-f

ast

/Clic

k-on-s

low

Latency difference (in milliseconds)

Click-on-fastClick-on-slowRatio

Fast or slow query response preference according to click presence metric

0

0.05

0.10

0.15

0.20

0 500 750 1000 0

0.5

1.0

1.5

2.0F

ract

ion

of

qu

ery

pa

irs

Clic

k-m

ore

-on

-fa

st/C

lick-

mo

re-o

n-s

low

Latency difference (in milliseconds)

Click-more-on-fastClick-more-on-slowRatio

Fast or slow query response preference according to click count metric

Yahoo LabsMouse Tracking Analysis for Inferring User Behaviour

Background Information

§ Abundance of multimedia content§ Availability of large volumes of interaction data§ Scalable data mining techniques

Part of the efforts have focused on understanding how users interact and engage with web content

Measurement of within-content engagement remains a difficult and unsolved task

personalisation

service quality

ad quality

Recommender algorithms

§ Lack of standardised methodologies§ Absence of well-validated measures§ Users often don’t provide explicit feedback about

their QoE§ Existing methods don’t form scalable solutions§ Traditional web analytics (e.g., clicks, dwell time,

pageviews) vs. users’ true intentions and motivations

Challenges

§ Navigation & interaction with a digital environment usually involves the use of a mouse (i.e., selecting, hovering, clicking)

§ Can be easily performed in a non-invasive manner, without removing users from theirnatural setting

§ Several works have shown that the mouse cursor is a proxy of gaze (attention)

§ Low-cost, scalable alternative to eye-tracking

Why Mouse Tracking?

Motivation

§Develop techniques for measuring within-content engagement with online news articles§Quantify user engagement with Direct Displays in web search, e.g., Knowledge Graph

Methodology

§Large scale analysis§~15GB of mouse cursor data (e.g., <x,y,t>, clicks) of users interacting with online news (bucket test)§Learn mouse cursor patterns (unsupervised approach)

§Controlled study§A small sample (~50 participants) of users interacting with engaging and non-engaging news articles§Create ground truth for our prediction task

Apply learned patterns to smaller set and test on ground truth

§ Time§ Coverage§ Type (e.g., vertical scroll)§ Distance§ Speed§ Acceleration§ Direction§ Spectral Analysis

Feature Engineering

§ Perform the clustering for k = 1..40• Agglomerative Hierarchical Clustering• K-Means• Spectral Clustering

§ Compute cluster validity using a large number of internal criteria; each criterion results in a ranking

§ Perform Rank Aggregation to derive a single ranked list L' that has the minimum distance from a given set of ranked input lists L = {L1, L2, …, Lm}

Learning Mouse Cursor Motifs

Prediction Task§ The frequency distribution of mouse gestures varies

per user and content (interesting vs. uninteresting)

ClassifierPerformance metrics

Precision Recall F-Measure Accuracy

Baseline .273 .523 .359 .522

1NN .664 .659 .659 .659

SMO .700 .682 .678 .681

Random Forest .727 .727 .727 .727

Stacking (1NN + SMO) .751 .750 .750 .750

On Human Information Processing in Information Retrieval

Human Information Processing (HIP)

§ We are not consciously aware of the mental processes determining our behaviour

§ Such unconscious influences reach from basic or low-level mental processes to high-level psychological processes like motivations, preferences, or complex behaviours

Human Information Processing (HIP)

§ The search for information is often led by a human brain§ HIP is the field of study of experimental psychology and cognitive

neuroscience

Psychological Variables

§ The most interesting psychological variables and processes for the study of IR are those related to attentional and emotionalphenomena

Selective attentionCognitive effort

/ arousalEmotional reactions

Psychophysiological Measures of HIP§ Standardised questionnaires for measuring

perceptual aspects, perceived usability, cognitive working load, or affective

§ Online measures of user behavior and cognitive states that are often unavailable for conscious reports:§ Behavioral§ Psychophysiological

Characteristics of Psychological Methods

§ Helpful in unveiling attentional and emotional reactions not consciously available to us

§ Offer high temporal and spatial resolution§ Robust against cognitive biases (e.g., social desirability bias*)§ Always provide “honest” responses§ No direct question to the subject, no direct answer§ The information on the research questions has to be inferred

from the variations on the physiological signals and the way they are related to psychological constructs

* The tendency of survey respondents to answer questions in a manner that will be viewed favorably by others.

Electrodermal Activity (EDA)

§ Changes in conductivity of the skin due to activation of sweat glands by activation of the autonomous nervous system (sympathetic division)

§ Reflects general activation both for attentionaland emotional measures (in fact, it is calibrated by having participants perform complex math calculations)

§ It’s the basis of the “truth machine”, though not as effective as fiction has led us to believe…

Electrodermal Activity (EDA)

§ Unconscious Physiological Effects of Search Latency on Users and Their Click Behaviour(SIGIR 2015)• Although the latency effects did not produce

changes on the self-reported data, their impact on users’ physiological responses is evident

• Even when short latency increases of under500ms are not consciously perceived, they have sizeable physiological effects that can contribute to the overall user experience

-0.4-0.20.0 0.2 0.4 0.6 0.81.0 1.2 1.4

1 2 3 4 5 6 7 8 9 10

µS

Time after query onset (in seconds)

0ms500ms750ms1000ms

15.0 15.2 15.4 15.6 15.816.0 16.2 16.4 16.6 16.817.0

0 1 2 3 4 5 6 7 8 9 10 11 12

µS

Time after stimulus onset (in seconds)

Electrodermal Activity (EDA)

§ A large-scale query log analysis ascertained the effect on the clicking behaviour of users and revealed a significant decrease in users’ engagement with the search result page, even at smallincreases in latency

0

0.05

0.10

0.15

0.20

0 500 750 1000 0

0.5

1.0

1.5

2.0

Fra

ctio

n o

f q

ue

ry p

airs

Clic

k-m

ore

-on

-fa

st/C

lick-

mo

re-o

n-s

low

Latency difference (in milliseconds)

Click-more-on-fastClick-more-on-slowRatio

0

0.05

0.10

0.15

0.20

0 500 750 1000 0

0,5

1.0

1.5

2.0

Fra

ctio

n o

f query

pairs

Clic

k-on-f

ast

/Clic

k-on-s

low

Latency difference (in milliseconds)

Click-on-fastClick-on-slowRatio

HIP Dynamics§ Human information processing is both serial and parallel§ Cognitive science has provided large amounts of evidence that

conscious information processing is mainly serial§ When processing information in situations that require to shift the

focus of attention between different tasks and/or stimuli, this results in an increase in the effort required to process that information

§ Simon effect

HIP Dynamics (Serial Processing)

HIP Dynamics (Serial Processing)§ Switching tasks§ Try to read the word in odd trials

and name the color on even trials!

Green

Red

Blue

Red

Green

Yellow

HIP Dynamics (Parallel Processing)§ Simon effect: Hit the left key if there is an A on screen and the

right if there is a B

HIP Dynamics (Parallel Processing)§ The effect is still there with crossed hands!

Multimodal Behaviour Modelling§ Behaviour measurements in ecological conditions § Behaviour understanding through cameras and microphones§ Aggregating various online measures gives an accurate picture

of the user’s experience§ Robust real-time behaviour analyses, information that can be

used for the purpose of research on human behaviour and user experience

§ The opportunity is ripe to move beyond experimental laboratory settings into real large-scale controlled studies

Take-away Messages§ The use of neuro-physiological methods in IR research is

essential in order to obtain a complete picture of the mental processes underlying user search behaviour

§ The collaboration between psychological and IR research can go far beyond the application of sophisticated measuring methodologies

§ Introduce actual knowledge on the dynamics of human information processing into a real-world testing ground

§ The use of multimodal signals holds the promise of allowing large-scale, controlled studies that will undoubtedly foster the progress of both research fields

Thank you for your attention!

iarapakis

arapakis.ioannis@gmail.com

https://es.linkedin.com/in/ioannisarapakis

top related