mediaeval 2016 - ir evaluation: putting the user back in the loop
TRANSCRIPT
![Page 2: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/2.jpg)
Change the search algorithm.
How can we know whether we made the users happier?
![Page 3: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/3.jpg)
Different approaches to evaluation
• User-‐studies
• In-‐situ evaluation• A/B Testing• Interleaving
• Collection-‐based evaluation
![Page 4: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/4.jpg)
in-‐situ evaluation
![Page 5: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/5.jpg)
A/B Testing
Baseline (control) Experimental (treatment)
![Page 6: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/6.jpg)
![Page 7: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/7.jpg)
![Page 8: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/8.jpg)
collection-‐based evaluation
![Page 9: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/9.jpg)
Machine Learning
• Feature vectors
• Labels
Cranfield Collections
Information Retrieval
• Documents• Queries
• Labels– relevance judgments
Query 1 Query 2 Query N
![Page 10: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/10.jpg)
![Page 11: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/11.jpg)
Cranfield Paradigm• Simple user model• Controlled experiments• Reusable but static test
collections
Online Evaluation• Full user participation• Many degrees of freedom• Unrepeatable experiments
System Focus User Focus
Evaluation Landscape
TREC Tasks TREC Session
TREC TotalRecall
TREC OpenSearch
![Page 12: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/12.jpg)
TREC Total Recall
results
human assessor
search algorithm
query
documentcollection
![Page 13: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/13.jpg)
TREC Session Track
![Page 14: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/14.jpg)
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
![Page 15: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/15.jpg)
Paris Luxurious Hotels Paris Hilton
![Page 16: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/16.jpg)
Test Collection
Evaluating Retrieval over Sessions: The TREC Session Track 2011–2014Ben Carterette1, Paul Clough2, Mark Hall3, Evangelos Kanoulas4, Mark Sanderson5
1 University of Delaware, 2 University of She�eld, 3 Edge Hill University, 4 University of Amsterdam, 5 RMIT University
Objectives
I Test if the retrieval e�ectiveness of a query could be improved by using previousqueries, ranked results, and user interactions.
Test Collection
Four test collections (2011–2014) comprising N sessions of varying length, each con-sisted of:I mi blocks of user interactions (the session’s length);I the current query qm1 in the session;I mi≠1 blocks of interactions in the session prior to the current query, composed of:
Û the user queries in the session, q1, q2, ..., qmi≠1;Û the ranked list of URLs seen by the user for each of those queries;Û the set of clicked URLs/snippets.
Test Collection Statistics
2011 2012 2013 2014collection ClueWeb09 ClueWeb09 ClueWeb12 ClueWeb12
topic propertiestopic set size 62 48 61 60
topic cat. dist. known-item 10 exploratory,6 interpretive,
20 known-item,12 known-subj
10 exploratory,9 interpretive,
32 known-item,10 known-subj
15 exploratory,15 interpretive,15 known-item,15 known-subj
session propertiesuser population U. She�eld U. She�eld U. She�eld + IR
researchersMTurk
search engine BOSS+CW09filter
BOSS+CW09filter
indri indri
total sessions 76 98 133 1,257sessions per topic 1.2 2.0 2.2 21.0
mean length (in queries) 3.7 3.0 3.7 3.7median time between queries 68.5s 66.7s 72.2s 25.6srelevance judgments
topics judged 62 48 49 51total judgments 19,413 17,861 13,132 16,949
Algorithmic Improvements
I Session history can be used to improve e�ectiveness over basic ad hoc retrieval.
0 20 40 60 80 100
−0
.10
.00.1
0.2
run number
ma
x ch
an
ge in
nD
CG
@1
0 fro
m R
L1
ba
selin
e
2011201220132014
Topic - System Analysis
I Known-subject and exploratory topics benefit most from access to session history.I There is substantial variability in topics due to the way the users perform their
search and formulate their query.
0.0
0.5
1.0
1.5
topic (ordered by median)
diff
ere
nce
in ∆
nD
CG
@1
0 o
ver
sess
ion
s
2012
−10
2012
−47
2014
−40
2013
−14
2012
−28
2012
−4
2013
−24
2014
−46
2012
−6
2014
−52
2014
−39
2014
−26
2014
−13
2014
−47
2012
−5
2014
−44
2013
−12
2011
−7
2012
−32
2011
−30
2014
−56
2013
−21
2011
−20
2012
−34
2013
−49
2014
−15
2012
−11
2014
−24
2014
−35
2014
−10
2012
−23
2014
−30
2011
−52
2013
−28
2012
−24
Conclusions
I Retrieval e�ectiveness can be improved for ad hoc retrieval using data based onsession history.
I The more detailed the session data, the greater the improvement.
SIGIR 2016
![Page 17: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/17.jpg)
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
![Page 18: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/18.jpg)
![Page 19: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/19.jpg)
TREC Tasks Track
![Page 20: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/20.jpg)
TREC Tasks Track [2015–now]
1. understand underlying user’s task
2. assist user in completing the task
![Page 21: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/21.jpg)
Make Improvements At Home
TASKUNDERSTANDING
![Page 22: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/22.jpg)
Make Improvements At HomeTASK
COMPLETION
![Page 23: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/23.jpg)
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
![Page 24: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/24.jpg)
CLEF Dynamic Search for Complex Tasks
![Page 25: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/25.jpg)
CLEF Complex Tasks [now]
1. Produce methodology and algorithms that will lead to a dynamic test collection by simulating users
2. Understand and quantify what constitutes a good ranking of documents at different stages of a session, and a good overall session
![Page 26: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/26.jpg)
TREC Open Search
![Page 27: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587366a11a28abe7648b6fa5/html5/thumbnails/27.jpg)