feeling lucky? multi-armed bandits for ordering judgements in pooling-based evaluation
TRANSCRIPT
![Page 1: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/1.jpg)
Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation
David E. Losada
Javier Parapar, Álvaro Barreiro
ACM SAC, 2016
![Page 2: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/2.jpg)
Evaluation
is crucial
compare retrieval algorithms, design new search solutions, ...
![Page 3: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/3.jpg)
information retrieval evaluation: 3 main ingredients
docs
![Page 4: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/4.jpg)
information retrieval evaluation: 3 main ingredients
queries
![Page 5: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/5.jpg)
information retrieval evaluation: 3 main ingredients
relevance judgements
![Page 6: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/6.jpg)
relevance assessments are incomplete
![Page 7: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/7.jpg)
relevance assessments are incomplete
...search system 1 search system 2 search system 3 search system n
![Page 8: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/8.jpg)
relevance assessments are incomplete
...search system 1 search system 2 search system 3 search system n
![Page 9: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/9.jpg)
relevance assessments are incomplete
1. WSJ132. WSJ17..100. AP567101. AP555...
1. FT9412. WSJ13..100. WSJ19101. AP555...
1. ZF2072. AP881..100. FT967101. AP555...
1. WSJ132. CR93E..100. AP111101. AP555...
...
rankings of docs by estimated relevance (runs)
![Page 10: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/10.jpg)
relevance assessments are incomplete
1. WSJ132. WSJ17..100. AP567101. AP555...
1. FT9412. WSJ13..100. WSJ19101. AP555...
1. ZF2072. AP881..100. FT967101. AP555...
1. WSJ132. CR93E..100. AP111101. AP555...
... pool depth
rankings of docs by estimated relevance (runs)
![Page 11: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/11.jpg)
relevance assessments are incomplete
101. AP555...
101. AP555...
101. AP555...
101. AP555...
... pool depth
rankings of docs by estimated relevance (runs)
![Page 12: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/12.jpg)
relevance assessments are incomplete
101. AP555...
101. AP555...
101. AP555...
101. AP555...
... pool depth
rankings of docs by estimated relevance (runs)
WSJ13WSJ17 AP567WSJ19AP111 CR93E
ZF207AP881FT967pool
...
![Page 13: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/13.jpg)
relevance assessments are incomplete
101. AP555...
101. AP555...
101. AP555...
101. AP555...
... pool depth
rankings of docs by estimated relevance (runs)
WSJ13WSJ17 AP567WSJ19AP111 CR93E
ZF207AP881FT967pool
...human assessments
![Page 14: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/14.jpg)
finding relevant docs is the key
Most productive use of assessors' time is spent on judging relevant docs(Sanderson & Zobel, 2005)
![Page 15: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/15.jpg)
Effective adjudication methods
Give priority to pooled docs that are potentially relevant
Can signifcantly reduce the num. of judgements required to identify a given num. of relevant docs
But most existing methods are adhoc...
![Page 16: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/16.jpg)
Our main idea...
Cast doc adjudication as a reinforcement learning problem
Doc judging is an iterative process wherewe learn as judgements come in
![Page 17: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/17.jpg)
Doc adjudication as a reinforcement learning problem
Initially we know nothing about the quality of the runs
? ? ? ?...
As judgements come in...
And we can adapt and allocate more docs for judgement from the most promising runs
![Page 18: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/18.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
![Page 19: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/19.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
![Page 20: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/20.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
![Page 21: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/21.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
![Page 22: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/22.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
![Page 23: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/23.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
![Page 24: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/24.jpg)
Multi-armed bandits
...
unknown probabilities of giving a prize
![Page 25: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/25.jpg)
exploration vs exploitation
exploits current knowledgespends no time sampling inferior actionsmaximizes expected reward on the next action
explores uncertain actionsgets more info about expected payofs may produce greater total reward in the long run
allocation methods: choose next action (play) based on past plays and obtained rewardsimplement diferent ways to trade between exploration and exploitation
![Page 26: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/26.jpg)
Multi-armed bandits for ordering judgements
...
machines = runs
...
play a machine = select a run and get the next (unjudged) doc 1. WSJ13
2. CR93E..
(binary) reward = relevance/non-relevance of the selected doc
![Page 27: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/27.jpg)
Allocation methods tested
...
random ϵn-greedy
with prob 1-ϵ plays the machine with the highest avg reward
with prob ϵ plays a random machine
prob of exploration (ϵ) decreases with the num. of plays
Upper Confdence Bound (UCB)
computes upper confdence bounds for avg rewards
conf. intervals get narrower with the number of plays
selects the machine with thehighest optimistic estimate
![Page 28: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/28.jpg)
Allocation methods tested: Bayesian bandits
prior probabilities of giving a relevant doc: Uniform(0,1) ( or, equivalently, Beta(α,β), α,β=1 )
U(0,1) U(0,1) U(0,1) U(0,1)
...
evidence (O ∈ {0,1}) is Bernoulli (or, equivalently, Binomial(1,p) )
posterior probabilities of giving a relevant doc: Beta(α+O, β+1-O) (Beta: conjugate priorfor Binomial)
![Page 29: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/29.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 30: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/30.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 31: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/31.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 32: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/32.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 33: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/33.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 34: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/34.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 35: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/35.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
![Page 36: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/36.jpg)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
two strategies to select the next machine:
Bayesian Learning Automaton (BLA): draws a sample from each the posterior distribution and selects the machine yieding the highest sample
MaxMean (MM): selects the machine with the highest expectation of the posterior distribution
![Page 37: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/37.jpg)
test different document adjudication strategies in terms of how quickly they find the relevant docs in the pool
experiments
# rel docs found at diff. number of judgements performed
![Page 38: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/38.jpg)
experiments: data
![Page 39: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/39.jpg)
experiments: baselines
...WSJ13WSJ17 AP567WSJ19AP111 CR93E
ZF207AP881FT967pool
...
AP111, AP881, AP567, CR93E, FT967, WSJ13, ...
DocId: sorts by Doc Id
![Page 40: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/40.jpg)
experiments: baselines
1. WSJ132. WSJ17..100. AP567
...
1. FT9412. WSJ13..100. WSJ19
1. WSJ132. CR93E..100. AP111
WSJ13, FT941, ZF207, WSJ17, CR93E, AP881 ...Rank: rank #1 docs go 1st, then rank #2 docs, ...
1. ZF2072. AP881..100. FT967
![Page 41: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/41.jpg)
experiments: baselines
1. WSJ132. WSJ173. AP567..
...
1. FT9412. WSJ133. WSJ19..
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority)
1. ZF2072. AP8813. FT967..
100 100 100 100
![Page 42: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/42.jpg)
experiments: baselines
1. WSJ132. WSJ173. AP567..
...
1. FT9412. WSJ133. WSJ19..
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority)
1. ZF2072. AP8813. FT967..
100 100 100 100
![Page 43: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/43.jpg)
experiments: baselines
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run stays in the run until a non-rel doc is found
100
![Page 44: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/44.jpg)
experiments: baselines
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run stays in the run until a non-rel doc is found
100
WSJ13
![Page 45: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/45.jpg)
experiments: baselines
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run stays in the run until a non-rel doc is found
100
WSJ13, CR93E
![Page 46: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/46.jpg)
experiments: baselines
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run stays in the run until a non-rel doc is found
100
WSJ13, CR93E, AP111
![Page 47: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/47.jpg)
experiments: baselines
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run stays in the run until a non-rel doc is found when a non-rel doc is found, priority is decreased
100 99
WSJ13, CR93E, AP111
![Page 48: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/48.jpg)
experiments: baselines
1. WSJ132. WSJ173. AP567..
...
1. FT9412. WSJ133. WSJ19..
1. WSJ132. CR93E3. AP111..
MoveToFront (MTF) (Cormack et al 98)
and we jump again to another max priority run
1. ZF2072. AP8813. FT967..
100 100 99 100
![Page 49: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/49.jpg)
experiments: baselines
1. WSJ132. WSJ173. AP567.
...1. FT9412. WSJ133. WSJ19.
1. WSJ132. CR93E3. AP111.
Moffat et al.'s method (A) (Moffat et al 2007)
based on rank-biased precision (RBP) sums a rank-dependent score for each doc
1. ZF2072. AP8813. FT967.
score
0.200.160.13.
![Page 50: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/50.jpg)
experiments: baselines
1. WSJ132. WSJ173. AP567.
...1. FT9412. WSJ133. WSJ19.
1. WSJ132. CR93E3. AP111.
Moffat et al.'s method (A) (Moffat et al 2007)
based on rank-biased precision (RBP) sums a rank-dependent score for each doc
1. ZF2072. AP8813. FT967.
score
0.200.160.13.
all docs are ranked by decreasing accummulated score and the ranked list defines the order in which docs are judged
WSJ13: 0.20+0.16+0.20+...
![Page 51: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/51.jpg)
experiments: baselines
Moffat et al.'s method (B) (Moffat et al 2007)
evolution over A's method considers not only the rank-dependent doc's contributions but also the runs' residuals promotes the selection of docs from runs with many unjudged docs
Moffat et al.'s method (C) (Moffat et al 2007)
evolution over B's method considers not only the rank-dependent doc's and the residuals promotes the selection of docs from effective runs
![Page 52: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/52.jpg)
experiments: baselines
MTF: best performing baseline
![Page 53: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/53.jpg)
experiments: MTF vs bandit-based models
![Page 54: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/54.jpg)
experiments: MTF vs bandit-based models
Random: weakest approach
BLA/UCB/ϵn-greedy are suboptimal
(sophisticated exploration/exploitation tradingnot needed)
MTF and MM: best performing methods
![Page 55: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/55.jpg)
improved bandit-based models
MTF: forgets quickly about past rewards (a single non-relevance doc triggers a jump)
non-stationary bandit-based
solutions:
not all historical rewards count the same
MM-NS and BLA-NSnon-stationary variants of MM and BLA
![Page 56: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/56.jpg)
stationary bandits
Beta( , ), , =1α β α β
rel docs add 1 to α non-rel docs add 1 to β
(after n iterations)
Beta(αn,β
n)
αn
= 1+jrel
s
βn
= 1+jret
s – jrel
s
jrels : # judged relevant docs (retrieved by s)
jrets : # judged docs (retrieved by s)
all judged docs count the same
non-stationary bandits
Beta( , ), , =1α β α β
jrel
s= rate*
jrel
s+ rel
d
jrets= rate*
jret
s+ 1
(after n iterations)
Beta(αn,β
n)
αn
= 1+jrel
s
βn
= 1+jret
s – jrel
s
rate>1: weights more early relevant docs rate<1: weights more late relevant docs rate=0: only the last judged doc counts (BLA-NS, MM-NS) rate=1: stationary version
![Page 57: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/57.jpg)
experiments: improved bandit-based models
![Page 58: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/58.jpg)
conclusions
multi-arm bandits: formal & effective framework for doc adjudication in a pooling-based evaluation
it's not good to increasingly reduce exploration(UCB, ϵ
n-greedy)
it's good to react quickly to non-relevant docs(non-stationary variants)
![Page 59: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/59.jpg)
future work
query-relatedvariabilities
hierarchical
bandits
stopping criteria
metasearch
![Page 60: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/60.jpg)
reproduce our experiments & test new ideas!http://tec.citius.usc.es/ir/code/pooling_bandits.html
(our R code, instructions, etc)
![Page 61: Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation](https://reader034.vdocuments.mx/reader034/viewer/2022052705/58f0af511a28ab967d8b45e9/html5/thumbnails/61.jpg)
David E. Losada
Javier Parapar, Álvaro Barreiro
Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation
Acknowledgements:
MsSaraKelly. picture pg 1 (modified).CC BY 2.0. Sanofi Pasteur. picture pg 2 (modified).CC BY-NC-ND 2.0. pedrik. picture pgs 3-5.CC BY 2.0. Christa Lohman. picture pg 3 (left).CC BY-NC-ND 2.0.Chris. picture pg 4 (tag cloud).CC BY 2.0.Daniel Horacio Agostini. picture pg 5 (right).CC BY-NC-ND 2.0.ScaarAT. picture pg 14.CC BY-NC-ND 2.0.Sebastien Wiertz. picture pg 15 (modified).CC BY 2.0. Willard. picture pg 16 (modified).CC BY-NC-ND 2.0.Jose Luis Cernadas Iglesias. picture pg 17 (modified).CC BY 2.0. Michelle Bender. picture pg 25 (left).CC BY-NC-ND 2.0.Robert Levy. picture pg 25 (right).CC BY-NC-ND 2.0.Simply Swim UK. picture pg 37.CC BY-SA 2.0.Sarah J. Poe. picture pg 55.CC BY-ND 2.0.Kate Brady. picture pg 58.CC BY 2.0. August Brill. picture pg 59.CC BY 2.0.
This work was supported by the “Ministerio de Economía y Competitividad”
of the Goverment of Spain and FEDER Funds underresearch projects
TIN2012-33867 and TIN2015-64282-R.