player rating algorithms for balancing human computation games: testing the effect of bipartiteness
TRANSCRIPT
Player Rating Systems for Balancing Human
Computation Games testing the effect of bipartiteness
Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016
cb
<1> the challenge
»flow«
Diff
icul
ty
Skill/time
frustration
boredom
flow (1990)Mihaly Csikszentmihalyi
winning odds correlate w/ retentionLomas et al., 2013
human computation games
Diff
icul
ty
Skill/time
1. scientific tasks are predetermined
the
problem
Diff
icul
ty
Skill/time
2. tasks can’t be changed
Diff
icul
ty
Skill/time
3. Difficulty is unknown in advance
?
?
? ?
?
??
??
?
? ?
?
?
??
?
Diff
icul
ty
Skill/time
4. solving tasks defeats crowdsourcing
!
!
! !
!
!!
!!
!
! !
!
!
!!
!
Diff
icul
ty
Skill/time
?
?
? ?
?
??
??
?
? ?
?
?
??
?
… hence tasks are served randomlyLintott, 2016
hence retention is very poorSauermann & Franzoni, 2015
% P
laye
r re
tain
ed
Time/levels
most leave after balanced tutorials* idealised
tutorial
actual tasks*
Diff
icul
ty
Skill/time
How to sequence tasks w/o solving?
?
?
?
?
?
??
?
the
challenge
?
?
?
user-generated content
also
applies to
crowdsourcing
also
applies to
<2> the approach
multiplayer matchmaking
elo, 1978 glicko-2, 2012/3 trueskill, 2006
uses player rating algorithms
skill = winning odds, updated w/ each gameMoser, 2010
remember: winning odds > retentionLomas et al., 2013
widely used, effective predictionMenke, 2016
our approach: tasks = players
Player rating = skill
Task rating = difficulty
Player rating = skill
<3> the question
we produce a bipartite graphAsratian et al., 1998
we produce a bipartite graphAsratian et al., 1998
Play
ers
Task
s
Play
ers
Task
s
less density, less information flowScott, 2012
more structural holesScott, 2012
Play
ers
Task
s
more unbalanced graphsScott, 2012
Play
ers
Task
s
Research question does a bipartite (player-player or user-task) graph negatively affect the prediction accuracy of player rating algorithms? does graph balancedness affect accurcay?
<4> the study
predicting chess matches with elo
data
set 1
bipartite training data has no effect
unbalanced bipartite graphs perform better
unbalanced bipartite graphs have super vertices
elo, glicko2, Truskill on paradox game
data
set 2
all rating systems outperform baseline
<5> discussion & outlook
main contributions
• Identified 4 challenges to difficulty balancing in human computation games, crowdsourcing, UGC
• Introduced content sequencing through adapting player rating algorithms as a novel approach
• Identified bipartiteness of user-task graph as potential issue
• Found that bipartiteness does not affect prediction accuracy of ELO, Glicko-2, Truskill in Chess matches or human computation game Paradox
• Found that unbalanced graphs improve prediction accuracy, presumably due to super vertices/players
• Provided first support that our approach is viable
limitations & future work I
• Approach requires previous/initial data • Use super-users to provide initial data
• Use “calibration” tasks in tutorials
• Use mixed method data to identify skill & difficulty indicators, data & machine learning to validate & extract additional indicators
• Current algorithms only compute win/loss/draw • Graded success measures could improve accuracy and learning speed
• Study trained on large data sets (10,000, 37 edges) • Testing learning speed of algorithms w/ current default retention in human
computation games
• Study tested only one human computation game • Replication with multiple games
limitations & future work II
• Study didn’t test direct effect on retention • Follow-up user study
• Task pool might not contain tasks of best-fitting difficulty (similar to empty bar in mulitplayer games) • Procedural content generation to generate training/filler tasks
• Many human computation tasks don’t vary much in difficulty • Expand matching approach to other factors like curiosity/variety