player rating algorithms for balancing human computation games: testing the effect of bipartiteness

41
Player Rating Systems for Balancing Human Computation Games testing the effect of bipartiteness Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016 cb

Upload: sebastian-deterding

Post on 07-Jan-2017

515 views

Category:

Design


2 download

TRANSCRIPT

Page 1: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Player Rating Systems for Balancing Human

Computation Games testing the effect of bipartiteness

Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016

cb

Page 2: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

<1> the challenge

Page 3: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

»flow«

Diff

icul

ty

Skill/time

frustration

boredom

flow (1990)Mihaly Csikszentmihalyi

Page 4: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

winning odds correlate w/ retentionLomas et al., 2013

Page 5: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

human computation games

Page 6: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

1. scientific tasks are predetermined

the

problem

Page 7: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

2. tasks can’t be changed

Page 8: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

3. Difficulty is unknown in advance

?

?

? ?

?

??

??

?

? ?

?

?

??

?

Page 9: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

4. solving tasks defeats crowdsourcing

!

!

! !

!

!!

!!

!

! !

!

!

!!

!

Page 10: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

?

?

? ?

?

??

??

?

? ?

?

?

??

?

… hence tasks are served randomlyLintott, 2016

Page 11: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

hence retention is very poorSauermann & Franzoni, 2015

Page 12: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

% P

laye

r re

tain

ed

Time/levels

most leave after balanced tutorials* idealised

tutorial

actual tasks*

Page 13: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Diff

icul

ty

Skill/time

How to sequence tasks w/o solving?

?

?

?

?

?

??

?

the

challenge

?

?

?

Page 14: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

user-generated content

also

applies to

Page 15: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

crowdsourcing

also

applies to

Page 16: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

<2> the approach

Page 17: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

multiplayer matchmaking

Page 18: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

elo, 1978 glicko-2, 2012/3 trueskill, 2006

uses player rating algorithms

Page 19: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

skill = winning odds, updated w/ each gameMoser, 2010

Page 20: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

remember: winning odds > retentionLomas et al., 2013

Page 21: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

widely used, effective predictionMenke, 2016

Page 22: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

our approach: tasks = players

Player rating = skill

Task rating = difficulty

Player rating = skill

Page 23: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

<3> the question

Page 24: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

we produce a bipartite graphAsratian et al., 1998

Page 25: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

we produce a bipartite graphAsratian et al., 1998

Play

ers

Task

s

Page 26: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Play

ers

Task

s

less density, less information flowScott, 2012

Page 27: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

more structural holesScott, 2012

Play

ers

Task

s

Page 28: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

more unbalanced graphsScott, 2012

Play

ers

Task

s

Page 29: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

Research question does a bipartite (player-player or user-task) graph negatively affect the prediction accuracy of player rating algorithms? does graph balancedness affect accurcay?

Page 30: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

<4> the study

Page 31: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

predicting chess matches with elo

data

set 1

Page 32: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

bipartite training data has no effect

Page 33: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

unbalanced bipartite graphs perform better

Page 34: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

unbalanced bipartite graphs have super vertices

Page 35: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

elo, glicko2, Truskill on paradox game

data

set 2

Page 36: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

all rating systems outperform baseline

Page 37: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

<5> discussion & outlook

Page 38: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

main contributions

• Identified 4 challenges to difficulty balancing in human computation games, crowdsourcing, UGC

• Introduced content sequencing through adapting player rating algorithms as a novel approach

• Identified bipartiteness of user-task graph as potential issue

• Found that bipartiteness does not affect prediction accuracy of ELO, Glicko-2, Truskill in Chess matches or human computation game Paradox

• Found that unbalanced graphs improve prediction accuracy, presumably due to super vertices/players

• Provided first support that our approach is viable

Page 39: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

limitations & future work I

• Approach requires previous/initial data • Use super-users to provide initial data

• Use “calibration” tasks in tutorials

• Use mixed method data to identify skill & difficulty indicators, data & machine learning to validate & extract additional indicators

• Current algorithms only compute win/loss/draw • Graded success measures could improve accuracy and learning speed

• Study trained on large data sets (10,000, 37 edges) • Testing learning speed of algorithms w/ current default retention in human

computation games

• Study tested only one human computation game • Replication with multiple games

Page 40: Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

limitations & future work II

• Study didn’t test direct effect on retention • Follow-up user study

• Task pool might not contain tasks of best-fitting difficulty (similar to empty bar in mulitplayer games) • Procedural content generation to generate training/filler tasks

• Many human computation tasks don’t vary much in difficulty • Expand matching approach to other factors like curiosity/variety