byu computer science department hierarchical bayesian models for rating individual players from...
Post on 19-Dec-2015
218 views
TRANSCRIPT
BYU Computer Science Department
Hierarchical Bayesian Models for Rating Individual Players
from Group Competitions
Joshua E. Menke
BYU Computer Science Department
Why Rank and Rate?
• Ranking in groups important• Sports, executive teams between competing
corporations, military training, etc.• Computer and Video Gaming Industry
– Big business: $18 billion gross output in U.S. in 2004
• Players prefer games that help them compare themselves.
• Use for balancing teams: TrueSkill™• Use for game / level design.
BYU Computer Science Department
Brief Rating Background
• Elo (1978) for Chess– Thurstone Case V: Normal distribution– Later modified to use a logistic distribution
• Glickman (1999, 2001) for Chess– Bradley-Terry Model
(Bradley and Terry, 1952)
– Uncertainty based on number of matches played and time between matches.
BYU Computer Science Department
Rating Players From Groups
• TrueSkill™ (Herbrich, Graepel, 2006)– Generalized Bayesian Thurstone Case V
• Huang (2006)– Generalized Bradley-Terry
(Maximum Likelihood)
• Menke et. al (2006)– Hierarchical Bayesian Bradley-Terry– Extensions: improve predictions / analyze game
BYU Computer Science Department
Bradley-Terry Model
• Two opponents, ability parameters 1 and 2, probability the first opponent wins:
1/(1+2)
• Current logistic Elo uses Bradley-Terry with
x = exp(x).
• Wider distribution: • Allows weaker players a greater chance of winning.
BYU Computer Science Department
Wolfenstein: Enemy Territory™
• Two Teams or Sides, WWII: Axis vs. Allies
• Objective-based
• Multiplayer
• Online: Players come, go, change teams
• Asymmetry: Team sizes / Maps fairness
• Soccer (Football) Example
• Splash Damage, London
BYU Computer Science Department
Map-Side in Enemy Territory
• Axis side vs. Allies side
• Matches take place on certain maps
• Different objectives for each side
• Player i on side s for map m
BYU Computer Science Department
First Data Set
• Matches: 100 per server, 3 servers for 300
• Players: 877
• Matches per Player: ~ 7
BYU Computer Science Department
Data Example
InitGame: ...\mapname\fueldump\...Winner: AXIS Time: 1800000Name: |R!P|Orpheo GUID DFBB5: Axis: 0 Allies: 1450200Name: |R!P|Crazyeskimo GUID EF071: Axis: 1549800 Allies: 0Name: sliveR GUID 0A589: Axis: 1614950 Allies: 0Name: DaSaNi GUID 3F6C7: Axis: 1278400 Allies: 0Name: BlackSheep GUID 6C875: Axis: 352600 Allies: 1336200*
* Played on both teams
• Map Name, Winner, Duration• Name, GUID, milliseconds on Axis,Allies
BYU Computer Science Department
Model
Bayes Law:
We need:
• Prior: p(), model the individual players
• Likelihood: p(matches|), model match outcomes given players
p(µjmatches) = p(matchesjµ)p(µ)Rp(matchesjµ)p(µ)dµ
BYU Computer Science Department
Basic Player Model
• Let i represent player i’s ability to help their side win a match
• A simple model for i
i » N(,02)
BYU Computer Science Department
Basic
i » N(,02)
• Let = 0 without loss of generality
• 2 is given a prior distribution
• Symmetric around 0Good players + , bad players –
• But: Assumes map-side has no effect
BYU Computer Science Department
Accounting for Map-Side Effects
• Map fairness varied in Enemy Territory
• Sometimes harder for Axis, and vice versa
• Basic model naïve
• Map effects uniform for all players
BYU Computer Science Department
Accounting for Map-Side Effects
• Let i,m-s represent player i’s ability to help side s win a match played on map m:
im-s ´ i + m-s with i » N(0,2)
• 2 given a prior distribution
• Player’s rating increases or decreases based on map-side
BYU Computer Science Department
Accounting for Map-Side Effects
im-s ´ i + m-s
• Similar to Agresti’s (1988) “homefield” parameter, except one for Axis, one for Allies: model decision for simplicity.
BYU Computer Science Department
Map-Side Effects
• More skilled team can have equal challenge for a given map by playing on the harder side
• Judge which maps are more balanced.
• Useful for map/level designers
BYU Computer Science Department
Server Difficulty
• Compare players across different servers
• Determine how a given server affects a player’s rating adding server bias j
i,m-s,j ´ i+m-s+j
j » N(0,2)
• With given a prior distribution
BYU Computer Science Department
Server Difficulty
i,m-s,j ´ i+m-s+j
• Modeled as an increase instead of decrease in player ability for simplicity.
• Lower not higher is more difficult.
• Player performance composed of base ability, map-side offset, and server difficulty
BYU Computer Science Department
Server Difficulty
• Can use to choose servers
• Rank players globally across servers
• Requires some server “cross-over”
BYU Computer Science Department
Likelihood
• Choose side s’s probability of winning a match on map m proportional to:
• Exponentiated sum of player ratings
• Modified by map-size and server
¸s;m =exp(P jP s j
i=1;i2P sµi ;m¡ s;j )
BYU Computer Science Department
Bradley Terry Likelihood
• Probability of sAxis defeating sAllies:
¸sA x i s ;m=(¸sA x i s ;m +¸sA l l i es ;m)
BYU Computer Science Department
Likelihood Function
• Product of map predictions
• G: total # of matches, w(g): winning side for match g, l(g): losing side, m: map
P (wj¸) =QG
g=1 ¸w(g);m(¸w(g);m +¸ l(g);m)¡ 1
¸w(g);m = exp(P jPw (g) j
i=1;i2Pw (g)(µi ;m¡ s;j ))
¸ l(g);m = exp(P jP l ( g) j
i=1;i2P l ( g)(µi ;m¡ s;j ))
BYU Computer Science Department
Public Server Problem
• Players come, go, change teams at will
• Need time played per team
• Available in original data
BYU Computer Science Department
Simple Exposure Model
• Weighted sum: % time played per time
i,w(g) (i,l(g)): % of the total match time player i spent on the winning (losing) team
¸w(g);m = exp(P jPw (g) j
i=1;i2Pw (g)(¿i ;w(g)µi ;m¡ s;j ))
¸ l(g);m = exp(P jP l ( g) j
i=1;i2P l ( g)(¿i ;l(g)µi ;m¡ s;j ))
BYU Computer Science Department
Prior Selection
• Instead of non-informative priors, hyperprior distributions:
2, 2
, and 2 ~ Inverse Gamma
• chosen such that the means are 1 and the variances 1/3.
• Keeps player ratings between -3,3
• Hyperpriors to infer relative differences
BYU Computer Science Department
Fit with MCMC: Quickly
• Markov-Chain Monte Carlo Integration
• Samples complete conditional distributions– Thousands of samples per parameter– Take the mean / standard deviation of samples
BYU Computer Science Department
MCMC Results Example: 3-1
Ranked 2 standard deviations below mean
3rd place 8-1 vs. 8th place 9-0
BYU Computer Science Department
Combined Server Difficulty
• Ranked in order of difficulty
• Lower posterior mean is more difficult
• Veterans could choose to play on server 2
• Newer players on Server 1
BYU Computer Science Department
Combined Map-Side Effects
• Oasis biased towards Allies.
• Better players should play on Axis
• Venice a balanced map
• Of interest: both popular maps.
BYU Computer Science Department
Bayesian 2 Goodness-of-fit
• Valen Johnson, Annals of Statistics, 2004• Yields p-values for joint samples
• Server 2 does have a less consistent player base• Biased accuracy near 100%
BYU Computer Science Department
Problems with MCMC
• Average Enemy Territory match: – 15 minutes
• Time to fit 300 matches with MCMC:– 30 minutes
• MCMC can not keep up with new matches
BYU Computer Science Department
Second Data Set
• Matches: 5,000
• Players: 2,000+
• Time for MCMC: On the order of days
• Common Efficient Solutions:– Newton-Raphson method– Elo / Glickman Update– Expectation Propagation
BYU Computer Science Department
Newton-Raphson Method
• Batch Gradient Descent
• L’: vector of first derivatives
• L’’: matrix of second partial derivatives
• k: current iteration
• Note: [-L'']-1 covariance matrix of multivariate normal approximation
BYU Computer Science Department
Problems with Newton-Raphson
• Requires storing match history and re-fitting the data after every match, becomes impractical and slow.
– Preferable to update based on last match only
• Matrix of partial-second derivatives too large.
– Millions of players: impossible to store. – Takes too long to invert.
BYU Computer Science Department
Recursive Newton-Raphson
• Based on Bottou and Lecun (2004)
• t a “leaky” approximation to [-L'']-1 (covariance matrix).
BYU Computer Science Department
Recursive Newton-Raphson
• Bottou and Lecun: Empirical / Theoretical– asymptotically outperforms Newton-Raphson – Any batch gradient descent method.
BYU Computer Science Department
Applied to Enemy Territory
• Derive from the log posterior
• Priors instead from MCMC
• Example: Player Rating. – Winning Time - Prediction - Shrinkage
BYU Computer Science Department
Bayesian Shrinkage Terms
• Batch: applied once on entire set of matches
• Recursive: Applied once per update– Weight each by 1/|matches|
• |matches| unknown a priori
– Weight by infinite geometric series 2-t-1
• Sums to 1.0, like applying once
• Effect of prior diminishes given data
BYU Computer Science Department
Time-Varying
• Recursive algorithms track time-varying differences
• Update a weighted sum of prior performance and recent performance
• Variance approximation leaky, can track changes over time.
BYU Computer Science Department
Results: Accuracy
• Measured before updating for each match
• For an unfair comparison:– TrueSkill™ Reported Large Teams: ~ 0.62
– More to show 70% is good.
BYU Computer Science Department
Uses for Ratings
• Rank Players
• Improve Map Design
• Help Choose Servers
• Level up, MMORPG– Clear progression path– Play on easier servers first, “graduate” to harder
ones
BYU Computer Science Department
Active Team Balancing
• Public Server dynamics mean teams need to be balanced during play
• Greedy: Move player to bring probability of both teams winning closest to 50-50
• Uncomfortable for player moved
• Increases “fun” factor overall
• Sequential optimal design
BYU Computer Science Department
Future Directions
• Explicitly Model time-varying changes
• Number of players vs. map-side rating
• Online Bayesian Neural Network Training
• Expectation-Propagation for this model
• Direct Comparisons to TrueSkill™
BYU Computer Science Department
Questions?
• Thanks for coming!
• Demo if time:http://stats.etpub.org