churn prediction in mobile social games: towards a ...yokozunadata.com/research/churnslides.pdf ·...
TRANSCRIPT
Churn Prediction inMobile Social Games: Towards a Complete Assessment Using Survival Ensembles
1
África Periáñez, Alain Saas, Anna Guitart and Colin MagneIEEE/ACM DSAA 2016Montreal, October 19th, 2016
About us
2
Who are we?● Game and technology company based in Tokyo (spin-off of
Silicon Graphics)
● Research project to provide Game Data Science as a Service
● Goals: predict player behavior, scale to big data and intuitive result visualization
3
● Free-to-play mobile social games● in-app purchases and activity behavioral data
Our data
4
Churn prediction in Free-To-Play games
We focus on the top spenders: the whales ➔ 0.2% of the players, 50 % of the revenues➔ Their high engagement make them more likely to answer positively to
action taken to retain them➔ For this group, we can define churn as 10 days of inactivity
◆ The definition of churn in F2P games is not straightforward
The modelSurvival Ensembles
5
Challenge: modeling churn
◎ Survival analysis focuses on predicting the time-to-event, e.g. churn○ when a player will stop playing?
◎ Classical methods, like regressions, are appropriate when all players have left the game
◎ Censoring Problem: dataset with incomplete churning information
◎ Censoring is the nature of churn
➔ Survival analysis is used in biology and medicine to deal with this problem
➔ Ensemble learning techniques provide high-class prediction results
6
◎ We focus on whales◎ Churn definition as 10 days of inactivity◎ Cumulative survival probability (Kaplan-Meier estimates) ◎ Step function that changes every time that a player churns
7
Output of the model
◎ Two approaches:○ Churn as a binary classification○ Churn as a censored data problem
◎ One model: Conditional Inference Survival Ensembles1 ○ deals with censoring ○ high accuracy due to ensemble learning
Survival Analysis
➔ Survival analysis methods (e.g. Cox regression) does not follow any particular statistical distribution: fitted from data
➔ Fixed link between output and features: efforts to model selection and evaluation
1) Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework 8
Challenge: modeling churn
Survival Tree➔ Split the feature space
recursively
➔ Based on survival statistical criterion the root node is divided in two daughter nodes
➔ Maximize the survival difference between nodes
➔ A single tree produces instability predictions
Conditional Survival Ensembles➔ Outstanding predictions
➔ Make use of hundreds of trees ➔ Conditional inference survival
ensemble use a Kaplan-Meier function as splitting criterion
➔ Overfit is not present
➔ Robust information about variable importance
➔ Not biased approach
9
Conditional inference survival ensembles
Conditional inference survival tree partition with Kaplan-Meier estimates of the survival time which characterizes the players placed in every terminal node group
10
Linear rank statistics as splitting criterion
Survival tree
◎ Two steps algorithm:
○ 1) the optimal split variable is selected: association between covariates and response
○ 2) the optimal split point is determined by comparing two sample linear statistics for all possible partitions of the split variable
Random Survival Forest
➔ RSF is based on original random forest algorithm1
➔ RSF favors variables with many possible split points over variables with fewer
111) Breiman L. 2001. Random Forests.
Conditional inference survival ensembles
Features selection◎ Game independent features:
○ player attention:
● time spent per day
○ player loyalty :
● number of days connecting (loyalty index)
● days from registration to first purchase
● days since last purchase
○ player intensity:
● number of actions, sessions, etc.
● amount in-app purchases
◎ Game dependent features:● player level: (concept common to most games)
12
Features selection
◎ Game independent features:
○ player attention: time spent per day, lifetime
○ player loyalty : number of days connecting, loyalty index (number of days played over lifetime), days from registration to first purchase, days since last purchase
○ player intensity: number of actions, sessions, amount in-app purchases, action activity distance (total average actions compared to last days behaviour)
○ player level: concept common to most games)
◎ Game dependent features researched but ultimately not part of our model:
○ participation in a guild (social feature)
○ actions measured by categories
13
The ResultsWith “Age of Ishtaria” Game Data
14
15
Binary classification results and comparison with other models
16
Predicted Kaplan-Meier survival curves as a function of time (days) for new or existing players
Censored data problem results
17
Validation -- Churn prediction
18
Validation -- Churn prediction
1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression
◎ Censoring problem is the right approach○ the median survival time, i.e. time when the percentage of
surviving in the game is 50%, can be used as a time threshold to categorize a player in the risk of churning
◎ Binary problem -- static model○ also bring relevant information○ useful insight for a short-term prediction
◎ SVM, ANN, Decision Trees, etc. are useful tools for regression or classification problems.○ in their original form cannot handle with censored data○ 1) modification of algorithm or 2) transformation of the data
19
Survival ensembles approach
◎ Application of state-of-the-art algorithm “conditional inference survival ensembles” ○ to predict churn ○ and survival probability of players in social games
◎ Model able to make predictions every day in operational environment
◎ adapts to other game data: Democratize Game Data Science
◎ relevant information about whales behaviour ○ discovering new playing patterns as a function of time○ classifying gamers by risk factors of survival experience
◎ Step towards the challenging goal of the comprehensive understanding of players
20
Summary and conclusion
21
Other work related to Game Data Science
Discovering Playing Patterns:Time Series Clustering of Free-To-Play Game DataAlain Saas, Anna Guitart and África PeriáñezIEEE CIG 2016
Special Session on Game Data ScienceChaired by Alain Saas and África PeriáñezIEEE/ACM DSAA 2016www.gamedatascience.org