Predicting Customer Lifetime Value - umu. ?· Predicting Customer Lifetime Value Using machine learning…

Download Predicting Customer Lifetime Value - umu. ?· Predicting Customer Lifetime Value Using machine learning…

Post on 16-Oct-2018




1 download

Embed Size (px)


<ul><li><p>Predicting Customer Lifetime ValueUsing machine learning algorithms</p><p>Matilda Karlssson</p><p>Matilda KarlsssonHT 2016Examensarbete, 30 hpSupervisor: Patrik EklundExaminer: Henrik BjorklundCivilingenjorsprogrammet i tekinsk datavetenskap, 300hp</p></li><li><p>Abstract</p><p>Spending money to acquire new customers can be a risk since new play-ers never immediately pay off. In this thesis three machine learningalgorithms, neural network, bayesian network and regression, is used totry to early find out if it is possible to determine how much a user willspend in the game in order to minimize the risk.</p><p>The result showed that neural network performed badly mostly becausethere might not be a strong correlation between how a player plays, orwhere he comes from, and how much he will spend.</p><p>Because of how bayesian network works, it was hard to answer the ques-tion, but it still gave a good indication at what kind of players spendsmoney in the game.</p><p>Regression showed that a player should have paid off around 50% of theadvertisement cost around day six or seven, or it will most likely neverpay off.</p></li><li><p>Contents</p><p>1 Introduction 1</p><p>1.1 Partners 1</p><p>1.2 Purpose 1</p><p>2 Problem description 3</p><p>2.1 Problem 3</p><p>2.2 Mad skills motocross 2 3</p><p>2.3 Goals and purposes 4</p><p>2.3.1 In-depth study 4</p><p>2.3.2 Data 5</p><p>3 Machine learning algorithms 7</p><p>3.1 Bayesian network 7</p><p>3.1.1 Sprinkler, rain and wet grass 8</p><p>3.1.2 Hugin Expert 10</p><p>3.2 Neural network 10</p><p>3.2.1 Back propagation 12</p><p>3.3 Regression 12</p><p>4 Results 15</p><p>4.1 Result of neural network 15</p><p>4.2 Result of bayesian network 16</p><p>4.3 Result of Regression 17</p><p>5 Conclusion and future work 21</p><p>5.1 Conclusion and limitation 21</p><p>5.2 Future work 21</p></li><li><p>A All other bayesian networks 25</p></li><li><p>1(40)</p><p>1 Introduction</p><p>Having customers is an essential part of any profitable company. Without paying customersmost companies would quickly have to go bankrupt. Therefore most companies needs away of acquiring new customers.</p><p>Advertisement could be an efficient way to gain new customers. But advertisement costsmoney and if the company wants to profit they need to acquire enough costumers to coverthe advertisement costs. A new customer might have a lifetime of over a year, and duringthis time the customer will hopefully pay off. However, when acquiring new customerswe do not want to wait a full year before knowing whether or not the advertisement wassuccessful. The best case would be to know already after a week or two so that we couldquickly stop advertise in case it costs more than it will ever return. Therefore it could be ofgood value to be able to predict a customers lifetime value. When knowing the customerlifetime value a company also know how much they can be willing to spend to acquire newcustomers.</p><p>Turborilla AB[23] develops games for iOS and Android and wishes to advertise their gamewith Pay per Click advertisement. Turborilla would have to pay for every click on theiradvertisement, and hopefully gain a customer for every click as well. In Turborillas gameMad Skills Motocross 2, they earn money from customers watching ads, and customersbuying certain in-game features, such as a new bike or some kind of assist to beat a track.</p><p>1.1 Partners</p><p>The idea for this thesis was provided by Turborilla AB. They also provided test data andresources to use when implementing and testing the algorithms. They helped a lot duringthe work of this thesis with knowledge, information and a workspace.</p><p>1.2 Purpose</p><p>The purpose of this thesis is to test if it is possible to predict the lifetime value of a customerwithin a confidence interval:</p><p>The customer X will spend Y amount within 180 days with 95% confidence.</p><p>Preferably as soon as possible after the customer was acquired.</p><p>This can be divided into several subtasks. First we need to figure out which attributes</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>2(40)</p><p>about a customer that correlates to their lifetime value. Looking at the device they playon might give an indication of how much they are willing to spend on a mobile game. Itcould be possible that someone on a brand new phone might be willing to spend real moneyon a mobile game while someone one an older phone might not be, or that how well theyperform in the game correlates to how much they are willing to spend.</p><p>It is also essential that we are able to measure the confidence of the predictions. A modelthat predicts the lifetime value of customers is useless unless it also gives some kind ofconfidence of the prediction. Using a 95% confidence interval means that we know with acertainty of 95% that the actual value will lie within the interval. Any percentage could beused for the interval, and 90%, 95% or 99% is commonly chosen.</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>3(40)</p><p>2 Problem description</p><p>This chapter will explain more about the game in question, the problem and explain the datathat Turborilla currently saves about their customers.</p><p>2.1 Problem</p><p>Turborilla does not have any way of calculating their customers lifetime value today, mean-ing that this thesis does not depend on any previous work at Turborilla.</p><p>The requirements for the prediction of lifetime value is that is should run fairly quickly.With Turborillas 29 million users it can be expected to take some time but this should stillbe minimized as much as possible. The software also needs to be able to make confidentpredictions. The software needs to predict the right value quite certain to be of any use atall.</p><p>2.2 Mad skills motocross 2</p><p>The main focus of this thesis is how Turborilla AB profits from their game Mad SkillsMotocross 2 (MSM2).</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>4(40)</p><p>Figure 1: Screenshot from the game MSM2, where two players race against each other.</p><p>MSM2 is a 2D side-scrolling game where players compete against each other or against acomputer controlled player. Figure 1 is a screenshot from the game where two players raceagainst each other. The game is free to download and play for free if the user wishes too.New bikes or bike parts are possible to unlock when the player is skilled enough, or theycan be bought at any time for real money.</p><p>A player who has never spent any money in the game is shown advertisement once per day,meaning that Turborilla will profit from users even if they have not spent any money in thegame.</p><p>2.3 Goals and purposes</p><p>This thesis consists of two parts.</p><p> Make an in-depth study of machine learning algorithms, calculations of lifetime valuefrom other companies and study Turborillas data.</p><p> Using some machine learning methods to test if it is possible to predict players life-time value.</p><p>2.3.1 In-depth study</p><p>The first part of the in-depth study consists of reading how other companies evaluate theircustomers lifetime value and Turborillas data. When reading about other calculations ofcustomers lifetime value, most used an equation that depends on a customers average spentfor a week and for how long they expect a customer to stay within the company. CustomerLifetime Value = weekly spent * number of weeks in company. When checkingthis against Turborillas data it was quite easy to understand that this approach would notbe sufficient. This is the main reason why this thesis focuses on using machine learning</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>5(40)</p><p>algorithms to solve this problem.</p><p>In the in-depth study the main focus what therefore machine learning algorithms. Thealgorithms studied where bayesian network, neural network and regression. The study ofthe algorithms shows what kind of data that applies usually is used. Neural network tendsto work best with continuous data, while Bayesian network also work on categorical data.</p><p>2.3.2 Data</p><p>There are two kinds of data available, one for aggregated players, and one for a single player.</p><p>Turborilla has today around 29 million users, meaning that the user data over every singleplayer is huge. The data is also split into two; one for Android users and one for Appleusers. These are split so that around two third of the users are Android users and one thirdis Apple users.</p><p>Aggregated data</p><p>The aggregated user data is grouped by the date players started playing, and for all of theseit is possible to see what they together spent a specific date.</p><p>Table 1 A table over aggregated data showing date, players and spent every day after startDate players 0 1 2 3 4 301 jan 5241 $ $ $ $ $ $2 jan 4613 $ $ $ $ $ $3 jan 4166 $ $ $ $ $ $4 jan 2554 $ $ $ $ $ $...26 feb 5048 $ $ $ $ $27 feb 4512 $ $ $ $28 feb 5623 $ $ $29 feb 5241 $ $30 feb 1245 $</p><p>Table 1 shows how the aggregated data looks like. The number of users starting on a specificday are made up, and in the real data it is a numbers instead if $, but Turborilla wanted tokeep these numbers hidden.</p><p>For every date, 1 Jan, 2 Jan, it shows how many players that installed the game on thisspecific date, and the rest is how much they totally spent on day x after they started. So forthe 5241 players who started on January 1 the 0 column is how much they spent on Januray1, and the 1 column shows how much they spent on January 2. But for the 4613 playerswho started on January 2 the 0 column shows how much they spent on January 2, and thecolumn 1 shows how much they spent on the January 3. This data is only collected up untilday 30.</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>6(40)</p><p>Single user data</p><p>The single user data is collected in massive database files, where every row in one user.Their data is also split into several different files. Every player in this data has its own ID-tag to keep track of them in between files, that is also used as a primary key if the data isloaded into a database.</p><p>This data saves only if a player has does something, much like a boolean value, and not anydates when it happened. So if a player has lost track one nine times, the database wouldsimply only save a 9. The only date it does save is starting date for each player and thelast time they were online.</p><p>Table 2 Some example columns from the databaseuserID device timezone bike customized bike unlocked bought anything track one lostsdf45ER234gFGGFGH iPhone7 America/Honolulu True 8 False 8qwe234SDXV23aqwer iPhone4 America/Toronoto True 3 True 7</p><p>Table 2 shows some example columns of what the data might look like. This is a reallycompromised version of the database, in reality the data is a lot larger. The data in the Table2 is made up. Like said these are also a concatenation of what could be seen in severaldifferent files. In reality game progress and device is not saved in the same file.</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>7(40)</p><p>3 Machine learning algorithms</p><p>Machine learning is to teach computers to do things without being explicitly programmedto do so [13]. Machine learning has developed from pattern recognition and is today usedin several different fields. Banks uses it to get insights in investment opportunities, websitesuses it to recommend items users might like depending on previous purchases and trans-portation services analyse data to find patterns that will make routes more efficient, andtherefore profitable [5].</p><p>In this thesis, three different machine learning algorithms were used, which are explainedin detail below.</p><p>3.1 Bayesian network</p><p>A bayesian network is used to find patterns of influence among a set of variables [12]. Abayesian network consist of a directed acyclic graph where the nodes in the graph are ran-dom variables like attributes or features, and the arc between nodes represent a dependencebetween the nodes. Since the arcs have a direction it indicates that A causes B when thearc goes from A to B [14]. Bayesian networks are also called probabilistic networks becausethey use classic probabilistic calculus.</p><p>The possibility of an event A denoted P(A) is a number in the interval [0,1]. The basictheorems of probabilistic calculus are:</p><p>P(A) = 1 if and only if A in certain</p><p>If A and B is mutually exclusive then:</p><p>P (A B) = P (A) + P(B)</p><p>A basic concept in Bayesian network is conditional probability. A lot of statements haveunsaid prerequisites, like the probability that a die turning up 6 is 16 have an unsaid pre-requisite that the die is fair. This is denoted</p><p>P(A | B) = x</p><p>Giving the event B, the probability of A is x.</p><p>This means that if B is true and everything else known is irrelevant for A, then P(A) = x.</p><p>Another fundamental rule for probability calculus is:</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>8(40)</p><p>P(A | B) P(B) = P(A, B)</p><p>where P(A, B) is the same as A B. Followed from this we get that</p><p>P(A | B) P(B) = P(B | A)P(A)</p><p>P(B | A) = P(A | B) P(B)P(A)</p><p>If the formula is dependent on a context C, the formula becomes like equation 3.2 [10]</p><p>P(A|B,C)P(B|C) = P(A,B|C) (3.1)</p><p>P(B|A,C) = P(A|B,C)P(B|C)P(A|C)</p><p>(3.2)</p><p>3.1.1 Sprinkler, rain and wet grass</p><p>A classical example of a bayesian network is the cause of wet grass. The wet grass couldbe caused by either rain or sprinklers, but seldom by both. If the grass is wet and that thesprinkler is on, the chance of rain decreases.</p><p>sprinkler rain</p><p>cloudy</p><p>wet grass</p><p>Figure 2: Bayesian network over weather, sprinkler, rain and wet grass, showing that thewet grass depends on the sprinklers and rain, but not directly on the weather</p><p>Figure 2 shows how the wet grass depends on sprinklers and rain.</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>9(40)</p><p>Table 3 Several tables: first showing the chance of cloud, then the chance of sprinklersbeing on in case of cloud, then the chance of rain in case of cloud and last the chance of thewet grass, depending on sprinklers and rain</p><p>P(Cloudy=F) P(Cloudy=T)0.2 0.8Cloudy P(Sprinkler=F) P(Sprinkler=T)T 0.9 0.1F 0.5 0.5Cloudy P(Rain=F) P(Rain=T)T 0.2 0.8F 0.9 0.1Sprinkler Rain P(Wet grass=F) P(Wet Grass=T)F F 1.0 0.0T F 0.1 0.9F T 0.1 0.9T T 0.01 0.99</p><p>Table 3 shows the different probabilities for cloudy weather, sprinklers, and rain in the caseof cloud or not could, and wet grass, depending on sprinklers and rain.</p><p>From this we can make predictions: If the sprinklers is on, the grass is probably wet.Abductions: if someone falls on the slippery grass, it is probably wet. Abduction: if thegrass is wet it is more likely that either the sprinkler is on, or it is raining. Explainingaway: If the sprinklers are on, the likelihood of it also raining is reduced [17].</p><p>To calculate something in a bayesian network equation 3.2 is used. For example P(W=T|C=T),the probability that the grass is wet, given that it is cloudy.</p><p>P(W = T |C = T ) = S={T,F}</p><p>R={T,F}</p><p>P(C = T,S,R,W = T )C = T</p><p>(3.3)</p><p>Because of the conditional dependencies P(C, S, R, W) = P(C)P(S| C)P(R| C) P(W| S, R)</p><p>Therefore equation 3.3 becomes:</p><p>S={T,F}</p><p>R={T,F}</p><p>P(C = T,S,R,W = T )C = T</p><p>=P(T,T,T,T )+P(T,F,T,T )+P(P,T,F,T )+P(T,F,F,T )</p><p>P(C = T )(3.4)</p><p>0.8 0.1 0.8 0.99+0.8 0.9 0.8 0.9+0.8 0.1 0.2 0.9+00.8</p><p>= 0.7452 (3.5)</p><p>At equation 3.5 the last term becomes 0 because P(W=T | R=F, S=F) is 0. The result 0.7452means that P(W=T | C=T) is 75%.</p><p>Matilda Karlsson Predicting Customer Lifetime Value</p></li><li><p>10(40)</p><p>3.1.2 Hugin Expert</p><p>For this thesis, the program Hug...</p></li></ul>