bachelor project42

49
Steffen Steffensen Halkjær Eriksen 011189-2725 A win-rate model in Pokémon TCG Bachelor Project Bachelor in Mathematics-Business Economics Supervisor Professor Jørgen T. Lauridsen December 2011

Upload: johnmuser

Post on 13-Oct-2014

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bachelor Project42

Steffen Steffensen Halkjær Eriksen 011189-2725

A win-rate model in Pokémon TCG

Bachelor Project Bachelor in Mathematics-Business Economics

Supervisor Professor Jørgen T. Lauridsen

December 2011

Page 2: Bachelor Project42
Page 3: Bachelor Project42

i

Acknowledgements

I would like to thank Professor Jørgen T. Lauridsen for always being there when I needed it.

He gave me good advice, pointed me in the right direction; without his help I am sure I would

not be able to do the best I could.

To Søren Rud Kristensen I would like to thank for the help when working with Stata and the

nice comments during my work.

I specially would like to thank all the people who let me interview them for this project. They

gave me good and wise answers to all my question and thereby making me able to get a more

complete view of the game. They have all been a pleasure to interview.

I will also like to express my gratitude to all the people who answered my survey and in that

way helped me get the data I needed to make this research.

Finally, a really heartfelt gratitude goes to my friends and family who encouraged and

supported me in the process, to my previous neighbor Ákos Kancsal who helped me get the idea

to make this project and to my fiancé Catarina for being the most amazing girl one could ever

wish for.

Page 4: Bachelor Project42

ii

Page 5: Bachelor Project42

iii

Abstract

The strategic card game known as Pokémon Trading Card is a dynamic game where a lot of

different factors can influence the outcome of a game. This paper investigates what affects a

Pokémon  TCG  player’s  win-rate besides luck, with respect to the season 2010-2011. In order to

do so a cross sectional dataset containing 84 individuals was collected along with 7 interviews.

Techniques from econometrics are applied in other to determine the effect of the different

factors   on   a   player’s   win-rate. In a situation where the dependent variable is a coding of

qualitative outcome, a probability model is applied to maintain the familiar type of regression.

The dependent variable could then be linked to a list of factors, each of them with a different

impact on the probability for a higher win-rate. The model chosen is the logit model due to its

mathematical convenience and the fact that the posterior distribution is a continuous

probability function which then holds with the theory of probability models. To take account for

problems with heteroscedasticity a weighted least-squares logistic regression for grouped data,

known as glogit, has been used to estimate the data.

Results revealed a positive effect of experience, playing decks containing a SP-engine and

being a Pokémon professor. Ageing proves to have a negative   effect   on   a   player’s   win-rate.

There is no difference whether a player is from the USA or not. Also having family members

playing, having a job or playing abroad during the season showed no significant influence on the

win-rate.

Page 6: Bachelor Project42

iv

Page 7: Bachelor Project42

v

Table of Contents Acknowledgements .................................................................................................................................. i

Abstract .................................................................................................................................................. iii

Table of contents....................................................................................................................................v

Chapter 1. Introduction .......................................................................................................................... 1

Chapter 2. Data ....................................................................................................................................... 5

2.1. Data collection ............................................................................................................................. 5

2.2. Variables ....................................................................................................................................... 6

2.2.1 Dependent variable ................................................................................................................ 7

2.2.2 Independent variables .......................................................................................................... 8

Chapter 3. Methods .............................................................................................................................. 11

3.1. Probability models ..................................................................................................................... 11

3.2. The logit model .......................................................................................................................... 12

3.3. The model .................................................................................................................................. 14

Chapter 4. Results ................................................................................................................................. 17

Chapter 5. Discussion ............................................................................................................................ 23

Chapter 6. Conclusions ......................................................................................................................... 27

Chapter 7. Future work ......................................................................................................................... 29

References ............................................................................................................................................ 31

Appendix A – Data collection

Appendix B – Scan of cards containing in a SP-engine

Appendix C – Omitted variable bias

Appendix D – Heteroscedasticity tests

Page 8: Bachelor Project42

vi

Page 9: Bachelor Project42

Chapter 1. Introduction

Page 1 of 31

Chapter 1. Introduction

In 1998 the Pokémon Trading Card Game was created. The game known as Pokémon TCG

was based on the video game series created by Satoshi Tajiri. In contradistinction to other TCGs

at that time like Magic; The Gathering, it appealed more to the younger audience. It quickly

became very popular and spread from Japan throughout the world. The game was published by

Wizards of the Coast from its creation until they lost the license to Nintendo in 2003. Affiliated

and owned by Nintendo, The Pokémon Company international (TPCi) is the responsible for all

Pokémon franchise and marketing. Play! Pokémon1 is the division of TPCi that takes care of the

Pokémon TCG and is responsible for the organized tournament play. Today, after 13 years,

Pokémon TCG is still one of the most popular TCGs along with Magic; The Gathering and Yu-Gi-

Oh.

The game itself is a 2 player game, where each player has their own pre-constructed 60 card

deck. There are different types of ways to play the game, the most common one to  be  “Modified  

Constructed”.2 This is the type of game play there will be focus on in this report. The goal for

each player is to  knock  out   the  other  player’s  Pokémon with the help of their own Pokémon,

energy and trainer cards. Each time a player succeed in knocked out one of his or her opponent’s

Pokémon the player are allowed to take 1 of his or her initial 6 prize cards. There are in general 4

ways to win the game:

1. Taking all 6 prize cards before the opponent.

2. Knocking out the opponent’s  last  Pokémon on the field.

3. Decking out the opponent.3

4. Winning by using “Lost  World”.4

In the event of both players fulfilling one or more of above win conditions at the same time,

the player fulfilling more win conditions wins the game; otherwise a new game to one prize card

is played. For more specific rules of the game; how to play the cards, how to construct a deck

and much more visit the official Pokémon website. (1)

1 Play! Pokémon was until August 2010 known as Pokémon Organized Play (POP). 2 Modified Constructed means that each player plays with a pre-constructed 60 card deck, where only cards from a certain number of sets are allowed in the deck. 3 Each player must draw a card from their deck in the beginning of their turn. If there are no more cards left in the deck and therefore the player is unable to draw a card, that player loses the game. 4 Your  opponent  must  have  at  least  6  Pokémon  in  their  “Lost  Zone”  in  order  to  claim  themselves  the  winner.  

Page 10: Bachelor Project42

Chapter 1. Introduction

Page 2 of 31

There has previously been done research on other card games like Texas Holdem Poker (2) (3),

however there has not yet been done research about Pokémon TCG. Unlike a game like Texas

Holdem  Poker,  Pokémon  TCG  is  not  “constant”.  The amount of cards the player may use for his

or her decks change over time, while in Texas Holdem Poker the player play with the same 52

card every time. There are usually 4 new sets released every year, each of them containing

around 100 cards. There is then a format change once a year removing a number of the oldest

sets. A player usually has a card pool in the beginning of the season of at least 500 different

cards to create decks from. Then during the season more cards are added through the release of

new sets. A player then end up with a card pool of around 1000 cards before the format change

at the end of the season. All this together gives a dynamic game in constant evolution where

players constantly have to learn new cards and combos in order to not fall behind.

When playing a game like Pokémon TCG, a player have to think a lot about how to construct

his or her 60 card deck before entering a tournament. Since there is more than one way to win

the game and the game is as dynamic as it is, play testing serves an important role in achieving

more wins doing tournament play. So having a test group who is very devoted to the game gives

a player a huge advantage in the game.(3) The more a player play test, the more the player know

about what he or she should include in his or her deck and which tactic should be used. Things

like this ads a lot factors into consideration when discussing what will make a player win more

games. It  is  not  just  “the  luck  of  the  draw”  that  decides  the  outcome  of  a  game.  

The aim for this paper is to investigate what affects a Pokémon TCG player’s  win-rate other

than luck. The win-rate is defined as the percentage of premier rated games5 a player wins. This

research will additionally focus on a number of hypotheses, which all together will help to

answer the main research question of this paper. This study will focus on the 2010-2011 season

from September 1st until April 24th.6 Knowing  what  will  affect  a  player’s  win-rate might be helpful

to get a deeper understanding of the game and how a player can improve his or her own win-

rate.

5 At premier rated tournaments players can earn point in order to qualify for the World Championship. 6 Normally a season last one whole year(Sept. 1st to Sept. 1st ), but because of the release of Black&White on April 25th, where a lot of rule changes followed, the study only goes until April 24th.

Page 11: Bachelor Project42

Chapter 1. Introduction

Page 3 of 31

The objective of this paper is to answer the main research question:

“What  does  significantly  affect a Pokémon TCG players win-rate”

In order to answer this question the following hypothesis will be used:

“Ageing  will  affect  a  player’s  win-rate negatively”

“Playing the game for a longer time helps a player achieve a higher win-rate”

“Playing a deck with a SP-engine contribute to a higher win-rate”

“A  player achieve a higher win-rate if he or she has other family members playing”

“A full time job affects a player’s win-rate negatively”    

“The Pokémon professor status affects a player’s win-rate positively”

“Playing abroad during the season boost a players win-rate”

“Being  from  USA  does  not  affect  a  player’s  win-rate”

When researching about a subject that heavily involves the players themselves, the best way

to approach this seemed to be getting as much data from as many different players as possible.

Then by also getting data from people, who had another view of the game (mainly judges) a

more complete view of the game can be done, thereby being able to answer the main research

question more precise. This report makes use of nonexperimental data. There are both

quantitative data such as surveys and qualitative data like interviews. The advantages of

gathering the quantitative data in this way are that this kind of data makes it possible to use

techniques from econometrics in order to answer the research questions. The interviews which

form the qualitative part of the data were used to create the different hypothesis stated in this

paper.

So in the development to provide an answer to these questions a cross section data set has

been collected in the season of 2010-2011. In chapter 2 the data of this paper will be discussed.

Chapter 3 will discuss the methods used in this paper. Chapter 4 presents the results produced

Page 12: Bachelor Project42

Chapter 1. Introduction

Page 4 of 31

in this paper and a discussion of these is provided in chapter 5. In chapter 6 the final conclusions

will be made. Chapter 7 will look at some possible further research.

Page 13: Bachelor Project42

Chapter 2. Data

Page 5 of 31

Chapter 2. Data

In order to answer the research questions, a cross sectional analysis about Pokémon TCG

players win-rate from around the world in the 2010-2011 season has been conducted. With the

data collected it is then possible to capture the effects, such as a player’s  main  deck  choice  doing  

a season, particularly if the chosen deck contains a so called SP-Engine.7 The dependent variable

is  the  player’s  win-rate throughout all the regressions. The independent variables are capturing

the  factors  that  may  influence  the  player’s  win-rate.

2.1. Data collection

As stated earlier this report makes use of both quantitative and qualitative data with both

data types being nonexperimental. The period in which the data was collected stretched from

June 1st until August 14th. The quantitative part of the data was conducted as survey data. A total

of 84 observations were collected, after sorting out incomplete answers or for some other

reason not valid answers. An online survey was created at www.surveymonkey.com, so people

could either fill out the survey at that site or send the answers by e-mail. Different ways of PR

was made to draw attention to the survey. An article was published on one of the big pokémon

website www.sizprizes.com, small paper flyers were given out at the Dutch National

Championship 2011 and Danish National Championship 20118, mails were distributed to people

in the pokémon community and a copy of the article was posted at the facebook page for Danish

Pokémon players. A copy of the article and the flyer can be seen in appendix A.

When looking at the article and the flyer, certain conditions were stated in order to

contribute to the project. These conditions have been well considered and apart from the

reasons stated in the article some further comments ought to be given.

The  first  condition  was:  “Master players only”.  The reason for only using players from the

master division9, other than the one stated in the article, is that players who have reached that

7 An SP-Engine is when your collection of Supporter and Trainer cards in your deck mainly or partly consist of: Cyrus’s  Conspiracy,  Team Galactic's Invention G-101 Energy Gain, Team Galactic's Invention G-103 Power Spray, Team Galactic's Invention G-105 Poke Turn and Team Team Galactic's Invention G-109 SP Radar. A scan of each of the cards can be found in appendix B. 8 I was present at these two championships. One as a player (Dutch) and one as the head judge (Danish). 9 For season 2010-2011 a player born in 1994 or prior plays in the master division.

Page 14: Bachelor Project42

Chapter 2. Data

Page 6 of 31

age will often be able to play the most advanced decks and also play the game with less mistakes

than a player who plays in a lower age division.

The second condition was:   “Results from this season only – Prior to Black & White”. The

reason for this is that the game had some rule changes with the release of that set, which will

dramatically change the types of decks being played, making the results very unstable. The first

early rotation in the history of Pokémon TCG also happened at July 1st, removing a number of

the oldest sets in order to keep a healthy game environment.(4)

The  third  and  last  condition  was:  “You must have played at least 25 premier rated games”.  

As stated in the article this number was chosen after some consideration. A too low number will

give too unstable win-rates and will therefore be less reliable. Picking a too high number will

then give trouble with the amount of data that could be collected.

The qualitative part of data consists of 7 interviews of players and judges. The interviews

were made during the Pokémon TCG World Championship 2011 at Hilton Bayfront hotel in San

Diego, California from August 10th to August 14th. These selected people were all asked the same

questions in the same order. As stated earlier the people selected for the interviews were a mix

of players and judges from all around the world in order to get a more complete view of the

game. The questions asked during the interviews can be seen in appendix A.

The questions in the interviews were meant to be well connected to the questions asked in

the survey. However some of the questions in the interview also go beyond the questions asked

in the survey, to get some interesting statements from the people who were interviewed. All

together the statements obtained in the 7 interviews formed the different hypotheses stated in

this paper.

2.2. Variables

In this section the different variables, which are being used in the cross sectional regression

of Pokémon TCG players win-rate, will be described. Table 1 in summarizes the data that has

been collected.

Page 15: Bachelor Project42

Chapter 2. Data

Page 7 of 31

Table 1: Name, definition, mean, variance, min and max value of each of the variables described in the following subsections.

Name: Definition: Mean: Variance: Min: Max:

wr The  player’s  win-rate expressed in

%.

0.6463333 0.009979 0.4 0.904

age Age of the player in years. 21.83333 66.45382 15 64

exp The experience of the player in

whole years.

4.97619 12.43316 1 12

sp If  the  player’s  deck  contained  a  sp-

engine.

0.5357143 0.2517212 0 1

job Indicating if the player has a full

time job.

0.202381 0.1633678 0 1

prof Telling if the player has earned the

title of Pokémon professor.

0.4285714 0.2478485 0 1

abroad If the player played a tournament

abroad in the past season.

0.6666667 0.2248996 0 1

family Indicating if the player has any

family members playing.

0.4404762 0.2494263 0 1

usa_player Telling if the player is from the USA

or not.

0.5 0.253012 0 1

As mentioned earlier the  dependent  variable   is   the  player’s  win-rate. Multiple regressions

are run in order to test models with different specifications. In this way the most true and fair

model of the tested models can be found. Then that model can be used to describe what

influence a Pokémon  TCG  player’s  win-rate.

2.2.1 Dependent variable

When working with a win-rate, it can be considered that the model fitting the data best is a

model which fits a discrete choice, since a player can either win or lose when playing a game of

Pokémon TCG. So the choice of the model for this paper has been among models known as

qualitative response (QR) models. For the present case, the dependent variable can be

formulated as a coding of a qualitative outcome, namely the wins (coded as 1) and losses

Page 16: Bachelor Project42

Chapter 2. Data

Page 8 of 31

(coded as 0) of a player. However, there is further information in the present case, as there are

repeated observations for each player, i.e., each player played a number of games. This implies

that the dependent variable can be represented as the share of games won by the player. Thus,

the dependent variable (wr) is a figure between 0 and 1 which can be thought of as the

probability that the player in question would win a game. To fit into a probability model, the

dependent variable (wr) therefore must be transformed. In what follows, the logistic probability

model and thus the so called logit transformation is applied, which reads as follows:

𝑙𝑛 𝑤𝑟1 − 𝑤𝑟

A discussion of the choice of model and how the transformation is done, will be given in the

next section.

2.2.2 Independent variables

To begin with there are a various number of explanatory variables applied in the regressions,

in this section a short description of the variable used is given.

The  “age”  variable  denotes  the  age  of  the  player  at  the  time  he  or  she  answered  the  survey.  

This  variable   is  used  to  capture  the  effect  the  age  has  on  a  player’s  performance.   It  was  then  

used to create a new variable “agesquare”.  These two variables were then used to check for a

possible  ‘peak’  in  a  player’s  performance  and  more  important  at  what  age  such  peak  will  be.

“exp”  denotes  the  number  of  whole  years  the  individual  player  has  been  playing  the  game.  It  

measures the effect  of  experience  in  the  game.  Together  with  the  “sp”  variable  the  model  were  

extended  to  allow  the  ‘premium’  of  playing  a  deck  with  a  sp-engine to depend on the years of

experience. This interaction term,  which  will  be  denoted  “sp_exp”, was to test the idea that a

player   would   get   a   ‘premium’   according   to   the   years   of   experience.   So   a   player   with   a   low  

experience, who picked up a sp-deck, would gain a rather small boost in the win-rate, while a

player with more experience would gain a bigger boost in his or her win-rate by playing a sp-

deck. (4)(5)(6) Furthermore   this   variable   is   used   to   create   “expsquare”,   to   check   whether   the

return  to  experience  may  decrease  with  “expsquare”  <  0.  

The  variable  “sp”  is  a  dummy  variable  which  indicates  whether  a  player  mostly  played  a  deck  

containing a sp-engine or not. It measures the effect of playing with a deck containing a sp-

engine. It is important to note that some players played different decks during the season. So

Page 17: Bachelor Project42

Chapter 2. Data

Page 9 of 31

they might not have played with a deck containing a sp-engine the entire season. So the

variable is based on the deck a player played the most in the season. As mentioned above this

variable,  together  with  “exp”,  was  used  to  create  an  interaction  term  to  test  for  the  ‘premium’  

of playing a sp-deck.

To test for the effect of having a full-time   job,   the  variable  “job”   indicates   if   the player in

question had a full time job or was  studying.  This  was  used  to  test  if  a  player’s  win  rate  will  be  

negatively affected by having a full time job and not be studying. (3)(7)

The   “prof”   variable   captures the effect of being a Pokémon professor. This variable was

added to test if passing the professor exam and achieving the title of Pokémon professor will

increase your win-rate. The interviews have given the impression that being a Pokémon

professor not will affect your win-rate significantly. (8)(9)

“abroad”    denotes  if  the  player  has  been  playing  abroad  during  the  season.10 It measures the

effect  on  a  player’s  win-rate of playing abroad. It is to test if the player gets a positive effect on

his or her win-rate by playing abroad during the season. The general opinion gained from the

interviews, suggests a positive effect since the player in this way would have more exposure to

more tournaments, different players, different decks and different strategies. (3) (5) (9)

The  dummy  variable  “family”   tells   if  a  player  has  other   family  members  playing  the  game.  

This was used to test if there should be any positive effect of having family playing. Having

family playing makes it easy for a player to have a quick game without having to plan much. The

player can just wake up in the morning and ask another family member if he or she want to play

Pokémon, which makes it easier to develop strategies and test. (3) (6)

The   last   dummy   “usa_player”  denotes if a player comes from the USA. This variable was

used to test if there should be any significant difference in the level of the players from the USA

and outside the USA.11

10 For players in the USA, playing in another state counts as playing abroad. 11 Countries included in the list of countries outside the USA are: Australia, Canada, Denmark, England, Finland, Germany, Italy, Mexico, Netherlands, Portugal and Sweden.

Page 18: Bachelor Project42

Page 10 of 31

Page 19: Bachelor Project42

Chapter 3. Methods

Page 11 of 31

Chapter 3. Methods

In this section the methods used to obtain the results in this paper will be discussed,

furthermore the reason for the choice of model will be discussed. This section will conclude

what  will  be  the  ‘main’  regression  model  used  in  this  paper.

3.1. Probability models

In a situation where the dependent variable is some coding of qualitative outcome, it does

not seem like the familiar type of regression can be applied. However it is possible to construct

models, where each decision can be linked to a set of factors. In this way it is still possible to

maintain a regression like approach. The way it is done is by using probability models, with the

general structure:

𝑃𝑟𝑜𝑏(𝑒𝑣𝑒𝑛𝑡  𝑗  𝑜𝑐𝑐𝑢𝑟𝑠) = 𝑃𝑟𝑜𝑏(𝑌 = 𝑗) = 𝐹[𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡  𝑒𝑓𝑓𝑒𝑐𝑡𝑠, 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠] (3.1)

The interest of this paper is to see how factors such as age, experience, family and more

explain whether a TCG player wins or losses. The information of the factors can be gathered in a

vector x, so that it can be expressed by:

𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 𝐹(𝐱, 𝛃) and 𝑃𝑟𝑜𝑏(𝑌 = 0|𝐱) = 1 − 𝐹(𝐱, 𝛃) (10) (3.2)

The set of parameters β shows the change in the probability with a change in the vector x.

Such a change in x could be that a player started to play a deck containing a SP engine.

(Assuming the player did not play SP beforehand) It is possible to keep the standard linear

regression 𝐹(𝐱, 𝛃) = 𝐱′𝛃 and construct a model on that basis. However it is shown that this

model has some complications and therefore the model might not give predictions that look

like probabilities. (10) One of the complications is that the error term is Bernoulli distributed. This

means   that   ε   will   either   be   equal   to   −𝐱′𝛃 or 1 − 𝐱′𝛃, with the probabilities 1-F and F,

respectively. These complications are the reason that the linear regression approach is not so

frequently used. The requirement for a model is then that the model will give predictions which

holds with the theory described earlier in (3.1). Following this theory it would be expected that:

lim𝐱 𝛃→ 𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 1 and lim𝐱 𝛃→ 𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) = 0 (10) (3.3)

Page 20: Bachelor Project42

Chapter 3. Methods

Page 12 of 31

So actually any continuous probability distribution defined over the real line will be enough.

The distribution chosen for this paper is the logistic distribution which is defined as follows:

𝑃𝑟𝑜𝑏(𝑌 = 1|𝐱) =  𝐱 𝛃𝐱 𝛃 = 𝛬(𝐱′𝛃) (10) (3.4)

The model that arises from this distribution is called the logit model. It is closely related to

the probit model which arises from the normal distribution. The difference in the distribution

lies in the tails. The tails in the logistic distribution are heavier than the tails in the normal

distribution. It would therefore be expected to get similar results if encountering intermediate

values of    𝐱′𝛃 (e.g. -1.2 and +1.2). Furthermore the logistic distribution will tend to give a larger

probability to Y=0 when 𝐱′𝛃 is extremely little than the normal distribution, and then smaller

probabilities to Y=0 when 𝐱 𝛃 is very big. (10) On theoretical basis it is hard to justify which

distribution should be used and in most applications the choice between the two distributions

seems to make little difference. In this case the choice of the logit model is based on the

mathematical convenience.

After explaining the choice of the logistic distribution it will be natural to look at how the

logit model is derived and thus how the transformation of the dependent variable became

ln(wr/(1-wr)).

3.2. The logit model

First of all the kind of data observed for this paper has been grouped data. The grouped data

was obtained by observing ni individuals. (In this case 84 individuals was observed

independently of each other), all of them having the same vector xi. The dependent variable

was then a coding of the qualitative outcome (between 0 and 100). For simplification it will be

assumed that, throughout the rest of this section, the dependent variable denotes if a player

won or lost the game, instead of the win-rate. The idea is still the same and it can be extended

to the win-rate, since the win-rate denotes the proportion of games won of all games played. So

now the dependent variable will consist of the proportion Pi of the ni individuals i j who

responded with yij = 1(a game won). A single observation will then be expressed as

[𝑛 , 𝑃 , 𝒙 ], 𝑖 = 1,… . ,84. In this formulation it is then possible to use the familiar regression

methods to analyze the relationship between the proportion Pi and the vector of independent

variables xi. The observed Pi can then be treated as an estimate of the population quantity,

Page 21: Bachelor Project42

Chapter 3. Methods

Page 13 of 31

π =  F(𝐱 𝛃  ). This problem can then be treated as a Bernoulli experiment and then it can be

written as:

𝑃 = F(𝐱 𝛃  ) + 𝜀 = 𝜋 + 𝜀 (10) (3.5)

Where the expected value and variance of the error term is given by:

𝐸[𝜀 ] = 0,          𝑉𝑎𝑟[𝜀 ] = ( ) (3.6)

As it can be seen here the variance depends on xi and is considered to be heteroscedastic,

and it therefore suggest that these parameters can be estimated using a weighted least square

regression. However there is another way to proceed. The function F(𝐱 𝛃  ) is strictly

monotonic12, it is then 1 to 1 and it therefore has an inverse. A Taylor series approximation

around the point 𝑃 = 𝜋  (𝜀 = 0) for this function can be considered.

𝐹 (𝑃 ) = 𝐹 (𝜋 + 𝜀 ) ≈ 𝐹 (𝜋 ) + ( ) (𝑃 − 𝜋 )  (10) (3.7)

This expression can then be reduced to:

𝐹 (𝑃 ) ≈ 𝐱 𝛃 + 𝜀𝑖(𝜋𝑖)

(10) (3.8)

Since

𝐹 (𝜋 ) = 𝐱 𝛃  and ( ) = ( ( )) = ( ) (10) (3.9)

Equation (3.8) then produces a heteroscedastic linear regression of the form:

𝐹 (𝑃 ) = 𝑧 = 𝐱 𝛃 + u (10) (3.10)

Where

 𝐸[𝑢 |𝒙𝒊] = 0  and 𝑉𝑎𝑟[𝑢 |𝒙𝒊] = ( )[ ( )][ ( )] (10) (3.11)

With this knowledge it is now possible also to use this on the logistic model in (3.4). So the

inverse of the logistic function is the following:

𝛬 (𝜋 ) = 𝑙𝑛 (10) (3.12)

12 This is true for a probability model. (10)

Page 22: Bachelor Project42

Chapter 3. Methods

Page 14 of 31

The above function is known as the logit of 𝜋 , hence   the   “logit”  model.   It   has  now  been  

shown how the logit function of 𝜋 is derived and therefore why it is possible to use in this

paper.

3.3. The model

As mentioned earlier a cross sectional analysis of 84 observations has been conducted. The

choice of the logit model is because of its mathematical convenience and the fact that a

continuous probability distribution, like the logistic, holds with the theory from (3.1).

Furthermore, since the dependent variable in this paper consist of qualitative outcome, it was

possible to maintain the familiar linear regression with use of probability models. The

dependent variable could then be linked to a list of factors, each of them with a different

impact on the probability for, in this case, a higher win-rate.

When estimating the data using the logit model, there will arise some problems due to

certain types of heteroscedasticity. So to take account for the heteroscedasticity a weighted

least-squares logistic regression for grouped data has been used to estimate the data. This type

of model is known as a glogit13 model and it takes account for heteroscedasticity caused by

differences in the group sizes and an error term which is Bernoulli distributed. The two models

are never the less closely related and provide almost the same estimates.

It is important to note that even when the glogit model is applied to estimate the data and

produce the results, all tests performed in this paper are based on the logit model with the

same explanatory variables and not on the glogit model. This is because that all the familiar

tests known from OLS cannot be readily used on the transformed glogit model, as these are not

standard options in the STATA implementation. Formally, it is possible to derive the tests for the

glogit, but this is beyond the scope and space of the present project. The results from the tests

on the logit model can then be used to indicate any possible problems with the glogit model,

since the two models produce very similar estimates.

Like with any other linear regression model it is important to know if the assumptions are

violated.14 Even though the glogit models assures against heteroscedasticity caused by the two

13 The  name  ’glogit’  comes  from  the  Stata  command  of  the  same  name.  Stata  is  the  statistical  software  used  to  estimate the data in this paper. 14 In this paper the formulation of Hayashi will be used. (12)

Page 23: Bachelor Project42

Chapter 3. Methods

Page 15 of 31

cases described above, It will be preferable to test for heteroscedasticity as a function of the

data matrix X, so the Breusch-Pagan / Cook-Weisberg test in STATA is applied. The assumption

of homoscedasticity states that the conditional second moment, which in general is a nonlinear

function of X, is a constant, or written in more mathematical terms:

𝐸 𝜀 𝑿 = 𝜎 > 0  (𝑖 = 1,2,… , 𝑛) (10) (3.13)

This assumption is also known as the spherical error variance assumption. If this assumption

is violated, then the variance is not constant and varies with X. Furthermore the estimator is no

longer BLUE. The t and F-tests are no longer valid. However the estimator is still unbiased. (11)

To check if the assumption of multicollinearity is violated a correlation matrix has been used,

to check the correlation between the different explanatory variables. The idea of this

assumption is that matrix X should be at full column rank, i.e. none of the columns of the data

matrix X can be stated as a linear combination of other columns of X. The assumption then also

automatically implies that there are at least as many observation as regressors.

The most important assumption of the linear regression is the assumption of strict

exogeneity states that the expectation is conditional on all regressors for all observations.15 This

can be stated mathematically as:

𝐸(𝜀 |𝑿) = 0  (𝑖 = 1,2,… , 𝑛) (11) (3.14)

This assumption of strict exogeneity has several implications which are useful. One of them

being that 𝐸(𝜀 ) = 0. i.e. the unconditional mean of the error term is 0.16 Another implication is

that the explanatory variables are orthogonal to the error term for all observations:17

𝐸 𝑥 𝜀 = 0  (𝑖, 𝑗 = 1,… , 𝑛; 𝑘 = 1,… , 𝐾) (11) (3.15)

If this assumption is not satisfied one or more explanatory variables are said to be

endogenous. There are different ways an explanatory variable can be endogenous. The case

considered in this paper is the case of omitted variable bias. Remember that the error term

captures the impact of variables not included in the regression, in the case of this paper it could

15 Some authors define strict exogeneity as xi being independent of εi. 16 The proof of this is an application of the law of total expectation. 17 The proof of this is an application of the law of iterated expectations.

Page 24: Bachelor Project42

Chapter 3. Methods

Page 16 of 31

be  variables  like  “innate  ability”,  “Pokémon  League”18 and “motivation”. If any of such variables

is correlated with any of the explanatory variables the assumption of strict exogeneity does not

hold. A closer study of omitted variable bias is provided in appendix C.

No test for endogeneity is applied since it would be rather complicated in the logit model.

Moreover the model does not suggest problems with endogenous variables.

Below the econometric models is presented:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 =  𝛽 + 𝛽𝐗 + 𝛿𝐷𝑢𝑚𝑚𝑖𝑒𝑠 + 𝜀

X is a set of independent variables.

Dummies are capturing different effects such as:

𝑠𝑝 , 𝑗𝑜𝑏 , 𝑝𝑟𝑜𝑓 , 𝑎𝑏𝑟𝑜𝑎𝑑 , 𝑓𝑎𝑚𝑖𝑙𝑦 , 𝑢𝑠𝑎_𝑝𝑙𝑎𝑦𝑒𝑟

However a logit model with identical parameters is used to conduct the tests for each glogit

model.

18 Dummy if the player regularly attends a Pokémon League.

Page 25: Bachelor Project42

Chapter 4. Results

Page 17 of 31

Chapter 4. Results

This section will present the results of the different model specifications used in this

research. The model considered first, is a model with all the mentioned variables as

independent variables. From there on a different number of independent variables will then be

removed from the regression, in order to find the most suitable model to describe the present

data and which can answer the hypotheses stated in this paper. This section ends up with what

will be the final model describing what will have a significant impact on a Pokémon TCG player’s  

win-rate.

Before presenting the results, table 2 shows the correlation matrix between the variables

used in the different specifications in order to check for any problems with the assumption of

multicollinearity. Correlations above 0.3 between the variables are highlighted in the table.

There is a relatively high correlation  between  “age”  and  “job”  which  is  not  surprisingly  since  an  

older player is more likely to have a full time job, rather than being studying, which is in line

with the expectations. The  correlation  between  “exp”  and  “prof”  is  also  not  that  surprising  since

it is often player with at least some experience that is able to pass the professor exam. Also the

fact that a player has to be 18 in order to take the exam explains this correlation. This fact also

explains   the   correlation  between   “prof”   and  “age”.  Moreover “usa_player”   and   “abroad”  are  

also positively correlated. This might be explained by the fact that the distribution of the

tournaments in the USA is somewhat different than in Europe. In USA and Canada they have so

called Regional Championships which are big tournaments that attracts players from many

states. There are only a few Regional Championships held and therefore people have to travel

across state and country boarders in order to attend. Europe on the other hand only had one

tournament in the season 2010/2011, which could be counted as a Regional Championship.19

The   variables   “age”   and   “sp”   are negatively correlated. This fact indicates that there is a

relationship between the variables in such way, that the older the player is the less likely the

player will be playing a deck containing a SP-engine.

Finally, it can then also be concluded that it is highly likely that the models do not suffer

from problems with multicollinearity since there are no variables which is highly correlated.

19 The European Challenge Cup held in Arnhem, The Netherlands.

Page 26: Bachelor Project42

Chapter 4. Results

Page 18 of 31

Table 2: Correlation matrix of the variables used in the specifications.

age exp sp job prof abroad family usa_player

age 1 exp -0.1167 1

sp -0.2519 0.1163 1 job 0.4638 -0.1995 -0.1252 1

prof 0.2642 0.3765 0.0345 0.2224 1 abroad 0.001 0.0096 0.1013 -0.021 0.2041 1

family 0.1573 0.1839 0.0567 0.2096 0.2008 0.0678 1 usa_player 0.0617 -0.0068 0.0716 0.0296 0 0.2525 -0.2158 1

Table 3, shows the results from seven different specifications. All regressions are done with

the glogit model, to account for heteroscedasticity caused by differences in the group sizes and

by the Bernoulli distributed error term, which is a common problem whit probability models. In

regression (1), all variable shown in above correlation matrix were used. However this

specification showed only a few variables with explanatory power. Therefore a specification

with  only  “age”,  “exp”  and  “sp”  as  independent  variables  was  run  in  regression  (2).  In  regression  

(3) to (7) one of the other variables not used in (2) were added in turn to check if any of them

would add any explanatory power to the model. In all the specification a heteroscedasticity test

was applied to check for heteroscedasticity caused as a function of the data matrix. In all the

specifications, the null hypothesis of homoscedasticity was not rejected, so it can be concluded

that it is less likely that the trust in the model will be lower due to violation of the spherical

error variance assumption. All tests for heteroscedasticity can be seen in appendix D.

In  all  the  specifications  “age”,  “exp”  and  “sp”  were  added; they showed significant results at

the 5 % level,  except  for  “age”  in  regression  (1).  The  coefficient  for  “age”  is  negative,  which  is  in  

line with the expectations. It is not strongly negative, which is not surprising. The coefficient for

“exp”,  which   captures   the  effect  of  one  extra  year  of   experience   in   the  game,  has  a  positive  

coefficient. This supports the theory well. The strongly positive sign of the   “sp”   coefficient   is  

also in line with the expectations. It is highly positive comparing to other coefficients; however

this does not seem so surprising when looking at the statements from the interviews. (4) (8) (13)

Results revealed that even if the effect of having a job was insignificant, the negative sign of the

coefficient is in line with the expectations. (4) (10) The   positive   coefficients   for   “prof”   and

“abroad”   is   also   as   one   would   expect,   even   though   the   positive   sign   on   “prof”   could   be  

Page 27: Bachelor Project42

Chapter 4. Results

Page 19 of 31

discussed.   The   negative   sign   on   “family”   does   not   meet   the   expectations.   One   would   have  

expected this coefficient to be positive (7)(9)(10), however the coefficient is strongly negative.

Page 28: Bachelor Project42

Chapter 4. Results

Page 20 of 31

Table 3: Estimation results using different model specifications. When using the glogit regression, one must specify the number of positive values and then the total population as the LHS of the regression. gw (games won) is therefore the number of positive values out of the entire number of gp (games played). (1) (2) (3) (4) (5) (6) (7) gw gw gw gw gw gw gw age -0.00937 -0.0117* -0.00791 -0.0149* -0.0121* -0.0101 -0.0117* (-1.51) (-2.18) (-1.32) (-2.60) (-2.25) (-1.85) (-2.13) exp 0.0350* 0.0443** 0.0409** 0.0369* 0.0440** 0.0475*** 0.0441** (2.36) (3.34) (3.05) (2.62) (3.34) (3.54) (3.30) sp 0.234* 0.253** 0.247* 0.237* 0.232* 0.261** 0.255** (2.47) (2.68) (2.64) (2.53) (2.44) (2.77) (2.64) job -0.186 -0.177 (-1.37) (-1.39) prof 0.165 0.147 (1.57) (1.48) abroad 0.122 0.142 (1.11) (1.37) family -0.111 -0.114 (-1.11) (-1.23) usa_player -0.0624 -0.00253 (-0.65) (-0.03) _cons 0.545** 0.594*** 0.565** 0.642*** 0.506** 0.586*** 0.594*** (2.97) (3.50) (3.33) (3.76) (2.81) (3.48) (3.46) N 84 84 84 84 84 84 84 t statistics in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001

Page 29: Bachelor Project42

Chapter 4. Results

Page 21 of 31

Table 4 presents results for five more glogit specifications. In these specifications new

variables   as   “agesquare”  and   “expsquare”  and  an   interaction   terms   testing   for  a  potential  

premium of playing a deck containing a SP-engine, when a player has more experience are

added. These variables are added in turn to see their individual effect on the model. In

regression  (1)  the  variable  “agesquare”  was  added  to  test   if  the  negative  effect  ageing  was  

declining.  Furthermore  “prof”  was  added,  since  it  showed  to  have  a  significant  impact  at  5  %  

level. Regression (2) the  variable  “expsquare”  was  added  to  take  into  account  the  return  to  

experience.  “sp_exp”  was  added  in  regression  (3),  to check for the premium of playing with

an SP-engine when a player has more experience. In (4) all three additional variables were

added to the model, to see their combined effect on the model. The regression (5) is the

final specification. In this specification “sp_exp”  was  removed  due  to   insignificance. A test

for heteroscedasticity is applied in each specification to test for heteroscedasticity caused as

a function of the data matrix; however the null hypothesis of homoscedasticity was not

rejected in any of the specifications, indicating a model which is not in violation with the

assumption of the spherical error variance.

“agesquare”  has  a  very  small  positive  sign in all the specifications it was added, which

could indicate a small decline in the effect of ageing. The two age coefficients are individual

insignificant in the specifications were they are both present, however they are not dropped

due to the fact that they are jointly significant at the 5 % level. The coefficient for

“expsquare”  is  negative,  however  only  slightly.  This  indicates  that  the  return  to  experience  is  

decreasing.  Again  the  coefficients  “exp”  and  “expsquare”  are  individual  insignificant,  but  a  F-

test shows evidence for a jointly significance between the variables and they are therefore

not   dropped   from   the  model.   For   both   “age”   and   “exp”   it   applies   that  when   the   squared  

coefficient is added they become insignificant, but the two coefficients are then jointly

significant as previously stated. In  the  regressions  where  the  interaction  term  “sp_exp”  was  

added it turned out insignificant. The sign of the coefficient is negative, which suggests that

the premium of playing a SP-deck becomes less with a higher experience. The negative sign

was somehow expected, due to the fact that such a deck could be some kind of an autopilot

deck for relatively new players.(4) On the other hand a positive sign would not have been

surprising either, since this deck also is considered as a rather complicated deck, which

requires skill to play. (7)

Page 30: Bachelor Project42

Chapter 4. Results

Page 22 of 31

Table 4: Estimation results when adding  “agesquare”,  “expsquare”  and  the  interaction  term  “sp_exp”. (1) (2) (3) (4) (5) gw gw gw gw gw age -0.0586 -0.0145* -0.0138* -0.0528 -0.0528 (-1.99) (-2.53) (-2.37) (-1.78) (-1.77) agesquare 0.000626 0.000567 0.000549 (1.51) (1.35) (1.31) exp 0.0356* 0.114 0.0524** 0.121* 0.101 (2.55) (1.97) (2.65) (2.00) (1.73) expsquare sp

0.229*

-0.00614 (-1.37) 0.207*

0.388*

-0.00541 (-1.20) 0.368*

-0.00519 (-1.15) 0.204*

(2.44) (2.16) (2.37) (2.27) (2.13) prof 0.233* 0.149 0.148 0.228* 0.224 (2.05) (1.51) (1.50) (2.02) (1.97) sp_exp -0.0297 -0.0327 (-1.12) (-1.26) _cons 1.227** 0.486* 0.544** 0.927* 1.023* (2.90) (2.38) (2.84) (2.00) (2.24) N 84 84 84 84 84 t statistics in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001

Page 31: Bachelor Project42

Chapter 5. Discussion

Page 23 of 31

Chapter 5. Discussion

This chapter summarizes the results from the different specifications of the model tested in

the last chapter. First  there  is  found  evidence  for  age  having  a  negative  effect  on  a  player’s  win-

rate. In all specifications where  “agesquare”  was  added  it  showed  that  the  effect  of  ageing  was  

declining until a player reached the end of his or her forties. At this point the negative effect

ageing reaches a minimum. It is a minimum due to the positive sign of  “agesquare”.  This is in line

with the theory and the claims from the interviews. Generally younger players who had a few

years in master are doing best. These players still have a high motivation for playing the game

and become a champion. (8) (9) The first years in masters they are still learning and have not reach

their full potential.(6)(7) Players at that age usually also work less and are going to college and

therefore usually have some more free time at hand. (4) When getting older a player have more

responsibilities and the interest starts going away and thereby  the  player’s  win-rate decrease.

The regressions have also shown that experience has an explanatory power on the win-rate.

However the return to experience is declining as a consequence of the negative sign of

“expsquare”, which also supports the basic theory of playing a game for a longer time, makes a

player better.

In all specifications used the  variable  “sp”,  donating  if  a  player  played  with  a  deck  containing  

a SP-engine,   showed   a   positive   significant   impact   on   a   player’s   win-rate. This supports the

findings in the interviews where a lot of reasons were given for SP-decks being so strong. Some

pointed that it was not necessarily the best decks out there, but they were straight forward to

play and players already had the cards. (7) (9) The fact that this deck was more or less carried over

from the season before made it easy for the players to construct and play this deck. While many

other decks still had to be formed, SP-decks were already formed and with the release of new

cards, a player could often just grab one or two new cards and the deck was ready to go. Since

this made a huge amount of players play the deck, it was obvious that it also would take a lot of

the top stops at tournaments.(7) Others also considered these decks to be some kind of an auto

pilot deck, so even a relatively new player could play it. On the other hand while most other

decks were limited to one or two strategies, SP-decks offered a lot of different strategies so

player’s   with   more   experience   could   shift   the   strategy   from   game   to   game. (4) (8) Some also

viewed the decks to be rather complicated decks and therefore takes time to learn, due to the

Page 32: Bachelor Project42

Chapter 5. Discussion

Page 24 of 31

amount of different strategies a player could use. So these decks required some skill

of the player in order to generate a good win-rate. (5) (6) Furthermore no evidence is found for

earning  a  ‘premium’  playing  SP-decks with more experience.

The results do not support the hypothesis that when a player has other family members

playing, that player will have a higher win-rate. The coefficient point is negative, but however

insignificant. Interviews support an increased win-rate if a player has other family members

playing, since the support will be higher. There is an easier access to resources, since the family

then is more likely to spend money on the game and they also tend to travel more to

tournaments and thereby get to play more games. (5) (12) The fact that a player can wake up in the

morning and play straight away with another family member also helps. When a player only can

play in the Pokémon league once a week or with friends, the player often gets to play fewer

games than if other family members played. (6) On the other hand interviews also pointed out

that it was mainly younger players who had benefit from having family members playing. This

could very well explain the negative findings, since it is only master players who are considered

in this paper. Typically having family members playing are good in the beginning when getting

into the game and when it is younger players who are considered. (7) (12)

The  theory  of  a  full   job  negatively  affecting  a  player’s  win-rate is supported by the results;

however the connection is not significant. When a player is no longer studying and is going to

have a regular full time job, that player typically would have less time to play than before. A job

is a bigger responsibility and you have to perform in order not to lose your job, whereas when

studying the player can slack off a bit and still pass. (4) (6) Also  when  studying  the  player’s  brain  is

used to be occupied with many different things and is used to think hard, which can help when

then playing the game. (5) (8)

Results reveal a positive connection between being a Pokémon professor, however the

connection is not strong since it is significant on a level between 5 and 10 %. This small positive

connection is not surprising, since some have pointed out that there might be a small difference.

This is because the player would know exactly what his or her cards do and what the different

penalties are for making a mistake during the game. (4) (6) On the other hand most would have

expected no connection between the professor status and the win-rate. This is due to the fact

that many judges and professors not necessarily know all the strategies and deck building even

though they know the rules very well. The professor status is also not only about knowing the

Page 33: Bachelor Project42

Chapter 5. Discussion

Page 25 of 31

rulings, but also about how to handle and help players, how to judge games and knowing the

other mechanics of the game. (5) (7) (8)

Results  support   the  theory   that  playing  abroad   increases  a  player’s  win-rate. However the

connection is not significant. When playing abroad the players gets more exposure to different

decks and play styles. A player might see a card combination he or she not would have thought

about and thereby get a broader view of the game. (4) (5) (9) Players who play abroad also get more

used to handle stress at big tournaments and players also find out where there strategy is good

and where it is weak.(7) (8) Furthermore when a player chooses to play abroad he or she often

already has a big commitment to the game. (6) One could also argue that it is a case of the

chicken and the egg, who came first? Are they better players because they play so much or do

they play so much because they win? However it both contributes to a higher win-rate. (12)

Lastly, no evidence is found that if being from the USA would affect   a   player’s   win-rate,

which is in line with the expectations. The coefficient is slightly negative indicating a negative

relationship, however it is highly insignificant.

Page 34: Bachelor Project42

Page 26 of 31

Page 35: Bachelor Project42

Chapter 6. Conclusions

Page 27 of 31

Chapter 6. Conclusions

When a person plays a game whether it is Pokémon TCG or another game, that person will

often ask himself, how to achieve most possible wins. This aim of this research was to look at

what factors play an important role when playing Pokémon TCG, so players can get a deeper

understanding of what would make them better players and help them achieve the highest

possible win-rate. Analyzing a game like Pokémon TCG is not easy due to the regularly

increase/decrease in the card pool, however understanding general factors that affect  a  player’s  

win-rate is an important step towards becoming a better player. In the season 2010/2011 data

through surveys and interviews were conducted and together with techniques known from

econometric  theory  a  picture  of  what  affect  a  player’s  win-rate could be drawn.

From  the  research  it  can  be  concluded  that  ageing  has  a  negative  effect  on  a  player’s  win-

rate, however it does not prove to have a strong effect. Also the effect is declining and will be at

a minimum when a player reaches the end of his or her forties.

Regarding the effect of experience it is found to have a positive effect in every specification,

which proves the hypothesis stated, yet the return to experience is decreasing, so the marginal

effect of one extra year of experience gets lower.

In the case of playing with a SP-engine, it is proven to have a strong positive impact on a

player’s win-rate. It can then be concluded that a player who mainly played decks containing a

SP-engine would have a significantly higher win-rate, than players who did not. However no

‘premium’  of  playing  with  a  SP-engine could be shown.

It is also possible to conclude that having family members playing, having a regular full time

job or playing  abroad  during  the  season  had  the  expected  effect  on  a  player’s  win-rate, yet none

of them proved to be significant resulting in a rejection of all the corresponding hypotheses.

The Pokémon professor status turned out to have a positive impact on the win-rate, making

a player who is a Pokémon professor achieve a slightly higher impact. Since this result only is

significant between 5 and 10 % the hypothesis can only be partly proven. There is a positive

effect, however it is rather small.

Page 36: Bachelor Project42

Chapter 6. Conclusions

Page 28 of 31

Last but not least no effect of being a player from USA has been found which is consistent

with the hypothesis stated and it is therefore not rejected.

Page 37: Bachelor Project42

Chapter 7. Future work

Page 29 of 31

Chapter 7. Future work

The results obtained during this research showed which factors that had a significant impact

on  a  Pokémon  TCG  player’s  win-rate in the season 2010-2011. It showed that techniques known

from econometrics   are   appropriate   to   describe   a   Pokémon   TCG   player’s  win-rate. The model

described in this paper can also be used for future season, however not with the exact same

choice of variable, since some of them will not be relevant.

Further extension to the model can also be made. It could be interesting to check for other

effects on the win-rate such as if a player played regularly in a Pokémon league, the motivation

of  a  player  or  a  player’s  innate  ability.  However  a  variable  like  “innate  ability”  would  be  hard  to  

measure and a proxy variables ought to be used such a variable reflecting a score on an IQ test.

Caution should be made when adding such a variable since problems with endogeneity can

arise.

Regarding the data gathering process in general, a larger sample could be collected in

future studies in order to obtain more precise estimates.

Page 38: Bachelor Project42

Page 30 of 31

Page 39: Bachelor Project42

Page 31 of 31

References

1. Play Pokémon. The Official Pokémon Website. www.pokemon.com. [Online] 25 April 2011. [Cited: 4 September 2011.] http://www.pokemon.com/us/news/op_bw_modifiedformat-2011-04-25/.

2. Bragonier, Danny. Statistical Analysis of Texas Holdem Poker. California State Polytechnic University. [Online] Spring 2010. [Cited: 23 August 2011.] http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1006&context=statsp&sei-redir=1#search=%22Statistical%20Analysis%20Texas%20Holdem%20Poker%20California%20State%20Polytechnic%20University%22.

3. Nguyen, Duy. Regression Analysis on Poker Models. Texas Christian University. [Online] 12 April 2010. [Cited: 3 November 2011.] http://wwwstu.tcu.edu/duynguyen/Regression%20Analysis%20on%20Poker%20Models.pdf.

4. Craig, Heidi. Interview #1. [interv.] Steffen Eriksen. 11 August 2011.

5. Nelson, David. Interview #4. [interv.] Steffen Eriksen. 11 August 2011.

6. Wittenkeller, Josh. Interview #5. [interv.] Steffen Eriksen. 11 August 2011.

7. Rountree, Ives. Interview #2. [interv.] Steffen Eriksen. 11 August 2011.

8. Ceolin, Andrea. Interview #7. [interv.] Steffen Eriksen. 12 August 2011.

9. Sucevich, Kyle. Interview #8. [interv.] Steffen Eriksen. 14 August 2011.

10. Greene, William H. Econometric Analysis 6 Edition. s.l. : Prentice Hall, 2007.

11. Hayashi, Fumio. Econometrics. s.l. : Princeton University Press, 2000.

12. Kamada-Fujii, Doreen. Interview #6. [interv.] Steffen Eriksen. 11 August 2011.

13. Pokémon Organized Play. The Official Pokémon Website. www.pokemon.com. [Online] [Cited: August 23, 2011.] http://www.pokemon.com/us/organized-play/tournaments/rules/.

14. Pokébeach. Pokémon Card Search: Rising Rivals. www.pokébeach.com. [Online] [Cited: 13 November 2011.] http://pokebeach.com/tcg/rising-rivals/scans.

15. PokéBeach. Pokémon Card Search: Platinum. www.pokébeach.com. [Online] [Cited: 13 November 2011.] http://pokebeach.com/tcg/platinum/scans.

Page 40: Bachelor Project42
Page 41: Bachelor Project42

Appendix A – Data collection

1. Article on www.Sixprizes.com

Below a copy of the article published on www.sixprizes.com is given. For seeing the original

article visit: http://www.sixprizes.com/uncategorized/dane-bachelor-project-pokemon-tcg/

Dane Bachelor Project on the Pokemon TCG – Need Your Help! Written by Steffen Eriksen | June 4, 2011 | 13 comments | 1,034 views | Rating: +8

Hello everyone!

Since this is my first article here on SixPrizes I will start off by introducing myself. My name is Steffen Eriksen and I am currently doing my bachelor in Mathematic-Economics in Denmark. I have been playing Pokémon TCG for around 6 years and have played tournaments in many countries.

The article is about the data collection process for my bachelor, which I hope you will be a part of.

Some time ago I got an idea about a so-called “win-rate”  model for the Pokémon TCG, so I asked my university if I could write my bachelor project about this model and I got a yes! So now I have this huge opportunity to write my project about my hobby, which is really cool!

To explain a little more about what this win-rate model, it is a model where I try to explain what will affect a Pokémon  player’s  win-rate. What I mean when I say win-rate, is how many percent of your premier rated games you have won.

The methods I will use to estimate such a model is generally different types of regression models, which is borrowed from econometrics.

This model I can then use to answer some hypothesis I will state in the beginning of my paper. Such a hypothesis could be: Will playing the game for a longer time give you a higher win-rate?

Page 42: Bachelor Project42

So I will with this model maybe be able to answer some pretty interesting hypotheses about the game.

Before I can even start thinking about setting up such a model I need data from Pokémon players. Unfortunately I cannot just use data from every player out there. I have to make some certain conditions in order to make my model valid. I will state these 3 conditions and if you do not fulfill all 3 of the conditions below I will not be able to use your contribution.

Condition 1 – Masters Division Only

I have chosen only to make this model for players in the Masters divison. This is done because in different age division there might be different decks which do well. For example in  the  Junior  division  a  “Speed  Jumpluff”  deck  might  do  well.

However I do not think that such a deck will do well in the master division. So such a sample will give a misleading picture of my model and maybe show that the highest win-rate will be achieved  with  a  “Speed  Jumpluff”  deck,  when   it   in   reality  not   is.   (Sorry   to  all   Jumpluff  fans out there).

Condition 2 – Results from This Season Only – Prior to B/W

I am only working with the current season. So all questions below assume the current season. It is then your tournament record from the 1st of September until now that counts.

If you have already played with B&W and the new rules (e.g. in the USA) then it is your tournament record from Regionals and back that counts.

The reason why I will not take B&W into account is because it will have too big of an impact on the decks that are being played and also increase the luck factor even more.

Condition 3 – You must have played at least 25 Premier Rated games

You have to have played at least 25 premier rated games this season. The reason for this is that with a low number of games played will give a more unstable win-rate. A single loss will lower your win-rate too much when your number of games played is below 25.

Then you might ask yourself, why 25 and not 30, 35, or 42 (which is the answer to everything). I just have to pick a suitable number. Picking a to low number will give unstable win-rates. Picking a too high one will result in too few samples.

So after stating these 3 conditions I can now go on to the actual questions. All questions are simple and can be answered right away, except for the last 2, which requires you to log onto

Page 43: Bachelor Project42

your Pokémon account on Pokemon.com, unless of course you can remember by heart how many games you have played this season and how many you have won.

You also have to answer all the questions; otherwise your contribution will not be valid.

Here are the questions:

1. What country are you from? 2. How old are you? 3. How many years have you played the game? (approximately) 4. What type of deck have you used the most this

season? (Examples: LuxChomp, BlazeChomp, VileGar, LostGar, DialgaChomp, MagneRock, Gyarados,  MewDos,  etc…)

5. Do you have a full-time job? (Full-time student does not count as a job for this question.)

6. Are you a Pokémon Professor? 7. Have you played a Premier Rated tournament

abroad this season? (For players in the US, I count playing in a different state as playing abroad.)

8. Do you have family members who also play the Pokemon TCG?

9. How many Premier Rated games have you played this season? (You can check on your My Pokemon account.)

10. How many of those games have you won? (This you can check on your my-pokemon account.)

E-mail your answers to: [email protected]

Or simply fill out my survey here: http://www.surveymonkey.com/s/8GWP8F9

Thanks so much for your help!

Image Credits: S.S. Anne Pokemon, PokeBeach, Pokegym, and Pokemon Paradijs

Page 44: Bachelor Project42

2. Interview Questions

First the person who was interviewed was asked to talk about his or her experience with

the game. They were then asked to answer the following questions:

1. You see a lot of boys playing this game. Why do you think that so few girls play this

game?

2. Do you think that having other family members playing helps a player achieving a higher

win-rate?

3. When you look at all the master players, at what age do you think a player will peak and

why?

4. Do you think it will make a big difference to your win-rate if your studying or having a

full time job?

5. Do you think a person who is a Pokémon professor performs better than a player who is

not a Pokémon professor?

6. Looking more at the decks played in the past season. Do you think people who played a

deck containing a so called SP-engine performed better?

7. Can you point out a card or two that you think really made a difference if a player

decided to play that card in his or her deck?

8. A lot of players travel across country boarders and in the case of the USA state boarders

to play tournaments. Do you think that a player that does so ends up with a better win-

rate?

3. Hand out at Dutch Nationals and Danish Nationals Dear Pokémon player -Are you playing in the master division? -Have you played +25 premier rated matches this season? If so, then please help a fellow Pokémon player with his bachelor project by filling out a survey on: http://www.surveymonkey.com/s/8GWP8F9

Page 45: Bachelor Project42

Appendix B – Scan of cards contained in a SP-engine

Below card scans of card found in a typical SP-engine are presented:

Figure 1: Card  scan  of  “Cyrus’s  Conspiracy” (13)

Figure 3: Card  scan  of  “Power  Spray” (13)

Figure 2: Card  scan  of  “SP  Radar” (14)

Figure 4: Card  scan  of  “Energy  Gain” (13)

Page 46: Bachelor Project42

Figure 5: Card  scan  of  “Poké  Turn” (13)

Page 47: Bachelor Project42

Appendix C – Omitted variable bias

To explain more about the consequences of omitting important variables suppose that the

logit_wr is determined by:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 = 𝛽 + 𝛽 𝑎𝑔𝑒 +  𝛽 𝑒𝑥𝑝 + 𝜀   (C.1)

Where 𝐸(𝜀 |𝑎𝑔𝑒 , 𝑒𝑥𝑝 ) = 0.

Suppose that exp (experience) was not observed and the following model was estimated

instead:

𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 = 𝛽 + 𝛽 𝑎𝑔𝑒 + 𝑣   (C.2)

Where 𝑣 = 𝛽 𝑒𝑥𝑝 +𝜀

Notice that the correlation between the variable age and the error term is no longer zero i.e.:

𝐸(𝑣 |𝑒𝑥𝑝 ) = 𝐸(𝛽 𝑒𝑥𝑝 +𝜀 |𝑎𝑔𝑒 ) = 𝛽 𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) ≠ 0

This is because 𝑒𝑥𝑝 and 𝑎𝑔𝑒 are positively correlated (𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) > 0). In the case of (C.2) the

assumption of strict exogeneity is violated.

The bias of the omitted variable can be described in the following way. Let:

𝛽 be the estimator of 𝛽 from a simple regression of 𝑙𝑜𝑔𝑖𝑡_𝑤𝑟 on 𝑎𝑔𝑒 (see equation (C.2)).

𝑦 = 𝑙𝑜𝑔𝑖𝑡_𝑤𝑟, 𝑥 = 𝑎𝑔𝑒 and 𝑥 = 𝑒𝑥𝑝 Model (C.1) can then be rewritten as follows:

𝑦 =  𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝜀 (C.3)

It can then be shown that:

𝐸 𝛽 𝑥 , 𝑥 = 𝛽 + 𝛽 𝛿 (C.4)

Where 𝛿 is the estimate for  𝛿 from the following regression:

𝑥 = 𝛿 + 𝛿 𝑥 + 𝜍 (C.5)

Equation (C.4) implies that the omitted variable bias is equal to:

𝑏𝑖𝑎𝑠 𝛽 𝑥 , 𝑥 =  𝐸 𝛽 𝑥 , 𝑥 − 𝛽 = 𝛽 𝛿 (C.6)

Page 48: Bachelor Project42

From equation (C.6), there can be two cases where the estimator 𝛽 is unbiased.

Were 𝛽 =0

In this example experience has no impact on the win rate.

𝛿 =0 this is equivalent to the variable 𝑥 (age) is uncorrelated with the omitted

variable 𝑥 (experience)

In the example stated it is unlikely that any of those conditions would happen. One would expect

that: 𝛽 >0 and 𝛿 >0 (Positive correlation between age and experience)

According to equation (C.2) and the bias formula presented in (C.6) the estimator 𝛽 is biased

upwards because:

𝐸(𝑣 |𝑎𝑔𝑒 ) = 𝐸(𝛽 𝑒𝑥𝑝 + 𝜀 |𝑎𝑔𝑒 ) = 𝛽 𝐸(𝑒𝑥𝑝 |𝑎𝑔𝑒 ) > 0 (C.7)

This problem can be partly solved with the use of panel data. However panel data is not possible

because there is a format change at the end of each season as argued earlier in the introduction of

this paper.

Page 49: Bachelor Project42

Appendix D – Heteroscedasticity tests

This appendix presents the result of the heteroskedasticity tests performed on all the

specifications used in this paper. The test applied is Breusch-Pagan / Cook-Weisberg test for

heteroskedasticity, with the null hypothesis of constant variance.

Table 5: Results of heteroscedasticity performed all specifications in this paper.

Specification Results from the Breusch-Pagan / Cook-Weisberg test for Heteroskedasticity (prob>chi2)

Collusion of the Breusch-Pagan / Cook-Weisberg test (null hypothesis: constant variance)

Table 3: Regression (1)

0.1222 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (2)

0.5557 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (3)

0.3345 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (4)

0.4037 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (5)

0.5535 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (6)

0.3373 Fail to reject the null hypothesis at the level 5%

Table 3: Regression (7)

0.6818 Fail to reject the null hypothesis at the level 5%

Table 4: Regression (1)

0.3254 Fail to reject the null hypothesis at the level 5%

Table 4: Regression (2)

0.5480 Fail to reject the null hypothesis at the level 5%

Table 4: Regression (3)

0.3667 Fail to reject the null hypothesis at the level 5%

Table 4: Regression (4)

0.3928 Fail to reject the null hypothesis at the level 5%

Table 4: Regression (5)

0.4572 Fail to reject the null hypothesis at the level 5%