stochastic modeling of irrationality in normal-form games

8/18/2019 Stochastic Modeling of Irrationality in Normal-Form Games

http://slidepdf.com/reader/full/stochastic-modeling-of-irrationality-in-normal-form-games 1/26

Stochastic Modeling of Irrationality in

Normal-Form Games

Ben Zod

March 31, 2016

Abstract

Richard McKelvey and Thomas Palfrey introduced a statisticalgeneralization of the Nash equilibrium solution concept that accountsfor irrationality or lack of information among players of a game. Theirmodel, quantal response, clouds an action’s perceived utility with sta-tistical noise resulting in continuous movement of equilibria as payoffschange, and often describes empirical results with greater accuracythan does Nash’s solution concept. This paper examines this modeland its origins, and finds a weakness caused by its stochastic specifica-tion. Quantal response equilibria exhibit inflexibility when adapting

to the appearance of an additional strategy, and can be manipulatedto yield counter-intuitive results.

1 Introduction

Before Mckelvey and Palfrey (1995) introduced the quantal response modelfor normal-form games, the field of game theory focused on the decision-making of intelligent and rational subjects. The field examined how perfectlyrational players would act in situations involving conflict, cooperation, andstrategy. Mckelvey and Palfrey relax the assumptions of intelligence and

rationality, and apply a stochastic model to normal-form games.Traditional game theory, pioneered by John Nash, defines equilibrium in

a game as a situation in which each player’s strategy is the best responseto every other player’s strategy. That is, after revealing the outcome, eachplayer could not have done better by changing strategies, given every other

1



player’s strategy remains constant. In normal-form games, this solution con-

cept results in pure strategy equilibria, in which players always play one of their strategy choices, and mixed strategy equilibria, in which players mixbetween their available options according to some probability distribution.Nash’s solution concept shows exactly how perfectly intelligent and rationaldecision makers act. However, humans often know varying levels of informa-tion, and vary in their rationality. For this reason, empirical data of humansplaying games often does not align well with the Nash equilibrium predic-tions, as shown in [8].

The assumption of perfectly rational and knowledgeable players is not onethat fits well with human players. For example, imagine a subject is askedto participate in an experiment in which she is provided two light bulbs

of differing brightness, and is asked to write down which bulb is brighter.The equivalent Nash solution concept to this experiment would be that thesubject would always know which bulb is brighter, and would also write downthe brighter bulb, regardless of how close in brightness they are. However,this is not something we expect from a human subject. If one light bulbis much brighter than the other, we expect the subject to answer correctlyclose to 100% of the time. However, as the light bulbs get closer and closerin brightness, we expect the subject to be more likely to make a mistake.That is, as the difference in brightness between the two bulbs decreases, theprobability of the subject answering correctly also decreases. Thus, instead

of “best responding,” the subject is making an educated guess based on theinformation provided. This can be thought of as “better responding.”Quantal response equilibria attempt to model this behavior of ”better

responding” by assigning the probability of a player playing an action as pro-portional to the difference between the expected utility of that action againstthat of every other possible action. In the light bulb example, this meansthat the probability that the subject answers bulb one as being brighter isproportional to the difference between the true brightness of bulb one andtwo. If bulb one is much brighter than bulb two, the subject answers bulbone with probability close to 1. If bulb one is just a small amount brighterthan bulb two, the subject answers bulb one with probability close to 0.5 1.

This paper will first explore individual choice models, closely examining1The light bulb example is an experiment of individual choice. Although the quantal

response model is designed for and used for games of more than one player, the individualchoice example is presented for simplicity and understanding the core concept.

2



Duncan Luce’s (1959) choice axiom, its results, and its limitations. The logic

applied to individual choice decisions will then be expanded to the realm of game theory, where McKelvey and Palfrey’s quantal response equilibriumconcept will be more carefully defined, calculated, and analyzed. Followingthis, I will delve into the limitations of the quantal response model, thesituations in which it succeeds and fails, and in which areas it can improveas a model that predicts behavior of subjects with varying levels rationality.

2 Individual Choice

2.1 Luce’s Choice Model

All theorems and lemmas in this section are attributed to Duncan Luce(1959). For further information and proofs of the results below, see [6].

Throughout this section we will suppose that a universal set U is given,and should be interpreted as the universe of possible alternatives. In deci-sions, the decision maker will have to be able to evaluate the elements of U according to some preference specification and select elements from certainsubsets of U . Now, let T be a finite subset of U , and suppose that an elementmust be chosen from T . If S is a subset of T (S ⊂ T ), let P T (S ) denote theprobability that the selected element lies in S. If x is an element of T (x ∈ T ),let P T (x) denote the probability that the selected element is x.

With notation defined, we can specify the three probability axioms thatform a foundation of any probabilistic study.

The three axioms of probability

(i) For S ⊂ T , 0 ≤ P T (S ) ≤ 1

(ii) P T (T ) = 1

(iii) If R, S ⊂ T and R ∩ S = ∅, then P T (R ∪ S ) = P T (R) + P T (S ).

Note: If we repeatedly apply part (iii), we see that

P T (S ) = x∈S P T (x);

therefore, it is always sufficient to state results just for P T (x).

3



Moving forward, we will be using probabilities that a certain element of

a set is selected over all other options in that set. That is, the probabilitythat x is chosen from the menu {x, y} is formally written as P {x,y}(x). Forconvenience, from here on out this probability will be written as P (x, y),assuming x = y. Thus, P (x, y) > 1

2 when x is preferred to y, and it is always

true that P (x, y) = 1 − P (y, x) and P (x, x) = 12 .

The probability axioms establish constraints on the measures P T , butthere are no assumed connections among the several measures. Luce sus-pected that, at least for choices, complete independence among the severalmeasures would be a naive assumption. His proposed relationship is referredto as Luce’s choice axiom, and is the foundation for this section.

Axiom 2.1. Let T be a finite subset of U such that, for every S ⊂ T , P S isdefined.

(i) If P (x, y) = 0, 1 for all x, y ∈ T , then for R ⊂ S ⊂ T P T (R) = P S (R)P T (S );

(ii) If P (x, y) = 0 for some x, y ∈ T , then for every S ⊂ T P T (S ) = P T −{x}(S − {x});

Interpretation. Part (i) of the axiom states that the probability thatthe selected element of the menu T lies in R is exactly the probability thatthe selected element of the menu S lies in R, multiplied by the probability

the selected item is even in the menu S , where T contains S, which in turncontains R. Part (ii) essentially says that if in fact the decision maker wouldnever choose the element x over y, where x and y are elements of T , thenthe probability that the selected item of the menu T lies in S is the same asif the element x did not exist in T or S .

Lemma 2.2. If P (x, y) = 0, 1 for all x, y ∈ T , then axiom 1 implies that for

any S ⊂ T such that x, y ∈ S ,

P (x,y)P (y,x) = P S (x)

P S (y) .

The importance of this lemma is that it implies that when axiom 1 holdsfor T and its subsets, the ratio P S (x)/P S (y) is independent of S.

Luce’s first theorem formally establishes that, assuming axiom 1 holds,all the probabilities are determined by the pairwise probabilities.

4



Theorem 2.3. If axiom 1 holds for T and if P (x, y) = 0, 1 for all x, y ∈ T ,

then

P T (x) = 1

y∈T

P (y,x)P (x,y)

= 1

1+

y∈T −{x}

P (y,x)P (x,y)

His next theorem shows that axiom 1 also demands that the pairwiseprobabilities meet certain constraints.

Theorem 2.4. If axiom 1 holds for {x,y,z } and if none of the pairwise dis-

criminations is perfect 2, then

P (x, y)P (y, z )P (z, x) = P (x, z )P (z, y)P (y, x).

Corollary 2.5. Under the conditions of the theorem,

P (x, z ) = P (x,y)P (y,z)

P (x,y)P (y,z)+P (z,y)P (y,x)

If each of {x, y}, {y, z }, and {x, z } are offered to a subject just once,his choices are governed by the given probabilities, and they are statisticallyindependent, then P (x, y)P (y, z )P (z, x) is exactly the probability that hischoices imply the intransitivity x > y > z > x. Among other things thatwill become apparent later, Theorem 2.4 ensures that, if axiom 1 holds, thisprobability is the same as the probability of x > z > y > x.

In what follows, Luce shows that for situation in which pairwise choicediscrimination is imperfect, axiom 1 implies the existence of a ratio scale thatis unique except for its unit, as well as being independent of any structuralassumptions on the set of alternatives. This theorem lays the groundworkfor the quantal response probabilities, discussed in the next section.

Theorem 2.6. Suppose that T is a finite subset of U, that P (x, y) = 0, 1 for

all x, y ∈ T , and that axiom 1 holds for T and its subsets, then there exists a

positive real-valued function v on T, which is unique up to multiplication by

a positive constant, such that for every S ⊂ T

P S (x) = v(x)

y∈S

v(y)

Proof. Define v(x) = kP T (x), where k > 0; then by part (i) of axiom 1, andpart (iii) of the probability axioms, we have

2That is, P (a, b) = 0, 1 for any a, b ∈ T .

5



P S (x) = P T (x)P T (S )

= kP

T (x)

y∈S

kP T (x)

= v(x)

y∈S

v(y)

so existence is ensured.To show uniqueness, suppose that v is another such function: then for anyx ∈ T

v(x) = kP T (x) = kv(x)

y∈T

v(y).

Let k = k/

y∈T

v(y), and we have v(x) = k v(x), which concludes the proof.

What Luce has shown here is the following: Say we are confined to alocal region T in which all pairwise discriminations are imperfect. Supposewe also know that the several probability measures are related to one anotherin accordance to axiom 1, such that P S acts like a conditional probabilityrelative to P T . Now, what has been shown is that the distribution P T (x) canbe interpreted as a particular choice of unit of a ratio scale over T. Thesescales can be extended throughout U, having important implications.

In the third chapter of Individual Choice Behavior , Luce applies his axiom

and its results to utility theory. If we let A be the set of pure alternativesand E the set of chance events, then aρb, where a, b ∈ A and ρ ∈ E , isthe uncertain alternative where a is the outcome if ρ occurs, and b is theoutcome if ρ does not occur. The symbol QD is introduced for the subsetsD of E to describe the probability that an element from D is most likely tooccur, according to the subject. Luce’s choice axiom is assumed to hold forthe families {P T } and {QD}.

The idea is that aρb will be preferred to aσb, where σ ∈ E , if and only if one of the following is true

(1) a is preferred to b and ρ is considered more likely than σ, or

(2) b is preferred to a and σ is considered more likely than ρ.

A preference structure following this rule is considered decomposable, andcan be written as

P (aρb, aσb) = P (a, b)Q(ρ, σ) + P (b, a)Q(σ, ρ) for a, b ∈ A and ρ, σ ∈ E .

6



2.2 Debreu’s Critique

In 1960, Gerard Debreu published a brief review of Luce’s Individual Choice Behavior (see [2]). After summarizing Luce’s axiom and its results and ap-plications, Debreu focuses his attention on a simple example that illustratesa serious limitation to the applicability of the axiom.

Debreu gives the set U the following three elements:

DC , a recording of the Debussy quartet by the C quartet

BF , a recording of the eighth symphony of Beethoven by the orchestraconducted by F,

BK , a recording of the eighth symphony of Beethooven by the B orchestraconducted by K

The subject will be given the choice of a subset of U, and will listen tothe recording he has chosen. We see the following actions by our subject:

• When presented with {DC , BF } he chooses DC with probability 3/5.

• When presented with {DC , BK } he chooses DC with probability 3/5

• When presented with {BF , BK } he chooses BF with probability 1/2

We can interpret this as the subject preferring the Debussy quartet by

the C quartet to either of the Beethoven options, and being indifferent towho is conducting Beethoven’s eighth symphony. But what happens if thesubject is presented with all three options? According to the axiom,

• When presented with {DC , BF , BK }, the subject must choose DC withprobability 3/7, and BF and BK with probability 2/7 each.

This implies that, although the subject prefers Debussy to either indi-vidual Beethoven option, he prefers Beethoven when given all three optionsfrom which to choose. This is counter-intuitive, and presents a difficulty tothe application of Luce’s model of individual choice.

3 Quantal Response

Notation, definitions, and theorems in this section are credited to McKelveyand Palfrey (1995) (see [7]).

7



3.1 Notation

Consider a finite n-person game in normal form: There is a set N = {1,..,n}of players, and for each player i ∈ N , a strategy set S i = {si1,...,siJ i} con-sisting of J i pure strategies. There exists a payoff function , ui : S → R, foreach i ∈ N , where S =

i∈N S i.

Let ∆i be the set of probability measures on S i. Elements of ∆i are of theform pi : S i → R where

sij∈S i

pi(sij) = 1, and pi(sij) ≥ 0 for all sij ∈ S i.

We use the notation pij = pi(sij). We denote points in ∆ by p = ( p1,...,pn),where pi = ( pi1,...,piJ i) ∈ ∆i. We use sij to denote the strategy pi ∈ ∆i with

pij = 1, and we use the shorthand notation p = ( pi, p−i). Thus, (sij , p−i)represents the strategy where i plays the pure strategy sij, and all other

players play their components of p.The payoff function is extended to have domain ∆ by the rule ui( p) =s∈S p(s)ui(s), where p(s) =

i∈N pi(si). As defined originally by John

Nash (1950), a vector p = ( p1,...,pn) ∈ ∆ is a Nash Equilibrium if, for alli ∈ N and all pi ∈ ∆i, ui( p

i, p−i) ≤ ui( p).

Now, we write X i = RJ i , to represent the space of possible payoffs for

strategies that player i might adopt. We also let X =n

i=1 X i. The functionu : ∆ → X is defined as

u( p) = (u1( p), ..., un( p)),

where

uij( p) = ui(sij , p−i).

3.2 Quantal Response Equilibrium

Next, Mckelvey and Palfrey (1995) define a statistical version of Nash Equi-librium called quantal response equilibrium. In this version, each player’sutility for each action is subject to random error3. For each i and each

j ∈ {1,...,J i}, and for any p ∈ ∆, player i’s utility for playing action j isdefined as

3

There are multiple interpretations for this random error. The interpretation consideredin this paper is that players are not entirely rational, meaning the true utility they receiveis clouded by statistical noise, leading to occasional errors which decrease with an increasein rationality. An alternative interpretation is that players do in fact calculate the expectedpayoffs correctly, but have an additive payoff disturbance associated with each availablepure strategy.

8



uij( p) = uij( p) + εij

Each player’s error vector, εi = (εi1,...,εiJ i), is distributed according to a joint distribution, with density function f i(εi). E (εi) = 0, and the marginaldistribution of f i exists for each εij . f = (f 1,...,f n) is considered admissible

if, for all i, f i satisfies the above properties.An interpretation of this basic model is that each player receives a sig-

nal from each action, comprised of the true expected utility of that actionand some error. This results in actions with higher expected utilities beingselected more often, but not always. It also implies that as the difference intrue utility becomes greater, the better option is chosen more often.

For any u = (u1,..., un) with ui ∈ R ji for each i, the authors define the

ij-response set Rij ⊆ R

J i

byRij(ui) = {εi ∈ R

J i |uij + εij ≥ uik + εik ∀k = 1,...,J i}

Given p, each set Rij(ui( p)) specifies the region of errors that would causeplayer i to choose action j . Lastly, let σij(ui) be the probability that, givenu, player i will select strategy j, that is,

σij(ui) = Rij(ui)

f (ε)dε

Definition 3.1. Let Γ = (N,S,u) be a game in normal form, and let f be admissible. For any such f and game Γ = (N,S,u), a quantal response

equilibrium (QRE) is any π ∈ ∆ such that for all i ∈ N, 1 ≤ j ≤ J i,πij = σij(ui(π)).

The authors call σi : RJ i → ∆J i the statistical reaction function (orquantal response function) of player i. They then layout several results aboutstatistical reaction functions:

1. σ ∈ ∆ is nonempty

2. σi is continuous on RJ i

3. σij is monotonically increasing in uij .

4. If, for all i and for all j, k = 1,...,J i, εij and εij are i.i.d., then for allu, for all i, and for all j, k = 1,...,J i,

uij > uik ⇒ σij(u) > σik(u).

9



Property 3 compares the statistical best response function if one of player

i’s expected payoffs, uij has changed while every other component of ui hasstayed the same. The region Rij expands and each other uik decreases orremains the same. Property 4 states that the probability of different actionsare ordered by their expected payoffs. Together, these two properties meanthat it is more likely that better actions are chosen than are worse actions.

Properties 1 and 2 above imply the theorem below.

Theorem 3.2. For any Γ and any admissible f , there exists a QRE.

Proof. A quantal response equilibrium is a fixed point of σ ◦ u. σ ◦ u mustbe continuous on ∆, because the distribution of ε, whatever it may be, musthave a density. Thus, by Brouwer’s fixed point theorem, σ ◦ u has a fixed

point, meaning a QRE exists.

3.3 The Logit QRE

The most commonly used, easiest to work with, and conceptually understoodclass of quantal response functions is called the logistic quantal responsefunction. As will be apparent, the logistic function evolves directly fromLuce’s individual choice model.

For any given λ > 0, and for xi ∈ RJ i the logitstic quantal response

function is defined by

σij(xi) = eλxijJ i

k=1eλxik

and corresponds to optimal choice behavior if f i has an extreme value distri-

bution 4, with cdf F i(εij) = e−e−λεij−γ

, with independent eij ’s. If each playeruses a logistic quantal response function, the corresponding Logit Equilibrium

requires, for each i, j,

πij = eλxij

J i

k=1eλxik

where xij = uij(π), and the π ’s are equilibrium probability distributions.The set of possible logistic response functions is parameterized by theparameter λ. In our interpretation, this λ represents a player’s rationality,

4This function comes from a specific extreme value distribution called the Gumbeldistribution, as described in the papers of Emil J. Gumbel.

10



and is inversely related to the level or amount of error. When λ = 0, actions

are chosen completely randomly, and as λ → ∞, the amount of error goes to0. In fact, as we will soon show, in the case of λ → ∞, the Logit Equilibriumapproaches the Nash equilibrium of the underlying game.

For the purposes of showing such a result, we define the Logit Equilibrium

correspondence as the correspondence π∗ : R+ ⇒ 2∆ given by

π∗(λ) = {π ∈ ∆ : πij = eλuij(π)

J i

k=1eλuik(π)

∀i, j}

Theorem 3.3. Let σ be the logistic quantal response function. Let {λ1, λ2,...}be a sequence such that limt→∞ λt = ∞. Let { p1, p2,...} be a corresponding

sequence with pt ∈ π∗(λt) for all t such that limt→∞ pt = p∗. Then p∗ is a

Nash equilibrium.

Proof. Assume, for contradiction, that p∗ is not a Nash equilibrium. Thenthere is some player i and some pair of strategies, sij and sik, with p∗(sik) > 0,and ui(sij , p∗−i) > ui(sik, p∗−i). Equivalently, uij( p∗) > uik( p∗). Since u is acontinuous function, it follows that for sufficiently small ε there is a T suchthat for t ≥ T , uij( pt) > uik( pt) + ε.

But as t → ∞, σk(ui( pt))/σ j(u( pt)) → 0. Therefore pt(sik) → 0. But

this contradicts p∗(sik) > 0. Thus, our assumption that p∗ is not a Nashequilibrium is proven false, and the proof is complete.

3.4 An Example

Consider the following game:

Bob

H T

Alice H X, 0 0, 1

T 0, 1 1, 0

In tables such as this, each player plays one of their given strategies (in

this case, H and T on the left for Alice, Player 1, and H and T on the top forBob, Player 2), resulting in the payoffs from one of the boxes shown above.In each outcome box, the left number is Player 1’s utility gained from thatoutcome, and the right number is Player 2’s utility. For example, when bothplayers play ”T”, Alice gets 1 utility, and Bob gets 0 utility.

11



When X = 1, we have a game commonly known as ”Matching Pennies”.

The idea is that Alice wants to play H (or ”Heads”) if Bob is playing H , andwants to play T (”Tails”) if Bob is playing T . In other words, Alice wants toplay the same pure strategy as Bob. Bob on the other hand, wants to playthe opposite strategy of Alice. There are no pure strategy Nash equilibria tothis game. The mixed Nash equilibrium, determined by each player playingaccording to a probability distribution that makes the other player indifferentto either pure strategy, can be easily shown to be each player playing eachstrategy with probability 1/2.

But what happens when X = 1? Intuition would suggest that Alicemight want to play H more often as X increases. However, as a result of the way in which mixed Nash equilibria are calculated, Alice’s strategy will

not change in X , while Bob will play H less as X increases. This is slightlycounter-intuitive to human behavior, and empirical results confirm that thereis somewhat of an ”Own Payoff Effect”, in that the probability Alice plays H increases as X increases (Goeree, Holt, Palfrey 2002). This is well capturedby the logit quantal response model, as we will see shortly.

We now calculate the logit QRE for this game, with X as a variable. Let pij be the probability that player i plays strategy j , where i ∈{Alice, Bob},and j ∈ {H, T }, and let eij be the corresponding error term.

For Alice:EU 1(H ) = X · p2H + 0 · p2T and E U 1(T ) = 0 · p2H + 1 · p2T

Thus, according to our model, p1H = P(EU 1(H ) + eIH > EU 1(T ) + e1T )

= P(e1T − eIH < EU 1(H ) − EU 1(T ))

Because e1T − eIH is the difference between two independent error terms, wecan write it as a single error term for Player 1 (Alice).

p1H = P(e1 < EU 1(H ) − EU 1(T ))= P(e1 < Xp2H − p2T )

We also know, by the axioms of probability, that p2T = 1 − p2H , thus we have

p1H = P(e1 < Xp2H − (1 − p2H )= P(e1 < (X + 1) p2H − 1)= F [(X + 1) p2H − 1)]

12



Where F (x) = 11+e−λx

is the cdf of the difference between two extreme value

distributions.Following a similar method for Bob, we get

p2H = F (1 − 2 pIH )

Thus, our two equilibrium probabilities in this game are as follows:

p1H = 1

1+e−λ[(X+1)p2H −1] and p2H = 1

1+e−λ(1−2p1H )

We are left with a system with two equations, two unknowns, and param-eterized by λ, the irrationality parameter. There is no closed form solutionto these logit QREs. However, for a given λ, we can plot the two functions

against each other, with the intersection points being the equilibria.Using Mathematica, we can manipulate λ and trace the movement of the

equilibrium as λ moves from 0 to ∞. This progression is shown in Figures1, 2 and 3, for X = 4, 9, 19.

We can see that as X increases, so does p1H for a fixed λ. The smaller λthe bigger the increase in p1H is. The interpretation here is that a player whohas some level of irrationality will give in to the own payoff effect, in thatthey will more often play H as their payoff for (H ,H ) increases. The morerational a player is, the less they will give into this effect, and when a playeris perfectly rational (λ = ∞), they will not change behavior based on theirown payoff, and will play according to the Nash mixed strategy equilibrium.

We also see that as λ increases, the equilibrium probability that Bob playsH , ¯ p2H , strictly decreases, while the equilibrium probability that Alice playsH , ¯ p1H , increases and then decreases. The initial increase can be thoughtof as the Alice realizing how large her own payoff is for playing H as shegets a little bit rational. But as her rationality continues to increase, sherealizes that a rational Bob would play H infrequently, and so she adjustsaccordingly, resulting in a decreasing ¯ p1H in λ for high enough λ.

13



(a) λ = 0EQ = (0.500, 0.500)

(b) λ = 0.5EQ = (0.657, 0.461)

(c) λ = 1EQ = (0.723, 0.391)

(d) λ = 3EQ = (0.682, 0.251)

(e) λ = 10EQ = (0.568, 0.205)

(f) λ = 100EQ = (0.507, 0.200)

Figure 1: X = 4. The blue line in each image represents the probabilityequation for p1H in terms of p2H . The orange line represents the probabilityequation for p2H in terms of p1H . The red line is a trace of equilibrium pointsas a function of λ.

14



(a) λ = 0EQ = (0.500, 0.500)

(b) λ = 0.5EQ = (0.831, 0.418)

(c) λ = 1EQ = (0.894, 0.313)

(d) λ = 3EQ = (0.795, 0.145)

(e) λ = 10EQ = (0.607, 0.104)

(f) λ = 100EQ = (0.511, 0.100)

Figure 2: X = 9

(a) λ = 0EQ = (0.500, 0.500)

(b) λ = 0.5EQ = (0.966, 0.386)

(c) λ = 1EQ = (0.989, 0.273)

(d) λ = 3EQ = (0.895, 0.086)

(e) λ = 10EQ = (0.644, 0.053)

(f) λ = 100EQ = (0.515, 0.050)

Figure 3: X = 19

15



4 Limitations of Quantal Response

4.1 Debreu’s Critique Revisited

Recall Gerard Debreu’s 1960 critique of Luce’s individual choice model. De-breu argued that the model gives undeserved5 additional probability to corre-lated strategies. Debreu uses the example of an individual deciding betweenoptions of music. When asked to decide between a recording of Debussy anda recording of Beethoven, the subject more often chooses Debussy. However,when asked to decide between Debussy, Beethoven, and the same Beethovensymphony but with a different conductor, the subject more often selects aBeethoven recording. This is counter-intuitive, as the subject does not have

a preference between the two Beethoven options, and so we would expectthem to select Debussy with the same probability as when only given twooptions. Taken to its logical extreme, as the amount of Beethoven optionsincreases, with the subject still preferring Debussy to any one Beethoven andbeing indifferent between all of the Beethoven options, the probability thatthe subject selects Debussy goes to 0.

This section will argue a related critique of the logistic quantal responsemodel.

4.2 A Motivating Example

Consider the following game:

Bob

B S

Alice B 2, 1 0, 0

S 0, 0 1, 2

This game, often called ”Bach or Stravinsky” (or ”Battle of the Sexes”),tells the story of Alice and Bob deciding separately whether to go to a JohannSebastian Bach concert or an Igor Stravinsky concert. Alice would prefer

Bach, Bob would prefer Stravinsky, but neither gets any utility if they go todifferent concerts.

5This additional probability is undeserved in the sense that it is contrary to what wesee and expect in human behavior.

16



It is important to note here that this game has three Nash equilibria:

two pure strategies where they both go to the same concert, and one mixedstrategy, where each goes to the concert they would prefer with probability2/3.

Let us also consider the following related game:

Bob

B S

Alice

B 2, 1 0, 0

S 1 0, 0 1, 2

S 2 0, 0 1, 2

S 1 and S 2 can be thought of as Alice going to the Stravinsky concertwearing brown shoes or black shoes. It affects neither her utility nor Bob’sutility, as the exact same outcome occurs whichever color shoe she chooses.Because the choice between S 1 and S 2 is so arbitrary and has no effect oneither player’s utility, intuitively we expect that the two options combinedshould get the same probability as S would in the 2x2 version.

However, because of the way the quantal response model is specified, thisis not the case. In fact, for a high enough λ, there are three QRE’s, eachcorresponding to the Nash equilibria stated earlier. However, all three of these equilibria are mixed strategy, as even the QRE that corresponds to a

pure strategy Nash equilibrium has no absolute probabilities (probabilitiesof 0 or 1) for any λ ∈ R+. Thus, any strategy for either player in a game getsat least some probabilistic weight, and therein lies the problem. I call thisthe additional strategy effect . As will be shown, in the Bach or Stravinskyexample, the probability that Alice plays S in the 2x2 game is different fromthe combined probability she plays either S 1 or S 2 in the 3x2 game. In otherwords, p2x21B = p3x21B . Whether p2x2

1B is greater than or less than p3x21B depends

on which of the three quantal response equilibria of this game we look at.It is important to note, however, that this is simply one instance in which

the QRE model does not fit human behavior. In many cases, the QRE model

does very well. For example, if we were to take the 2x2 BoS game that wehave already specified, and added another Stravinsky concert that both Aliceand Bob could attend, the normal form game would would be as follows:

17



Bob

B S Y S X

Alice

B 2, 1 0, 0 0, 0

S X 0, 0 1, 2 0, 0

S Y 0, 0 0, 0 1, 2

For interpretation, S X and S Y can be thought of as Stravinsky concertsin concert halls X and Y respectively. Just like in the 3x2 example, thedifference between S X and S Y is arbitrary in that both concert halls areequal in quality. However, if Alice goes to concert hall X , while Bob goes toconcert hall Y , both players get 0 utility. In this case, the QRE model rightly

gives the additional Stravinsky strategy significant probability. Alice’s choiceof X or Y changes Bob’s utility, and similarly Bob’s choice changes Alice’sutility. So in this case, the quantal response equilibria do a good job of predicting how players should and do act.

4.3 Additional Strategy Effect in Generalized Games

In this subsection, games with variable payoffs will be examined in orderto make generalized conclusions about the QRE model. The two games of interested are shown below:

Bob

L R

Alice T a, α b, β

B c, γ d, δ

BobL R

Alice

T a, α b, β

M c, γ d, δ

B c, γ d, δ

Using a similar method to the one used to calculate the QRE in thematching pennies game, we can calculate the generalize QRE probabilitysystem of equations.

2x2

p2x21T = 1

1+e−λ[(a−b−c+d) p2x2

2L +b−d]

p2x22L = 1

1+e−λ[(α−γ −β +δ ) p2x2

1T +γ −δ ]

18



3x2

p3x21T = 11+2e

−λ[(a−b−c+d) p3x22L +b−d]

p3x21M = 1

2+e−λ[(c−d−a+b) p3x2

2L +d−b]

p3x22L = 1

1+e−λ[(α−γ −β +δ ) p1T 3x2+γ −δ ]

Observations. Most importantly, we first note that in the 3x2 game,although there are 3 equations, each probability can be written in terms of either p3x21T or p3x2

2L , meaning that the system can be solved with only thefirst and third equations. Having produced equations for each system, we

examine the relationship between the corresponding probabilities. We seethat for any a, b, c, d, α, β,γ, δ ∈ R, p2x2

2L = p3x22L . We also note that the only

difference between p2x21T and p3x2

1T is that the 3x2 version has an additionale−λ[(a−b−c+d) p2L+b−d] term in the denominator.

Remark. It is easy to show that p2x21T > p3x21T , as e taken to any positiveexponent is also positive, and 1

1+x > 1

1+2x for any positive x.

Lemma 4.1. For λ > 0, and x, y,z ∈ R, a function of the form

f ( p) = 1

1+xe−λ[yp+z]

is monotonic.

Proof. If y < 0, then e−λ[yp+z] increases in p, and f ( p) decreases in p forpositive x, and increases in p for negative x.

If y > 0, then e−λ[yp+z] decreases in p, and f ( p) increases in p forpositive x, and decreases in p for negative x.

If y = 0 or x = 0, then f ( p) does not change in p.

Lemma 4.2. For λ > 0, and a, b, c, d, α, β,γ, δ ∈ R, p2x21T , p2x22L , p3x21T , and p3x22L

are monotonic functions.

Proof. Each function can be written in the form of f ( p), as it has beenpreviously specified. Thus, by Lemma 4.1, each function is monotonic.

The quantal response equilibrium probabilities in the general 2x2 and 3x2games are ¯ p2x2

1T , ¯ p2x22L , ¯ p3x21T , and ¯ p3x22L , such that

19



¯ p2x21T = 1

1+e−λ[(a−b−c+d)¯ p2x2

2L

+b−d]

¯ p2x22L = 1

1+e−λ[(α−γ −β +δ )¯ p2x2

1T +γ −δ ]

and

¯ p3x21T = 1

1+2e−λ[(a−b−c+d)¯ p3x2

2L +b−d]

¯ p3x22L = 1

1+e−λ[(α−γ −β +δ )¯ p3x2

1T +γ −δ ]

Theorem 4.3. For λ > 0, and a, b, c, d, α, β,γ, δ ∈ R, ¯ p2x21T = ¯ p3x21T

Proof. It can be easily shown that the difference between p2x21T and p3x21T is

exactly

e−λ[(a−b−c+d) p2L+b−d]

(1+e−λ[(a−b−c+d) p2L+b−d])(1+2e−λ[(a−b−c+d) p2L+b−d]) > 0

thus the function p2x21T is always above the function p3x21T for any given p2Lfor all λ > 0. We also know that the difference between p2x2

2L and p3x22L is

exactly 0 at all pIT and λ > 0, since they are the same function of p1T . Forconvenience, we will just write both of these functions as p2L.

Because p2x21T , p3x21T , and p2L are all monotonic, as shown in Lemma 4.2,

and because p2x2

1T − p3x2

1T is always positive, the intersection(s) of p2x2

1T and p2Lmust be different from the intersection(s) of p3x2

1T and p2L. These intersectionsrepresent the fixed point equilibrium solutions to each system, and thus weobtain the desired result.

This theorem shows that for any fixed λ, the addition of the new butidentical strategy changes the equilibrium probabilities. Even if this changeis small, it is significant because we can continue to add copies of the samestrategy until the difference between the equilibrium of the new game andthat of the original game is large. We show this in the next theorem.

Theorem 4.4. Let N be the number of copies of one of the original two

strategies in the generalized game for Player 1. For any fixed λ, as N in-

creases, the probability of the other strategy increases or decreases strictly

monotonically.

20



Proof. This result is the logical extreme of Theorem 4.3. We have already

seen that when N is increased from 0 to 1, as is the case in the 2x2 to the 3x2example, p3x21T is always below p2x21T , thus, depending on the games specifica-tion, ¯ p3x21T will either increase or decrease from ¯ p2x21T with this difference in p3x2

1T

and p2x21T . Now, imagine a game that has k strategies available, with k − 1of them being identical copies of one strategy. When we add an additionalcopy to this kx2 game, such that we have a (k+1)x2 game with N = k, wehave the following two functions for the two games:

¯ pkx21T = 1

1+(k−1)e−λ[(a−b−c+d)¯ pkx2

2L +b−d]

¯ p(k+1)x21T =

1

1+ke−λ[(a−b−c+d)¯ p(k+1)x22L +b−d]

We see that the difference between these two functions is

e−λ[(a−b−c+d) p2L+b−d]

(1+(k−1)e−λ[(a−b−c+d) p2L+b−d])(1+ke−λ[(a−b−c+d) p2L+b−d])

which is always strictly greater than 0. By the same logic as Theorem4.3, the addition of another copy when there are already k copies moves theequilibrium in the same direction, regardless of how many copies are alreadypresent.

Thus, every additional copy pushes the equilibrium probability that Player

1 plays the non-copied strategy towards 0 or 1.

Conjecture 4.5. As λ → ∞, | ¯ p2x21T − ¯ p3x21T | strictly decreases.

This conjecture proposes that as λ increases, the difference between theequilibrium probabilities of the 2x2 and 3x2 games constantly gets smaller.This is an intuitive result, as we expect a player to treat the two games asmore and more alike, as he is getting more and more rational.

In order to conceptualize all of this, it is helpful to return to our previous

Bach or Stravinsky example.Observing Figure 4, it is easy to see the result of Theorem 4.3, as theintersection points are noticeably different, especially for low values of λ.Additionally, we see that for low enough values of λ there is only one equi-librium for each version. It is only after λ gets large enough that the three

21



(a) λ = 0 (b) λ = 0.5 (c) λ = 1.8

(d) λ = 3 (e) λ = 10 (f) λ = 100

Figure 4: Bach or Stravinsky. The blue line represents the function p2x21T , the

orange line represents the function p3x21T , and the green line represents thefunction p2L.

equilibria appear. On this note, because of the nature of the functions, wesee that three equilibria appear in the 2x2 game for lower values of λ thanin the 3x2 game. We can think of this as the Alice in the 3x2 game taking

‘longer’ to get to the same level of rationality as her 2x2 self.

4.4 Applications to Prisoner’s Dilemma

The story behind the famous prisoner’s dilemma game is this: Two guiltypeople (Alice and Bob) are being questioned separately for collaborativelybreaking the law. Both can either defect and confess to the crime, or co-operate with each other by not confessing. If both cooperate, they get 1year each. If one defects and the other tries to cooperate, the defector getsoff without jail time, and the other gets 5 years. If both defect, then theyeach get 3 years. The normal form game is presented below, with numberscorresponding to utility, not jail time.

22



Bob

D C

Alice D 1, 1 5, 0

C 0, 5 3, 3

This game has a dominant strategy, in that defecting provides more utilityno matter how the other player acts. Similarly, cooperating is a dominatedstrategy, in that each player can get more utility by not cooperating. Thus,the only situation in which both players are best responding, and thus theonly Nash equilibrium, is when both players defect. Even though the best

joint outcome is when both players cooperate, it is too tempting in that case

for each player to defect.The theorems in the previous section result in an interesting consequence

for this game. Like we did in previous sections, we create a third option forAlice that is an exact copy of her second strategy. We are now looking atthe game below.

Bob

D C

Alice

D 1, 1 5, 0

C 1 0, 5 3, 3

C 2 0, 5 3, 3

After reading section 4.3, we suspect that the probability that Alice de-fects in this game will be different from the same probability in the original2x2 version. Using the same technique as before, we see that this is true,and in fact the addition of a copy of the C strategy actually makes it morelikely that Alice will cooperate. This is shown in Figure 5.

Note also that as λ increases, the equilibrium moves towards both playersplaying D with probability close to 1 much more quickly than in other gameswe have looked at. This is caused by D being a dominant strategy. However,what this shows is that for any fixed rationality value λ, there is a number

of copies of C , call it N ∗, that can be added such that the probability thatAlice plays D is approximately 0. For a very low value of λ, N ∗ need not belarge. As the value of λ increases, N ∗ grows very quickly. For λ = ∞, thereis no such N ∗. But, as shown in Figure 6, for any fixed λ, we can keep addingcopies of cooperate until Alice defects with probability approximately 0.

23



(a) λ = 0 (b) λ = 1 (c) λ = 3

Figure 5: Prisoner’s Dilemma. The blue line represents the function p2x21D ,

the orange line represents the function p3x21D , and the green line represents the

function p2D.

(a) λ = 0 (b) λ = 1 (c) λ = 3

(d) λ = 5 (e) λ = 7 (f) λ = 9

Figure 6: Prisoner’s Dilemma. The green line represents the function p2D.Each other line represents the function p1D in a game with N copies of the C

strategy for Alice. Dark Blue: N = 1, Orange: N = 2, Red: N = 5, Purple:N = 10, Brown: N = 20, Light Blue: N = 50, Yellow: N = 100, Magenta:N = 1,000, Dark Green: N = 10,000, Bright Red: N = 1,000,000

24



5 Conclusion

In this paper, we examine only the effect of adding identical copies of pre-existing strategies to a game. However, it is important to note that this isonly an extreme example, used for convenience and ease of understanding.This specific example is only a small part of the phenomenon that exists. Weknow by the specification of the quantal response model that as we changethe payoffs of a game, the equilibria change continuously. Thus, if we take anidentical strategy and adjust it ever so slightly, we see the same phenomenon.Continue in this way, and we reach the conclusion that we see similar effectsto the ones we have shown, even when strategies are just correlated in someway. That is, if two strategies are similar but not identical, the model still

may have trouble adapting. This has much more widespread applicationsthan just the identical strategy special case.

Moving forward, in order to better suit the model to adapt and adjust tosimilar or identical strategies, there needs to exist in the model specificationsome parameter representing correlation between strategies. If two strategiesare highly correlated, meaning that they offer the player similar results, thenwe should see similar equilibrium probabilities in the game with only oneof the two strategies, and the game with both strategies. If two strategiesare mostly uncorrelated, in that there is a significant difference in outcomebetween the two strategies, then the addition of one of the strategies to agame containing the other strategy should in fact make a significant impacton the equilibrium probabilities.

References

[1] Anderson, Simon P., Jacob K. Goeree, and Charles A. Holt. ”Minimum-Effort Coordination Games: Stochastic Potential and Logit Equilibrium.”Games and Economic Behavior 34.2 (2001): 177-99.

[2] Debreu, Gerard. Rev of Individual Choice Behavior a Theoretical Analy-

sis . The American Economic Review (1960): 186-88.

[3] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. ”QuantalResponse Equilibria.” The New Palgrave Dictionary of Economics (2008):783-87.

25



[4] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. ”Risk Averse

Behavior in Generalized Matching Pennies Games.” Games and Economic Behavior 45.1 (2003): 97-113.

[5] Haile, Philip A., Ali Hortasu, and Grigory Kosenok. ”On the EmpiricalContent of Quantal Response Equilibrium.” American Economic Review

98.1 (2008): 180-200.

[6] Luce, R. Duncan. Individual Choice Behavior a Theoretical Analysis . Mi-neola, NY: Dover Publications, 2005.

[7] Mckelvey, Richard D., and Thomas R. Palfrey. ”Quantal Response Equi-libria for Normal Form Games.” Games and Economic Behavior 10.1

(1995): 6-38.

[8] Ochs, Jack. ”Coordination Problems,” Handbook of Experimental Eco-

nomics (1995), edited by John Kagel and Alvin E. Roth, Princeton Uni-versity Press, Princeton, 195-251.

26

stochastic modeling of irrationality in normal-form games

Documents