adaptive learning in imperfect monitoring games

14
Review of Economic Dynamics 2, 472-485 (1999) Article ID redy.1998.0047, available online at http://www.idealibrary.com on I n | ~_|® Adaptive Learning in Imperfect Monitoring Games* Mario Gilli Unioersith Bocconi, Istituto di Economia Politica, via Gobbi 5, 20136 Milano, Italy Received February 26, 1997 This paper deals with the problem of specifying a general learning model, the rationality of which is not situation dependent. I propose a very general model of adaptive learning suitable to study learning problems in games with imperfect monitoring, such as extensive form games. In this context I relate adaptive learning with a general notion of equilibrium. In particular I provide a dynamic characted- sation of .conjectural equilibria: a "stable" strategy profile is consistent with adaptive learning if and only if it is a conjectural equilibrium. Journal of Economic Litertiture Classification Numbers: C72, D83, D82 © 1999AcademicPress Key Words: adaptive learning; justification operator; imperfect monitoring games. 1. INTRODUCTION We have learned the answers, all the answers: It is the question that we do not know. --Archibold MacLeish The Hamlet of A. MacLeish The last years have seen an extraordinary flourishing of works studying learning in noncooperative games. A common characteristic of most of these models is that the learning rules are often ad hoc and very specific, in the sense that they are not derived from an explicit behavioural model or are tailored for a specific context. The problem is that the properties of learning algorithms are not independent of the underlying environment. Therefore any specific representation of the players' way of learning is unlikely to be generally applicable: in some cases it could correctly represent the actual players'~ behaviour, but in other contexts it is hope- *I would like to thank David Canning, David Easley, Frank Hahn, Ehud Lehrer, Hamid Sabourian, and especially Ramon Marimon and two anonymous referees for their valuable observations, and the seminar participants at the 1995 IGIER Colloquia and at the 1995 CNR theory workshop for their helpful comments. The usual disclaimers apply. 472 1094-2025/99 $30.00 Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ®

Upload: unimib

Post on 21-Jan-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Review of Economic Dynamics 2, 472-485 (1999)

Article ID redy.1998.0047, available online at http://www.idealibrary.com on I n | ~_|®

Adaptive Learning in Imperfect Monitoring Games*

Mario Gilli

Unioersith Bocconi, Istituto di Economia Politica, via Gobbi 5, 20136 Milano, Italy

Received February 26, 1997

This paper deals with the problem of specifying a general learning model, the rationality of which is not situation dependent. I propose a very general model of adaptive learning suitable to study learning problems in games with imperfect monitoring, such as extensive form games. In this context I relate adaptive learning with a general notion of equilibrium. In particular I provide a dynamic characted- sation of .conjectural equilibria: a "stable" strategy profile is consistent with adaptive learning if and only if it is a conjectural equilibrium. Journal of Economic Litertiture Classification Numbers: C72, D83, D82 © 1999 Academic Press

Key Words: adaptive learning; justification operator; imperfect monitoring games.

1. INTRODUCTION

We have learned the answers, all the answers: It is the question that we do not know.

--Archibold MacLeish The Hamlet of A. MacLeish

The last years have seen an extraordinary flourishing of works studying learning in noncooperative games. A common characteristic of most of these models is that the learning rules are often ad hoc and very specific, in the sense that they are not derived from an explicit behavioural model or are tailored for a specific context. The problem is that the properties of learning algorithms are not independent of the underlying environment. Therefore any specific representation of the players' way of learning is unlikely to be generally applicable: in some cases it could correctly represent the actual players'~ behaviour, but in other contexts it is hope-

*I would like to thank David Canning, David Easley, Frank Hahn, Ehud Lehrer, Hamid Sabourian, and especially Ramon Marimon and two anonymous referees for their valuable observations, and the seminar participants at the 1995 IGIER Colloquia and at the 1995 CNR theory workshop for their helpful comments. The usual disclaimers apply.

472 1094-2025/99 $30.00 Copyright © 1999 by Academic Press All rights of reproduction in any form reserved.

®

ADAPTIVE LEARNING 473

lessly misspecified. A possible solution to this problem is to provide a formulation of learning general enough to encompass the more common learning algorithms such as best reply dynamics, fictitious play, stationary Bayesian learning, etc. This is the approach pioneered by Milgrom and Roberts (1990, 1991) (see also Marimon (1993), Marimon (1997), Marimon and McGrattan (1992), and Heldon et al. (1995)). The desired degree of generality is achieved by avoiding any detailed description of how the players actually reach their decision, identifying instead general properties that the sequence of plays over time should satisfy. In this paper I follows this approach by generalising both the model of strategic interaction and the behavioral hypotheses. This is obtained using strategic form games with signal functions (imperfect monitoring games) as models of strategic situations and justification operators to represent players' behaviour. In- deed the generalisation to games with imperfect monitoring also requires the generalisation of the behavioral hypothesis used in Milgrom and Roberts (1990, 1991), since in these more general contexts it would be very restrictive to limit the players' behaviour to static optimisation, while generally it would be rational to experiment with strategies that are suboptimal in the short run. Actually experimentation is an essential feature of learning. Contrary to Milgrom and Roberts' (1990, 1991) tech- niques, my approach allows a unified analysis of games with private information such as extensive form games and of different behavioral hypotheses as active learning with experimentation. This paper not only provides a generalisation of Milgrom and Roberts' work and results to games with imperfect monitoring, but new relevant results are also pre- sented. In fact the approach I follow in this paper generates new interest- ing problems; in particular I am able to prove necessary and sufficient conditions relating conjectural equilibria (see Gilli (1993) for a general definition) and general adaptive learning processes.

The paper is organised as follows: In Section 2 the general learning model is described; Section 3 provides the results relative to conjectural equilibria in pure strategies, while Section 4 comments on the generalisa- tion of these results to mixed strategies. Section 5 concludes by interpret- ing the results in terms of the usual assumption of Bayesian rationality for extensive games.

2. A GENERAL LEARNING MODEL

The paper actually is about adaptive learning in general, not specifically in games. Notwithstanding this, I directly cast my model within a game theoretic structure to provide a more direct comparison with Milgrom and

474 MARIO GILLI

Roberts (1990, 1991) and to give more specific connections between agents' limit behaviour and equilibrium concepts.

In this paper I consider the class of finite imperfect monitoring games (IMGs) (a general analysis of IMGs is provided in Gilli (1995); the same model in the context of repeated games is studied in Lehrer (1989, 1990, 1991, and 1992). An imperfect monitoring game is defined as

G(n) (s,, N :ffi U i , ' O i ) i f f i l ,

where

• N is the set of players, i ~ N;

• S i is the set of player i pure strategies, S : - -Xi~ N S i and S_ i := Xj . i Sj; moreover ~i :-- A(Si) and E := ®i ~ N A(Si) are the set of mixed strategies, respectively, of player i and of all players, where A(-) denotes the set,0f all probability measures on the set .;

• ~7i is player i's signal function: let M i be the set of i possibly private messages. Then rli: S -~ Mi, where rli(s) is the signal privately observed by player i. when the players play the strategy profile s, e.g., it could be the outcome of an extensive form game;

• u i is player i's payoff function: ui: A(S) ~ R. Note that u~ is not assumed to satisfy any particular condition, e.g., it could be merely ordinal, not necessarily satisfying the yon Neumann-Morgenstern axioms.

I assume that the analysis is restricted to the class of finite imperfect monitoring games, i.e., the sets N, S, and M are finite.

The behaviour in the general learning model of this paper is modelled through the notion of the justification operator (Milgrom and Roberts, 1991). Let ~ ( X ) be the family of all Cartesian subsets of a generic set X.

DnraNi'noN 1. A justification operator J is a mapping defined as fol- lows:

J: ~ ( S ) ~ ( S ) such that

(i) B, C ~.~(S) , B c_ C ~ J ( B ) _ J ( C ) ;

(ii) B ~ . ~ ( S ) , B ~ O ~ J ( B ) # O .

Moreover, Vk ~ N, VB ~ ~ ( S ) define

J ° ( B ) ..= B

. . =

:= f3 Sk(B). k e n

ADAPTIVE LEARNING 475

Finally Ji(B) is the projection on the ith element of J(B) and Ji(X_i) is the projection on the ith element of J(S i × X_i).

Remark. Roughly, any strategy profile in J(X) is a possible prof'de of players' choice if they suppose they are going to play in X. Therefore Jk (X) is the set of players' possible strategies when they are supposed to choose a strategy profile in Jk - l (X ) . The standard assumption in eco- nomics is the hypothesis of Bayesian rationality. In this case, the interpre- tation of J(.) is

b i E J i ( B _ i ) if and only if 3/z i ~ A(B_i):

b i E arg m,ax ui(b~, l~i). bi

Obviously many other "justifiable" behaviours are compatible with a particular specification of a justification operator, e.g., a dynamic be- haviour with experimentation and active learning or the rationalizable strategies or the pure undominated strategies defined in B6rgers (1993) or the hyperrati0nal strategies part of the set of stable equilibria. Indeed the only restriction on possible behaviours are requirements (i) and (ii); otherwise the choices are completely general, and standard results imply that in finite games conditions (i) and (ii) are satisfied for all the above solution concepts. On the other hand, not all common solution concepts satisfy nonemptiness (ii) and monotonicity (i); for example, the operator that selects the set of weak undominated strategies is not monotone and the set of strict equilibria can be empty.

The following properties are easy to derive, but they help to explain the role and meaning of justification operators:

PROPOSITION 1. Then

Assume that J(B) c B, B ~.~(S), and B is not empty.

Vk ~ N, Jk+l(B) _cJk(B); (1)

o; (2)

C=;(C)_B C_r(B) . (3)

Let J and G be two justification operators. Then

I ( B ) c_c_ G(B) ~ Vk ~ N i k ( B ) c_ Gk(B) . (4)

Proof. The proof of (1) is by induction. By assumption, (1) holds for k = 0. Suppose that Jk(B) c j k - I (B) . Then applying the definition of jn,

476 MAYO GILLI

with n = k, k - 1, and property (i) of Definition 1,

j k + l ( B ) --- j ( j k ( B)) c - j ( j k - ' ( B)) = jk( B).

The proof of (2) follows from properties (i) and (ii) of Definition 1 and from the finiteness of B which implies that Jk(B) is compact for all k (see, e.g., Binmore (1981));

To prove (3), suppose that J(C) = C c_ B. Then iterating the application of J, from the hypothesis and from (i) in Definition 1, it follows that Jk(c ) = C c jk(B) . Then by induction C G J~(B).

The proof of (4) follows immediately from the hypothesis and from property (i) of Definition 1, iterating the application of J and G, according to the definition of jk(.) and of Gk(.). I

The use of justification operators avoids any detailed description of how the players actually reach their decisions. Similarly, to obtain a very general notion of the learning process, no detailed description of a learning rule is provided. Instead I specify some general properties that plausible learning processes should have. Following the approach used in Milgrom and Roberts (1990, 1991), the learning model is cast in terms of properties of plays; that is, in terms of restrictions on the possible se- quences of strategy profiles. In particular I identify some general proper- ties that the sequence of plays over time must satisfy: the players' play is consistent with adaptive learning when the agents learn from past observa- tions and their behaviour reflects this dependence only.

DEFINITION 2. A sequence of strategy profiles {st}, is consistent with adaptive learning iff

Vi E N, Vt', 32: Vt > t, st EJi( U s-i(~i(s*))) , -iA.<t

where

S_i(~i(S)) := {S_i E S_i [ ~i(Si, S_i) = ~i(S)}.

Remarks.

1. A sequence of strategy profiles is consistent with adaptive learn- ing iff each player can eventually fend a way to justify her choices in terms of her past observations about opponents' past play. More specifically the players eventually choose only strategies that are justified by the supposi- tion that the opponents are not going to use pure strategies that are not compatible with the signals observed for a sufficiently long time.

ADAIrrlVE LEARNING 4 7 7

2. When there is perfect monitoring and Bayesian rationality, Defi- nition 2 coincides with the definition of adaptive learning given by Mil- grom and Roberts (1991), but now many more situations are compatible with this definition. My definition is more general for two different reasons. First, it allows for imperfect monitoring of opponents strategies, as required, e.g., by extensive form games. Second, the players' behaviour can be quite complex depending on the specification of J. For example active learning with experimentation in extensive games is allowed by Definition 2, while it is not compatible with Milgrom and Roberts' defini- tion of adaptive leaming. What is remarkable is that for the following theorems I do not need more specific assumptions.

3. Common naive learning processes, such as stationary Bayesian learning, fictitious play, or Cournot dynamics, induce sequences of strategy profiles that are consistent with adaptive learning. But the same is true of some extremely sophisticated learning processes, as the steady state learn- ing of Fudenberg and Levine (1993a).

Finally, I need a notion of convergence for players' behaviour. Indicate with suPP(.) the support of a probability measure.

DEFINITION 3. A sequence of strategy profiles {st}t converges to a possibly correlated strategy profile /3 ~ A(S) iff:

(i) Vs ~ S,

(ii) 3}': Vt _>_ t',

K(s;t) lira ~ = /3 ( s ) , t~oo t

s t ~ SUPP(/3),

where x(s; t) is the number of times s has been played until t, i.e., K(s; t ) / t is the empirical relative frequency of s in the play.

Remark. The notion of convergence is defined in terms of empirical relative frequencies of players' behaviour taking the sequence of pure strategies played. The definition states that a sequence of pure strategies {s t} converges to a strategy profile if this profile replicates the empirical frequencies and if the players eventually play only strategies that have positive probability according to this profile. More specifically (i) implies that every strategy in the support of /3 is played inf'mitely Often (but not vice versa) and (ii) means that there is a period after which each player plays only strategies in the carrier of/3, i.e., every strategy played infinitely often belongs to the support of/3.

478 i A m O C~t.ta

3. CONJECTURAL EQUILIBRIA IN PURE STRATEGIES AND ADAPTIVE LEARNING

The aim of this section is the analysis of the relationship between adaptive learning and conjectural equilibria in pure strategies.

DEFINITION 4. A pure strategy profile g is a conjectural equilibrium in pure strategies (CE p) iff Vi E N, 313 i ~ A(S_i) such that:

(a) si E Ji(SUPP( fli));

(b) SUPP(fli) --- S_,(r/,(S)).

Remarks.

1. A strategy profile is a conjectural equilibrium in pure strategy when each strategy in the profile is justified assuming that the opponents play some strategy compatible with the information, possibly private, of the player. In other words in a CE e the players' beliefs are not contradicted by players' observations. Condition (a) is the behavioral hypothesis, while condition (b) is the equilibrium condition of compatibility between i's observations and conjectures.

2. At this level of generality I do not hypothesise about the relation- ship between the information on opponents' behaviour and player i's choice: any behaviour rule satisfying Definition 1 is compatible with this definition, not only Bayesian rationality. A comprehensive discussion of this equilibrium concept in the case of Bayesian rationality is contained in [8]. To understand the notion of conjectural equilibrium, assume Bayesian rationality and consider the imperfect monitoring game of Figures 1 and 2.

In this IMG each player receives a payoff (Figure 1) and a public signal rn k (Figure 2) as a function of the strategy profile played. Consider the strategy profile (A1, A2, s3) , where s 3 ~ {L ,R} is player 3's strategic

L R

AI

Dx

1, 1, 1

3, O, 0

3, O, 0

3, O, 0

A1

D1

1, 1, 1

O, 3, 0

O, 3, 0

O, 3, 0

FIGURE I

ADAPTIVE LEARNING 479

L R,

A1

Dz

,m. 1 ?T~ 2

,m, 3 1,Tt, 3

A1

D1

F I G U R E 2

TtlI Trl. 4

.yr.t 5 ,r},.t 5

choice: this situation is never a Nash equilibrium since with common beliefs at least one of the players must choose D i. However, suppose that player 1 believes s 3 = L with probability 1 / 4 and player 2 believes s 3 = R with probability 1/4. Then for both players it is rational to play A i. Moreover the signals received by the players when they play (A 1, A 2) do not contradict these conjectures. Therefore this situation is a (conjectural) equilibrium of the IMG: each player is rational and there is no incentive to change behaviour. But in a learning setting a player may try to discover the true state by playing D i. In other words she can rationally give up some utility today in order to have useful information for utility maximisation tomorrow; i.e., the player can use active learning strategies. Surely this behaviour is extremely important in learning contexts and to accommodate such behaviour we can just change the interpretation of justification operator, e.g., assume intertemporal utility maximisation. Obviously it is possible to rationalise even more complex behaviour through an opportune use of the justification operator. Moreover suppose that the signal function of Figure 2 is the outcome function of the extensive form game of Figure 3, which was used by Fudenberg and Kreps to show the possibility of

1 A 1 2

(3,o.o) (o.3.o) (3.o,o) (0.3,0)

A 2 (1.1.1)

F I G U R E 3

480 MARIO GILLI

long-run non-Nash outcomes (now reported in Fudenberg and Levine (1998, pp. 176-177)).

It is immediate to see that the above conjectural equilibrium is actually a self-confirming equilibrium of this game, since the mutual beliefs of players 1 and 2 are not contradicted on the equilibrium path and given these beliefs the players maximise their expected payoffs (Fudenberg and Levine, 1993b; Fudenberg and Levine, 1998, Chap. 6). This shows that self-confirming equilibria in pure strategies are a particular case of CE e for extensive form games and Bayesian rationality (for a detailed study of these connections, see Gilli (1993).

The following simple lemma underlines the dynamic characterisation of conjectural equilibria in pure strategies.

LEMMA 1.

g ~ CE e , i f a n d o n l y i f Vi ~ N , {si} GJi (S- i ( r l i (g) ) ) •

Proof. Only if: Relation (b) of Definition 4 and property (i) of Defini- tion 1 imply that

Vi ~ N , Ji(SUPP([Ji)) c--Ji(S-i('l~i(S))).

Therefore relation (a) of Definition 4 implies that

'¢i ~ N, {si} GJi (S- i ( r l i ( s ) ) ) •

if: Take any fli ~ A(S_i) such that SUPP[fli] = S-i(~?i(g)). Then in Definition 4 condition (b) is satisfied by construction and condition (a) follows immediately from the definition of fli and from the hypothesis

V i ~ N , J,(s_,(,i(

THEOREM 1. Consider a sequence o f pure strategy profiles {s'} t and the set o f its cluster points S~({st}). Then S~({s'}) c CE e only i f {st}t is consistent with adaptive learning.

Proof. First note that S~({st}) ~ 0 , since by assumption S is finite and therefore compact. Moreover by definition of cluster point and since S is finite,

=l't: S=({st}) = { s i l t > ~}. (5)

ADAPTIVE LEARNING 481

Therefore

VS E Sm({st}), Vi E N, V-t 3t'. V t > t ,

_ U s_ , ( ,7 , ( : ) ) . ~< ~'<t

(6)

Now the proof of the theorem is easy. By hypothesis g ~ CEe; therefore

Vi ~ N, Vt, 3 t : Vt _> ~,

(7)

where the first ificlusion is implied by Lemma 1 and the second inclusion follows from condition (i) in Definition 1 and from the inclusion in (6). Moreover by hypothesis, (7) holds for any g ~ S~({st}). Therefore (5) implies

Vi ~ N, Vt, ::1}: Vt >__ {4} - c J, U s_,(,,(s,))), i.e., by definition {sq t is consistent with adaptive learning. |

Remarks.

1. The intuition for Theorem 1 runs as follows. Suppose that players eventually play some pure strategy CE every period, although not necessar- ily always the same one. In a finite game, there are a finite number of pure strategy CE that are played repeatedly. In addition, the time between playing a particular CE that is in the limit set must be finite. But then, since adaptive learning only requires that eventually the players play a "justifiable response" each period to what each knows about the strategies that the other players used a finite number of periods before, this repeated play of CE must satisfy adaptive learning.

2. The theorem says that adaptive learning is a necessary condition in order to eventually playing a CE. Therefore it is not possible to obtain results for playing infinitely often a CE with a broader class of learning models than the ones that I consider.

The following theorem provides a dynamic characterisation of the set of conjectural equilibria in pure strategies when there is convergence of {s t } to a pure strategy profile.

482 MAmO GILLI

THEOREM 2. Suppose that the sequence o f strategy profiles {st}t converges to a pure strategy profile g ~ S. then ~ ~ CE e i f and only i f {st}t is consistent with adaptive learning.

Proof. First note that conditions (i) and (ii) in the Definition 3 of convergence of a sequence of strategy profiles in the case of pure strategy are equivalent to 3~: Vt > ~': s t = {g}, i.e., 3~: {st l t > t'} = {g}. This implies that Y~' > ~, 32: Yt > ~ {s ~ I t" < ~" < t} = {g}. Hence

Yi ~ N , V~ > t, 3"t: V t > t, S-i(Tli(g)) ~" U s - i ( ~ i ( s~ ) ) • (8) ~<r<t

Now the proof of the theorem is easy. Only if: It follows trivially from Theorem 1, since in this case S~({st}) =

{~}. If: If {s t} converges (In-st equality) and if the sequence of strategy

profiles is consistent with adaptive learning (first inclusion), then

Yi ~ N, ~'~ >__ ~, 3~: Vt > ~,

{Si} ~ {st}t ~J i S-i(~i(s~') = J i (S - i (n i ( s ) ) ) , ? t

where the second equality follows from relation (8). Therefore Lemma 1 implies that g ~ CE e. I

Remark. Half of this theorem (only if) is just a restatement of Theorem 1, while the other half can be understood as follows. Suppose a sequence of strategies converges to a pure strategy limit. Equivalently, in a finite game, all players must always play a given strategy after some finite time. For that to b e consistent with adaptive learning, each player's limit strategy must be a justifiable response to what she knows about the other players' limit strategies. This just says that the limit strategy must be a conjectural equilibrium. Therefore adaptive learning excludes any se- quence that converges to any pure strategy profile other than conjectural equilibrium.

4. C O N J E C T U R A L E Q U I L I B R I A IN MIXED STRATEGIES AND A D A P T I V E L E A R N I N G

The generalisation of previous results to mixed strategy conjectural equilibria is not immediate, but to avoid repetition I do not provide here all the details (available on request). I just give some hints on the tricks involved in this operation.

ADAPTIVE LEARNING 483

First, we should distinguish two different classes of learning models, depending on the players' object of learning. In particular the players can consider payoffs relevant to either their opponents' pure strategies or their opponents' mixed strategies. Indeed the players' observations and the elabo- ration of this information crucially depend on the concept the players have of their strategic situation, in the sense that these observations could convey information on the mixed strategy that regulates the players' choices or could directly convey information on the pure strategies the opponents are going to use. Note that the justification operators can easily be generalised to manage different situations, whether the players are referring to their opponents' pure or mixed strategies. Therefore it is necessary to distinguish situations where the players try to infer the aoerage behaviour of the opponents' populations, i.e., the mixed strategies of the opponents, and situations where they try to infer the opponents' pure strategy behaviour. Consequently there exist two alternative defini- tions of conjectural equilibrium in mixed strategies according to two different behavioral hypotheses, whether each player's behaviour is justi- fied with reference to either the opponents' pure strategies as induced by their mixed strategies--condition (a)--or directly to the opponents' mixed strategies, condition (b)"

(a) Vi ~ N, Vg i ~ SUPP(~i),

3/z i ~ A(X_i) such that si ~ Ji(SUPP(/3[ p.i])),

where /3[/z i] ~ MS_ i) is defined as /3[ i~i](g_ i) := f~_, tr_i(g_i)lzi(dtr_i); o r

(b) Vi ~ N, Vg~ ~ S U P P ( ~ ) ,

:l/z i ~ A(~_i) such that si ~ Ji(SUPP(/zi) ).

On the other hand the equilibrium condition is the trivial generalisation of the pure strategy case:

SUPP(/xi) ___ {o'_ i ~ X- i l pi( ' lgi , o'_i) = pi('lr~i,o'_,)},

where Pi['] is defined as V/3 ~ A(S), V m i ~ Mi, pi]/3](mi) :-- Y"(sln,[s)=m~} /3(s). For a better understanding of these notions, consider the special case of perfect monitoring, i.e., when 7/i is the identity function for each player, and of Bayesian rationality. Then it is not difficult to show that the first definition characterises the set of (normal form) rationalizable strategies, while the second coincides with Nash equilibria.

484 MARIO GILLI

Second, we should consider the information available to player i at t. Different learning models can be related to different situations of strategic interaction (see Battigalli et al. (1992) for a detailed description of the different ways to model learning in noncooperative game theory). In all these models even if each agent chooses a pure strategy, a player's behaviour is described by the statistical distribution of the agents' be- haviour, i.e., by a mixed strategy. But the players' information is different according to the specific model: the relevant observation at t can be the statistical distribution 9f the signals as generated by the statistical distribu- tion of agents' behaviour ~' ~ A(S), namely, pi[~ t] ~ A(Mi), or it can be ~Ti(s') with probability Pi[ ~ t].

Then, given these notes, it is not difficult to prove the analogs of Theorems 1 and 2 for the mixed strategy case, following a similar pattern of proof if at each date the players observe the statistical distribution of the signals of their population Pi[ ~t], otherwise assuming consistency in a statistical sense. For example a plausible assumption is that if ~t con- verges to q, then Vi ~ N, f'),> 1 X-i(rn~ I ~t) = •-i(Pi[q]). Intuitively this means that given the convergence of ~t, the agents can consistently estimate the statistical distribution of the signals since these are randomly extracted from the stationary unknown probability distribution p~[q] A(M,).

5. CONCLUSION

The results obtained in the previous sections are interesting because of the generality of the setting I consider in this paper, even if you object to my approach that it justifies everything (see Marimon (1997) for effective arguments for empirical content of learning models). A better understand- ing of the relevance of my results may be provided by their specialisation to more standard cases. In particular consider extensive form games and assume Bayesian rationality, then conjectural equilibria and self-confirm- ing equilibria coincide and thus the previous results can be read as follows:

1. Consider a sequence of strategy profiles {st}t and the set of its cluster points S~({s'}). Then any g ~ S®({s'}) is a self-confirming equilib- rium only if {st}t is consistent with adaptive learning (Theorem 1).

2. Suppose that a sequence of strategy profiles {st}t converges to g ~ S. Then g is a self-confirming equilibrium if and only if {st}t is consistent with adaptive learning (Theorem 2).

3. Consider a sequence of strategy profiles {st}t and the set of its cluster points S~({st}). Then any s ~ S°°({s'}) is rationalizable (in the normal form) if and only if {st}t is consistent with a weak form of adaptive learning (Section 4).

ADAPTIVE LEARNING 485

4. Suppose that a sequence of statistical distributions of strategy profiles { ~ t} t converges to a possibly correlated strategy profile q ~ A(S). Then q is a self-confirming equilibrium if and only if { ~t} t is consistent with a strong form of adaptive learning (Section 4).

REFERENCES

1. Battigalli, P., Gilli, M., and Molinari, C. (1992). "Learning and Convergence to Equilib- rium in Repeated Strategit3 Interactions: An Introductory Survey," Ricerche Economiche XLVI(3-4), 335-378.

2. Binmore, K. (1981). Topological Ideas, Cambridge: Cambridge Univ. Press. 3. B6rgers, T. (1993). "Pure Strategy Dominance," Econometrica 61, 423-430. 4. Fudenberg, D., and Kreps, D. "Learning, Experimentation and Equilibrium in Games,"

mimeo, Stanford University. 5. Fudenberg, D., and Levine, D. (1993a). "Steady State Learning and Nash Equilibrium,"

Econometrica 61, 547-574. 6. Fudenberg, D., and Levine, D. (1993b). "Self Confirming Equilibrium," Econometrica 61,

523-546. 7. Fudenberg, D., and Levine, D. (1998). The Theory of Learning in Games, Cambridge, MA:

MIT Press. 8. Gilli, M. (1993). "On Non-Nash Equilibria," Working Paper 93-05, Bocconi University;

Games and Economic Behavior, forthcoming. 9. Gilli, M. (1995). "Models of Strategic Interaction: A Case for Imperfect Monitoring

Games," mimeo, King's College. 10. Heldon, E., Jacobsen, H., and Sloth, B. (1995). "Adaptive Learning in Extensive Form

Games and Sequential Equilibrium," Discussion Paper 95-08, University of Copenhagen. 11. Lehrer, E. (1989). "Lower Equilibrium Payoffs in Two-Player Repeated Games With

Non-Observable Actions," Intern'ational Journal of Game Theory 18, 57-89. 12. Lehrer, E. (1990), "Nash Equilibria of n-Player Games With Semi-Standard Information,"

International Journal of Game Theory 19, 191-217. 13. Lehrer, E. (1991). "Internal Correlation in Repeated Games," International Journal of

Game Theory 19, 431-456. 14. Lehrer, E. (1992). "On the Equilibrium Payoffs Set of Two Player Repeated Games with

Imperfect Monitoring," International Journal of Game Theory 20, 211-226. 15. Marimon, R. (1993). "Adaptive Learning, Evolutionary Dynamics and Equilibrium Selec-

tion in Games," European Economic Review 37, 603-611. 16. Marimon, R. (1997). "Learning From Learning in Economics," in Advances in Economics

and Econometrics: Theory and Applications (D. Kreps and K. Wallis, Eds.), Vol. 1, pp. 278-315, Cambridge, UK: Cambridge Univ. Press.

17. Marimon, R., McGrattan, E. (1992). "On Adaptive Learning in Strategic Games," in Learning and Rationality in Economics (.4.. Kirman and M. Salmon, Eds.), pp. 158-193, Oxford: Blackwell.

18. Milgrom, P., and Roberts, J. (1990). "Rationalizability, Learning and Equilibrium in Games with Strategic Complementarities," Econometrica 58, 1255-1277.

19. Milgrom, P., and Roberts, J. (1991). "Adaptive and Sophisticated Learning in Normal Form Games," Games and Economic Behavior 3, 82-100.