fingerprint analysis of the noisy prisoner’s dilemma using...

14
Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using a Finite State Representation Daniel Ashlock Mathematics and Statistics University of Guelph, Guelph, Ontario Canada N1G 2W1 [email protected] Eun-Youn Kim National Institute For Mathematical Sciences 385-16, Doryong-dong, Yuseong-gu, DaeJeon 305-340, South Korea [email protected] Wendy Ashlock Department of Computer Science and Engineering York University Toronto, Ontario Canada M3J 1T3 [email protected] Abstract— Fingerprinting is a technique that permits auto- matic classification of strategies for playing a game. In this study the evolution of strategies for playing the iterated pris- oner’s dilemma at three different noise levels is analyzed using fingerprinting and other techniques including a novel quantity, evolutionary velocity, derived from fingerprinting. The results are at odds with initial expectations and permit the detection of a critical difference in the evolution of agents with and without noise. Noise during fitness evaluation places a larger fraction of an agent’s genome under selective pressure, resulting in substantially more efficient training. In this case efficiency is the production of superior competitive ability at a lower evolutionary velocity. Prisoner’s dilemma playing agents are evolved for 6400 generations, taking samples at eight exponentially-spaced epochs. This permits assessment of the change in populations over long evolutionary time. Agents are evaluated for competitive ability between those evolved for different lengths of time and between those evolved using distinct noise levels. The presence of noise during agent training is found to convey a commanding competitive advantage. A novel analysis is done in which a tournament is run with no two agents from the same evolutionary line and one-third of agents from each noise level studied. This analysis simulates contributed agent tournaments without any genetic relation between agents. It is found that in early epochs the agents evolved without noise have the best average tournament rank, but that in later epochs they have the worst. I. I NTRODUCTION This study is a substantially lengthened version of [13]. The experiments were re-run to permit the gathering of new types of data and several additional analyses were performed. A new quantity, evolutionary velocity, is introduced in this study and is used to expose an unexpected and critical difference between agents evolved with and without noise using the finite state representation. The prisoner’s dilemma [18], [17] is a classic model in game theory. Two agents each decide, without communication, whether to cooperate (C) or defect (D). The agents receive individual payoffs depending on the actions taken. The payoffs used in this study are shown in Figure 1. The payoff for mutual cooperation C is the cooperation payoff. The payoff for mu- tual defection D is the defection payoff. The two asymmetric action payoffs S and T , are the sucker and temptation payoffs, respectively. In order for a two-player simultaneous game to be considered prisoner’s dilemma, it must obey the following pair of inequalities: S D C T (1) and 2C (S + T ). (2) In the iterated prisoner’s dilemma (IPD) the agents play many rounds of the prisoner’s dilemma. IPD is widely used to model emergent cooperative behaviors in populations of selfishly acting agents and is often used to model systems in biology [29], sociology [23], psychology [28], and economics [22]. S C D P C 3 5 D 0 1 (1) S C D P C C T D S D (2) Fig. 1. (1)The payoff matrix for prisoner’s dilemma used in this study – scores are earned by strategy S based on its actions and those of its opponent P (2) A payoff matrix of the general two player game – C,T,S, and D are the scores awarded. The study in [10], continued in [5], investigates the effect of changing the representation used for a prisoner’s dilemma agent. The representations covered by the two studies are two versions of feed forward neural nets (one biased at the neuron level toward cooperation), Boolean parse trees [19], with and without a one-step time-delay operation, a linear genetic programming representation called an ISAc list [1], lookup tables, a type of Markov chain [27], and both a direct and cellular [5] representation of finite state machines. The change of representation, with other factors held as near to constant as possible, yielded a change from 0% to 95% in the probability that final populations were cooperative. This study uses one of these representations, finite state machines, investigating the probability of cooperative behavior in the

Upload: others

Post on 22-Oct-2019

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

Fingerprint Analysis of the Noisy Prisoner’sDilemma Using a Finite State Representation

Daniel AshlockMathematics and Statistics

University of Guelph,Guelph, Ontario

Canada N1G [email protected]

Eun-Youn KimNational Institute ForMathematical Sciences385-16, Doryong-dong,

Yuseong-gu, DaeJeon 305-340,South Korea

[email protected]

Wendy AshlockDepartment of Computer Science

and EngineeringYork UniversityToronto, OntarioCanada M3J 1T3

[email protected]

Abstract— Fingerprinting is a technique that permits auto-matic classification of strategies for playing a game. In thisstudy the evolution of strategies for playing the iterated pris-oner’s dilemma at three different noise levels is analyzed usingfingerprinting and other techniques including a novel quantity,evolutionary velocity, derived from fingerprinting. The results areat odds with initial expectations and permit the detection of acritical difference in the evolution of agents with and withoutnoise. Noise during fitness evaluation places a larger fractionof an agent’s genome under selective pressure, resulting insubstantially more efficient training. In this case efficiency is theproduction of superior competitive ability at a lower evolutionaryvelocity. Prisoner’s dilemma playing agents are evolved for 6400generations, taking samples at eight exponentially-spaced epochs.This permits assessment of the change in populations overlong evolutionary time. Agents are evaluated for competitiveability between those evolved for different lengths of timeandbetween those evolved using distinct noise levels. The presenceof noise during agent training is found to convey a commandingcompetitive advantage. A novel analysis is done in which atournament is run with no two agents from the same evolutionaryline and one-third of agents from each noise level studied.This analysis simulates contributed agent tournaments withoutany genetic relation between agents. It is found that in earlyepochs the agents evolved without noise have the best averagetournament rank, but that in later epochs they have the worst.

I. I NTRODUCTION

This study is a substantially lengthened version of [13]. Theexperiments were re-run to permit the gathering of new typesof data and several additional analyses were performed. A newquantity,evolutionary velocity, is introduced in this study andis used to expose an unexpected and critical difference betweenagents evolved with and without noise using the finite staterepresentation.

The prisoner’s dilemma [18], [17] is a classic model ingame theory. Two agents each decide, without communication,whether to cooperate (C) or defect (D). The agents receiveindividual payoffs depending on the actions taken. The payoffsused in this study are shown in Figure 1. The payoff for mutualcooperationC is thecooperationpayoff. The payoff for mu-tual defectionD is thedefectionpayoff. The two asymmetricaction payoffsS andT , are thesuckerandtemptationpayoffs,

respectively. In order for a two-player simultaneous game tobe considered prisoner’s dilemma, it must obey the followingpair of inequalities:

S ≤ D ≤ C ≤ T (1)

and2C ≤ (S + T ). (2)

In the iterated prisoner’s dilemma(IPD) the agents playmany rounds of the prisoner’s dilemma. IPD is widely usedto model emergent cooperative behaviors in populations ofselfishly acting agents and is often used to model systems inbiology [29], sociology [23], psychology [28], and economics[22].

S

C D

P C 3 5D 0 1(1)

S

C D

P C C T

D S D

(2)

Fig. 1. (1)The payoff matrix for prisoner’s dilemma used in this study –scores are earned by strategyS based on its actions and those of its opponentP (2) A payoff matrix of the general two player game –C, T, S, andD arethe scores awarded.

The study in [10], continued in [5], investigates the effectof changing the representation used for a prisoner’s dilemmaagent. The representations covered by the two studies aretwo versions of feed forward neural nets (one biased at theneuron level toward cooperation), Boolean parse trees [19],with and without a one-step time-delay operation, a lineargenetic programming representation called an ISAc list [1],lookup tables, a type of Markov chain [27], and both a directand cellular [5] representation of finite state machines. Thechange of representation, with other factors held as near toconstant as possible, yielded a change from 0% to 95% inthe probability that final populations were cooperative. Thisstudy uses one of these representations, finite state machines,investigating the probability of cooperative behavior in the

Page 2: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

presence of noise. The duration of evolution in [10], [5] was250 generations with samples taken at 50, 100, 150, 200, and250 generations. Little effect on the cooperativeness of agentswas observed at different epochs. This study demonstrates adifferent result for noisy strategies evolved for a longer time.

In [11] it was found that evolving agents to play theiterated prisoner’s dilemma for a long time gave them asubstantial competitive advantage against agents evolvedforless time from different evolutionary lines. This phenomenon,called non-local adaptation, suggests that agents are gainingnot just skill playing the agents with whom they are co-evolving but general skill at playing the prisoner’s dilemma;they have a statistically significant advantage against agentsthey have never been evaluated against before. In [12] itwas demonstrated that non-local adaptation takes place in asteady fashion across much of evolution. This study extendsmeasurement of competitive advantage to include the effectsof noise as well. Non-local adaptation is also observed inother co-evolutionary contexts. In [21] it was observed incompetitive exclusion in a spatial model of plant growth. In[3] it was observed in populations of virtual robots evolving topaint a floor in two competing colors. The effect was observedin predator-prey models in [2] and in populations of agents ina game calleddivide the dollarin [4].

A prisoner’s dilemma fingerprintis a function in twovariables,x and y ranging from 0 to 1, whose value is thestrategy’s expected score when playing against an infinite testsuite of strategies called Joss-Ann strategies. These strategieswere chosen to be as representative as possible of the full spaceof strategies. They are defined using the strategies TFT andPsycho described in Table I. Ifx+y ≤ 1, the Joss-Ann strategyplays C with a probability ofx and D with a probability ofyand TFT otherwise. Ifx + y ≥ 1, the Joss-Ann strategy playsC with a probability of1 − y, D with a probability of1 − x

and Psycho otherwise.Note that in either case whenx + y = 1, the strategy is

just the random strategy which cooperates with probabilityx and defects with probabilityy. For more details aboutfingerprinting see [24], [7], [5], [1], [6], [15]. This studyusesan approximation to the fingerprint function consisting of 25numbers which are the values of the fingerprint function on a 5by 5 evenly-spaced grid from 1/6 to 5/6. Given that fingerprintsare functions they are a subset of a potentially infinitelydimensioned space, making the use of a 25-point samplinggrid potentially suspect. In [15] it is proved (Corollary 2)thatthe space of fingerprint functions is at most 6-dimensional,making a 25 point sample a generous one.

Fingerprints are developed in detail in three theses [30],[24], [15]. These results are summarized and extended in [6]. Aportion of the theory of fingerprints and an initial application tothe visualization of evolved agents appears in [7]. Additionalapplications as well as a marriage of fingerprints with a newtechnique calledmulti-clusteringappear in [8]. One applica-tion of fingerprints is to place a metric-space structure on thespace of prisoner’s dilemma strategies. In this study, the metricspace structure induced by fingerprints is used to compute a

new quantity: evolutionary velocity. The evolutionary velocityof a population is the distance that its mean fingerprint movesin one generation.

Fingerprinting was used in [16], with a finite state repre-sentation, to demonstrate that the strategies that arise havedifferent distributions for different population sizes and indifferent epochs. The latter result, that strategies rare or absentat the beginning of evolution become common after thousandsof generations of evolution, was surprising. In [10] fingerprintswere used to demonstrate that the rate of appearance ofseveral well-known strategies varied between a direct finitestate representation for prisoner’s dilemma playing agents, acellular representation for finite state agents, and a new typeof representation called afunction stack, a modified form ofCartesian Genetic Programming [25].

The remainder of the study is structured as follows. Thedetails of experiments performed are given in Section II as aredetails of the analysis techniques used. Results and discussionare presented in Section III. Conclusions and possible nextsteps for both the improvement of fingerprinting techniquesand applications are given in Section IV.

II. EXPERIMENTAL DESIGN

The agent representation used in this study is 8-state fi-nite state machines with actions associated with transitionsbetween states (Mealy machines). State transitions are drivenby the opponent’s last action. Access to state informationpermits the machine to condition its play on several of itsopponent’s previous moves. The machines are stored as linearchromosomes listing the states. The initial state and action fora given machine are stored with and undergo crossover withthe first state in this linear chromosome.

Two variation operators are employed, a binary variationoperator and a unary variation operator. The binary variationoperator used is two-point crossover on the list of states.Crossover treats states as atomic objects. The mutation opera-tor changes a single state transition 40% of the time, the initialstate used by the machine 5% of the time, the initial action5% of the time, or an action associated with a transition 50%of the time. The unary variation operator replaces the currentvalue of whatever it is changing with a valid value selecteduniformly at random.

The evolutionary algorithm used in this study operates ona population of 36 agents, a number chosen for compatibilitywith previous studies [12], [9], [10]. Agent fitness is assessedby a round-robin tournament in which each pair of playersengage in 150 rounds of the iterated prisoner’s dilemma. Threecollections of runs were performed with different versionsofthis round-robin tournament. The first, serving as a baseline,had perfect communication between the agents. The secondand third had an average of 1% and 5%, respectively, of actionsmisunderstood by the opponent. This is one of two formsof possible noise; the other (not used in this study) has theactual nature of the action reversed rather than the opponent’sperception of it.

Page 3: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

Reproduction is elitist with an elite of the 24 highest scoringstrategies, another choice that maintains consistency with paststudies. When constructing the elite, ties are broken uniformlyat random. Twelve pairs of parents are picked by fitness-proportional selection with replacement on the elite. Parentsare copied, and the copies are subjected to crossover andmutation.

In each simulation the evolutionary algorithm was runfor 6400 generations with 100 replicates (experiments withdistinct random number seeds). The elite portion of the pop-ulation in generations 50, 100, 200, 400, 800, 1600, 3200,and 6400 was saved for analysis. This yields 100 sets of 24machines at each of eight epochs for each of the three noiselevels. A number of descriptive statistics are saved for eachgeneration of each replicate. These include the mean fitness, a95% confidence interval on the fitness, the maximum fitness,and the evolutionary velocity of the population, defined inSection II-C.

Many of the parameters used in this study are either selectedin imitation of previous studies or are essentially arbitrarily.A careful parameter-variation study would require millionsof collections of evolutionary runs because of combinatorialexplosion of the parameter space. The current study representsa small slice of this diverse space of possible experiments.We now carefully describe the ways in which the evolvedmachines were analyzed.

A. Probability of Cooperativeness

For each epoch and noise level, the fitness (average pris-oner’s dilemma score) of the saved elite population in anoise-free tournament was computed. Following [10], [5] apopulation was judged to becooperativeif its average scorewas at least 2.8 out of a maximum possible of 3.0. The fractionof populations that were cooperative was modeled as theparameter of a binomial distribution; Figure 2 summarizes theresults. The choice to evaluate all populations without noisepermits an apples-to-apples comparison of the populationsevolved at different noise levels.

B. Competitiveness and Nonlocal-Adaptation

While both competitiveness and non-local adaptation wereanalyzed in [13] a different form of these analyses are per-formed in this study. The results are in substantial agreement,but the method of analysis has been revised and the presen-tation of the information has been substantially modified inamanner that enhances clarity.

Competitiveness is measured between sets of agents evolvedusing different fitness evaluation processes. It is intended toassess the impact on the agent’s competitive ability of thosedistinct fitness evaluation processes. In this study there arethree different fitness evaluation processes, all based on roundrobin tournaments, but using different levels of noise. Weevaluate competitiveness in the zero-noise environment – apractice that would logically give the agents evolved in thatenvironment an advantage. The zero noise environment waschosen because it yields deterministic results on any givenpair

of agents. As we shall see, agents evolved without noise didnot benefit from its lack when evaluated against agents evolvedin the presence of noise. Non-local adaptation is measuredbetween agents from distinct epochs but evolved using thesame fitness evaluation process. It measures the impact ofadditional evolution.

Save for the identity of the sets of agents being comparedboth competitiveness and non-local adaptation are measuredin the same fashion. All agents for a given noise level andepoch are loaded into agent-pools. All pairs of distinct agent-pools are then compared. The comparison is performed bysampling 10,000 pairs of agents, each pair consisting of oneagent from each of the agent-pools being compared. Each pairplays 150 rounds of iterated prisoner’s dilemma. Each time anagent from the first pool out scores an agent from the secondpool asuccessis recorded. Each time an agent from the secondpool out scores an agent from the first afailure is recorded. Thesuccesses and failures are modeled as Bernoulli experimentsusing a normal approximation to the binomial distribution.TheBernoulli parameter being estimated is the probability an agentfrom the first pool will beat an agent from the second. A 95%confidence interval is constructed for the Bernoulli parameterfor each pair of agent-pools. Examples of the presentation ofthese confidence intervals for competitiveness appear in Figure5 while those for non-local adaptation appear in Figure 6.

C. Evolutionary Velocity

A velocity is the change of a quantity over time. Thenatural measure of time in the evolutionary algorithms usedin this study is generations. The fingerprints of prisoner’sdilemma playing agents used in this study map the strategiesto points in 25-dimensional Euclidean space. A populationmay then be said to have a mean position in fingerprint spacein each generation. Theevolutionary velocityof an evolvingpopulation in a given generation is defined to be the change inthe position of the mean fingerprint of the population betweenthe last generation and the current one using the standardEuclidean distance.

The small, well-mixed populations used in this study ensurethat a population will move to and stay near a state of relativelylow diversity. When a population is relatively non-diverse, thefingerprints will be close together. A common event in anevolving population of game playing agents is a successionevent. This happens when a new agent type appears thatcan exploit a flaw in the current dominant agent type. Thepopulation diversity briefly increases as the new agent typetakes over. There should be some succession events visibleas simultaneous changes in the population average score andspikes in the evolutionary velocity.

Based on earlier work [13] we know that being exposedto noise yields a competitive advantage to prisoner’s dilemmaplaying agents, an observation confirmed via a different eval-uation of competitive ability in Section III-A of this study.Because of this we conjectured that evolutionary velocitywould be lowest in the noise-free environment and highestin the environment where agents were evolved with the most

Page 4: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

noise. In Section III-B we test and discuss this incorrectconjecture.

D. Two Forms of Competitiveness

Non-local adaptation and competitiveness both measure theability of two agent types to out-score one another in apairwise competition. Axelrod’s famous tournaments [18],[17]measured a completely different sort of competitive ability.This is the ability to have the highest score in a tournamentwith a diverse collection of other players. The strategy TFTcannot beat another player in a pairwise contest. The only wayto get ahead of another player is to defect against them whenthey cooperate. TFT never defects first and so always defectsagainst its opponent’s cooperation on a move directly afterthe opponent does the same. In spite of this, TFT won bothof Axelrod’s tournaments. This is because, even though it wasout-scored by all its opponents, it elicited cooperation inmanyof them. Its opponents, on the other hand, sometimes got intovery low-scoring sequences of defections with one another.Atthe end of the tournaments TFT had the highest total score.

We create a contest similar to a contributed agent contestin the following fashion. A contest consists of a round robintournament with 36 agents, the number used in fitness evalu-ation during evolution, using 150 rounds of iterated prisoner’sdilemma. In order to prevent genetically-based collaboration,no two agents in a given tournament were selected fromthe same replicate. Twelve agents were selected from eachof the three available noise levels and all agents in a giventournament were selected from the same epoch. The agentswere selected from the saved elite population uniformly atrandom and the replicates used in a tournament were alsoselected uniformly at random, without replacement. A set of10,000 tournaments were performed. For each epoch 95%confidence intervals for the mean population rank of the agentsfrom each noise level were computed.

E. Fingerprint Analysis: Voronoi Tiles

In order to analyze prisoner’s dilemma strategies one cantry to find some way to define and name each individualstrategy. Table I lists twelve strategies with names. However,this naming method quickly becomes intractable as the numberof strategies increases. Also, it gives no objective way tocompare the strategies. A prisoner’s dilemma fingerprint isa way of assigning a real-valued function to each strategy, oras with the simplified version used in this study, a point in25-space which consists of 25 values of that function.

In order to analyze the evolved strategies, twelve referencestrategies, given in Table I, were selected. These strategiesrepresent all unique fingerprints that appear in the collection oflookup tables for prisoner’s dilemma based on the opponent’slast two moves[15]. They are all equivalent to one-state ortwo-state finite state machines. A depiction of their relativepositions in fingerprint space is shown in Figure 3.

The evolved strategies in this study are analyzed in terms ofwhich of these strategies they are closest to. This technique iscalled aVoronoi tiling or aDirichlet tessellation. Explanations

of the technique can be found in many places; one good oneis [20]. If a strategy is closer to Strategy S than it is to anyother reference strategy, then it is said to be in the StrategyS tile. The assumption is that strategies with fingerprints thatare close together behave similarly, so we can group themtogether and analyze them in terms of the simple strategiesthat we understand well and in terms of where they fall infingerprint space.

The space of fingerprint functions might be as much as 6-dimensional. However, the error is small when strategies areprojected onto two dimensions using non-linear projection, atechnique defined in [8] and applied in [14] to RNA folding.The technique uses an evolutionary algorithm to project multi-dimensional points onto a 2-dimensional picture preservingthe distance relationships between the points as closely aspossible. Thus, it is reasonable to conceptualize the spacein two dimensions. The greatest possible fingerprint distancebetween any two strategies is that between ALLD and ALLC(proved in [15] Theorem 3); the next greatest distance isbetween TFT and Psycho. One can visualize fingerprint spaceas the interior of the diamond formed by these four strategiesas shown in Figure 3.

III. R ESULTS AND DISCUSSION

In an early study of the impact of noise on the iteratedprisoner’s dilemma [26] it was found that higher levels ofnoise yielded lower levels of cooperation. This study evolved16-state finite state machines using the Moore encoding for 50generations with the same noise levels used in this study. Thetop panel of Figure 2 demonstrates that allowing additionalevolutionary time changes this result. In the final epochsamples, and only in that epoch, the significant differencesbe-tween the probabilities that a population will be cooperative atdifferent noise levels disappears. The noise free and 1% noisepopulations never exhibit a significant difference, thoughthe1% noise populations are less cooperative in 7 of 8 epochs. The5% noise populations, however, show a steady and statisticallysignificant increase in their ability to cooperate. It appears thatthey are learning strategies which allow cooperation in thepresence of noise.

The assessment of cooperativeness was performed in anoise-free environment to permit apples-to-apples comparisonof the probability of cooperativeness. The bottom panel ofFigure 2 displays the average fitness of populations, averagedover all 100 replicates. The experiments with 1% noise appearto pay a small fitness penalty for noise events, while the5% noise experiments pay a larger one. The 5% noise levelexperiments have a substantial upward trend in fitness overthe course of evolution. This is evidence that the strategiesevolving under 5% noise are adapting nontrivially to thepresence of noise.

A. Competitiveness and Non-local Adaptation

Results for competitiveness are presented in Figure 5. Sam-pling pairs of players as a win/loss Bernoulli variable yieldsentirely comparable results to placing whole populations into

Page 5: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

TABLE I

STRATEGIES USED AS REFERENCE STRATEGIES FORVORONOI TILING.

Abbrev. Name DescriptionALLD Always defect This strategy always defects.2TFT Two-tits-for-tat This strategy defects twice in response to defection and otherwise cooperates.TFT Tit-for-tat. This strategy does whatever its opponent did last time.TF2T Tit-for-two-tats This strategy cooperates except after a sequence of two defections.UC Usually cooperate This strategy cooperates except after a C following a D.ALLC Always cooperate This strategy always cooperates.(2TFT) Inverse two-tits-for-tat This strategy does the opposite of what 2TFT would do.PSYC Psycho This strategy does the opposite of what its opponent did lasttime.TFT-PSY Tit-for-tat-psycho This strategy plays like TFT until its opponent defects;

then it plays like PSYCHO until its opponent cooperates.PSY-TFT Psycho-tit-for-tat This strategy plays like PSYCHO until its opponent defects;

then it plays like TFT until its opponent cooperates.(TF2T) Inverse tit-for-two-tats This strategy does the opposite of what TF2T would do.UD Usually defect This strategy defects except after a D following a C.

competition as in [13]. The main result is that evolution in thepresence of noise yields a competitive advantage. This is inspite of the fact that the arena for evaluating competitivenessis the noise-free environment. The use of two noise level inthe experimentation also demonstrates that 5% noise yieldsamore commanding advantage.

Figure 5 displays 64 comparisons representing each possiblepair of saved epochs for each of three noise level. Eachcomparison comprises 10,000 pairwise tests of agents and ismade in the form of a confidence interval on the probabilitythat the agent type indexing the row will beat the agent typeindexing the column. Agents evolved with a lower noise levelindex the columns. There are three categories of outcome:row victory, column victory, or a failure to find a significantcompetitive advantage. These have been gray-scale coded withlight gray representing column victory, dark gray representingrow victory, and medium gray representing a result in whichthe confidence interval for probability of row victory includes0.5.

The grey-scale coding permits the viewer to immediately seethat all epochs of agents evolved at 5% noise beat all epochs ofagents evolved at 0% noise, in spite of the fact that additionalevolution grants competitive advantage when the noise level isnot varied. The time factor becomes important in the currentexperiments in the 0:1 and 1:5 comparisons where agents froma substantially later epoch achieve victory over agents evolvedat a higher noise level. One is tempted to conjecture that allthree populations are becoming more competitive, and that theagents evolving in the presence of noise are simply doing sofaster. The evidence presented in Section III-B suggests thatthis is, at best, an oversimplification of the true state of affairs.

The results for non-local adaptation, shown in Figure 6 usethe same grey-scale encoding scheme as the competitivenessresults. These matrices have a form of skew-symmetry, withentries opposite across the main diagonal being mirror imagesof one another. This yields 28 distinct comparisons, in the formof pairs of symmetric entries off of the diagonal. However,they are less easy to interpret. The original observation ofnon-local adaptation [11] compared two epochs, 100 and 10,000

generations, of a noise free prisoner’s dilemma structuredas aspatial evolutionary algorithm. The current experiments havea much simpler design but report and compare eight epochsrather than two. The hypothesis tested in the initial non-localadaptation study was that additional evolution grants a com-petitive advantage even against agent types not encounteredbefore. The initial study supported this hypothesis; the resultsin [13] and here are consistent, but the new results reveal amore complex situation.

A comparison of two epochs isconsistentwith the non-localadaptation hypothesis (NLA hypothesis) if either the more-evolved population has a significant advantage or if neitherpopulation has a significant advantage. In the zero-noise exper-iments, summarized in the top panel of Figure 6, 26 of the 28comparisons are consistent with the NLA hypothesis. Both theinconsistent comparisons are on the sub-diagonal representingadjacent epochs. In the experiments with 1% noise, 25 of28 comparisons are consistent with the NLA hypothesis. Thethree comparisons that are inconsistent are the comparisons ofthe three most evolved populations. These three populationsalso have the greatest temporal separation. The situation growssubstantially worse for the NLA hypothesis in the experimentswith 5% noise, where only 18 of the 28 comparisons areconsistent with the hypothesis. The 5% noise agents sampledat epoch 6400 are beaten by epochs 200-3200. This is clearevidence of a substantial decline in competitive ability for theagents evolved with 5% noise at deep time against youngeragents evolved at 5% noise.

An explanation for this retrograde non-local adaptation liesin Figure 2. The strategies evolved at 5% noise begin a sharpincrease in cooperativeness in the 5th-8th epoch. This corre-sponds exactly with the sudden decline in competitive abilityin the third column, 5th-8th rows of the bottom panel of Figure6. There is an apparent tradeoff between competitive abilityand the ability to cooperate in the presence of noise. As theygain the ability to cooperate in a noise environment, the agentsevolved in a high-noise environment become more tolerant of(possibly-noise induced) defections. The agents from earlierepochs, presumably less tolerant of defection, exploit this

Page 6: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

50 100 200 400 800 1600 3200 6400

0% Noise

1% Noise

5% Noise

95% confidence interval on probability of cooperative play.

Generations

Pro

babi

lity

of C

oope

ratio

n

0.00

0.25

0.50

0.75

1.00

1.8

2

2.2

2.4

2.6

2.8

3

0 1000 2000 3000 4000 5000 6000

Fitn

ess

Generations

Mean fitness over all replicates

0% noise1% noise5% noise

Fig. 2. Shown are 95% confidence intervals on the probabilitythat apopulation in a given epoch and noise level will play cooperatively in theabsence of noise (top) and the mean value over all 100 replicates of the meanpopulation fitness over the course of evolution.

tolerance. This highlights the dilemma of cooperation in anoisy environment: how do we tell a true defection from thenoise-induced appearance of defection?

Note that the additional tolerance of noise events in the morehighly evolved agents from the 5% noise environment doesnot mean they are becoming pushovers. The between-noise-level comparisons show that the agents are still very strongcompetitors; the loss of competitive ability in later epochs actsin a purely intramural manner.

B. Evolutionary Velocity

Fingerprint based evolutionary velocity is introduced inthis study. When small, well-mixed populations of Prisoner’sdilemma playing agents are trained with an evolutionaryalgorithm the populations rapidly become non-diverse. Thismeans that, most of the time, a single agent type with minorvariation dominates the population. An important event is thesuccessionin which a new agent type, able to out-compete thecurrent dominant type, appears and takes over the population.Succession events are sometimes accompanied by suddenchanges in the population mean fitness. In other cases the

����

������

����

��

���

�� ���

��� ��

������

����

����

Fig. 3. Stylized depiction of tiles in fingerprint space.

fitness difference is relatively small and the succession isdifficult to detect. One potential application of fingerprints isto detect such events.

This type of detection is prototyped in this study bydisplaying fitness and evolutionary velocity on the same setof axes. One representative population from each noise levelare displayed in this fashion in Figure 7. For all three noiselevels we note that abrupt changes in average score are oftenaccompanied by pronounced spikes in evolutionary velocity.The size of the changes in fitness is not well correlated withthe size of the evolutionary velocity spikes. It is intuitive thatthese sizes need not be correlated in magnitude. A relativelymodest shift in strategy, and hence fingerprint, can result in alarge change in score and vice versa.

During the analysis of these results a possible improvementin the experimental design was noted. The evolutionary ve-locity was computed from the mean fingerprint of the entirepopulation. In retrospect, computing the mean position of theelite would likely have generated a cleaner and more usefulsignal. All members of the elite have survived at least onefitness evaluation. It is possible for a mutant with very lowprobability of survival to also exhibit a large shift in itsfingerprint. This large fingerprint deviation would be reflectedas spurious evolutionary velocity. The fingerprints of mutantsthat will not survive into the next generation generates apreventable source of noise in evolutionary velocity.

Our hypothesis was that evolutionary velocity would in-crease with noise level. That simple hypothesis explains thepatterns in both Figure 4 and Figure 6. Using both these

Page 7: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

0

0.5

1

1.5

2

2.5

3

0 1000 2000 3000 4000 5000 6000

Generations

Fitness and evolutionary velocity

fitnessvelocity

0

0.5

1

1.5

2

2.5

3

0 1000 2000 3000 4000 5000 6000

Generations

Fitness and evolutionary velocity

fitnessvelocity

0

0.5

1

1.5

2

2.5

3

0 1000 2000 3000 4000 5000 6000

Generations

Fitness and evolutionary velocity

fitnessvelocity

Fig. 7. This figures displays overlaid plots of fitness and evolutionary velocity on the same axes. Each of the three panelsare results for one replicate. Thetop panel is from a population evolved with 0% noise, the middle a 1% noise population, and the bottom 5% noise. These plotsare illustrative rather thanrepresentative.

Page 8: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

� � � � � �� �� ��

��

��

���

��

���

���

��

�� ���

���

!����"

���

�������

!����"

����

�������

���

��

����

����

#�$�%��� $�

� � � � � �� �� ��

��

��

���

��

���

���

��

��� ���

���

!����"

���

�������

!����"

����

�������

���

��

����

����

#�$�%��� $�#�$�%��� $�

� � � � � �� �� ��

��

��

���

��

���

���

��

��� ���

���

!����"

���

�������

!����"

����

�������

���

��

����

����

#�$�%��� $�

Fig. 4. The relative size of the contents of the different Voronoi tiles infingerprint space across the sampling epochs.

0.4600.490

50

50

0.4430.474

0.4010.432

0.3450.375

0.3700.400

0.2840.312

0.3020.331

0.2930.322

0.4470.477

0.4000.431

100

100 0.355

0.3840.2910.320

0.2860.315

0.2420.269

0.2220.248

0.2170.243

0.5200.551

0.5080.539

0.4500.481

200

200 0.338

0.3680.3660.396

0.3010.330

0.2620.289

0.2520.279

0.5580.589

0.5360.566

0.4870.517

0.3690.400

400

400 0.405

0.4360.3120.341

0.2890.318

0.2710.299

0.5430.574

0.5150.546

0.4740.505

0.3600.389

0.3960.427

800

800 0.332

0.3620.2810.309

0.2890.318

0.5350.566

0.5060.537

0.4710.501

0.3480.378

0.3580.388

0.2880.316

1600

1600 0.282

0.3100.2740.302

0.6110.641

0.5680.598

0.5400.571

0.4090.440

0.4450.475

0.3580.387

0.3510.381

3200

3200 0.332

0.362

0.5860.616

0.5770.607

0.5090.540

0.4010.432

0.4140.445

0.3410.371

0.3350.365

0.3260.356

6400

6400

0% N

oise

1% Noise

0.3240.353

50

50

0.2560.283

0.2000.226

0.1490.172

0.1630.187

0.1710.195

0.1610.184

0.2010.226

0.2810.309

0.2200.246

100

100 0.173

0.1970.1070.126

0.1250.146

0.1300.152

0.1250.146

0.1470.170

0.3470.377

0.2630.291

0.2120.237

200

200 0.134

0.1560.1460.169

0.1610.184

0.1470.170

0.1820.206

0.3750.406

0.2790.307

0.2270.254

0.1500.173

400

400 0.154

0.1780.1750.199

0.1630.187

0.1990.224

0.3800.410

0.2860.314

0.2310.257

0.1290.151

0.1760.201

800

800 0.173

0.1970.1540.178

0.1920.217

0.3580.388

0.2640.292

0.2190.245

0.1310.153

0.1410.163

0.1560.179

1600

1600 0.138

0.1600.1750.199

0.4310.462

0.3400.370

0.2810.310

0.1790.204

0.2140.240

0.2140.240

0.2120.238

3200

3200 0.232

0.259

0.4060.436

0.3070.336

0.2480.275

0.1550.178

0.1860.210

0.1990.224

0.1840.209

0.2090.235

6400

6400

0% N

oise

5% Noise

0.3380.367

50

50

0.2650.293

0.2040.229

0.1290.151

0.1570.180

0.1600.183

0.1460.169

0.1760.201

0.3570.386

0.2730.301

100

100 0.219

0.2450.1230.145

0.1400.163

0.1570.180

0.1340.156

0.1470.169

0.4060.437

0.3060.335

0.2360.263

200

200 0.154

0.1780.1720.196

0.1810.206

0.1590.182

0.1910.216

0.5010.532

0.4050.436

0.3190.348

0.2170.243

400

400 0.253

0.2800.2620.289

0.2300.256

0.2750.303

0.4830.514

0.4020.433

0.3070.336

0.2050.231

0.2540.281

800

800 0.264

0.2910.2270.253

0.2830.311

0.5480.579

0.4430.473

0.3660.396

0.2460.273

0.3060.334

0.3240.353

1600

1600 0.279

0.3070.3630.394

0.5530.584

0.4970.528

0.3710.402

0.2650.293

0.2840.312

0.3230.352

0.2700.298

3200

3200 0.346

0.376

0.5830.613

0.5010.532

0.3730.403

0.2450.272

0.2680.296

0.3100.339

0.2610.289

0.3110.340

6400

6400

1% N

oise

5% Noise

Fig. 5. Competitiveness results in the form of 95% confidenceintervals forthe probability that the agents indexing the row will beat the agents indexingthe column. The top panel gives results for no noise versus 1%noise, themiddle for no noise versus 5% noise, and the bottom for 1% noise versus5%. Rows are indexed by the lower-noise agent pools with numerical labelsfor both rows and columns giving the epoch of the agent pool. Dark graycells indicate significant superiority of the agent-pool indexing the row, lightgray the same for the pool indexing the column. Middle gray indicates nosignificant difference.

Page 9: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

50

5050

0.5000.500

0.5180.549

0.4360.467

0.4230.454

0.4000.431

0.4350.466

0.3640.395

0.3800.410

0.4510.482 100

10010

0 0.5000.500

0.4010.432

0.3690.400

0.3600.389

0.3770.407

0.3280.357

0.3300.359

0.5330.564

0.5680.599 200

20020

0 0.5000.500

0.4520.483

0.4240.454

0.4630.493

0.4080.438

0.4150.446

0.5460.577

0.6000.631

0.5170.548 400

40040

0 0.5000.500

0.4590.490

0.4940.525

0.4160.446

0.4440.475

0.5690.600

0.6110.640

0.5460.576

0.5100.541 800

80080

0 0.5000.500

0.5170.547

0.4400.471

0.4530.484

0.5340.565

0.5930.623

0.5070.537

0.4750.506

0.4530.483 1600

160016

00 0.5000.500

0.4110.442

0.4150.446

0.6050.636

0.6430.672

0.5620.592

0.5540.584

0.5290.560

0.5580.589 3200

320032

00 0.5000.500

0.4990.530

0.5900.620

0.6410.670

0.5540.585

0.5250.556

0.5160.547

0.5540.585

0.4700.501 6400

640064

00 0.5000.500

50

50

50

0.5000.500

0.4750.506

0.4300.461

0.3110.340

0.3460.375

0.2830.311

0.2820.310

0.2780.307

0.4940.525 100

100

100 0.500

0.5000.4390.470

0.3360.365

0.3360.365

0.2650.293

0.2410.267

0.2290.255

0.5390.570

0.5300.561 200

200

200 0.500

0.5000.3670.397

0.3740.404

0.3050.333

0.2890.317

0.2870.315

0.6600.689

0.6350.664

0.6030.633 400

400

400 0.500

0.5000.4980.529

0.4150.446

0.3870.417

0.3970.428

0.6250.654

0.6350.664

0.5960.626

0.4710.502 800

800

800 0.500

0.5000.3940.424

0.4120.443

0.4170.447

0.6890.717

0.7070.735

0.6670.695

0.5540.585

0.5760.606 1600

1600

1600 0.500

0.5000.5040.535

0.5170.548

0.6900.718

0.7330.759

0.6830.711

0.5830.613

0.5570.588

0.4650.496 3200

3200

3200 0.500

0.5000.5040.534

0.6930.722

0.7450.771

0.6850.713

0.5720.603

0.5530.583

0.4520.483

0.4660.496 6400

6400

6400 0.500

0.500

50

50

50

0.5000.500

0.3960.427

0.3280.357

0.2410.268

0.2660.294

0.2590.287

0.2670.294

0.2940.323

0.5730.604 100

100

100 0.500

0.5000.4130.444

0.3070.336

0.3320.362

0.3540.384

0.3380.367

0.3770.407

0.6430.672

0.5560.587 200

200

200 0.500

0.5000.3710.401

0.4090.440

0.4460.477

0.4370.468

0.5070.538

0.7320.759

0.6640.693

0.5990.629 400

400

400 0.500

0.5000.5130.543

0.5810.611

0.5890.619

0.6430.672

0.7060.734

0.6380.668

0.5600.591

0.4570.487 800

800

800 0.500

0.5000.5320.563

0.5320.563

0.5580.589

0.7130.741

0.6160.646

0.5230.554

0.3890.419

0.4370.468 1600

1600

1600 0.500

0.5000.4500.481

0.5190.550

0.7060.733

0.6330.662

0.5320.563

0.3810.411

0.4370.468

0.5190.550 3200

3200

3200 0.500

0.5000.5690.600

0.6770.706

0.5930.623

0.4620.493

0.3280.357

0.4110.442

0.4500.481

0.4000.431 6400

6400

6400 0.500

0.500

Fig. 6. Non-local adaptation results. The top panel gives results for nonoise, the middle for 1% noise, and the bottom for 5% noise. Given is a 95%confidence interval on the probability the the agent-pool for the epoch indexingthe row will beat the the agent pool for the epoch indexing thecolumn. Darkgray cells indicate significant superiority of the agent-pool indexing the row,light gray the same for the pool indexing the column. Middle gray indicatesno significant difference.

TABLE II

SHOW ARE 95% CONFIDENCE INTERVALS FOR THE MEAN EVOLUTIONARY

VELOCITY OVER ALL GENERATIONS AND REPLICATES FOR THREE

DIFFERENT NOISE LEVELS. NOTE ALL PAIRS OF CONFIDENCE INTERVALS

ARE DISJOINT AND THAT EVOLUTIONARY VELOCITY IS INVERSELY

CORRELATED WITH NOISE LEVEL.

Noise Mean evolutionary velocity0% (0.164,0.168)1% (0.134,0.138)5% (0.125,0.129)

D/D

C/C

D/D

C

1

2

Fig. 8. The finite state diagram for a two-state implementation of the strategyvengeful.

indicators it is not unreasonable to suppose that all threecollections of populations are heading to the same place withthe higher-noise populations getting there faster. Table II givesconfidence intervals for evolutionary velocity over all timesand replicates. The data in this table demonstrate that thehypothesis is the exact reverse of the true state of affairs.Noiseis inversely correlated with evolutionary velocity with all thedifferences statistically significant.

Given the substantial competitive advantage that additionalnoise grants, this suggests that agents evolved in the presenceof noise are moving through strategy space more efficiently.Evolutionary velocity is a measure of change of position instrategy space. If there is a reason the higher noise agents willtend to take a more direct path to good positions in the adaptivelandscape, then all the results reported in this study remainconsistent. Such an explanation follows with some necessarypreliminaries.

One feature of fingerprints is that they measure only theasymptotic behavior of agents. Examine the finite state ma-chine in Figure 8. It implements the strategyvengefulwhichcooperates until its opponent’s first defection and defectsthere-after. In a noise-free environment this is a fairly competitive

Page 10: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

strategy. it cooperates perfectly with itself and defends itselfaggressively against anything that does not embrace a no-first-use policy toward defection. The noise used in the computationof fingerprints ensures that the asymptotic behavior and hencethe fingerprint of vengeful is identical to that of always defect.The ability of vengeful to cooperate with copies of itself arestrongly dependent on perfect information: a lack of noise.Itscooperation is embedded in transient states.

The implementation of vengeful given in Figure 8 is asimple example of the dependence of success on transientstates – states that the machine will leave and never return toif the opponent makes a particular move. Populations evolvedin the absence of noise are likely to exploit such transientstates to recognize copies or near-copies of themselves. Thisrecognition strategy is calledhandshakingand grants agentsthe ability to preferentially cooperate with strategies that sharetheir genes. Noise makes handshaking less practical, becausethe handshake must be able to tolerate noise. The practicalityof handshaking also drops as the level of noise increases bothbecause the imperative to correct for noise grows and becausethe potential for more complex patterns of noise grows. Astrategy that can deal with one noise event and still handshakecannot deal as easily with two. Given that the agents used inthis study have eight states, there may not be room in thegenome to handle multiple-noise errors in a handshake.

Consider now a population of agents evolved without noise.There is evolutionary pressure to make the portion of the state-diagram used to compete with copies of yourself small. Thisreduces the probability it will be modified by mutation and somakes it more heritable. This secondary evolutionary pressureto keep the active genome small is a form ofparsimony.It is also advantageous to have this portion of the statediagram immediately accessible from the starting state of thetransition diagram. This evolutionary pressure thus favors theuse of “early” states in the transition to manage the interactionof an agent with its close relatives. Such states are oftentransient and thus not measured by fingerprinting. “Later”states might be rarely used in fitness evaluation. Mutation,thus, would make them more or less random. We deduce thatagents evolved without noise will have evolutionary velocityconsiderably closer to the fundamental random-walk velocitythat would result from applying mutations to a populationwith random selection than would a population evolved withnoise. We offer this as a tentative explanation for the higherevolutionary velocity of the zero-noise populations. We turnnow to the question of more efficient travel though strategyspace by agents evolved at higher noise levels.

We have already discussed why an agent evolved in thepresence of noise will find it harder to depend on transientstates. In fact, any state accessible from the starting statehas a positive probability of being reached when noise ispresent. This means that the number of states used during afitness evaluation will be higher when noise is present and,to a degree, the evenness of access to states will increasewith noise. This means that agents evolved in the presence ofnoise have a higher fraction of their genetic material subjected

50 100 200 400 800 1600 3200 6400

- unused - permanant - transient

State type in each epoch, 0% noise

Fra

ctio

n of

sta

tes

50 100 200 400 800 1600 3200 6400

- unused - permanant - transient

State type in each epoch, 1% noise

Fra

ctio

n of

sta

tes

50 100 200 400 800 1600 3200 6400

- unused - permanant - transient

State type in each epoch, 5% noise

Fra

ctio

n of

sta

tes

Fig. 9. Fraction of states, across all members of eliter populations withinan epoch, that are transient, permanent, or unused. The top panel shows theresult for no noise, the middle for 1% noise, and the bottom for 5% noise.

to selection pressure. It is intuitive that this increase inthegenerality of selection pressure across the genome will growwith noise, at least for “low” noise levels.

At this point a thought experiment may be valuable. Apopulation of agents evolved in the absence of noise inheritmany features in their transition diagrams that are not sub-jected to selection pressure. A mutation will, with positiveprobability, place these untested features into the portion ofthe transition diagram that is relevant to selection. Likewise,mutation can detach parts of the transition diagram from thoseportions undergoing selection and, once they are detached,re-

Page 11: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

randomize them. This means the rate at which a populationdevelops, and later expresses, non-adaptive features is muchhigher when the fraction of the genome undergoing selectionis lower. We have already demonstrated that the fraction ofthe genome undergoing selection is higher in the presenceof noise. This means that evolutionary velocity generated bythe exploitation of and elimination of non-adaptive featuresresulting from the long-term action of mutation will be higheras noise levels drop and will be highest at zero noise.

The agents evolved with noise are, in essence, making farbetter use of their fitness evaluations. Since a greater fractionof their genomes are under test, they are more likely to retainadaptive behaviors once those behaviors are acquired. Theyare less likely to gain non-adaptive features that, once placedunder selection pressure by a mutation, must be eliminatedfrom the gene-pool. We conclude that agents evolved in thepresence of noise move toward good locations in the adaptivelandscape more efficiently than agents evolved without noise.To be clear: no-noise agents have a higher fingerprint-basedevolutionary velocity that generates a lower rate of acquisitionof adaptive behavior. This results from the lower proportion oftheir genome that is under selection in any given generation.

To test this reasoning, we analyzed the state use of theagents in the experiments. The results of this analysis areshown in Figure 9. This figure shows that the noiselesspopulations use significantly more transient states. It alsoshows that, in the early epochs (the first five for 1% noise, thefirst three for 5% noise), the agents evolved with noise usemore of their states than those evolved without noise. In laterepochs the number of states used by agents evolved with noisedecreases. We conjecture that this is driven by the increaseinevolutionary efficiency.

All the evolving agents are subject to the same parsimonypressure to have smaller genomes. If the agents evolved withnoise are, in fact, evolving more efficiently, wasting less timeon weeding out mutation-induced non-adaptive features, thenthe parsimony pressure acts on them more strongly. The resultsof parsimony pressure may be set aside during a successionevent. The zero noise agents, which we have demonstratedrun around the adaptive landscape at a higher rate, are likelyto have a higher rate of succession events and so set asideparsimony gains more often. This tentative explanation forthe smaller number of states used by agents evolved in thepresence of noise suggests that experiments using agents withmore states might provide valuable additional data.

C. Contest Winners and Evolution Dominators

The results presented in Sections III-A and III-B suggestthat it might be worth testing agents sampled from the differentnoise levels and different populations in a contest setting.The confidence intervals for the mean contest rank of eachagent type are displayed, for all epochs, in Figure 10. In thesetournaments rank 1 is the best rank, and so the better an agentpopulation is the lower it appears in the figure. The tournamentranks are in the range 1-36 while the confidence intervals havea radius of 0.05-0.065 and so are difficult to see.

In the first three epochs, the agents evolved without noisehave a significantly better tournament ranking that thoseevolved with 1% noise which in turn are superior to thosewith 5% noise. Between the third and fourth epoch agentsevolved with 1% noise obtain significantly superior meancontest rank to those evolved with no noise. Between thefourth and fifth epochs the agents evolved with 5% noise alsopass the agents evolved without noise. The 1% noise agentsretain their superiority to the 5% noise agents throughout allepochs. In [16] it was found that strategies not observed earlyin evolution became common at deep time (6400 generations).The results on mean contest ranking echo these results –there is a significant difference in the character of prisoner’sdilemma agents between early and later evolution.

The ability to win a contest is different from the abilityto survive the process of evolution. Winning a tournamentinvolves getting high scores against a diversity of opponents.Tournament winners do not consistently beat their opponents;they just get high scores while playing them. Evolutionaryalgorithms select for strategies which get high scores whenplaying copies of themselves and which beat mutants. We callagents with these two different sorts of competitive abilitycontest winnersand evolution dominators. A contest winnerwill receive a high tournament ranking when playing round-robin against a slate of diverse opponents. An evolutiondominator will tend to cooperate with agents very similar toitself (its relatives) and beat agents different from itself. Thesedefinitions reflect the different objective functions used forscoring contests and for driving the evolution of a small well-mixed population.

It is intuitive that an evolution dominator will do poorly ina contest unless the contest is packed with strategies similar toitself with which it will cooperate. The evolutionary algorithmused in this study selects strongly for evolution dominators. Itdoes so less efficiently when noise is present, as the retrogradenon-local adaptation shown in Figure 6 demonstrates. Thepopulations evolved with noise are, however, better contestwinners. The 1% noise populations are superior to the 5%noise populations. This, in turn, suggests that there is adifference in the contest-winner ability of agents evolvedatdifferent noise levels and hence potentially an optimum noiselevel for such evolution. This level is probably below 5%.These results also suggests that a well-mixed, noiseless, small-population evolutionary algorithm is not an optimal tool fordesigning entries for a prisoner’s dilemma contest.

D. Fingerprint Analysis

The outcome of the Voronoi tile fingerprint analysis isgiven in Figure 4. These figures show the distribution of thestrategies in the tiles of the combined populations evolvedat each noise level. The difference between the noiselesspopulations and the two sets of populations evolved with noiseis substantial. The populations evolved with no noise changecomparatively little in the proportions of strategies in each tile,while both sets of populations evolved with noise show a trendwith time towards strategies in the ALLD and (TF2T) tiles.

Page 12: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8

Con

test

Ran

king

Epoch

95% confidence intervals for mean contest rank (rank 1 is best)

0% noise1% noise5% noise

Fig. 10. Shown above are 95% confidence intervals on the mean contest rank for agents evolved with each of the three noise levels. These confidenceintervals are given separately for each epoch. The confidence intervals are narrow enough to be masked by the impulses marking the data points for eachepoch. The horizontal scale is logarithmic: the epochs are exponentially spaced.

All three sets of populations contain no significant numberof strategies in the ALLC, (2TFT), and PSY tiles (less than1.0% in the noiseless populations and less than 0.2% in thepopulations evolved with noise). The strategies in these tilesare likely just bad strategies. Of the other nine tiles, thenoiseless populations favor the TFT tile with 45% of strategiesfalling in it. There does not seem to be much change in thedistribution over time, except that strategies begin to appearin significant numbers in the (TF2T) tile starting after 400generations (epoch 4). In epoch 1 they account for less than1% of the strategies; in epoch 4 they account for 2%, and byepoch 8 5% of the strategies are in that tile.

In the populations evolved with 1% noise, there is a signif-icant trend into the ALLD and (TF2T) tiles. The ALLD tilecontains 1% of the strategies in epoch 1 and 20% in epoch8. Likewise, the (TF2T) tile contains 1% of the strategiesin epoch 1 and 60% in epoch 8. The trend is even morepronounced in the populations evolved with 5% noise. 1%of the strategies in the first epoch are in the (TF2T) tile and60% in the last epoch. The move to the ALLD tile happenssooner in the 5% noise populations with 12% of the strategiesthere in the first epoch and 32% in the last epoch.

The Voronoi tiles do not give information about specificstrategies found; they only give information about what otherstrategies have nearby fingerprints. We also analyzed theoccurrences of two specific strategies: ALLD and TFT. In the

noiseless populations ALLD never amounts to more than 2%of the total strategies; TFT ranges from 10% to 22% with apeak in epoch 2. In the 1% noise populations neither strategyis common, they amount to less than 1% of the total, exceptin epoch 4 in which 3% of the strategies are ALLD. In the5% noise populations, TFT is also rare (less than 1%), butALLD exists in significant numbers, ranging from 8-32% ofthe strategies, peaking in epoch 4.

Another strategy worth analyzing in particular is the Fortressstrategy. It was first discovered and analyzed in [16] whenit appeared in populations that had evolved for more than4096 generations. The three state version of it is pictured inFigure 11. Fortressn hasn states in its minimal form. It willcooperate indefinitely with another strategy which cooperateswith it, but only after a series ofn− 1 defections, and, if theopponent ever defects, it must play the series of defectionsagain before Fortress will cooperate with it. Table III showsthe fraction of strategies with fingerprints close to Fortress 3for all the populations saved. “Close to” is defined by threedifferent distances:R = 0.1 means the strategies are virtuallyidentical to Fortress 3;R = 1.0 means they are as similar asstrategies in the Voronoi tiles are to their reference strategies,and R = 0.32 is in between. In the noiseless populations,these strategies are rare, but become more common in the laterepochs, peaking at 10% in epoch 8 withR = 1.0. The story isradically different in the populations evolved with noise.They

Page 13: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

1

D

C/D

2 3

D/D

D/DC/D

C/CD/C

Fig. 11. The Fortress 3 strategy.

also become more common with time, but, instead of beingrare, they come to dominate the population with 83% of thestrategies withinR = 1.0 of Fortress 3 in epoch 8 of the 1%noise populations and 86% in the 5% noise populations.

IV. CONCLUSIONS ANDNEXT STEPS

This study introduces the idea of fingerprint-based evo-lutionary velocity and integrates it with previous resultsoncompetitiveness, non-local adaptation, and probability of co-operative behavior to reach the following conclusions. First,employing noise in a finite state representation increases thefraction of states that is under selection. Second, having ahigher fraction of the genome under selection enhances theefficiency of evolutionary training.

This study also highlights two distinct forms of compet-itive ability, the ability to dominate an evolving populationand the ability to win a contest against a diverse selectionof opponents. These two abilities, while not opposites, aredemonstrated in this study to have some degree of trade-off.In particular, we conclude that a well-mixed, small-populationevolutionary algorithm will favor the production of evolutiondominators over contest winners. We also note that evolvingwith noise enhances the algorithm’s ability to produce contestwinners and that adding 1% noise does this to a greater degreethan adding 5% noise. This suggests that the noise level canbe optimized for the production of contest winners, a possiblenext step for this research.

Results on the number of transient and unused states inthe three different agent types suggests that the more efficientevolution enjoyed by agents evolved with noise may havepermitted evolution to exert a greater parsimony pressure onthe size of the active genome in those agents. The genome size,eight states or 68 bits, used in this study leaves only modestroom for parsimony pressure to act. A follow-up study withmore states could be used better understand the interactionofnoise and evolution to affect state usage.

TABLE III

FRACTION OF SAVED POPULATION WITHIN VARIOUS RADII OF THE

FINGERPRINT OF THE STRATEGYFORTRESS-3 FOR DIFFERENT EPOCHS

AND NOISE LEVELS.

Epoch R=0.1 R=0.32 R=1.00% Noise

50 0.0000 0.0004 0.0471100 0.0000 0.0000 0.0271200 0.0000 0.0025 0.0354400 0.0000 0.0108 0.0387800 0.0000 0.0000 0.00461600 0.0308 0.0308 0.04463200 0.0196 0.0283 0.08676400 0.0413 0.0437 0.1008

1% Noise50 0.0000 0.0004 0.0171100 0.0175 0.0238 0.0425200 0.0646 0.0671 0.1083400 0.1179 0.1471 0.2525800 0.1988 0.2046 0.35881600 0.3125 0.3350 0.54833200 0.4646 0.4846 0.73966400 0.5729 0.5742 0.8329

5% Noise50 0.0000 0.0000 0.0342100 0.0088 0.0092 0.0721200 0.0683 0.0742 0.1371400 0.1988 0.2112 0.3162800 0.2658 0.2804 0.36461600 0.3917 0.4238 0.55293200 0.5575 0.5642 0.71836400 0.6775 0.6775 0.8625

This study shows that the non-local adaptation hypothesisfrom [11] is too simple. While additional evolution oftenconveys a competitive advantage, it doesn’t do so invariably.This study opens a number of doors for understanding whythis is so. The delineation of distinct types of competitiveness(evolution dominator and contest winner) also permits a morenuanced phrasing of the non-local adaptation hypothesis.

In [15] it was found that relaxing both the small-populationand well-mixed hypotheses in evolution without noise yieldsa diverse population that is cooperative and remains stableagainst invasion. Re-examining this evolutionary design withevolutionary velocity could be used to document the stability.The type of evolutionary algorithm used there might be muchbetter for evolving contest winners than the sort used here.

The hypothesis that the fraction of the genome under selec-

Page 14: Fingerprint Analysis of the Noisy Prisoner’s Dilemma Using ...eldar.mathstat.uoguelph.ca/dashlock/eprints/FingerprintsIII.pdfFingerprint Analysis of the Noisy Prisoner’s Dilemma

tion is an important variable deserves additional study. Thisstudy uses eight-state Mealy automata as its representation.This representation encodes a space of268 = 2.95E1020

distinct automata (many of which encode the same strategy),but each of these represents only 68 bits of information. Itseems likely that the character of evolution in the presenceofnoise may change significantly if the genome size of agents isincreased or permitted to evolve. In addition, the finite staterepresentation is adept at failing to use portions of its genome.The representation used in this study, directly encoded finitestate machines, was one of twelve used in [10], [5]. Thegenome-fraction question could be addressed in the contextof any of these representations and many others.

The questions treated in this study of genome size can beasked in many contexts well beyond the prisoner’s dilemma.Fingerprinting may be employed for any simultaneous two-player game with a finite number of moves [24]. Many ofthe results supporting the results in this study are restrictedto games with two moves (the mathematics becomes morecomplex when three or more moves are available).

V. ACKNOWLEDGMENTS

The first author thanks the University of Guelph and theNational Science and Engineering Research Council of Canada(NSERC) for supporting this work. The second author thanksKorean National Institute For Mathematical Sciences for itssupport of this research. The third author thanks York Uni-versity and the Ontario Council of Graduate Studies for theirsupport of this research.

REFERENCES

[1] D. Ashlock. Evolutionary Computation for Opimization and Modeling.Springer, New York, 2006.

[2] D. Ashlock and A. Aherk. Non-local adaptation of artificial predatorsand prey. In Proceedings of the 2005 Congress on EvolutionaryComputation, volume 1, pages 41–48, Piscataway, NJ, 2005. IEEE Press.

[3] D. Ashlock, E. Blankenship, and J. D. Gandrud. A note on generaladaptation in populations of painting robots. InProceedings of the 2003Congress on Evolutionary Computation, pages 46–53, Piscataway, NJ,2003. IEEE Press.

[4] D. Ashlock and K. M .Bryden. Non-local adaptation in bidding agents.In Smart Engineering System Design: Neural Networks, EvolutionaryProgramming, and Artificial Life, volume 15, pages 193–199. ASMEPress, 2005.

[5] D. Ashlock and E. Y. Kim. The impact of cellular representation onfinite state agents for prisoner’s dilemma. InProceedings of the 2005Genetic and Evolutionary Computation Conference, pages 59–66, NewYork, 2005. ACM Press.

[6] D. Ashlock and E.-Y. Kim. Fingerprinting: Automatic analysis andvisualization of prisoner’s dilemma strategies.IEEE Transaction onEvolutionary Computation, 12:647–659, 2008.

[7] D. Ashlock, E. Y. Kim, and W. K. vonRoeschlaub. Fingerprints:Enabling visualization and automatic analysis of strategies for twoplayer games. InProceedings of the 2004 Congress on EvolutionaryComputation, volume 1, pages 381–387, Piscataway NJ, 2004. IEEEPress.

[8] D. Ashlock and E.Y. Kim. Techniques for analysis of evolved prisoner’sdilemma strategies with fingerprints. InProceedings of the 2005Congress on Evolutionary Computation, volume 3, pages 2613–2620,Piscataway, NJ, 2005. IEEE Press.

[9] D. Ashlock, E.Y. Kim, and L. Guo. Multi-clustering: avoiding the naturalshape of underlying metrics. In C. H. Dagli et al., editor,Smart Engi-neering System Design: Neural Networks, Evolutionary Programming,and Artificial Life, volume 15, pages 453–461. ASME Press, 2005.

[10] D. Ashlock, E.Y. Kim, and N. Leahy. Understanding representationalsensitivity in the iterated prisoner’s dilemma with fingerprints. IEEETransactions on Systems, Man, and Cybernetics–Part C: Applicationsand Reviews, 36(4):464–475, 2006.

[11] D. Ashlock and J. Mayfield. Acquisition of general adaptive features byevolution. InProceedings of the 7th Annual Conference on EvolutionaryProgramming, pages 75–84, New York, 1998. Springer.

[12] D. Ashlock and B. Powers. The effect of tag recognition on non-local adaptation. InProceedings of the 2004 Congress on EvolutionaryComputation, volume 2, pages 2045–2051, Piscataway, NJ, 2004. IEEEPress.

[13] D. A. Ashlock and E. Y. Kim. Fingerprint analysis of the noisy prisoner’sdilemma. In Proceedings of the 2007 Congress on EvolutionaryComputation, pages 4073–4080, 2007.

[14] D.A. Ashlock and J. Schonfeld. Depth annotation of rna folds forsecondary structure motif search. InProceedings of the 2005 IEEESymposium on Computational Intelligence in Bioinformatics and Com-putational Biology, page 3845, 2005.

[15] W. Ashlock. Using priosoner’s dilemma fingerprints to analyse evolvedstrategies. Master’s thesis, University of Guelph, 2008.

[16] W. Ashlock and D. Ashlock. Changes in prisoner’s dilemma strategiesover evolutionary time with different population sizes. InProceedingsof the 2006 Congress On Evolutionary Computation, pages 1001–1008,Piscataway, NJ, 2006. IEEE press.

[17] R. Axelrod. The Evolution of Cooperation. Basic Books, New York,1984.

[18] R. Axelrod and W. D. Hamilton. The evolution of cooperation. Science,211:1390–1396, 1981.

[19] W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone.GeneticProgramming : An Introduction. Morgan Kaufmann, San Francisco,1998.

[20] M. de Berg, M. van Kreveld, M. Overmans, and O. Schwarzkopf.Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, 2000.

[21] D. Doty. Non-loca evolutionary adaptation in grid plants. InProceedingsof the 2004 Congress on Evolutionary Computation, volume 2, pages1602–1609, Piscataway, NJ, 2004. IEEE Press.

[22] M. Hemesath. Cooperate or defect? Russian and Americanstudents in aprisoner’s dilemma.Comparative Economics Studies, 176:83–93, 1994.

[23] J. M. Houston, J. Kinnie, B. Lupo, C. Terry, and S. S. Ho. Com-petitiveness and conflict behavior in simulation of a socialdilemma.Psychological Reports, 86:1219–1225, 2000.

[24] E. Y. Kim. Analysis of Game Playing Agents with Fingerprints. PhDthesis, Iowa State University, 2005.

[25] J. F. Miller and S. L. Smith. Redundancy and computational efficiencyin cartesian genetic programming.IEEE Transaction on EvolutionaryComputation, 10(2):167–174, 2006.

[26] J. H. Miller. The coevolution of automata in the repeated prisoner’sdilemma. Journal of Economic Behavior and Organization, 29(1):87–112, January 1996.

[27] S. I. Resnick.Adventures in Stochastic Processes. Birkhauser, Boston,1992.

[28] D. Roy. Learning and the theory of games.Journal of TheoreticalBiology, 204:409–414, 2000.

[29] K. Sigmund and M. A. Nowak. Evolutionary game theory.CurrentBiology, 9(14):R503–505, 1999.

[30] W. K. vonRoeschlaub. Automated analysis of evolved strategies initerated prisoner’s dilemma. Master’s thesis, Iowa State University, 1994.