# the prisoner's dilemma

Post on 15-Nov-2014

2.343 views

Embed Size (px)

DESCRIPTION

Presentation of a project done by a group students of the "Science of Complex Systems 2012" course at University of Kent, UK. http://blogs.kent.ac.uk/complex/lectures-ph724/TRANSCRIPT

- 1. Jimmel Stewart, Peter Fox, Steven Fowler-Tutt, Oliver Shorley-Smith

2. What is the Prisoners Dilemma?A paradox in decision analysis in which two individuals acting in their own best interest pursue acourse of action that does not result in the ideal outcome. The typical prisoners dilemma is set up in such a way that both parties choose to protectthemselves at the expense of the other participant. As a result of following a purely logical thought process tohelp oneself, both participants find themselves in a worse state than if they had cooperated with each other in the decision-making process http://www.investopedia.com/terms/p/prisoners-dilemma.asp#ixzz1r2pVAseN 3. Pay Offs in involved in PrisonersDilemma? An alternative expression of this situation is given in thefollowing payoff matrix. The payoffs are traditionally called: T Temptation to defect R Reward for cooperation S Suckers Payoff P Punishment for mutual defection The condition T>R>P>S must hold. 4. Iterated Prisoners Dilemma? When considering repeated plays of the Prisoners Dilemma,known has the Iterated Prisoners Dilemma. The possibility of future interactions allowed us to look at theactions taken by the individual and how it could affect futurepayoffs. Therefore the defect always tactic would no longer holds. This allowed the players to develop different types of strategiesfor game play. That may take into account the opponentsprevious moves. The classic Axelrod iterated prisoners dilemma was used. 5. Reason for studying PrisonDilemma? The Prison Dilemma game seems simple but it has generated a largeamount of research that has been used to explain and analyses realscenarios such as: 1. Businesses interacting in a market 2. Personal relationships 3. Super power negotiations 4. Trench warfare live and let live system of World War I This project is concerned with applying the Prisoners Dilemma toshow how cooperation can evolve in a environment of individualsfrom different walks of life. The Prisoners Dilemma has proved a powerful tool for explainingthe evolution of cooperation from Robert Axelrods pioneering workto Richard Dawkins use of it in his famous work The Selfish Gene. 6. Axelrod and the Iterated Prisoner Dilemma 7. Axelrod and the Iterated Prisoner Dilemma Classic Prisoner Dilemma game Iterated PD do not permit players to knownumber of iterations Choice made today will influence opponentschoice tomorrow The future can cast a shadow back upon thepresent and thereby affect the current strategicsituation. Cooperation will evolve if chance players will meetagain 8. Open Tournament Computer simulations from all over the worldcompete Tit for Tat was the overall winner Tit for Tat not the best in any singlegame, but averaged out against allother strategies will win Ensure T>R>P>S and R + R > T + Sin our simulations/real data 9. Computer Simulations Part 1 10. Computer Simulations Part 1: Is Tit for Tat the best strategy? Simulate Tit for Tat against seven other strategies look for emergent behaviour Classic PD payoffs Match 1) Tit for Tat versus Random. No emergentbehaviour Match 2) Tit for Tat versus Always Defect. Evolution ofDefection behaviour. Matches 3, 4, 5, 6) Tit for Tat versus Always Cooperate,Tit for Tat, Pavlov and Grudger. Evolution ofCooperation behaviour. 11. Match 7: Tit for Tat versus Nave Prober Nave Prober = Tit for Tat but withRandom factor R See three distinct Phases of emergentbehaviour Phase One = Cooperative Phase Phase Two = Flip Flop Phase Phase Three = Defection Phase 12. 70score A Graph to Show the Cumulative Scores of 60Gabriel (red diamond) and Lucifer (blue square)Cumulative for Tit for Tat versus Naive Prober 50 40 30 CooperativeFlip-Flop PhaseDefection Phase 20 10 0 0 5 10 1520 2530 35Iteration 13. 3.5A Graph to Show How The LuciferRunning Mean (blue square) AndPhase Mean (red diamond) Vary 3With Iteration for Tit for Tat versusMean Nave Prober2.5 2 CooperativeFlip-Flop PhaseDefection Phase1.5Phase10.5 0 0510 152025 30 Iteration 35 14. Random Factor R Increase in R will decrease average scores.Why? Earlier likelihood of phase transition fromCooperative to Flip Flop Phase, and Flip-Flop to Defection Phase Decrease in R increases average scores, samerationale 15. Summary Tit for Tat did not win any single game Overall scored fairly highly supports Axelrodtournament outcome Sometimes emergent behaviour occurs, either towardsCooperation or Defection Tit for Tat versus Nave Prober sees three distinctPhases of evolutionary, emergent behaviour 16. Strategies of the Iterated Prisoners dilemma and Comparison of Gradual 17. A Successful Iterated Strategy(classic round robin tournaments of finiteiterations)Should exhibit the following behaviour Nice gameplay Retaliation forgiveness non-enviousAxelrod (1984) says a successful strategy must be Simple 18. The Gradual Strategy tit-for-tat forgiveness MemoryWhen faced with N total defections itresponds with N consecutive defectionsand then follows with two co-operations 19. Does Gradual out-perform Tit- for-tat?Our Meeting With Gradual: A Good Strategy For The IteratedPrisoners Dilemma by B. Beaufils, J-P. Delahaye and P. MathieuRound-robin tournamentStrategyFinal ScoreGradual 33416 1000 iterations per gameTit-for-tat 31411Soft majo 31210 Classical payoffSpite 30013 Against 12 other classical strategies ProberPavlov2917728910including Tit-for-Tat Mistrust25921Cooperate 25484 Accumulated score Per kind24796Results:DefectPer nasty2436323835 Gradual shown as the best strategyRandom22965 20. The Ecological EvolutionExperiment Strategies begin in equal populations Round-robin tournaments Unsuccessful = populations decrease Successful = population gains agents Simulation repeats until all populations stabiliseResults Gradual shown to significantly out-perform otherstrategies 21. Results 22. Our Comparison via ComputerSimulation Software PRISON used to compile experimentsAims: Confirm previous findings 12 strategies Include more complex and up-to-date strategies 37 strategies totalExperiments: Round-robin tournament Ecological simulation 23. Our ResultsStrategyFinal ScorePositiStrategyFinal PopulationRound-Robin gradual soft_spitefulsoft_joss103058 9992099712 Ecologicalon12gradualsoft_joss569 541 Evolutionhard_prober994783soft_spiteful529Final Scores:c_then_per_dctit_for_tatdoubler 9901696896 9675745tit_for_tatdoubler 412400 6 c_then_per_dc286prober4worse_and_worse3 spiteful 944179270192118 Final 978soft_tf2tsoft_majo slow_tft 232151 137soft_majo 91159soft_tf2tper_cd 9111089480 Populations: 10 1112worse_and_worse3hard_tf2t pavlov 135 132 125pavlov 8793413 spiteful 28 hard_tft8769414 hard_tft19 slow_tft8746215hard_prober = 0 stopped in 79hard_tf2t 8701816 mistrust = 0 stopped in 37 mistrust8606717 all_c= 0 stopped in 34prober 8600518prober4 = 0 stopped in 29prober28377419prober2 = 0 stopped in 27per_ccd83589 20 easy_go = 0 stopped in 25prober38328721 prober = 0 stopped in 24worse_and_worse 81300 22 hard_majo = 0 stopped in 24 prob_c_4_on_5 8090123 per_cd = 0 stopped in 23per_ddc80364 24 prober3 = 0 stopped in 22worse_and_worse280150 25 per_ccd = 0 stopped in 21 ipd_random79789 26prob_c_4_on_5= 0 stopped in 19 per_cccdcd79538 27 gradual_killer= 0 stopped in 19 calculator79364 28 per_ccccd = 0 stopped in 19easy_go 79351 29per_cccdcd = 0 stopped in 18hard_majo78757 30calculator = 0 stopped in 17gradual_killer 7873931worse_and_worse = 0 stopped in 17per_ccccd 7861032 ipd_random = 0 stopped in 16 all_c7587133per_ddc = 0 stopped in 15hard_joss73248 34 hard_joss = 0 stopped in 14better_and_better 7273135better_and_better = 0 stopped in 13 all_d 72022 36 worse_and_worse2= 0 stopped in 1337 all_d= 0 stopped in 11 24. Our Results Results obtained confirm that Gradual out-performs Tit-for-Tat + other strategies The more complex versions out-perform their simplecounterparts Two new strategies more successful that Tit-for-Tat:soft-joss always cooperates Plays like Tit for Tat, but itdefects only with the probability 0.9soft-spiteful Cooperates if opponent cooperates; otherwiseretaliates with 4Ds followed by 2CsBoth strategies promote forgiveness! 25. Conclusions Gradual consistently out-performed Tit-for-Tat = Axelrods results out-dated + Tit-for-Tat no longer the best strategy for IPD Strategies out-performing Tit-for-Tat use additional complexity = Axelrods original statement that a strategy is required to be simple is not necessarily true. Encouraging forgiveness + protection form exploitation = increasing emergence of cooperative gameplay. 26. The Prisoners Dilemma 27. How the game was organised How would Ex-Prisoners/Students copewhen needing to cooperate. All players put into a scenario, with twooptions to co-operate or defect. Seven Matches. Six iterations used. Each individual was briefed on what theaccomplice had decided to do (Co operateor Defect) 28. Pay Offs in the Game Player B Cooperate Defect Cooperate Prisoner A: 1Player Ayear 1 MonthPrisoner B: goes freeDefectPrisoner A: goes free3 Months Prisoner B: 1year 29. Results from games with Ex- Prisoners1 2 3 4 5 6 7Relationship siblings siblingssiblingsFriends Friends Cousins PartnersParticipant12 1 2 1 2 1 2 1 2 1 2 1 2Stage 1CD C C C C C C C C C C C CStage 2DD C C C C C C C C C C C CStage 3DD C D C C C C D C C C C CStage 4DD D D C D C C D D C C C CStage 5DD D D C C C C D D D C C CStage 6DD C C C C D C D D D D C D* letters in red are ex-prisoners 30. Ex-Prisoner Results showed The results show that 8 out of the 14 participantsdefected in the final stage. Of the 14 only 2 defected in the first stage, and mostkept cooperating right up until stages 3 or 4. Out of the 7 pairings 3 pairs both defected in the laststage, 2 pairs both cooperated in the last stage and 2pairs had 1 of the 2 participants defecting in the laststage. All the other ex-prisoners cooperated up until thepoint that their partner defected. Of the 8 participants wh

Recommended