iterated prisoner’s dilemma game in evolutionary computation 2003. 10. 2 seung-ryong yang

31
Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

Upload: noreen-chandler

Post on 08-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

3 Motivation Evolutionary approach Understanding complex behaviors by investigating simulation results using evolutionary process Giving a way to find optimal strategies in a dynamic environment IPD game Model complex phenomena such as social and economic behaviors Provide a testbed to model dynamic environment Objectives Obtaining multiple good strategies Forming coalition to improve generalization ability

TRANSCRIPT

Page 1: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

Iterated Prisoner’s Dilemma Game in Evolutionary Computation

2003. 10. 2

Seung-Ryong Yang

Page 2: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

2

Agenda

Motivation

Iterated Prisoner’s Dilemma Game

Related Works

Strategic Coalition

Improving Generalization Ability

Experimental Results

Conclusion

Page 3: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

3

Motivation

Evolutionary approachUnderstanding complex behaviors by investigating simulation results using evolutionary processGiving a way to find optimal strategies in a dynamic environment

IPD gameModel complex phenomena such as social and economic behaviorsProvide a testbed to model dynamic environment

ObjectivesObtaining multiple good strategiesForming coalition to improve generalization ability

Page 4: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

4

Iterated Prisoner’s Dilemma Game (1/2)

OverviewPrisoner’s possible choice

Defection

Cooperation

CharacteristicsNon-cooperative

Non-zerosum

Types of Game2IPD (2-player Iterated Prisoner’s Dilemma) game

NIPD (N-player Iterated Prisoner’s Dilemma) game

Cooperate Defect

Cooperate R / R T / S

Defect S / T P / P

Payoff Matrix of 2IPD Game by Axelrod, R.(1984)

STRSPRT 2,

Cooperate Defect

Cooperate 3 / 3 0 / 5

Defect 5 / 0 1 / 1

Page 5: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

5

Iterated Prisoner’s Dilemma Game (2/2)Representation of Strategy

History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action

Own History Opponent’s History

0 1 0 ∙∙∙ 1

l = 2 : Example History 11 01

2N History

Page 6: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

6

Related Works

Previous StudyPaul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization

Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation

Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning

IssuesTopics are broad about coalition formation in multi-agent environment

Darwen and Yao have studied coalition in IPD game, but different

Focused on cooperation, the number of player, payoff variances, etc

Page 7: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

7

What is Different?

Co-evolutionary LearningSelection Method

Rank BasedRoulette wheelTournament

Coalition FormationCoalition keeps surviving to next generationCondition to form coalition is flexible

Decision Making in CoalitionAdapting several decision making methods to coalition

Borda Function, Condorect FunctionAverage Payoff, Highest Payoff Weighted Voting

Page 8: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

8

Evolving StrategyTo evolve strategy, we use ;

Genetic algorithmCo-evolutionary learningStrategic coalition

Evolutionary Process

Page 9: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

9

Evolution of Agents (1/2)

Ci

C1

Ck

Before Population Current Population Next Population

Ci

C1

CkCj

Ci

C1

Ck

Cj

Cl

Evolution of AgentsAgents can develop their strategy using co-evolutionary learningWeak agents are removed from the population

Evolution of CoalitionFormed coalition survives to next generation Agents can join coalition generation by generation

Coalition survives or grows up

Page 10: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

10

Evolution of Agents (2/2)Problem : Possibility of evolving by weak agents

Caused by removing better agent from the population who belongs to coalition

Making new agents by mixing better agents within coalition

PopulationCk

Ci

Cj

A1

A2

Random Extraction

CoalitionMutation

Ai

Repeat as the number of agents belong to coalition

Page 11: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

11

Strategic Coalition (1/2)

What is Coalition?A cooperative game as a set A of agents in which each subset of A is called coalition - Matthias Klusch and Andreas Gerber, 2002

A group of agents that work jointly in order to accomplish their tasks - Onn M. Shehory, 1995

Coalition in the IPD game

Forming coalition through round-robin game

Pursuing more payoff using generalization ability

Coalition forms autonomously without supervision

Page 12: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

12

DefinitionsDefinition 1 : Coalition Value

Definition 2 : Payoff Function

Definition 3 : Coalition Identification

CS

C

p

pw

wpS

Cp

C

i i

ii

C

iiiC

1

1

where

Strategic Coalition (2/2)

STRSPRT 2,

(1)

10)(1

1)(0

1

1

1

1

C

i iDi

C

i iCi

C

i iDi

C

i iCi

C

wC

wCDefect

wC

wCCooperate

D

if

if

)1(1

CRankCw

CS

wp

Rankii

Cii

(2)

(3)

Definition 4 : Decision Making

Definition 5 : Payoff Distribution

Page 13: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

13

Coalition Formation (1/2)

A1

A2

A3

A4

Ak

An

Am

A5

Aj

...Ai

A2

Ai

A5

A3

C1

Aj

...

C2

Ci

A1

A4

C1

Ak

Al

C2

Am

An

Ci

... ...

Initial Population PopulationIncluding coalition

2IPD game

FormCoalition

Ai A5 A5 C1 C2 Ci

...

Page 14: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

14

Coalition Formation (2/2)Algorithm

2IPD Game

Exceeds iterationper generation?

Game type?

Agent vs.Agent

Agent vs.Coalition

Coalition vs.Coalition

Satisfy conditionfor forming coalition?

FormingCoalition

JoiningCoalition

Genetic Operation

Satisfycondition?

N

N

N

Y

Y

StopY

2,

2.1 STpSTp ji

2.2 ,

STCji pp

2,.3 STpp ji

Forming coalition1. Round-robin 2IPD game2. Obtain rank3. Determine confidence of

agent according to the rank

Joining coalition1. Round-robin 2IPD game2. Obtain rank3. If number of agents > max. number of

agents within a coalition, remove the weakest agent

4. Determine confidence of each agent

Page 15: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

15

Coalition Decision Making

Decision makingTo decide coalition’s opinionUse weighted voting method

Sharing profitsDistribution payoff with each agent’s confidenceRank influences each weight

Determining next action of coalition

• : Weight for cooperation of coalition Ci

• : Weight for defection of coalition Ci

DiC

CiCCi

Cj

Ck

Cl

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

CiC

DiC

Page 16: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

16

Weight of AgentsAdjusting weight

Give incentive to agents in coalitionIt reflects decision making of coalition

DiC

CiCCi

Cj

Ck

Cl

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

Adjusting weight

Page 17: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

17

Improving Generalization Ability (1/2)

Problem of one good strategyNot adaptive to dynamic environment

Obtain multiple good strategies for specific environment

Ex) Biological immune system

MethodFitness sharing

Adjust confidences of multiple strategies by evolution

Co-evolution

Coalition formation

Page 18: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

18

Improving Generalization Ability (2/2)

How good a player performs against unknown player

Evaluation

Random Generationof 100 Strategies

2IPD Game

Extract Top Strategies

in the Population

1 0001110...2 0000100...

3 0100100...4 0001100...5 0010010...

10 0000010...Top Strategies

Genetically Evolved Strategies

IPDGame

Page 19: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

19

Test StrategyTest Strategies

Strategy CharacteristicsTit-For-Tat Initially cooperate, and then follow opponentTrigger Initially cooperate. Once opponent defects, continuously defectAllD Always defectCDCD Cooperate and defect over and overCCD Cooperate and cooperate and defectRandom Random move

Example Strategy

0 0 1 0 1 1 0 0

0 0 0 1 1 1 1 1

1 1 1 1 1 1 1 1

0 1 0 1 0 1 0 1

0 0 1 0 0 1 0 0

1 1 0 1 0 0 1 1

Tit-for-Tat

Trigger

AllD

CDCD

CCD

Random

Page 20: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

20

Example of Game

Tit-for-Tat

1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1

Vs.Evolved Strategy

0 0 0 0

1 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history

1 1 1 0

1 1 1 1

1 1 1 1

0 0 1 0

1 0 1 1

1 1 1 1

1 1 1 1

0 1 0 03

5

1

1

1

3

0

1

1

1

Payoff Payoff

1

2 3 4 5

1

2 3 4 5

Page 21: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

21

Test Environment

Population size : 100

Crossover rate : 0.3

Mutation rate : 0.001

Number of generations : 200

Number of iterations : a third of population

Training set : Well-known 6 strategies

Experimental Result

Page 22: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

22

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

Random Payoff

Random S.D

Evolved Strategy vs. Random

Rank Genotype ofEvolved strategy

Evolved strategy Random

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

10111001111010111101001110011110101111011011101111111111111100111011111111111101001110111110111110110011000011111111111100111011111111111011001110111110111110111011111111111111111110111001111111111101

3.0800002.8000002.9200002.8800002.9400002.6800003.0400003.1600003.4800002.760000

1.9983991.9899751.9983991.9963971.9890701.6904441.9996001.9935901.9415461.985548

0.4800000.5500000.5200000.5700000.5400002.3500000.4900000.5000000.3800000.560000

0.4996000.4974940.4996000.6671580.5553381.9968730.4999000.6708200.4853860.496387

Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the gameagainst Random test strategies with high payoffs.

Experimental Result

Page 23: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

23

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

TFT Payoff

TFT S.D

Evolved Strategy vs. Tit-for-Tat

Rank Genotype ofEvolved strategy

Evolved strategy Tit-for-Tat

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

11000100001011011100011011000010100111001000100000101101110000000100001010011100100010000010110111000101010000101101110011001000001010011100110011000010110111100111010000101101110001010100011011011100

3.0200003.0000001.0400001.0800002.9800003.0000001.0400003.0000003.0200003.000000

1.6369480.0000000.3979950.5600000.3458321.6248080.3979950.0000001.6369480.000000

2.6400003.0000000.9900001.0200002.9700002.6700000.9900003.0000002.6400003.000000

2.0616500.0000000.0994990.4237920.4112182.0447740.0994990.0000002.0616500.000000

Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.

Experimental Result

Page 24: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

24

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

Trigger Payoff

Trigger S.D

Evolved Strategy vs. Trigger

Rank Genotype ofEvolved strategy

Evolved strategy Trigger

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

10111011110011101000101110111100111010010011101111001111100010111011110011111001101110111100111110011011101111001111100110111111110010111000001110111100111110011011101111001111100100111011110011111001

1.0400001.0400001.0600001.0400001.0800001.0400001.0400001.0400001.0600001.040000

0.3979950.3979950.4431700.3979950.4833220.3979950.3979950.3979950.4431700.397995

0.9900000.9900001.0100000.9900001.0300000.9900000.9900000.9900001.0100000.990000

0.0994990.0994990.2233830.0994990.2984960.0994990.0994990.0994990.2233830.099499

Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively.

Experimental Result

Page 25: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

25

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

AllD Payoff

AllD S.D

Evolved Strategy vs. AllD

Rank Genotype ofEvolved strategy

Evolved strategy ALLD

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

00111111111110101111001111111111101011110011111111111010111100111011111110101111101111111111101011110011111111111010111110111011111110101111001111111111101011110011111111111010101100111111111110101111

1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000

1.0000001.0000001.0000001.0000001.0000001.0000001.0400001.0400001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.3979950.3979950.0000000.000000

The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game.

Experimental Result

Page 26: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

26

Number of Coalition

0

5

10

15

20

25

30

0 20 40 60 80 100 Generation

Coa

litio

n

Coalition survives next generation. In early evolutionary process, most of coalitionare formed. It makes genetic diversity high and better choice against opponents.Coalition can grow if the conditions of agents are satisfied.

Experimental Result

Page 27: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

27

Comparing the Results

The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.

Experimental Result

Page 28: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

28

Bias of the Strategy

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0 50 100 150 200

RandomTFTTriggerAllDCDCDCCD

Bia

s

Generation

Bias shows how next choice of the strategies is selected against its opponents.The higher rate of bias means that a strategy chooses more “cooperation” than“defection” with a bias rate and vice versa.

Experimental Result

Page 29: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

29

Conclusions

ConclusionStrategic coalition might be a robust method that can adapt to a dynamic environmentDecision making methods influence the results, but not seriousThe evolved strategies by coalition generalize well against various opponents

DiscussionCan the strategic coalition be adapted to n-IPD game ?Which parameters in IPD game influence generalization ability ?How can make opponent strategies to test ?How can adapt this problem to real world ?

Page 30: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

30

Examples (1)Market Observer

Page 31: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

31

Examples (2)Forest Prediction