intelligent strategies for several zero-, one- and two-player games

Intelligent Strategies for Several Zero-, One- and Two-Player Games

Mugurel Ionut Andreica, Nicolae TapusPolitehnica University of Bucharest

Computer Science Department


2

Summary

• Motivation• Zero-Player Games

– Model natural evolution

• Single-Player Games– Model resource usage optimization

strategies

• Two-Player Games– Model the behavior of agents with

conflicting interests

• Conclusions & Future Work


3

Motivation

• Games=major motivation for developing:– Intelligent systems– Efficient algorithmic techniques

• (uprising) Game theory provides:– Means for analyzing complex

interactions between rational (economic) agents

– Strategies for maximizing revenues


4

Zero-Player Games (1/3)• natural evolution based on a set of rules

– No decisions to be made => 0 players• well-known model: binary cellular

automata– n cells (1,2,..,n)– cell i at time moment t: q(i,t)=0 or 1– transition function: q(i,t)=f(q(i-1,t-1), q(i, t-1),

q(i+1, t-1))• the considered cellular automaton

– if (q(i,t-1)=1 and q(i+1,t-1)=0) then• q(i,t)=0 and q(i+1,t)=1

– “swap” a pair of adjacent 1 and 0 (1 before 0)– models natural evolution which converges to

equilibrium


5

Zero-Player Games (2/3)• objective: find the state of the automaton after m

time steps– m=O(n) (after O(n) steps, all the 0s lie before all the 1s

=> no more “swaps” occur)• easy in O(n·m) time• linear time algorithm (independent of m)

– for each zero i (in left-to-right order), compute the list of actions ai,1, ai,2, ..., ai,na(i)

– na(i)=the number of time steps after which the ith zero reaches its final position

– an action:• move (the cell to the left is a 1)• wait (the cell to the left is a 0)

– ai,j=the action performed during time step na(i)-j (action ai,1 is the last action performed and ai,na(i) is the first one)


6

Zero-Player Games (3/3)

• compute the list of actions for all the “zero”-s in O(n) overall time

– the list of actions for “zero” i+1 is obtained from the list of “zero” i

– the list of “zero” i = handled like a stack (popping some actions + pushing others)

• auxiliary information (total number of waits during the first 1≤x≤na(i) actions)

– can determine in O(1) time the position of each “zero” after m time steps


7

Single-Player Games

• One player makes decisions

• Resource usage optimization– Minimize the amount of resources used– Maximize the amount of resources

collected

• Two games:– 1D Push-*– Resource Collector


8

1D Push-* (1/2)• Push-* = a simplified version of Sokoban• 1D Push-*

– Linear board with n cells (1,...,n – left to right)– robot: located on cell 1 => must reach cell n– Each cell: empty or occupied (by a block)– Moves:

• Walk 1 cell left/right – energy consumption: W• Push (any number of blocks in sequence) 1 cell

left/right – energy consumption: P• Jump K>1 cells (only if the previous K-1 moves

were walks) – energy consumption: J

– Objective: reach cell n + minimize the total energy consumed


9

1D Push-* (2/2)

• Maximal intervals of unoccupied cells:– I1, I2, ..., Ik (from left to right)

• Optimal strategy: never return (jump) from Ib to Ia (a<b)

• Dynamic programming:– E(i,j)=the minimum energy consumed in order

to have the robot located at cell i and having j empty cells to the left (i.e., the cells i-1, i-2, …, i-j are empty)

• For each pair (i,j): O(n2) sequences of moves to reach other pairs (i’,j’)

• O(n4) overall time complexity


10

Resource Collector (1/2)

• Complete directed graph with n vertices– tri,j=the travel time from i to j

• m “boxes of resources” appear at certain vertices (new resources are made available at some vertices)– vk = the vertex where the kth box appears– tak= the time moment when the kth box appears– ck = the amount of resources inside the kth box– box k is collected only if the player is located at vk at time tak

(or reaches vk at time tak)

• Objective: collect the maximum amount of resources (when all the information is known in advance)– initially, the player is located at vertex 1


11

Resource Collector (2/2)

• easy (but inefficient) O(m2) dynamic programming algorithm:–

– Cmax[k]=the maximum amount of resources which the player can collect if at time tak he/she arrives (or is located) at vertex vk (and, thus, collects the resources in box k)

• improved algorithms (using efficient algorithmic techniques):– O(m·n·log(n)), for m>n·log(n)– O(n·(n+Tmax)), for integer time moments and small Tmax

• Tmax=max{tri,j}

– O(m·log2(m)), when:• each vertex is a point on the OX axis: vertex i at point x i

• tri,j=|xi-xj|• OY axis: time => box k = point (xk, tak) with weight ck

• Orthogonal range search techniques, by rotating all the points by 45 degrees

pkv,vkmax

tata trandk p if Cmax[p],maxc[k]C kp


12

Two-Player Games

• Two players with conflicting interests– Try to maximize their revenues

• Two games:– K in a Row (a version of Kayles)

• Sprague-Grundy game theory + Observing Patterns

– Collect and Even/Odd Number of Objects• Dynamic Programming + Observing Patterns


13

K in a Row (1/2)

• linear board – n squares• two players, moving alternately (1st player=the one making

the first move)• a move = cover K consecutive uncovered squares of the

board• the first player unable to move => loses the game• using the (well-known) Sprague-Grundy game theory:

– G(i)=the Grundy number of a board composed of i consecutive uncovered squares

– G(i)=mex{G(s)} ; s = a state of the board which can be reached by performing one move

– a move: cover the squares j+1, ..., j+K => s = j squares to the left and (i-j-K) squares to the right ; G(s)=G(j) xor G(i-j-K)

– if G(n)>0 => the 1st player wins ; otherwise: the 2nd player wins

– O(n2) to compute all the Grundy numbers


14

K in a Row (2/2)• Patterns => reduce the complexity of computing the

winning strategy– K=1 => 1st player wins only if n is odd– K≥2 : focus on losing states– let s0=K-1, s1, s2, ... be the sequence of losing states (the

states 1, 2, ..., K-2 are trivial and are not considered here)– let d1, d2, ... (with di=si-si-1) be the sequence of differences

between consecutive losing states– K=2: d has a prefix of length 8 (4, 4, 6, 6, 4, 4, 6, 4) and a

period of length 5 afterwards (4, 12, 4, 4, 10) => O(1) for computing the winner (and O(1) per move for the winning strategy)

– K≥4: d1=2·K, d2=2·K, d3=4·K-2, d4=4·K-2, d5=4·K, d6=4·K-2, d7=8·K-2, d8=4·K-2, d9=8·K, d10=8·K-2, d11=16·K-6, d12=4·K => any state between 1 and 69·K-19 can be analyzed in O(1) time


15

Collect an Even/Odd Number of Objects (1/2)

• a pile composed of n (odd) objects• two players, moving alternately• a move=take at least 1 and at most min{K, # objects in the pile}

objects from the pile + keep the objects• winner = the player who gathered an even total number of objects• dynamic programming – O(n·K) : easy, but inefficient:

– win[0,i] is 1, if the pile contains i objects, the winner must gather an even number of objects and the player whose turn is next has a winning strategy (and 0, otherwise)

– win[1,i] is defined similarly, except that the winner must gather an odd number of objects

–

–

otherwise 0,

0c]-i2, mod 2)) mod c)-((i12) mod win[((c

such that K})min{i,c1( if1,i]win[0,

otherwise 0,

0c]-i2, mod 2)) mod c)-((i2) mod win[((c

such that K})min{i,c1( if1,i]win[1,


16

Collect an Even/Odd Number of Objects (2/2)

• improve the time complexity to O(n)– maintain last[x,y,z] (0≤x,y,z≤1)=the last value of i (number of objects in the pile)

such that:• the parity of the total number of objects to be gathered by the winner is x (0

for even, 1 for odd)• y=((the number i of objects in the pile) mod 2)• z=win[x,i]

–

–

• Patterns– K even: win[0,n]=0, only if (n mod (K+2)=1).– K odd: win[0,n]=0, only if (n mod (2·K+2)=1) – if winner=player who gathers an odd total # of objects

• K odd: win[1,n]=0, only if (n mod (2·K+2)=(K+2)) • K even: win[1,n]=0, only if (n mod (K+2)= (K+1))

otherwise 0,

K0]) 2), mod 1)-((i 2), mod 1)-last[((i-(i if 1,

K0]) 2), mod (i 2), mod 1)-last[((i-(i if 1,

i]win[0,

otherwise 0,

K0]) 2), mod 1)-((i 2), mod last[(i-(i if 1,

K0]) 2), mod (i 2), mod last[(i-(i if 1,

i]win[1,


17

Conclusions & Future Work• intelligent strategies/algorithmic techniques for:

– zero-player games• special kind of cellular automaton

– efficient algorithm for state evaluation

– single-player games• make decisions in order to reach a goal and optimize resource usage

(minimize resource consumption, maximize the amount of gathered resources)

• dynamic programming algorithms• geometric techniques

– two-player games• agents with conflicting interests• Sprague-Grundy game theory + dynamic programming• observing unexpected, non-standard patterns for losing states =>

improve the time complexity of computing winning strategies• future work

– tackle more realistic and real-time game models– develop a game-theoretic approach towards resource

management and rational decision making in resource optimization


18

Thank You !

intelligent strategies for several zero-, one- and two-player games

Documents