uri zwick tel aviv university simple stochastic games mean payoff games parity games texpoint fonts...

33
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games CSR 2008 Moscow, Russia

Upload: louisa-weaver

Post on 03-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Uri ZwickTel Aviv University

Simple Stochastic GamesMean Payoff Games

Parity Games

CSR 2008Moscow, Russia

Mean Payoff Games

Simple Stochastic Games

Parity Games

Randomized subexponential

algorithm for SSG

Deterministic subexponential

algorithm for PG

Mean Payoff Games

Simple Stochastic Games

Parity Games

R

R

R

R

A simple Simple Stochastic Game

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

Simple Stochastic games (SSGs)Strategies

A general strategy may be randomized and history dependent

A positional strategy is deterministicand history independent

Positional strategy for MAX: choice of an outgoing edge from each MAX vertex

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Simple Stochastic game (SSGs) [Condon (1992)]

The outdegrees of all non-sinks are 2

Terminating binary games

Easy reduction from general gamesto terminating binary games

All probabilities are ½.

The game terminates with prob. 1

“Solving” terminating binary SSGs

The values vi of the vertices of a game are the unique solution of the following equations:

Corollary: Decision version in NP co-NP

The values are rational numbersrequiring only a linear number of bits

Value iteration (for binary SSGs)

Iterate the operator:

Converges to the unique solution

But, may require an exponentialnumber of iterations just to get close

Simple Stochastic game (SSGs) Payoff version [Shapley (1953)]

MAX minRAND

R

Limiting average version

Discounted version

Markov Decision Processes (MDPs)

Values and optimal strategies of a MDP can be found by solving an LP

Theorem: [Epenoux (1964)]

MAX minRAND

R

SSG NP co-NP – Another proof

Deciding whether the value of a game isat least (at most) v is in NP co-NP

To show that value v ,guess an optimal strategy for MAX

Find an optimal counter-strategy for min by solving the resulting MDP.

Is the problem in P ?

Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]

Non-terminating version

Discounted version

MPGsPayoffSSGs

Pseudo-polynomial algorithm (PZ’96)

MAX minRAND

R

ReachabilitySSGs

Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]

Value(σ,) – average of cycle formed

Again, both players have optimal positional strategies.

Selecting the second largest element with only four storage locations [PZ’96]

Parity Games (PGs) A simple example

2

1 4 1

3 2

EVEN wins if largest priorityseen infinitely often is even

Priorities

Parity Games (PGs)

EVEN

3

ODD

8

EVEN wins if largest priorityseen infinitely often is even

Equivalent to many interesting problemsin automata and verification:

Non-emptyness of -tree automata

modal -calculus model checking

Parity Games (PGs)

EVEN

3

ODD

8

Replace priority k by payoff (n)k

Mean Payoff Games (MPGs)

Move payoffs to outgoing edges

[Stirling (1993)] [Puri (1995)]

Switches

…i

Value vector of strategy σ of MAX with respect to the optimal counter

strategy of min

Strategy/Policy Iteration

Start with some strategy σ (of MAX)

While there are improving switches, perform some of them

As each step is strictly improving and as there is a finite number of strategies, the algorithm

must end with an optimal strategy

SSG PLS (Polynomial Local Search)

Strategy/Policy IterationComplexity?

Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)]

What happens if we perform all profitable switches [Hoffman-Karp (1966)]

???

Not known to be polynomialBest upper bound: O(2n/n) [Mansour-Singh (1999)]

No non-linear examplesBest lower bounds: 2n-O(1) [Madani (2002)]

A randomized subexponential algorithm for simple stochastic games

Start with an arbitrary strategy for MAX

Choose a random vertex iVMAX

Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i))

If switching ’ at i is not profitable, then ’ is optimal

Otherwise, let (’)i and repeat

A randomized subexponentialalgorithm for binary SSGs

[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]

A randomized subexponentialalgorithm for binary SSGs

[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]

There is a hidden order of MAX vertices under which the optimal strategy returned by

the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i

All correct !Would never be switched !

MAX vertices

The hidden order

ui(σ) - the maximum sum of values of a strategy of MAX that agrees with σ on i

The hidden order

Order the vertices such that

Positions 1,..,iwere switched

and would neverbe switched again

SSGs are LP-type problems[Björklund-Sandberg-Vorobyov (2002)]

[Halman (2002)]

General (non-binary) SSGs can be solved in time

AUSO – Acyclic Unique Sink Orientations

Parity Games (PGs) A simple example

2

1 4 1

3 2

EVEN wins if largest priorityseen infinitely often is even

Priorities

Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]

Vertices of highest priority

(even)

Vertices from whichEVEN can force the

game to enter A

Firstrecursive

call

Lemma: (i)

(ii)

Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]

Second recursive

call

In the worst case, both recursive calls are on games of size n1

Deterministic subexponential alg for PGs Jurdzinski, Paterson, Z (2006)

Second recursive

call

Dominion

Idea: Look for small

dominions!

Dominion: A (small) set from which one of the players can win without the play ever leaving this set

Dominions of size s can be found

in O(ns) time

Open problems

● Polynomial algorithms?● Is the Policy Improvement algorithm

polynomial?● Faster subexponential algorithms

for parity games? ● Deterministic subexponential algorithms

for MPGs and SSGs?● Faster pseudo-polynomial algorithms

for MPGs?