Transcript
Page 1: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Games, Times, and Probabilities:Value Iteration in Verification and Control

Krishnendu Chatterjee Tom Henzinger

Page 2: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Models of Systems

vertices = states

edges = transitions

paths = behaviors

Page 3: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

graph

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Page 4: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graphs vs. Games

a

baa b

a

Page 5: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Games model Open Systems

Two players: environment / controller / input vs.

system / plant / output

Multiple players: processes / components / agents

Stochastic players: nature / randomized algorithms

Page 6: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Example

P1:

init x := 0

loop

choice | x := x+1 mod 2| x := 0

end choice

end loop

1: (x = y )

P2:

init y := 0

loop

choice | y := x | y := x+1 mod 2

end choice

end loop

2: ( y = 0 )

Page 7: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Questions

8 ( x = y )

9 ( x = y )

CTL

Page 8: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Questions

8 ( x = y )

9 ( x = y )00

10 11

01

X

CTL

Page 9: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

ATL [Alur/H/Kupferman]

Page 10: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111ATL [Alur/H/Kupferman]

Page 11: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111ATL [Alur/H/Kupferman]

X

Page 12: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111ATL [Alur/H/Kupferman]

X

Page 13: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111

Secure equilibra [Chatterjee/H/Jurdzinski]

Page 14: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111

Secure equilibra [Chatterjee/H/Jurdzinski]

Page 15: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Strategies

Strategies x,y: Q* ! Q

From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Page 16: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

Short for:

q ² hhP1ii 1 iff (9 x21) (8 y22) ( Outcomex,y(q) ² 1 )

Strategies x,y: Q* ! Q

From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Page 17: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

hhP1ii 1 hhP2ii 2 = (9 x21) (9 y22) [ (1 Æ 2)(x,y) Æ (8 y’22) (2 ! 1)(x,y’) Æ (8 x’21) (2 ! 1)(x,y) ]

Strategies x,y: Q* ! Q

From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Page 18: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Objectives and 2

Qualitative: reachability; Buechi; parity (-regular)

Quantitative: max; lim sup; lim avg

Page 19: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Reachability } aSafety a = :}: a

Normal Forms of -Regular Sets

Borel-1

Page 20: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Reachability } aSafety a = :}: a

Buechi } acoBuechi } a = :}: a

Normal Forms of -Regular Sets

Borel-1

Borel-2

Page 21: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Reachability } aSafety a = :}: a

Buechi } acoBuechi } a = :}: a

Streett Æ ( } a ! } b ) = Æ ( }: a Ç } b )Rabin Ç ( } a Æ } b )

Parity: complement-closed subset of Streett/Rabin

Normal Forms of -Regular Sets

Borel-1

Borel-2

Borel-2.5

Page 22: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Buechi Game

q4q0q2

q1q3

G

B

Page 23: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Buechi Game

q4q0q2

q1q3

G

B

• Secure equilibrium (x,y) at q0:

x: if q1 ! q0, then q2 else q4. y: if q3 ! q1, then q0 else q4.

• Strategies require memory.

Page 24: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Zero-Sum Games: Determinacy

W1

W2

1 = : 2

hhP2ii 2

hhP1ii 1

Page 25: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Nonzero-sum Games

W10 hhP1ii (1 Æ : 2 )

W01 hhP2ii (2 Æ : 1)

W11

W00

hhP1ii1 hhP2ii2

Page 26: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Objectives

Qualitative: reachability; Buchi; parity (-regular)

Quantitative: max; lim sup; lim avg

Page 27: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Objectives

Qualitative: reachability; Buchi; parity (-regular)

Quantitative: max; lim sup; lim avg

Borel-1Borel-2

Borel-3

Page 28: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Quantitative Games

hhP1ii lim sup

hhP1ii lim avg

4

2

2

0

2

0

0

4

3

Page 29: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg

4

2

2

0

2

0

0

4

3

Page 30: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg = 1

4

2

2

0

2

0

0

4

3

Page 31: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Solving Games by Value Iteration

Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

q

Region R: Q ! V

q’

R(q’)

Page 32: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Solving Games by Value Iteration

Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

q

Region R: Q ! V

q’

R(q’)

R(q) := pre(R(q’))

Page 33: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states transition labels : Q Q

transition function

Graph

Page 34: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states transition labels : Q Q

transition function

= [ Q ! {0,1} ] regions with V = B

9pre:

q 9pre(R) iff ( ) (q,) R

8pre:

q 8pre(R) iff ( ) (q,) R

Graph

Page 35: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

Page 36: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

Page 37: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

Page 38: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

8 c = ( X) ( c Ç 8pre(X) )

Page 39: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Reachability

R

Given RµQ, find the states from which some path leads to R.

R

Page 40: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

RR [ pre(

R)

R = ( X) (R Ç 9pre(X))

Given RµQ, find the states from which some path leads to R.

Graph Reachability

Page 41: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

RR [ pre(

R)R

[ pre(R)

[ pre2(R)

R = ( X) (R Ç 9pre(X))

Given RµQ, find the states from which some path leads to R.

Graph Reachability

Page 42: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R

. . .

RR

[ pre(R)

R [ pre(

R) [ pre2(R

)

R = ( X) (R Ç 9pre(X))

Given RµQ, find the states from which some path leads to R.

Graph Reachability

Page 43: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R

. . .

RR

[ pre(R)

R [ pre(

R) [ pre2(R

)

R = ( X) (R Ç 8pre(X))

Given RµQ, find the states from which all paths lead to R.

Graph Reachability

Page 44: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Value Iteration Algorithms

consist of

A. LOCAL PART: 9pre and 8pre computation

B. GLOBAL PART: evaluation of a fixpoint expression

We need to generalize both parts to solve games.

Page 45: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q

transition function

Turn-based Game

Page 46: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q

transition function

= [ Q ! {0,1} ] regions with V = B

1pre:

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

Turn-based Game

Page 47: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q

transition function

= [ Q ! {0,1} ] regions with V = B

1pre:

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

2pre:

q 2pre(R) iff q 2 Q1 Æ (8 ) (q,) R or q 2 Q2 Æ (9 2 ) (q,) 2 R

Turn-based Game

Page 48: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

Page 49: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

Page 50: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

Page 51: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

hhP2ii c = ( X) ( c Ç 2pre(X) )

Page 52: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

hhP2ii c = ( X) ( c Ç 2pre(X) )

Page 53: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

hhP2ii c = ( X) ( c Ç 2pre(X) )

Page 54: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R

P1 R

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

Page 55: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

RR [ 1pre(

R)

P1 R

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

Page 56: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

RR [ 1pre(

R)R

[ 1pre(R)

[ 1pre2(R)

P1 R

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

Page 57: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R

. . .

1 RR

[ 1pre(R)

R [ 1pre(

R) [ 1pre2(R

)

P1 R = ( X) (R Ç 1pre(X))

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

Page 58: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

P1 R

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

R

Safety Game

Page 59: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R \ 1pre(R)

P1 R

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

R

Safety Game

Page 60: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

P1 R

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

R

Safety Game

Page 61: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

. . .1 R

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

P1 R = ( X) (R Æ 1pre(X))

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

R

Safety Game

Page 62: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function

Quantitative Game

Page 63: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function

= [ Q ! N ] regions with V = N

1pre:

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game

Page 64: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function

= [ Q ! N ] regions with V = N

1pre:

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

2pre:

2pre(R)(q) = (min ) max( 1(q,), R((q,)) ) if q 2 Q1 (max 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game

Page 65: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Maximizing Game

a b0

1

2

5

3

Page 66: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

0 0 0

Page 67: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

1 0 0

Page 68: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

1 2 0

Page 69: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

2 2 0

Page 70: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B

B

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 71: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

BR1 = pre(B)

. . .pre(B

)

pre(B) [ pre2(B)

B

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 72: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

BR1 = pre(B)

B

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 73: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

BR1 = pre(B)R2 = pre(B Å

R1)

B

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 74: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B

... B

B = ( Y) 9 (B Æ 9pre(Y))

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 75: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B

... B

B = ( Y) ( X) ((B Æ 9pre(Y)) Ç 9pre(X))

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

Page 76: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B

P1 B

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

Page 77: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B

...P1 B

R2 = P1 1pre(B Å R1)

R1 = P1 1pre(B)

P1 B = ( Y) ( X) ((B Æ 1pre(Y)) Ç 1pre(X))

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

Page 78: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Can we use the same value iteration scheme?

Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures.

Reachability: 9 p = ( X) (p Ç 9pre(X)) 8 p = ( X) (p Ç 8pre(X))

Hence: hhP1ii p = ( X) (p Ç 1pre(X)) hhP2ii p = ( X) (p Ç 2pre(X))

From Graphs to Games

Page 79: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

Page 80: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

on graphs polynomial

on graphs linear

Page 81: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph-based (finite-carrier) systems:

Q = Bm = boolean formulas [e.g. BDDs]

9pre = (9 x 2 B)

Timed and hybrid systems:

Q = Bm £ Rn

= formulas of (Q,·,+) [e.g. polyhedral sets]9pre = (9 x 2 Q)

Beyond Graphs as Finite Carrier Sets

Page 82: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Q transition function

Concurrent Game

Page 83: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Q transition function

= [ Q ! {0,1} ] regions with V = B

1pre:

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

Concurrent Game

Page 84: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Q transition function

= [ Q ! {0,1} ] regions with V = B

1pre:

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

2pre:

q 2pre(R) iff (2 2 ) (1 1) (q,1,2) R

Concurrent Game

Page 85: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game

Page 86: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game

hhP2ii c = ( X) ( c Ç 2pre(X) )

Page 87: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game

hhP2ii c = ( X) ( c Ç 2pre(X) )

Page 88: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game

hhP2ii c = ( X) ( c Ç 2pre(X) )

Pr(1): 0.5 Pr(2): 0.5

Page 89: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

graph

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Page 90: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Nondeterministic closed system.

q1

q2 q3

Graph: 1 Player

b

a

a

Page 91: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a

Probabilistic closed system.

0.4 0.6

q1

q3q2

q5q4

MDP: 1.5 Players

a

b

a

c

Page 92: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Asynchronous open system.

q1

q3q2

q5q4

Turn-based Game: 2 Players

a

b

a

c

a

Page 93: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a

Probabilistic asynchronous open system.

0.4 0.6

q1

q3q2

q5q4

q7q6

Turn-based Stochastic Game: 2.5 Players

cb

c a

b

a

Page 94: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a

aa

q1

bbq2 q4 q5q3

1,1

1,2 2,1

2,2

Concurrent Game

Synchronous open system.

Page 95: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a

aa

q1

bbq2 q4 q5q3

q2: 0.3 q3: 0.2 q4: 0.5 q5:

q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3

q2: q3: 0.2 q4: 0.1 q5: 0.7

q2: 1.0 q3: q4: q5:

1 2

2

1Matrix game at each vertex.

q1:

Concurrent Stochastic Game

Probabilistic synchronous open system.

Page 96: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph: nondeterministic generator of behaviors (possibly stochastic)

Strategy: deterministic selector of behaviors (possibly randomized)

Graph + Strategies for both players ! Behavior

Page 97: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa.

Model = graph Pure behavior = path

q1

q2 q3b

a

a

Page 98: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}.

Model = MDP Pure behavior = probability distribution on paths = p-path

a

0.4 0.6

q1

q3q2

q5q4a

b

a

c

Page 99: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Model = turn-based game Pure behavior = path

Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa.

q1

q3q2

q5q4a

b

a

c

a

Page 100: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Model = turn-based game Pure behavior = path General (randomized) behavior = p-path

Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}.

q1

q3q2

q5q4a

b

a

c

a

Page 101: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

The objective of each player is to find a strategy that optimizes the value of the resulting behavior.

How do we define “value”?

A. Assign a value to each path

B. Assign a value to each behavior (expected value of A.)

C. Assign a value to each state (strategy sup inf of B.)

Page 102: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

A. Value of Paths

Qualitative value function: : Q ! {0,1}

e.g. -regular subsets of Q

Page 103: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

B. Value of Behaviors

path t: (T) = (t)

p-path T: (T) = Exp {(T)} (expected value)

Example:

T = {aaa: 0.2, aab: 0.7, bbb: 0.1 }

(} b)(T) = 0.8

Page 104: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

C. Value of States

hh1ii (q) = supx infy ( Outcomex,y(q) ) hh2ii (q) = supy infx ( Outcomex,y(q) )

Page 105: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function

= [ Q ! [0,1] ] regions with V = [0,1]

Concurrent Stochastic Game

Page 106: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function

= [ Q ! [0,1] ] regions with V = [0,1]

1pre:

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))

Concurrent Stochastic Game

Page 107: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function

= [ Q ! [0,1] ] regions with V = [0,1]

1pre:

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))

2pre:

2pre(R)(q) = (sup 2 2) (inf 1 1) R((q,1,2))

Concurrent Stochastic Game

Page 108: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

Page 109: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

hhP1ii c = ( X) max( c, 1pre(X) )

0

10

Page 110: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

hhP1ii c = ( X) max( c, 1pre(X) )

0

11

Page 111: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

hhP1ii c = ( X) max( c, 1pre(X) )

0.8

11

Page 112: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

hhP1ii c = ( X) max( c, 1pre(X) )

0.96

11

Page 113: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0

Concurrent Stochastic Game

hhP1ii c = ( X) max( c, 1pre(X) )

limit 1

11

Page 114: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Solving Games by Value Iteration

Reachability / max: Buechi / lim sup: Parity: …

Page 115: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Solving Games by Value Iteration

Reachability / max: Buechi / lim sup: Parity: …

Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they

optimal?

Page 116: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

1. Number of players: 1, 1.5, 2, 2.5

2. Alternation: turn-based or concurrent

3. Strategies: pure or randomized

4. Value of a path: qualitative (boolean) or quantitative (real)

5. Objective: Borel 1, 2, 3

6. Zero-sum vs. nonzero-sum

Summary: Classification of Games

Page 117: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

The two players have complementary path values: 2(t) = 1 – 1(t)

-reachability vs. safety / max vs. min-Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett

Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii1(q) + hh2ii2(q) = 1.

sup inf = inf sup

Summary: Zero-Sum Games

Page 118: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

1.5 players

2 players

2.5 players

concurrent

parity

CY98, dAl97: polynomial

GH82, EJ88

dAM01

dAH00, CdAH06:NP Å coNP

Summary: Zero-Sum Games

Page 119: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

-optimal strategies may not exist

-limit values may not be rational

--close strategies, for fixed , may require infinite memory

-no determinacy for pure strategies

a

aa

q1

bb

1,1

1,2 2,1

2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0

Concurrent Games are Difficult

Page 120: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

Turn-based Games are More Pleasant

Page 121: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

If solvable in P is open for non-stochastic parity games and for stochastic reachability games.

Turn-based Games are More Pleasant

Page 122: Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Summary

Verification and control are very special (boolean) cases of graph-based optimization problems.

They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces.

The theory and practice of this is still wide open …


Top Related