nash qlearning for generalsum stochastic games › ... › presentations › presentation11.pdf ·...

17

Nash Q-Learning for General-Sum Stochastic Games Hu, J. and Wellman, M.P. Lauri Lyly, [email protected]

Upload: others

Post on 26-Jun-2020

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash QLearning for GeneralSum Stochastic Games

Hu, J. and Wellman, M.P.

Lauri Lyly, [email protected]

Page 2: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stochastic generalsum games

● Stochasticity: Environment is in part formed by other agents– nondeterministic, noncooperative nature, no agreements

● Arbitrary relation between agents' rewards– Extends last time's topic zerosum

Page 3: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash Equilibrium

● Bestresponse joint strategy● Study limited to stationary strategies (policies)● Rewards of others are perceived, strategies are not

Page 4: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Nash QValues

Page 5: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stage game

● Oneperiod game as opposed to stochastic

● Mainly used in convergence proof● Nash equilibrium for the stage game. M is a “payoff

function”:

Page 6: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Update rule

● Same update rule for agent itself and its conjecture on other agent's Qfunctions

● Qfunctions can be initialized for example to 0● Asynchronous updating: only entries pertaining to current

state are updated

Page 7: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

The Nash Qlearning algorithm

Page 8: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Convergence proof requirements

● Assumption 1: Every stateaction tuple is visited infinitely often

● Assumption 2: Learning rate alpha(t) satisfies:– Sum from goes towards infinity– Squared sum does not– alpha = 0 if the element being updated doesn't correspond to

current stateaction tuple (asynchronous updating)

Page 9: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Proof basis and result

● Qlearning process updated by pseudocontraction operator using the usual form:

Q=(1alpha(t))Q(t)+alpha(t)[P(t)Q(t)]Contraction: Values approach optimal Q

● Link between stage games and stochastic games● Goal: Show that NashQ is a pseudocontraction operator● Actually a real contraction operator in restricted conditions

– Game with special types of Nash equilibrium points

Page 10: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Different Nash equilibria

● Global optimal point using stage game notation

● Saddle point

● All equilibria chosen for update must be same type

Page 11: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Stage games compared

Global optimal point (Up, Left) Saddle point (Down,Right)

Page 12: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Experimentation framework

● Two gridworld games

● Motivation for grid games: statespecific actions, qualitative transitions, immediate and longterm rewards

Page 13: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Learning process

● Violates assumption 3 of monotonic selection of global optima or saddle points.

● Still converges in most cases regardless of selection● Offline and online learning rated separately

Page 14: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Page 15: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Page 16: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

Conclusions

● No current method provides performance guarantees for generalsum stochastic games

● Works as a starting point– Other promising variants– Nash equilibrium itself can be refined

Page 17: Nash QLearning for GeneralSum Stochastic Games › ... › presentations › presentation11.pdf · Stochastic generalsum games Stochasticity: Environment is in part formed by other

References

● [7] Hu, J. and Wellman, M.P. (2003). Nash QLearning for GeneralSum Stochastic Games. Journal of Machine Learning Research 4:1039–1069.

@let@token Stochastic Differential Games in Finite and ...helper.ipam.ucla.edu/publications/gss2018/gss2018_15252.pdf · I Stackelberg (stochastic) di erential and dynamic games have

Stochastic Differential Games: A Sampling Approach via FBSDEsdcsl.gatech.edu/papers/dgaa17 (Printed).pdf · 2018-10-26 · Keywords Stochastic differential games · Forward and backward

Stochastic Direct Reinforcement: Application to Simple Games … · 2006-01-11 · Stochastic Direct Reinforcement: Application to Simple Games with Recurrence John Moody*, Yufeng

Solving Stochastic Games

Solving large stochastic games with reinforcement learning · 2015-09-18 · Solving large stochastic games with reinforcement learning Neal Hughes Australian National University,

Stochastic Modeling of Irrationality in Normal-Form Games

Stochastic Games and Bayesian Gameskevinlb/teaching/cs532l... · A stochastic game is a generalization ofrepeated games ... Stochastic Games and Bayesian Games CPSC 532l Lecture 10,

Joint Action Learners in Competitive Stochastic Games

Strategies in Stochastic Continuous-Time Games

Stochastic Perron for Stochastic Target Games...Stochastic Perron for Stochastic Target Games Jiaqi Li (Joint work with Erhan Bayraktar) Department of Mathematics University of Michigan

Continuous-Time Stochastic Games withqav.comlab.ox.ac.uk/papers/ic-contgames.pdf · Keywords: continuous time stochastic systems, time-bounded reachability, stochastic games 1. Introduction

Synthesis for Multi-Objective Stochastic Games: An Application ... - PRISM … · 2013. 6. 3. · methods in PRISM-games, a model checker for stochastic multi-player games, and present

From Timed Automata to Stochastic Hybrid Games

Deterministic and Stochastic Prisoner's Dilemma Games

Solving Stackelberg Equilibrium in Stochastic Games

Two-stage allocations for stochastic linear programming games

RECURSIVE METHODS IN DISCOUNTED STOCHASTIC GAMES

Zero-Sum Stochastic Games - staff.uz.zgora.plstaff.uz.zgora.pl/anowak/sg_zs.pdf · Zero-Sum Stochastic Games Anna Ja skiewicz Andrzej S. Nowak ... Gilette (1957) studied a similar

Forward–Backward Stochastic Differential Games and ... › mathfi › pdf-publications › article.pdf · 2 Maximum Principles for Stochastic Differential Games of Forward–Backward

De nable zero-sum stochastic games · 2017. 1. 29. · Keywords Zero-sum stochastic games, Shapley operator, o-minimal structures, deﬁnable games, uniform value, nonexpansive mappings,

Mean Field Games and Stochastic Growth Modeling · 2014-09-18 · Brief overview Stochastic Growth Capital Accumulation Game Continuous Time Mean eld games and stochastic growth The

Stochastic Zero-sum and Nonzero-sum -regular Games

Stochastic Multiplayer Games: Theory and Algorithms - Aachen

Stochastic games and their complexities

Stochastic Games Krishnendu Chatterjee CS 294 Game Theory

STOCHASTIC DIFFERENTIAL PORTFOLIO GAMES · STOCHASTIC DIFFERENTIAL PORTFOLIO GAMES Sid Browne Columbia University Original Version: April 1998 Appeared in Journal of Applied Probability,

Stochastic Games with a Single Controller and Incomplete ...the analogue for stochastic games. Sorin (1984, 1985) and Sorin and Zamir (1991) studied classes of stochastic games with

Stochastic Games and Bayesian Gameskevinlb/teaching/cs532l... · CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1. RecapStochastic GamesBayesian

pitchfreunde Vol. 4 - Pitch-Deck: qlearning

STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND …eilons/sgiiSIAM.pdf · STOCHASTIC GAMES WITH A SINGLE CONTROLLER AND INCOMPLETE INFORMATION∗ DINAH ROSENBERG†, EILON SOLAN‡,

A Lyapunov Optimization Approach to Repeated Stochastic Games

STOCHASTIC DIFFERENTIAL PORTFOLIO GAMES

Oblivious Equilibrium for Stochastic Games with Concave Utility

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic ...slazebni.cs.illinois.edu/fall17/lec10_stochastic_games_mark.pdfCS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search,