towards equilibrium transfer in markov games 胡裕靖 2013-9-9
TRANSCRIPT
Towards Equilibrium Transfer in Markov Games
胡裕靖2013-9-9
Outline
BackgroundPreliminary IdeasSome Results
Background
Multi-agent Reinforcement Learning
Single-agent RL:
Mountain CarPath finding
RL in multi-agent tasks
Robot Soccer IKEA furniture robot
Markov Games
N: the set of agents.: the discrete state space.: the joint action space of the agents.is the reward function.p is the transition function.
Agent take joint actions
: the discrete state space.: the action space of the agent.is the reward function. is the transition function.
from one agent to more than one
Equilibrium-based MARL
Some equilibrium solution concepts in game theory can be adopted
Our Previous Work Equilibrium-based MARL:
Multi-agent reinforcement learning with meta equilibrium []
Multi-agent reinforcement learning by negotiation with unshared value functions []
Focusing on combining MARL with equilibrium solution concepts
Problematic issues: Equilibrium computing is complicated and time
consuming A new complexity class: TFNP! [] For tasks with many agents, equilibrium-based
MARL algorithms may take too much time
How to accelerate the learning process of equilibrium-based MARL?
Transfer Learning in RLMatthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains. Journal of Machine Learning Research, 2009.
𝑀𝐷𝑃 𝑀𝐷𝑃 ′instance/policy/value function/model/…
Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning, Springer, 2012.
accelerate
Reuse learnt knowledge
Transfer Learning in Markov Games?
𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 ′instance/policy/value function/model/…
𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……
𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……
…………
……
𝑡
Why not transfer between these normal-form games within a Markov game?
Inter-task transfer
Inner-task transfer
Inner-task Transfer
𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1
𝑡+1(𝑠 ,𝑎 ,𝑏)
……
Transfer equilibrium between similar normal-form games during learning in a Markov game:
Reuse the computed equilibria in previous games Reducing learning time
Key problems: Which games are similar? For example: the games occur on different visits
of a state How to transfer equilibrium?
Preliminary Ideas
Game Similarity Games with the same action space? Games with different action space? Similarity payoff distance? Equilibrium-based similarity or equilibrium-
independent similarity?Drew Fudenberg and David M. Kreps. A theory of learning, experimentation and equilibrium in games. 1990.
Game Similarity
Why not take in the second game?
Equilibrium-based similarity
Equilibrium transfer
Find equilibria of two games and compute the similarity
Transfer seems senseless!
Weird Cycle
Our IdeaTransfer equilibrium between games which are thought to
be similar.
Evaluate how much the loss brought by equilibrium transfer is.
Transfer is acceptable when there is a little loss.
𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1
𝑡+1(𝑠 ,𝑎 ,𝑏)
……
The two games are different only in one item.
Problem Definition
𝐺 ,𝑝∗𝐺′ , ?
transfer method
Can we find a transfer method which can transfer the computed Nash equilibrium in game to a strategy profile in game that satisfies and , there holds
where is close to . In other words, given a transfer method, if is
small enough, then the transfer method is acceptable.
Furthermore,
Approximate Nash
equilibrium
Problem Definition
and , define the transfer error
Let Let
Given a transfer method, we need to find the bound of !
A Naïve Transfer Method
Define the difference of the two games such that and
Examine the transfer error
Direct Transfer
𝐺 ,𝑝∗ 𝐺′ , ?
𝑝∗
A Naïve Transfer Method
𝜖 𝑖 (𝑎𝑖 ,𝑝′)=𝑈 𝑖
𝐺′ (𝑎𝑖 ,𝑝−𝑖∗ )−𝑈 𝑖𝐺′
(𝑝∗ )
¿ Σ𝑎− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈 𝑖
𝐺 (𝑎𝑖 , �⃗�−𝑖 )+𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )[𝑈 𝑖𝐺 (𝑎𝑖
′ , �⃗�−𝑖)+𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖
𝐺′
(𝑎𝑖 , �⃗�−𝑖 )− Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺′
(𝑎𝑖′ , �⃗�− 𝑖) ]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖
𝐺 (𝑎𝑖 ,�⃗�−𝑖 )−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺 (𝑎𝑖′ , �⃗�−𝑖 )]+Σ�⃗�−𝑖𝑝−𝑖∗ (�⃗�−𝑖 )[𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖) ]
≤ Σ�⃗�−𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )[𝛿𝑖(𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖
′𝑝𝑖∗ (𝑎𝑖
′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖 (𝑎𝑖 , �⃗�−𝑖 )−Σ�⃗�𝑝∗ (�⃗� )𝛿𝑖(�⃗�)
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖
+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝∗ ( �⃗�)𝛿𝑖 (�⃗�)¿
≤ Σ�⃗�−𝑖 𝛿𝑖+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝
∗ ( �⃗�)𝛿𝑖 (�⃗�)¿𝛿𝑖
+¿ (𝑎𝑖 ,�⃗�− 𝑖)=max (0 ,𝛿𝑖 (𝑎𝑖 ,�⃗�− 𝑖) )¿
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )𝑈 𝑖
𝐺 ′
(𝑎𝑖 , �⃗�− 𝑖 )−Σ𝑎𝑖′ Σ�⃗�−𝑖𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺′ (𝑎𝑖′ , �⃗�−𝑖)
A Naïve Transfer Method
Σ�⃗�− 𝑖𝛿𝑖+¿ (𝑎𝑖 , �⃗�− 𝑖)−Σ�⃗�𝑝
∗ ( �⃗�) 𝛿𝑖(�⃗�)¿
Many items in are zero if two games are very similar
Some Results
Future Work
Some problems: Other transfer methods? Only Nash equilibrium? Equilibrium finding algorithms
Transfer between games with different action space
Transfer between games with different agent numbers
Game abstraction
Thanks!