in search of value equilibria by christopher kleven & dustin richwine xkcd.com
TRANSCRIPT
IN SEARCH OF VALUE EQUILIBRIA
By Christopher Kleven & Dustin Richwine
xkcd.com
Group
Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement
Learning Grad Student Mentor: Michael Wunder
PhD Student studying with Dr. Littman
Game Theory
Study of interactions of rational utility-maximizing agents and prediction of their behavior
An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions.Normal
Form GameColumn
acegbdfh
A B
RowA a, b c, d
B e, f g, h
Example
Child
Behave
Misbehave
Parent
Spoil 1, 2 0, 3
Punish
0, 1 2, 0
Spoiled Child Game Analysis
Let Child be Reinforcement Learner
Parent’s intent to play towards Nash Equilibrium outcome: (1/2)Spoil & (1/2)Punish1.5
Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3)
Misbehave0.667
Reinforcement Learning
Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and
Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration.
Q-Learning
Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B)
respectively. Q(action) =(1-α) Q(action) + αR -greedy exploration:
With a probability the Q-learner will choose a random action.
Goals
Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states.
Try to formalize the notion of "value based equilibria".
Develop new algorithms that learn effectively in a wide variety of games.
Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.
Importance
The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in
auctions, making online purchases, tracking goods, or even playing online poker
Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions
A successful algorithm may prove conducive to the understanding of the brain’s ability to learn