![Page 1: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/1.jpg)
IN SEARCH OF VALUE EQUILIBRIA
By Christopher Kleven & Dustin Richwine
xkcd.com
![Page 2: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/2.jpg)
Group
Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement
Learning Grad Student Mentor: Michael Wunder
PhD Student studying with Dr. Littman
![Page 3: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/3.jpg)
Game Theory
Study of interactions of rational utility-maximizing agents and prediction of their behavior
An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions.Normal
Form GameColumn
acegbdfh
A B
RowA a, b c, d
B e, f g, h
![Page 4: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/4.jpg)
Example
Child
Behave
Misbehave
Parent
Spoil 1, 2 0, 3
Punish
0, 1 2, 0
Spoiled Child Game Analysis
Let Child be Reinforcement Learner
Parent’s intent to play towards Nash Equilibrium outcome: (1/2)Spoil & (1/2)Punish1.5
Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3)
Misbehave0.667
![Page 5: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/5.jpg)
Reinforcement Learning
Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and
Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration.
![Page 6: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/6.jpg)
Q-Learning
Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B)
respectively. Q(action) =(1-α) Q(action) + αR -greedy exploration:
With a probability the Q-learner will choose a random action.
![Page 7: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/7.jpg)
Goals
Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states.
Try to formalize the notion of "value based equilibria".
Develop new algorithms that learn effectively in a wide variety of games.
Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.
![Page 8: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d1a5503460f949efaa5/html5/thumbnails/8.jpg)
Importance
The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in
auctions, making online purchases, tracking goods, or even playing online poker
Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions
A successful algorithm may prove conducive to the understanding of the brain’s ability to learn