in search of value equilibria by christopher kleven & dustin richwine xkcd.com

8
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Upload: dora-mcbride

Post on 18-Dec-2015

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

IN SEARCH OF VALUE EQUILIBRIA

By Christopher Kleven & Dustin Richwine

xkcd.com

Page 2: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Group

Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement

Learning Grad Student Mentor: Michael Wunder

PhD Student studying with Dr. Littman

Page 3: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Game Theory

Study of interactions of rational utility-maximizing agents and prediction of their behavior

An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions.Normal

Form GameColumn

acegbdfh

A B

RowA a, b c, d

B e, f g, h

Page 4: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Example

Child

Behave

Misbehave

Parent

Spoil 1, 2 0, 3

Punish

0, 1 2, 0

Spoiled Child Game Analysis

Let Child be Reinforcement Learner

Parent’s intent to play towards Nash Equilibrium outcome: (1/2)Spoil & (1/2)Punish1.5

Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3)

Misbehave0.667

Page 5: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Reinforcement Learning

Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and

Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration.

Page 6: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Q-Learning

Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B)

respectively. Q(action) =(1-α) Q(action) + αR -greedy exploration:

With a probability the Q-learner will choose a random action.

Page 7: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Goals

Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states.

Try to formalize the notion of "value based equilibria".

Develop new algorithms that learn effectively in a wide variety of games.

Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.

Page 8: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Importance

The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in

auctions, making online purchases, tracking goods, or even playing online poker

Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions

A successful algorithm may prove conducive to the understanding of the brain’s ability to learn