in search of value equilibria by christopher kleven & dustin richwine xkcd.com

IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Upload: dora-mcbride

Post on 18-Dec-2015

225 views

Category:

Documents

4 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

IN SEARCH OF VALUE EQUILIBRIA

By Christopher Kleven & Dustin Richwine

xkcd.com

Group

Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement

Learning Grad Student Mentor: Michael Wunder

PhD Student studying with Dr. Littman

Page 3: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Game Theory

Study of interactions of rational utility-maximizing agents and prediction of their behavior

An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions.Normal

Form GameColumn

acegbdfh

A B

RowA a, b c, d

B e, f g, h

Page 4: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Example

Child

Behave

Misbehave

Parent

Spoil 1, 2 0, 3

Punish

0, 1 2, 0

Spoiled Child Game Analysis

Let Child be Reinforcement Learner

Parent’s intent to play towards Nash Equilibrium outcome: (1/2)Spoil & (1/2)Punish1.5

Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3)

Misbehave0.667

Page 5: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Reinforcement Learning

Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and

Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration.

Page 6: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Q-Learning

Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B)

respectively. Q(action) =(1-α) Q(action) + αR -greedy exploration:

With a probability the Q-learner will choose a random action.

Page 7: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Goals

Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states.

Try to formalize the notion of "value based equilibria".

Develop new algorithms that learn effectively in a wide variety of games.

Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.

Page 8: IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Importance

The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in

auctions, making online purchases, tracking goods, or even playing online poker

Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions

A successful algorithm may prove conducive to the understanding of the brain’s ability to learn

What’s So Funny About Correlation? xkcd.com. What is Correlation? When two things go together: cookies & milk macaroni & cheese peanut butter & jelly

Nadja Dwenger, Henrik Kleven, Imran Rasul and Johannes ...eprints.lse.ac.uk/66118/1/Kleven_Extrinsic and intrinsic motivations... · By Nadja Dwenger, Henrik Kleven, Imran Rasul,

Richwine, Jason HARVARD UNIVERSITY DISSERTATION … · DISSERTATION ACCEPTANCE CERTIFICATE The undersigned, appointed by the Committee on Public Policy have examined a dissertation

Kleven HSE 2013

Http://xkcd.com/242/. Text Classification 2 David Kauchak cs160 Fall 2009 adapted from:

PHYS16 – Lecture 36 Ch. 15 Oscillations Xkcd.com

Ny Bolig – vår 2016 – Kleven Bygg

Estimating Taxable Income Responses Using Danish Tax …web.econ.ku.dk/eprn_epru/Journal_Articles/CEBR.Projekt 31.kleven-schultz_aug2013.pdfUsing Danish Tax Reforms ∗ Henrik Kleven,

PHYS16 – Lecture 33 Simple Harmonic Motion and Waves December 1, 2010 Xkcd.com

EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Mid-semester Feedback Thanks! 2

Mark Harmsworth – Architecture Nate Bruneau – Engineering Scott Kleven – Program Management Microsoft Corporation SESSION CODE: OSP321

EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – Floating Point – Finite State Machines – Combinational Logic – Sequential Logic 2

by, Elisa Kleven · The title of the book is The Lion and the Little Red Bird, and the author, or the person who wrote the story, is Elisa Kleven. ... 19 Berries, bird sour Ambled-slowly

Energy Recovery in Air Handlers Delivered by:Jason Richwine 16 April 2012Columbus, OH

Tunnel of love in kleven

The Optimal Income Taxation of Coupleselsa.berkeley.edu/~saez/kleven-kreiner-saezAugust08coupletax.pdf · HENRIK JACOBSEN KLEVEN London School of Economics, London WC2A 2AE, U.K

Omdømmedagen 2010: Tone Lien og Ingunn Rønningen Kleven, Birkebeinerrittet

karacmcgovern.weebly.com · Web viewRunning Header: BRAIN BREAKS. Brain Breaks Effects on On-Task Behavior. Kara McGovern, Megan Geisel, and Emily Richwine. James Madison University

Kleven HSE information

Activated Sludge Process Control - Lower Columbia Section Workshop - Feb2012... · 1 Activated Sludge Process Control by R. Dale Richwine, P.E. Richwine Environmental, Inc. Session

Sakai Accessibility Working Group Brian Richwine, Accessibility Working Group Lead, Indiana University

New Kleven Maritime a Norwegian case - OECD Kleven - 29Nov12.pdf · 2016. 3. 29. · Norwegian Shipbuilding 2012 Total number of vessels Offshore Vessels Total value bill NOK Deliveries

Kleven årsrapport 2011

On the Selective Opening Security of Practical Public-Key ... the Selective Ope… · sk Image source: xkcd.com SOSecurityofPKESchemes|HorstGörtzInstitute|PKC2015,NIST|Maryland:

Web Accessibility at IU UITS Adaptive Technology and Accessibility Centers Brian Richwine Mary Stores March 10, 2010

What’s So Funny About Correlation? Cartoon from xkcd.com

Activated Sludge Basics - Lower Columbia Sectionlowercolumbia.pncwa.org/documents/WorkShops/AS Workshop - Feb2012... · Activated Sludge Basics by R. Dale Richwine, P.E. Richwine

Kleven hms 2013

IQ and Immigration Policy - Jason Richwine

Building relationships - ABB Group€¦ · Business Insight 49 Kleven Verft. Created Date: 8/12/2016 3:32:05 PM

Credit: Randall Munroe xkcd.com Lab 7: Regular Expressions

Henrik Jacobsen Kleven London School of Economics

PowerPoint-presentatie - Syntra Vlaanderen · ruimtes proper houden, vaatwasser legen__. - Kopiëren - Lamineren - Etiketten kleven - Opdrachten afwerken - Rust uitstralen Behulpzaam

Version Control with Git xkcd.com/1597

Kleven Et Al. (2011),'Cheating Tax

in search of value equilibria by christopher kleven & dustin richwine xkcd.com

Documents

qlearning algorithm

learning robots

littman slide

reinforcement learning

different learning agents

h slide

michael littman

sub area of machine