evaluation through conflict martin zinkevich yahoo! inc
TRANSCRIPT
Who was I
• Worked with U Alberta Computer Poker Research Group– Designed Counterfactual Regret Algorithm– Theory behind DIVAT
• Worked on AAAI Computer Poker Competition– 2006 as lead programmer, 2007 as chair
• Work used in Man Vs Machine
AAAI Computer Poker Competition
• 5 years running• Now the ANNUAL Computer Poker
Competition• Latest-11 universities et al
OK, But Who Won?
• Online: Maximize total winnings• Equilibrium: Maximize number of people I can
win money from (or don’t lose against)
An Old Idea
• Think about learning in the presence of other intelligent agents.
• Prove cool stuff about your learning algorithm given:– constraints about the adversary– constraints about the game
Solving the Unsolvable
• In current competitions, people are often applying techniques that are effective in solvable games, even when the game is not solvable.
• In what competitions is it useless to approximate the game as solvable?
Axelrod’s Iterated Prisoner’s Dilemma
• A competition between many competitors.• One entry: tit-for-tat (Anatol Rapaport)
– Nice (initially)– Retaliating– Forgiving– Non-envious
• Learned that cooperation has value, but:– Cooperate with whom?– How do we cooperate?
What Is The Lemonade Stand Game?
• Every round for 100 rounds:– each person selects an action privately– then, the actions are revealed
• The score of a player is the distance clockwise to the next player plus the distance counterclockwise.
Key Observations• A constant-sum game between 3 players.
– For every gain, someone has to lose.• Possibilities For Cooperation
– Opposite sides of the circle, “sandwiching”• Not a “Solvable Game” (Nash, 1951)
– Playing equilibrium strategies is not advisable• Easy To Set “Table Image”
– The constant strategy often evokes cooperative behavior• Existing Techniques Fail
– Experts algorithms lose to constant strategy
Strategy #1: Play Constant
Strategy #2: Play Opposite
Strategy #3: Sandwich
Competition Structure
• Every set of three players played 100 rounds 180 times (1.5 million rounds total)
• Highest Total Score Wins• Mean, Standard Error can be calculated
Competitors
• 28 players, 9 teams– University of Southampton/Imperial College London
(Soton)– Yahoo! Inc. (Pujara)– Rutgers University (RL3)– Brown University (Brown)– Carnegie Mellon (2 teams-Waugh, ACTR)– University of Michigan (FrozenPontiac)– Princeton University (Schapire)– (Greg Kuhlmann)
Competition Results
0123456789
10
Soton
PujaraRL3
Waugh
ACTR
Schapire
Brown
Froze
nPontiac
Kuhlmann
Competitor
Scor
e Pe
r Rou
nd
Results
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Soton
PujaraRL3
Waugh
ACTR
Schapire
Brown
Froze
nPontiac
Kuhlmann
Competitor
Scor
e Pe
r Rou
nd-8
Modified Constant Uniformly Random
Restricting to Top 6
-1.5
-1
-0.5
0
0.5
1
Pujara Soton RL3 Waugh ACTR Schapire
Competitor
Scor
e Pe
r Rou
nd-8
The High Level
• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.
Lofty Goals
• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.
• behavior: a fully specified strategy.• used: actually leveraged
Practical Concessions
• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.
• Not any intelligent agent• Not any time (people change)• Not any task (context matters)