evaluation through conflict martin zinkevich yahoo! inc

Evaluation Through Conflict

Martin ZinkevichYahoo! Inc.

http://martin.zinkevich.org/lemonade

Who was I

• Worked with U Alberta Computer Poker Research Group– Designed Counterfactual Regret Algorithm– Theory behind DIVAT

• Worked on AAAI Computer Poker Competition– 2006 as lead programmer, 2007 as chair

• Work used in Man Vs Machine

Who am I

• Run the Lemonade Stand Game Competition• Work with Yahoo Anti-Abuse Team

AAAI Computer Poker Competition

• 5 years running• Now the ANNUAL Computer Poker

Competition• Latest-11 universities et al

Competitions:Science vs Entertainment

AAAI Computer Poker Competition

May The Best Program Win!And Win Again IF WE PLAYED AGAIN!

Head to Head

VS

for 1000 hands

All Combinations

7,-7 10,-10

-7,7 5,-5

-10,10 -5,5

OK, But Who Won?

• Online: Maximize total winnings• Equilibrium: Maximize number of people I can

win money from (or don’t lose against)

Why a New Competition?

ComputingEquilibria

✓Choosing Equilibria

?

Bach or Stravinsky

2,1 0,0

0,0 1,2

Big Question: How Do (or Would) People Get to Nash Equilibria?

Solvable Games

$

Unsolvable Games

∞

$ ?

An Old Idea

• Think about learning in the presence of other intelligent agents.

• Prove cool stuff about your learning algorithm given:– constraints about the adversary– constraints about the game

Solving the Unsolvable

• In current competitions, people are often applying techniques that are effective in solvable games, even when the game is not solvable.

• In what competitions is it useless to approximate the game as solvable?

Axelrod’s Iterated Prisoner’s Dilemma

• A competition between many competitors.• One entry: tit-for-tat (Anatol Rapaport)

– Nice (initially)– Retaliating– Forgiving– Non-envious

• Learned that cooperation has value, but:– Cooperate with whom?– How do we cooperate?

The Lemonade Stand Game

What Is The Lemonade Stand Game?

• Every round for 100 rounds:– each person selects an action privately– then, the actions are revealed

• The score of a player is the distance clockwise to the next player plus the distance counterclockwise.

Key Observations• A constant-sum game between 3 players.

– For every gain, someone has to lose.• Possibilities For Cooperation

– Opposite sides of the circle, “sandwiching”• Not a “Solvable Game” (Nash, 1951)

– Playing equilibrium strategies is not advisable• Easy To Set “Table Image”

– The constant strategy often evokes cooperative behavior• Existing Techniques Fail

– Experts algorithms lose to constant strategy

Strategy #1: Play Constant

Strategy #2: Play Opposite

Strategy #3: Sandwich

Competition Structure

• Every set of three players played 100 rounds 180 times (1.5 million rounds total)

• Highest Total Score Wins• Mean, Standard Error can be calculated

Competitors

• 28 players, 9 teams– University of Southampton/Imperial College London

(Soton)– Yahoo! Inc. (Pujara)– Rutgers University (RL3)– Brown University (Brown)– Carnegie Mellon (2 teams-Waugh, ACTR)– University of Michigan (FrozenPontiac)– Princeton University (Schapire)– (Greg Kuhlmann)

Competition Results

0123456789

10

Soton

PujaraRL3

Waugh

ACTR

Schapire

Brown

Froze

nPontiac

Kuhlmann

Competitor

Scor

e Pe

r Rou

nd

Results

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Soton

PujaraRL3

Waugh

ACTR

Schapire

Brown

Froze

nPontiac

Kuhlmann

Competitor

Scor

e Pe

r Rou

nd-8

Modified Constant Uniformly Random

Restricting to Top 6

-1.5

-1

-0.5

0

0.5

1

Pujara Soton RL3 Waugh ACTR Schapire

Competitor

Scor

e Pe

r Rou

nd-8

Restricting to Top 4

-1.5

-1

-0.5

0

0.5

1

Pujara RL3 Soton Waugh

Teach Simply!EQUILIBRIUM

FREE

=

Learn

=

=

= ?

Learn

=

=

10 7

The High Level

• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

Lofty Goals


• behavior: a fully specified strategy.• used: actually leveraged

Practical Concessions


• Not any intelligent agent• Not any time (people change)• Not any task (context matters)

Thank You

http://martin.zinkevich.org/lemonade

evaluation through conflict martin zinkevich yahoo! inc

Documents

game slide

sandwich slide

orglemonade slide

science vs entertainment

man vs machine slide

game competition work

yahoo antiabuse team

competition structure