cs 416 artificial intelligence lecture 21 making complex decisions chapter 17 lecture 21 making...

CS 416Artificial Intelligence

Lecture 21Lecture 21

Making Complex DecisionsMaking Complex Decisions

Chapter 17Chapter 17

Lecture 21Lecture 21

Making Complex DecisionsMaking Complex Decisions

Chapter 17Chapter 17

Game Theory

Multiagent games with simultaneous movesMultiagent games with simultaneous moves

• First, study games with First, study games with oneone move move

– bankruptcy proceedingsbankruptcy proceedings

– auctionsauctions

– economicseconomics

– war gamingwar gaming

Multiagent games with simultaneous movesMultiagent games with simultaneous moves

• First, study games with First, study games with oneone move move

– bankruptcy proceedingsbankruptcy proceedings

– auctionsauctions

– economicseconomics

– war gamingwar gaming

Definition of a game

• The playersThe players

• The actionsThe actions

• The payoff matrixThe payoff matrix

– provides the utility to each player for each combination of provides the utility to each player for each combination of actionsactions

• The playersThe players

• The actionsThe actions

• The payoff matrixThe payoff matrix

– provides the utility to each player for each combination of provides the utility to each player for each combination of actionsactions

Two-finger Morra

Game theory strategies

Strategy == policy Strategy == policy (as in policy iteration)(as in policy iteration)• What do you do?What do you do?

– pure strategypure strategy

you do the same thing all the timeyou do the same thing all the time

– mixed strategymixed strategy

you rely on some randomized policy to select an actionyou rely on some randomized policy to select an action

• Strategy ProfileStrategy Profile

– The assignment of strategies to playersThe assignment of strategies to players

Strategy == policy Strategy == policy (as in policy iteration)(as in policy iteration)• What do you do?What do you do?

– pure strategypure strategy

you do the same thing all the timeyou do the same thing all the time

– mixed strategymixed strategy

you rely on some randomized policy to select an actionyou rely on some randomized policy to select an action

• Strategy ProfileStrategy Profile

– The assignment of strategies to playersThe assignment of strategies to players

Game theoretic solutions

What’s a solution to a game?What’s a solution to a game?

• All players select a “rational” strategyAll players select a “rational” strategy

• Note that we’re not analyzing one particular game, but the Note that we’re not analyzing one particular game, but the outcomes that accumulate over a series of played gamesoutcomes that accumulate over a series of played games

What’s a solution to a game?What’s a solution to a game?

• All players select a “rational” strategyAll players select a “rational” strategy

• Note that we’re not analyzing one particular game, but the Note that we’re not analyzing one particular game, but the outcomes that accumulate over a series of played gamesoutcomes that accumulate over a series of played games

Prisoner’s Dilemma

Alice and Bob are caught red handed at the scene Alice and Bob are caught red handed at the scene of a crimeof a crime• both are interrogated separately by the policeboth are interrogated separately by the police

• the penalty if they both confess is 5 years for eachthe penalty if they both confess is 5 years for each

• the penalty if they both refuse to confess is 1 year for eachthe penalty if they both refuse to confess is 1 year for each

• if one confesses and the other doesn’tif one confesses and the other doesn’t

– the honest one (who confesses) gets 10 years the honest one (who confesses) gets 10 years

– the liar gets 0 yearsthe liar gets 0 years

Alice and Bob are caught red handed at the scene Alice and Bob are caught red handed at the scene of a crimeof a crime• both are interrogated separately by the policeboth are interrogated separately by the police

• the penalty if they both confess is 5 years for eachthe penalty if they both confess is 5 years for each

• the penalty if they both refuse to confess is 1 year for eachthe penalty if they both refuse to confess is 1 year for each

• if one confesses and the other doesn’tif one confesses and the other doesn’t

– the honest one (who confesses) gets 10 years the honest one (who confesses) gets 10 years

– the liar gets 0 yearsthe liar gets 0 years

What do you do to act selfishly?What do you do to act selfishly?What do you do to act selfishly?What do you do to act selfishly?

Prisoner’s dilemma payoff matrix

confess

confess

Prisoner’s dilemma strategy

Alice’s StrategyAlice’s Strategy

• If Bob If Bob testifiestestifies

– best option is to best option is to testify (-5)testify (-5)

• If Bob If Bob confessesconfesses

– best options is to best options is to testify (0)testify (0)

Alice’s StrategyAlice’s Strategy

• If Bob If Bob testifiestestifies


• If Bob If Bob confessesconfesses


testifying is a testifying is a dominantdominant strategy strategy


confess

confess

Prisoner’s dilemma strategy

Bob’s StrategyBob’s Strategy

• If Alice If Alice testifiestestifies


• If Alice If Alice confessesconfesses


Bob’s StrategyBob’s Strategy

• If Alice If Alice testifiestestifies


• If Alice If Alice confessesconfesses




confess

confess

Rationality

Both players seem to have clear strategiesBoth players seem to have clear strategies

• Both Both testifytestify

– game outcome would be (-5, -5)game outcome would be (-5, -5)

Both players seem to have clear strategiesBoth players seem to have clear strategies

• Both Both testifytestify

– game outcome would be (-5, -5)game outcome would be (-5, -5)

Dominance of strategies

Comparing strategiesComparing strategies• Strategy Strategy ss can can stronglystrongly dominate dominate s’s’

– the outcome of the outcome of ss is always better than the outcome of is always better than the outcome of s’s’ no matter what the other player doesno matter what the other player does

testifyingtestifying strongly dominatesstrongly dominates confessingconfessing for Bob and for Bob and AliceAlice

• Strategy Strategy ss can can weakly weakly dominate dominate s’s’

– the outcome of the outcome of ss is better than the outcome of is better than the outcome of s’s’ on at on at least one action of the opponent and no worse on othersleast one action of the opponent and no worse on others

Comparing strategiesComparing strategies• Strategy Strategy ss can can stronglystrongly dominate dominate s’s’

– the outcome of the outcome of ss is always better than the outcome of is always better than the outcome of s’s’ no matter what the other player doesno matter what the other player does

testifyingtestifying strongly dominatesstrongly dominates confessingconfessing for Bob and for Bob and AliceAlice

• Strategy Strategy ss can can weakly weakly dominate dominate s’s’

– the outcome of the outcome of ss is better than the outcome of is better than the outcome of s’s’ on at on at least one action of the opponent and no worse on othersleast one action of the opponent and no worse on others

Pareto Optimal

Pareto optimality comes from economicsPareto optimality comes from economics

• An outcome can be Pareto optimalAn outcome can be Pareto optimal

– textbook: no alternative outcome that all players would textbook: no alternative outcome that all players would preferprefer

– I prefer: the best that could be accomplished without I prefer: the best that could be accomplished without disadvantaging at least one groupdisadvantaging at least one group

Pareto optimality comes from economicsPareto optimality comes from economics

• An outcome can be Pareto optimalAn outcome can be Pareto optimal

– textbook: no alternative outcome that all players would textbook: no alternative outcome that all players would preferprefer

– I prefer: the best that could be accomplished without I prefer: the best that could be accomplished without disadvantaging at least one groupdisadvantaging at least one group

Is the Is the testify testify outcome (-5, -5) Pareto Optimal?outcome (-5, -5) Pareto Optimal?Is the Is the testify testify outcome (-5, -5) Pareto Optimal?outcome (-5, -5) Pareto Optimal?

Is (-5, -5) Pareto Optimal?

Is there an outcome that improves outcome Is there an outcome that improves outcome without disadvantaging any group?without disadvantaging any group?Is there an outcome that improves outcome Is there an outcome that improves outcome without disadvantaging any group?without disadvantaging any group?

How about (-1, -1) from (confess, confess)?How about (-1, -1) from (confess, confess)?How about (-1, -1) from (confess, confess)?How about (-1, -1) from (confess, confess)?

confess

confess

Dominant strategy equilibrium

(-5, -5) represents a dominant strategy equilibrium(-5, -5) represents a dominant strategy equilibrium

• neither player has an incentive to divert from dominant strategyneither player has an incentive to divert from dominant strategy

– If Alice assumes Bob executes same strategy as he is now, she will If Alice assumes Bob executes same strategy as he is now, she will only lose more by switchingonly lose more by switching

likewise for Boblikewise for Bob

• Imagine this as a Imagine this as a local optimumlocal optimum in outcome space in outcome space

– each dimension of outcome space is dimension of a player’s choiceeach dimension of outcome space is dimension of a player’s choice

– any movement from dominant strategy equilibrium in this space any movement from dominant strategy equilibrium in this space results in worse outcomesresults in worse outcomes

(-5, -5) represents a dominant strategy equilibrium(-5, -5) represents a dominant strategy equilibrium

• neither player has an incentive to divert from dominant strategyneither player has an incentive to divert from dominant strategy

– If Alice assumes Bob executes same strategy as he is now, she will If Alice assumes Bob executes same strategy as he is now, she will only lose more by switchingonly lose more by switching

likewise for Boblikewise for Bob

• Imagine this as a Imagine this as a local optimumlocal optimum in outcome space in outcome space

– each dimension of outcome space is dimension of a player’s choiceeach dimension of outcome space is dimension of a player’s choice

– any movement from dominant strategy equilibrium in this space any movement from dominant strategy equilibrium in this space results in worse outcomesresults in worse outcomes

Thus the dilemma…

Now we see the problemNow we see the problem

• Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)

– To achieve Pareto optimal outcome requires divergence To achieve Pareto optimal outcome requires divergence from local optimum at strategy equilibriumfrom local optimum at strategy equilibrium

• Tough situation… Pareto optimal would be nice, but it is Tough situation… Pareto optimal would be nice, but it is unlikely because each player risks losing more unlikely because each player risks losing more

Now we see the problemNow we see the problem

• Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)

– To achieve Pareto optimal outcome requires divergence To achieve Pareto optimal outcome requires divergence from local optimum at strategy equilibriumfrom local optimum at strategy equilibrium

• Tough situation… Pareto optimal would be nice, but it is Tough situation… Pareto optimal would be nice, but it is unlikely because each player risks losing more unlikely because each player risks losing more

Nash Equilibrium

John Nash studied game theory in 1950sJohn Nash studied game theory in 1950s

• Proved that every game has an equilibriumProved that every game has an equilibrium

– If there is a set of strategies with the property that no If there is a set of strategies with the property that no player can benefit by changing her strategy while the other player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the strategies and the corresponding payoffs constitute the Nash EquilibriumNash Equilibrium

• All All dominant strategiesdominant strategies are Nash equilibria are Nash equilibria

John Nash studied game theory in 1950sJohn Nash studied game theory in 1950s

• Proved that every game has an equilibriumProved that every game has an equilibrium

– If there is a set of strategies with the property that no If there is a set of strategies with the property that no player can benefit by changing her strategy while the other player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the strategies and the corresponding payoffs constitute the Nash EquilibriumNash Equilibrium

• All All dominant strategiesdominant strategies are Nash equilibria are Nash equilibria

Another game

• Acme: Hardware manufacturer chooses between CD and Acme: Hardware manufacturer chooses between CD and DVD format for next game platformDVD format for next game platform

• Best: Software manufacturer chooses between CD and DVD Best: Software manufacturer chooses between CD and DVD format for next title format for next title

• Acme: Hardware manufacturer chooses between CD and Acme: Hardware manufacturer chooses between CD and DVD format for next game platformDVD format for next game platform

• Best: Software manufacturer chooses between CD and DVD Best: Software manufacturer chooses between CD and DVD format for next title format for next title

No dominant strategy

• Verify that there is no dominant strategyVerify that there is no dominant strategy• Verify that there is no dominant strategyVerify that there is no dominant strategy

Yet two Nash equilibria exist

Outcome 1: (DVD, DVD)… (9, 9)Outcome 1: (DVD, DVD)… (9, 9)

Outcome 2: (CD, CD)… (5, 5)Outcome 2: (CD, CD)… (5, 5)

If either player unilaterally changes strategy, that If either player unilaterally changes strategy, that player will be worse offplayer will be worse off

Outcome 1: (DVD, DVD)… (9, 9)Outcome 1: (DVD, DVD)… (9, 9)

Outcome 2: (CD, CD)… (5, 5)Outcome 2: (CD, CD)… (5, 5)

If either player unilaterally changes strategy, that If either player unilaterally changes strategy, that player will be worse offplayer will be worse off

We still have a problem

Two Nash equlibria, but which is selected?Two Nash equlibria, but which is selected?

• If players fail to select same strategy, both will loseIf players fail to select same strategy, both will lose

– they could “agree” to select the Pareto optimal solutionthey could “agree” to select the Pareto optimal solution

that seems reasonablethat seems reasonable

– they could coordinatethey could coordinate

Two Nash equlibria, but which is selected?Two Nash equlibria, but which is selected?

• If players fail to select same strategy, both will loseIf players fail to select same strategy, both will lose

– they could “agree” to select the Pareto optimal solutionthey could “agree” to select the Pareto optimal solution

that seems reasonablethat seems reasonable

– they could coordinatethey could coordinate

Zero-sum games

IntroIntro

• Payoffs in each cell of payoff matrix sum to 0Payoffs in each cell of payoff matrix sum to 0

• The Nash equilibrium in such cases may be a mixed strategyThe Nash equilibrium in such cases may be a mixed strategy

IntroIntro

• Payoffs in each cell of payoff matrix sum to 0Payoffs in each cell of payoff matrix sum to 0

• The Nash equilibrium in such cases may be a mixed strategyThe Nash equilibrium in such cases may be a mixed strategy

Zero-sum games

Payoffs in each cell sum to zeroPayoffs in each cell sum to zero

Two-finger MorraTwo-finger Morra

• Two players (Odd and Even)Two players (Odd and Even)

• ActionAction

– Each player simultaneously displays one or two fingersEach player simultaneously displays one or two fingers

• EvaluationEvaluation

– f = total number of fingersf = total number of fingers

if f == odd, Even gives f dollars go to Oddif f == odd, Even gives f dollars go to Odd

if f == even, Odd gives f dollars go to Evenif f == even, Odd gives f dollars go to Even

Payoffs in each cell sum to zeroPayoffs in each cell sum to zero

Two-finger MorraTwo-finger Morra

• Two players (Odd and Even)Two players (Odd and Even)

• ActionAction

– Each player simultaneously displays one or two fingersEach player simultaneously displays one or two fingers

• EvaluationEvaluation

– f = total number of fingersf = total number of fingers

if f == odd, Even gives f dollars go to Oddif f == odd, Even gives f dollars go to Odd

if f == even, Odd gives f dollars go to Evenif f == even, Odd gives f dollars go to Even

Optimal strategy

von Neumann (1928) developed optimal mixed von Neumann (1928) developed optimal mixed strategy for two-player, zero-sum gamesstrategy for two-player, zero-sum games• We need only keep track of what one player wins (because We need only keep track of what one player wins (because

we then know what other player loses)we then know what other player loses)

– Let’s pick the Even playerLet’s pick the Even player

– assume this player wishes to maximizeassume this player wishes to maximize

• Maximin technique (note we studied minimax in Ch. 6)Maximin technique (note we studied minimax in Ch. 6)

– make game a turn-taking game and analyzemake game a turn-taking game and analyze

von Neumann (1928) developed optimal mixed von Neumann (1928) developed optimal mixed strategy for two-player, zero-sum gamesstrategy for two-player, zero-sum games• We need only keep track of what one player wins (because We need only keep track of what one player wins (because

we then know what other player loses)we then know what other player loses)

– Let’s pick the Even playerLet’s pick the Even player

– assume this player wishes to maximizeassume this player wishes to maximize

• Maximin technique (note we studied minimax in Ch. 6)Maximin technique (note we studied minimax in Ch. 6)

– make game a turn-taking game and analyzemake game a turn-taking game and analyze

Maximin

Change the rules of Morra for analysisChange the rules of Morra for analysis

• Force Even to reveal strategy firstForce Even to reveal strategy first

– apply maximin algorithmapply maximin algorithm

– Odd has an advantage and thus the outcome of the game Odd has an advantage and thus the outcome of the game is Even’s worst case and Even is Even’s worst case and Even mightmight do better in real game do better in real game

• The worst Even can do is to lose $3 in this modified gameThe worst Even can do is to lose $3 in this modified game

Change the rules of Morra for analysisChange the rules of Morra for analysis

• Force Even to reveal strategy firstForce Even to reveal strategy first

– apply maximin algorithmapply maximin algorithm

– Odd has an advantage and thus the outcome of the game Odd has an advantage and thus the outcome of the game is Even’s worst case and Even is Even’s worst case and Even mightmight do better in real game do better in real game

• The worst Even can do is to lose $3 in this modified gameThe worst Even can do is to lose $3 in this modified game

Maximin

Force Odd to reveal strategy firstForce Odd to reveal strategy first

– Apply minimax algorithmApply minimax algorithm

If Odd selects 1, loss will be $2If Odd selects 1, loss will be $2


– The worst Odd can do is to lose $2 in this modified gameThe worst Odd can do is to lose $2 in this modified game

Force Odd to reveal strategy firstForce Odd to reveal strategy first

– Apply minimax algorithmApply minimax algorithm



– The worst Odd can do is to lose $2 in this modified gameThe worst Odd can do is to lose $2 in this modified game

Combining two games

Even’s combined utilityEven’s combined utility

• Even’s winnings will be somewhere between Even’s winnings will be somewhere between

– the best case (MAX) in the game modified to its disadvantage the best case (MAX) in the game modified to its disadvantage

– the worst case (MIN) in the game modified to its advantagethe worst case (MIN) in the game modified to its advantage

• EvenFirst_Utility EvenFirst_Utility ≤≤ Even’s_Utility Even’s_Utility ≤≤ OddFirst_Utilit OddFirst_Utilit

• -3 -3 ≤≤ Even’s_Utility Even’s_Utility ≤≤ 2 2

Even’s combined utilityEven’s combined utility

• Even’s winnings will be somewhere between Even’s winnings will be somewhere between

– the best case (MAX) in the game modified to its disadvantage the best case (MAX) in the game modified to its disadvantage

– the worst case (MIN) in the game modified to its advantagethe worst case (MIN) in the game modified to its advantage

• EvenFirst_Utility EvenFirst_Utility ≤≤ Even’s_Utility Even’s_Utility ≤≤ OddFirst_Utilit OddFirst_Utilit

• -3 -3 ≤≤ Even’s_Utility Even’s_Utility ≤≤ 2 2

Considering mixed strategies• Mixed strategyMixed strategy

– select one finger with prob: pselect one finger with prob: p

– select two fingers with prob: 1 – pselect two fingers with prob: 1 – p

• If one player reveals strategy first, second player will always use a pure If one player reveals strategy first, second player will always use a pure strategystrategy

– expected utility of a mixed strategyexpected utility of a mixed strategy

U1 = p * uU1 = p * uoneone + (1-p) u + (1-p) utwotwo

– expected utility of a pure strategyexpected utility of a pure strategy

U2 = max (uU2 = max (uoneone, u, utwotwo))

– U2 is always greater than U1 when your opponent reveals action earlyU2 is always greater than U1 when your opponent reveals action early

• Mixed strategyMixed strategy

– select one finger with prob: pselect one finger with prob: p

– select two fingers with prob: 1 – pselect two fingers with prob: 1 – p

• If one player reveals strategy first, second player will always use a pure If one player reveals strategy first, second player will always use a pure strategystrategy

– expected utility of a mixed strategyexpected utility of a mixed strategy

U1 = p * uU1 = p * uoneone + (1-p) u + (1-p) utwotwo

– expected utility of a pure strategyexpected utility of a pure strategy

U2 = max (uU2 = max (uoneone, u, utwotwo))

– U2 is always greater than U1 when your opponent reveals action earlyU2 is always greater than U1 when your opponent reveals action early

Modeling as a game tree

Because the second player will always use a fixed Because the second player will always use a fixed strategy…strategy…

• Still pretending Even goes firstStill pretending Even goes first

Because the second player will always use a fixed Because the second player will always use a fixed strategy…strategy…

• Still pretending Even goes firstStill pretending Even goes first

-

- -

--

Typo in book

-

What is outcome of this game?

Player Odd has a choicePlayer Odd has a choice

• Always pick the option that minimizes utility to EvenAlways pick the option that minimizes utility to Even

• Represent two choices as functions of pRepresent two choices as functions of p

• Odd picks line that is lowest Odd picks line that is lowest (dark part on figure)(dark part on figure)

• Even maximizes utility byEven maximizes utility bychoosing p to be where lineschoosing p to be where linescrosscross

– 5p – 3 = 4 – 7p5p – 3 = 4 – 7pp = 7/12 => Ep = 7/12 => Eutilityutility = -1/12 = -1/12

Player Odd has a choicePlayer Odd has a choice

• Always pick the option that minimizes utility to EvenAlways pick the option that minimizes utility to Even

• Represent two choices as functions of pRepresent two choices as functions of p

• Odd picks line that is lowest Odd picks line that is lowest (dark part on figure)(dark part on figure)

• Even maximizes utility byEven maximizes utility bychoosing p to be where lineschoosing p to be where linescrosscross

– 5p – 3 = 4 – 7p5p – 3 = 4 – 7pp = 7/12 => Ep = 7/12 => Eutilityutility = -1/12 = -1/12

Pretend Odd must go first

Even’s outcome decided byEven’s outcome decided bypure strategy (dependent on q)pure strategy (dependent on q)• Even will always pick maximum ofEven will always pick maximum of

two choicestwo choices

• Odd will minimize the maximum ofOdd will minimize the maximum oftwo choicestwo choices

– Odd chooses intersection pointOdd chooses intersection point

– 5q – 3 = 4 – 7q5q – 3 = 4 – 7qq = 7/12 => Eq = 7/12 => Eutilityutility = -1/12 = -1/12

Even’s outcome decided byEven’s outcome decided bypure strategy (dependent on q)pure strategy (dependent on q)• Even will always pick maximum ofEven will always pick maximum of

two choicestwo choices

• Odd will minimize the maximum ofOdd will minimize the maximum oftwo choicestwo choices

– Odd chooses intersection pointOdd chooses intersection point

– 5q – 3 = 4 – 7q5q – 3 = 4 – 7qq = 7/12 => Eq = 7/12 => Eutilityutility = -1/12 = -1/12

Final results

Both players use same mixed strategyBoth players use same mixed strategy

– pponeone = 7/12 = 7/12

– pptwotwo = 5/12 = 5/12

– Outcome of the game is -1/12 to EvenOutcome of the game is -1/12 to Even

Both players use same mixed strategyBoth players use same mixed strategy

– pponeone = 7/12 = 7/12

– pptwotwo = 5/12 = 5/12

– Outcome of the game is -1/12 to EvenOutcome of the game is -1/12 to Even

Generalization

Two players with nTwo players with n action choices action choices

• mixed strategy is not as simple as p, 1-pmixed strategy is not as simple as p, 1-p

– it is (pit is (p11, p, p22, …, p, …, pn-1n-1, 1-(p, 1-(p11+p+p22+…+p+…+pn-1n-1))))

• Solving for optimal Solving for optimal pp vector requires finding optimal point in (n-1)- vector requires finding optimal point in (n-1)-dimensional spacedimensional space

– lines become lines become hyperplaneshyperplanes

– some hyperplanes will be clearly worse for all some hyperplanes will be clearly worse for all pp

– find intersection among remaining hyperplanesfind intersection among remaining hyperplanes

– linear programming can solve this problemlinear programming can solve this problem

Two players with nTwo players with n action choices action choices

• mixed strategy is not as simple as p, 1-pmixed strategy is not as simple as p, 1-p

– it is (pit is (p11, p, p22, …, p, …, pn-1n-1, 1-(p, 1-(p11+p+p22+…+p+…+pn-1n-1))))

• Solving for optimal Solving for optimal pp vector requires finding optimal point in (n-1)- vector requires finding optimal point in (n-1)-dimensional spacedimensional space

– lines become lines become hyperplaneshyperplanes

– some hyperplanes will be clearly worse for all some hyperplanes will be clearly worse for all pp

– find intersection among remaining hyperplanesfind intersection among remaining hyperplanes

– linear programming can solve this problemlinear programming can solve this problem

Repeated games

Imagine same game played multiple timesImagine same game played multiple times

• payoffs accumulate for each playerpayoffs accumulate for each player

• optimal strategy is a function of game historyoptimal strategy is a function of game history

– must select optimal action for each possible game historymust select optimal action for each possible game history

• StrategiesStrategies

– perpetual punishmentperpetual punishment

cross me once and I’ll take us both down forevercross me once and I’ll take us both down forever

– tit for tattit for tat

cross me once and I’ll cross you the subsequent movecross me once and I’ll cross you the subsequent move

Imagine same game played multiple timesImagine same game played multiple times

• payoffs accumulate for each playerpayoffs accumulate for each player

• optimal strategy is a function of game historyoptimal strategy is a function of game history

– must select optimal action for each possible game historymust select optimal action for each possible game history

• StrategiesStrategies

– perpetual punishmentperpetual punishment

cross me once and I’ll take us both down forevercross me once and I’ll take us both down forever

– tit for tattit for tat

cross me once and I’ll cross you the subsequent movecross me once and I’ll cross you the subsequent move

The design of games

Let’s invert the strategy selection process to design fair/effective Let’s invert the strategy selection process to design fair/effective gamesgames

• Tragedy of the commonsTragedy of the commons

– individual farmers bring their livestock to the town commons to grazeindividual farmers bring their livestock to the town commons to graze

– commons is destroyed and all experience negative utilitycommons is destroyed and all experience negative utility

– all behaved rationally – refraining would not have saved the commons as all behaved rationally – refraining would not have saved the commons as someone else would eat itsomeone else would eat it

ExternalitiesExternalities are a way to place a value on changes in global utility are a way to place a value on changes in global utility

Power utilities pay for the utility they deprive neighboring communities Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase (prof at UVa))(yet another Nobel prize in Econ for this – Coase (prof at UVa))

Let’s invert the strategy selection process to design fair/effective Let’s invert the strategy selection process to design fair/effective gamesgames

• Tragedy of the commonsTragedy of the commons

– individual farmers bring their livestock to the town commons to grazeindividual farmers bring their livestock to the town commons to graze

– commons is destroyed and all experience negative utilitycommons is destroyed and all experience negative utility

– all behaved rationally – refraining would not have saved the commons as all behaved rationally – refraining would not have saved the commons as someone else would eat itsomeone else would eat it

ExternalitiesExternalities are a way to place a value on changes in global utility are a way to place a value on changes in global utility

Power utilities pay for the utility they deprive neighboring communities Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase (prof at UVa))(yet another Nobel prize in Econ for this – Coase (prof at UVa))

Auctions

• English AuctionEnglish Auction

– auctioneer incrementally raises bid price until one bidder auctioneer incrementally raises bid price until one bidder remainsremains

bidder gets the item at the highest price of another bidder gets the item at the highest price of another bidder plus the increment bidder plus the increment (perhaps the highest bidder (perhaps the highest bidder would have spent more?)would have spent more?)

strategy is simple… keep bidding until price is higher strategy is simple… keep bidding until price is higher than utilitythan utility

strategy of other bidders is irrelevantstrategy of other bidders is irrelevant

• English AuctionEnglish Auction

– auctioneer incrementally raises bid price until one bidder auctioneer incrementally raises bid price until one bidder remainsremains

bidder gets the item at the highest price of another bidder gets the item at the highest price of another bidder plus the increment bidder plus the increment (perhaps the highest bidder (perhaps the highest bidder would have spent more?)would have spent more?)

strategy is simple… keep bidding until price is higher strategy is simple… keep bidding until price is higher than utilitythan utility

strategy of other bidders is irrelevantstrategy of other bidders is irrelevant

Auctions

• Sealed bid auctionSealed bid auction

– place your bid in an envelope and highest bid is selectedplace your bid in an envelope and highest bid is selected

say your highest bid is vsay your highest bid is v

say you believe the highest competing bid is bsay you believe the highest competing bid is b

bid min (v, b + bid min (v, b + ))

player with highest value on good may not win the player with highest value on good may not win the good and players must contemplate other player’s good and players must contemplate other player’s valuesvalues

• Sealed bid auctionSealed bid auction

– place your bid in an envelope and highest bid is selectedplace your bid in an envelope and highest bid is selected

say your highest bid is vsay your highest bid is v

say you believe the highest competing bid is bsay you believe the highest competing bid is b

bid min (v, b + bid min (v, b + ))

player with highest value on good may not win the player with highest value on good may not win the good and players must contemplate other player’s good and players must contemplate other player’s valuesvalues

Auctions

• Vickery Auction (A sealed bid auction)Vickery Auction (A sealed bid auction)

– Winner pays the price of the second-highest bidWinner pays the price of the second-highest bid

– Dominant strategy is to bid what item is worth to youDominant strategy is to bid what item is worth to you

• Vickery Auction (A sealed bid auction)Vickery Auction (A sealed bid auction)

– Winner pays the price of the second-highest bidWinner pays the price of the second-highest bid

– Dominant strategy is to bid what item is worth to youDominant strategy is to bid what item is worth to you

Auctions

• These auction algorithms can find their way into computer-These auction algorithms can find their way into computer-controlled systemscontrolled systems

– NetworkingNetworking

RoutersRouters

EthernetEthernet

– Thermostat control in offices (Xerox PARC)Thermostat control in offices (Xerox PARC)

• These auction algorithms can find their way into computer-These auction algorithms can find their way into computer-controlled systemscontrolled systems

– NetworkingNetworking

RoutersRouters

EthernetEthernet

– Thermostat control in offices (Xerox PARC)Thermostat control in offices (Xerox PARC)

Neural Networks

Read Section 20.5Read Section 20.5

Small program and homework assignmentSmall program and homework assignment

Read Section 20.5Read Section 20.5

Small program and homework assignmentSmall program and homework assignment

Model of Neurons

• Multiple inputs/dendrites Multiple inputs/dendrites (~10,000!!!)(~10,000!!!)

• Cell body/soma performs Cell body/soma performs computationcomputation

• Single output/axonSingle output/axon

• Computation is typically Computation is typically modeled as linearmodeled as linear

– change in input change in input corresponds to kcorresponds to k change in change in output (not koutput (not k22 or sin or sin…)…)

• Multiple inputs/dendrites Multiple inputs/dendrites (~10,000!!!)(~10,000!!!)

• Cell body/soma performs Cell body/soma performs computationcomputation

• Single output/axonSingle output/axon

• Computation is typically Computation is typically modeled as linearmodeled as linear

– change in input change in input corresponds to kcorresponds to k change in change in output (not koutput (not k22 or sin or sin…)…)

Early History of Neural Nets

Eons ago: Neurons are inventedEons ago: Neurons are invented

• 1868: J. C. Maxwell studies feedback mechanisms1868: J. C. Maxwell studies feedback mechanisms

• 1943: McCulloch-Pitts Neurons1943: McCulloch-Pitts Neurons

• 1949: Hebb indicates biological mechanism1949: Hebb indicates biological mechanism

• 1962: Rosenblatt’s Perceptron1962: Rosenblatt’s Perceptron

• 1969: Minsky and Papert decompose perceptrons1969: Minsky and Papert decompose perceptrons

Eons ago: Neurons are inventedEons ago: Neurons are invented

• 1868: J. C. Maxwell studies feedback mechanisms1868: J. C. Maxwell studies feedback mechanisms

• 1943: McCulloch-Pitts Neurons1943: McCulloch-Pitts Neurons

• 1949: Hebb indicates biological mechanism1949: Hebb indicates biological mechanism

• 1962: Rosenblatt’s Perceptron1962: Rosenblatt’s Perceptron

• 1969: Minsky and Papert decompose perceptrons1969: Minsky and Papert decompose perceptrons

McCulloch-Pitts Neurons

• One or two inputs to neuronOne or two inputs to neuron

• Inputs are multiplied by Inputs are multiplied by weightsweights

• If sum of products exceeds a If sum of products exceeds a threshold, the neuron firesthreshold, the neuron fires

• One or two inputs to neuronOne or two inputs to neuron

• Inputs are multiplied by Inputs are multiplied by weightsweights

• If sum of products exceeds a If sum of products exceeds a threshold, the neuron firesthreshold, the neuron fires

What can we model with these?

-0.5

-1

Error inbook

Perceptrons

• Each input is binary and has Each input is binary and has associated with it a weightassociated with it a weight

• The sum of the inner product The sum of the inner product of the input and weights is of the input and weights is calculatedcalculated

• If this sum exceeds a If this sum exceeds a threshold, the perceptron firesthreshold, the perceptron fires

• Each input is binary and has Each input is binary and has associated with it a weightassociated with it a weight

• The sum of the inner product The sum of the inner product of the input and weights is of the input and weights is calculatedcalculated

• If this sum exceeds a If this sum exceeds a threshold, the perceptron firesthreshold, the perceptron fires

Neuron thresholds (activation functions)

• It is desirable to have a differentiable activation function for It is desirable to have a differentiable activation function for automatic weight adjustmentautomatic weight adjustment

• It is desirable to have a differentiable activation function for It is desirable to have a differentiable activation function for automatic weight adjustmentautomatic weight adjustment

http://www.csulb.edu/~cwallis/artificialn/History.htm

Hebbian Modification

““When an axon of cell A is near enough to excite When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic in firing it, some growth process or metabolic change takes place in one or both cells such change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is that A’s efficiency, as one of the cells firing B, is increased”increased”

from Hebb’s 1949 from Hebb’s 1949 The Organization of BehaviorThe Organization of Behavior, , p. 62p. 62

Error Correction

wxcxw ii

Only updates weights for non-zero inputsOnly updates weights for non-zero inputs

For positive inputsFor positive inputs

• If the perceptron should have fired but did not, the weight If the perceptron should have fired but did not, the weight is increasedis increased

• If the perceptron fired but should not have, the weight is If the perceptron fired but should not have, the weight is decreaseddecreased

Only updates weights for non-zero inputsOnly updates weights for non-zero inputs

For positive inputsFor positive inputs

• If the perceptron should have fired but did not, the weight If the perceptron should have fired but did not, the weight is increasedis increased

• If the perceptron fired but should not have, the weight is If the perceptron fired but should not have, the weight is decreaseddecreased

Perceptron Example

• Example modified Example modified from “The Essence from “The Essence of Artificial of Artificial Intelligence” by Intelligence” by Alison CawseyAlison Cawsey

• Initialize all weights Initialize all weights to 0.2to 0.2

• Let epsilon = 0.05 Let epsilon = 0.05 and threshold = 0.5and threshold = 0.5

• Example modified Example modified from “The Essence from “The Essence of Artificial of Artificial Intelligence” by Intelligence” by Alison CawseyAlison Cawsey

• Initialize all weights Initialize all weights to 0.2to 0.2

• Let epsilon = 0.05 Let epsilon = 0.05 and threshold = 0.5and threshold = 0.5

Name Had 4.0 Male Studious Drinker Gets 4.0Richard 1 1 0 1 0Alan 1 1 1 0 1Alison 0 0 1 0 0Jeff 0 1 0 1 0Gail 1 0 1 1 1Simon 0 1 1 1 0

Weights 0.2 0.2 0.2 0.2

Perceptron Example

• First output is 1 First output is 1 since since 0.2+0.2+0.2>0.50.2+0.2+0.2>0.5

• Should be 0, so Should be 0, so weights with active weights with active connections are connections are decremented by decremented by 0.050.05

• First output is 1 First output is 1 since since 0.2+0.2+0.2>0.50.2+0.2+0.2>0.5

• Should be 0, so Should be 0, so weights with active weights with active connections are connections are decremented by decremented by 0.050.05


Old w 0.2 0.2 0.2 0.2New w 0.15 0.15 0.2 0.15

Perceptron Example

• Next output is 0 since Next output is 0 since 0.15+0.15+0.2<=0.50.15+0.15+0.2<=0.5

• Should be 1, so Should be 1, so weights with active weights with active connections are connections are incremented by 0.05incremented by 0.05

• New weights work for New weights work for Alison, Jeff, and GailAlison, Jeff, and Gail

• Next output is 0 since Next output is 0 since 0.15+0.15+0.2<=0.50.15+0.15+0.2<=0.5

• Should be 1, so Should be 1, so weights with active weights with active connections are connections are incremented by 0.05incremented by 0.05

• New weights work for New weights work for Alison, Jeff, and GailAlison, Jeff, and Gail


Old w 0.15 0.15 0.2 0.15New w 0.2 0.2 0.25 0.15

Perceptron Example

• Output for Simon is 1 Output for Simon is 1 (0.2+0.25+0.15>0.5)(0.2+0.25+0.15>0.5)

• Should be 0, so Should be 0, so weights with active weights with active connections are connections are decremented by 0.05decremented by 0.05

• Are we finished?Are we finished?

• Output for Simon is 1 Output for Simon is 1 (0.2+0.25+0.15>0.5)(0.2+0.25+0.15>0.5)

• Should be 0, so Should be 0, so weights with active weights with active connections are connections are decremented by 0.05decremented by 0.05

• Are we finished?Are we finished?


Old w 0.2 0.2 0.25 0.15New w 0.2 0.15 0.2 0.1

Perceptron Example

• After processing all the After processing all the examples again we get examples again we get weights that work for weights that work for all examplesall examples

• What do these weights What do these weights mean?mean?

• In general, how often In general, how often should we reprocess?should we reprocess?

• After processing all the After processing all the examples again we get examples again we get weights that work for weights that work for all examplesall examples

• What do these weights What do these weights mean?mean?

• In general, how often In general, how often should we reprocess?should we reprocess?


Weights 0.25 0.1 0.2 0.1

Perceptrons are linear classifiers

Consider a two-input neuronConsider a two-input neuron

• Two weights are “tuned” to fit the dataTwo weights are “tuned” to fit the data

• The neuron uses the equation wThe neuron uses the equation w11 * x * x11 + w + w22 * x * x22 to fire or not to fire or not

– This is like the equation of a line mx + b - yThis is like the equation of a line mx + b - y

Consider a two-input neuronConsider a two-input neuron

• Two weights are “tuned” to fit the dataTwo weights are “tuned” to fit the data

• The neuron uses the equation wThe neuron uses the equation w11 * x * x11 + w + w22 * x * x22 to fire or not to fire or not

– This is like the equation of a line mx + b - yThis is like the equation of a line mx + b - y

http://www.compapp.dcu.ie/~humphrys/Notes/Neural/single.neural.html

Linearly separable

These single-layer perceptron networks can These single-layer perceptron networks can classify linearly separable systemsclassify linearly separable systemsThese single-layer perceptron networks can These single-layer perceptron networks can classify linearly separable systemsclassify linearly separable systems

For homework

Consider a system like XORConsider a system like XOR

xx11 xx22 xx11 XOR x XOR x22

11 11 00

00 11 11

11 00 11

11 11 00

Consider a system like XORConsider a system like XOR

xx11 xx22 xx11 XOR x XOR x22

11 11 00

00 11 11

11 00 11

11 11 00

Class Exercise

• Find w1, w2, and Find w1, w2, and theta such that theta such that Theta(x1*w1+x2*w2)Theta(x1*w1+x2*w2)= x1 xor x2= x1 xor x2

• Or, prove that it Or, prove that it can’t be donecan’t be done

• Find w1, w2, and Find w1, w2, and theta such that theta such that Theta(x1*w1+x2*w2)Theta(x1*w1+x2*w2)= x1 xor x2= x1 xor x2


2nd Class Exercise

• x3 = ~x1, x4 = ~x2x3 = ~x1, x4 = ~x2

• Find w1, w2, w3, Find w1, w2, w3, w4, and theta such w4, and theta such that that Theta(x1*w1+x2*w2)Theta(x1*w1+x2*w2)= x1 xor x2= x1 xor x2


• x3 = ~x1, x4 = ~x2x3 = ~x1, x4 = ~x2

• Find w1, w2, w3, Find w1, w2, w3, w4, and theta such w4, and theta such that that Theta(x1*w1+x2*w2)Theta(x1*w1+x2*w2)= x1 xor x2= x1 xor x2


3rd Class Exercise

• Find w1, w2, and f() Find w1, w2, and f() such that such that f(x1*w1+x2*w2) = f(x1*w1+x2*w2) = x1 xor x2x1 xor x2


• Find w1, w2, and f() Find w1, w2, and f() such that such that f(x1*w1+x2*w2) = f(x1*w1+x2*w2) = x1 xor x2x1 xor x2


cs 416 artificial intelligence lecture 21 making complex decisions chapter 17 lecture 21 making...

Documents

pareto optimal outcome

economicsan outcome

alternative outcome

sthe outcome of s

dominant strategyif

alicestrategy s

pareto optimaltextbook

bob confessesbest options