ArtificialIntelligence
Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic
Decisionswithmultipleagents
Whatiftheuncertaintyisduetootheragentsandthedecisionstheymake?Andwhatifthedecisionsofthoseagentsareinturninfluencedbyourdecisions?
• agentdesign– gametheorycananalyzetheagent’sdecisionsandcomputetheexpectedutilityforeachdecision(undertheassumptionthatotheragentsareactingoptimallyaccordingtogametheory)
• mechanismdesign– inversegametheorymakeitpossibletodefinetherulesoftheenvironmentsothatthecollectivegoodofallagentsismaximized(wheneachagentadoptsthegame-theoreticsolutionthatmaximizesitsownutility)
Single-movegames
Considerarestrictedsetofgames,whereallplayerstakeactionsimultaneouslyandtheresultofthegameisbasedonthissinglesetofactions
– whatmattersisthatnoplayerhasknowledgeoftheotherplayers’choices
Asingle-movegameisdefinedbythreecomponents:– players(oragents),likeO(odd)andE(even)– actionsthattheplayerscanchoose, likeoneortwofingers– a payofffunction thatgivestheutilitytoeachplayerforeachcombinationofactionsbyallplayers;thepayoffmatrixfortwo-fingerMorra isasfollows:
O:one O:twoE:one E=+2, O=-2 E=-3, O=+3
E:two E=-3, O=+3 E=+4, O=-4
Single-movegames:solutionandstrategy
Eachplayerinagamemustadoptandthenexecuteastrategy(policy).
– a purestrategyisadeterministicpolicy;forasingle-movegame,itisjustasingleaction
– a mixedstrategyisrandomizedpolicythatselectsactionsaccordingtoaprobabilitydistribution;fortwoactions,itiswritten[p,a;(1-p),b]
Thegame’soutcomeisanumericvalueforeachplayer.Asolutiontoagameisastrategyprofile(anassignmentofstrategytoeachplayer)inwhicheachplayeradoptsarationalstrategy.
– Whatdoes“rational”meanwheneachagentchoosesonlypartofthestrategyprofilethatdeterminestheoutcome?
Prisoner‘sdilemma
Considerthefollowingstory:– Twoallegedburglars,AliceandBob,arecaughtred-handednear
thesceneofburglaryandareinterrogatedseparately.– Aprosecutorofferseachadeal: ifyoutestifyagainstyour
partnerastheleaderofaburglaryring,youwillgofreewhileyourpartnerwillserve10yearsinprison.
– However, ifyoubothtestifyagainsteachother,youwillbothget5years.
– Ifyoubothrefusetotestify,youwillserveonly1yeareachforlesserchargeofpossessingstoleproperty.
Showtheytestifyorrefuse?– Therationaldecisionistotestify.
Alice:testify Alice:refuseBob:testify A=-5, B=-5 A=-10, B=0
Bob:refuse A=0, B=-10 A=-1, B=-1
Dominance
TestifyisadominantstrategyforthePrisoner‘sdilemma.– a strategysforplayerpstronglydominatesstrategys’iftheoutcomeforsis
betterforpthantheoutcomefors’,foreverychoiceofstrategiesbytheotherplayer(s)
– a strategysweaklydominatess’ifsisbetteronatleastonestrategyprofileandnoworseonanyother
Itisirrational toplayadominatedstrategyandnottoplayadominantstrategyifoneexists.
Wheneachplayerhasadominantstrategy,thecombinationofthosestrategiesiscalledadominantstrategyequilibrium.Astrategyprofileformsanequilibrium ifnoplayercanbenefitbyswitchingstrategies,giventhateveryotherplayerstickswiththesamestrategy.Everygamehasatleastoneequilibrium– Nashequilibrium.
TheoutcomeofgameisParetooptimal ifthereisnootheroutcomethatallplayerswouldprefer.
– AnoutcomeisParetodominatedbyanotheroutcomeifallplayerswouldprefertheotheroutcome.
Prisoner’sdilemmaisduetohavingadominantstrategyequilibrium(testify,testify)thatisParetodominatedbyoutcome(refuse, refuse).
NodominantstrategyConsiderthefollowinggame
– Acme,avideogameconsolemanufacturer,hastodecidewhetheritsnextgamemachinewilluseBlu-raydiscsorDVDs.
– Meanwhile,thevideogamesoftwareproducerBestneedstodecidewhethertoproducenextgameonBlu-rayorDVD.
– Theprofitsofbothwillbepositiveiftheyagreeandnegativeiftheydisagree.
Thereisnodominantstrategyequilibrium forthisgame,buttherearetwoNashequilibria.Therearemultipleacceptablesolutions,butifeachagentaimsforadifferentone,thenbothagentswillsuffer.Howcantheyagreeonasolution?
BothcanshouldchoosethePareto-optimal Nashequilibriumprovidedthatoneexists;(bluray,bluray)isthePareto-optimalsolutionWhatiftherearemuchsuchsolutions(forexampleif(bluray,bluray)hadpayoff(5,5))?
• agentscaneitherguessorcommunicate• coordination games(gamesinwhichplayersneedtocommunicate)
Acme:bluray Acme:dvdBest:bluray A=+9, B=+9 A=-4, B=-1
Best:dvd A=-3, B=-1 A=+5, B=+5
Mixedstrategies
Considertwo-fingerMorra game– nopure-strategy profileexists
• ifthetotalnumberoffingersiseven,thenOwillwanttoswitch
• ifthetotalisodd,thenEwillwanttoswitch
– wemustlookformixedstrategiesinstead
VonNeumanndevelopedamethodforfindingtheoptimalmixedstrategyfortwo-player,zero-sumgames(gamesinwhichthesumofthepayoffsisalwayszero).– themaximin technique– weneedtoconsider thepayoffsofonlyoneplayer(E)
Maximin technique(purestrategies)
Supposewechangetherulesasfollows:– FirstEpicksherstrategyandrevealsittoO.ThenOpickshis
strategy,withknowledge ofE‘sstrategy.• Thisgivesaturn-takinggametowhichwecanapplythestandardminimax algorithm.
• Clearly,thisgamesfavorsO,sowegetalowerboundforthetrueutilityforE(-3).
– NowsupposewechangetherulestoforceOtorevealhisstrategyfirst,followedbyE.• ThisgivesanupperboundforthetrueutilityofE(2).
Maximin technique(mixedstrategies)
Weneedtoturnoutouranalysistomixedstrategies[p,one;(1-p),two]
Oncethefirstplayerhasrevealedhisorherstrategy,thesecondplayermightaswellchooseapurestrategy.
WhatisthevalueofptogetthebestutilityforE(left)?– p=7/12andthepayoff -1/12
WhatisthevalueofqtogetthebestutilityforO(right)?– q=7/12andthepayoff-1/12
Optimalstrategyforbothplayersis[7/12,one;5/12,two]» maximin equilibrium(itisalsoaNashequilibrium)
» two-fingerMorra gamefavortheplayerO
- -
Repeatedgames
Whatifthesamegameisrepeatedmoretimes?
Repeatedgame isthesimplestkindofamultiple-movegame:
• playersfacethesamechoicerepeatedly,buteachtimewithknowledgeofthehistoryallplayers’previouschoices
• payoffsareadditiveovertime
StrategiesforrepeatedgamesTherepeatedversionoftheprisoner’sdilemma:1. thesameplayersplay100rounds
– rationalstrategyisstilltotestify(thelastgameisnottherepeatedgameetc.)
– earningatotaljailsentenceof500yearseach2. 99%chancethattheplayersmeetagain
– theexpectednumberofroundsisstill100,butneitherplayerknowsforsurewhichroundwillbethelast
– perpetualpunishmentstrategy:eachplayerrefuses unlesstheotherplayhaseverplayedtestify
– theexpectedfuturepayoffis-100(∑ 0.99% ∗ (−1)+,-. )ifbothplayers
adoptedthisstrategy– a playerwhodeviatesfromthestrategyandchoosestestifywillgain
ascore0,butthenbothplayerswillplaytestifyandthetotalexpectedfuturepayoffbecomes-495(0 + ∑ 0.99% ∗ (−5)+
,-1 )
Afamousstrategyiscalledtit-for-tat:• startingwithrefuse andtheechoingtheotherplayer’sprevious
moveonallsubsequentmoves• highlyrobustandeffectiveagainstawidevarietyofstrategies
MechanismdesignSofarwefocusedonthequestion„Givenagame,whatisarationalstrategy?“Whatifweask„Giventhatagentspickrationalstrategies,whatgameshouldwedesign?“Wewouldlike todesignagamewhosesolutions,consistingofeachagentpursuingitsownrationalstrategy,resultinthemaximizationofsomeglobalutilityfunction.
Thisiscalledmechanismdesignorsometimes inversegametheory.Itisusedineconomicsandpoliticalscience.Ingeneralitallowsustoconstructsmartsystemsoutofcollectionofmorelimited(evenuncooperative)systems.
Amechanism consistsof:– a language fordescribingthesetofallowablestrategies
thatagentsmayadopt,– a distinguishedagent– center – thatcollectsreports
ofstrategychoicesfromagentsinthegames– anoutcomerule,knowntoallagents,thatthecenter
usestodeterminethepayoffsofeachagentgiventheirstrategychoices
Auctions
Anauction isamechanismforsellingsomegoodstomembersofapoolofbidders.
Forsimplicity,weconcentrateonauctionswithasingleitemforsale.
Eachbidderi hasautilityvalue vi forhavingtheitem.• Insomecases,eachbidderhasaprivatevaluefortheitem.
– Anoldfurniturehasdifferentvalueforafurniturecollectorandyoungfamily.
• Inothercases,theitemhasacommonvalue,butthereisuncertaintyastowhattheactualvalueis.– Differentbiddershavedifferentinformationandhencedifferent
estimatesoftheitem’struevalue.
Auctionmechanism– eachbiddergetsachancetomakeabidbi– thehighestbidbmaxwinstheitem,buttheprice
paidneednotbebmax (partofmechanismdesign)
Englishauction
Thebest-knownauctionmechanismistheascending-bid, orEnglishauction.
– Thecenterstartsbyaskingforaminimum (orreserve)bidbmin
– Ifsomebidderiswillingtopaythatamount,thecenterthenasksforbmin+d,forsomeincrementd,andcontinuesupfromthere.
– Theauctionendswhennobodyiswillingtobidanymore.– Thenthelastbidderwinstheitem,payingthepricehebid.
Howdoweknowifthisisagoodmechanism?– onegoalistomaximizeexpectedrevenue fortheseller;another
goalistomaximizeanotionofglobalutility– wesayanactionisefficientifthegoodsgototheagentwho
valuesthemmostTheEnglishauctionisusuallybothefficientandrevenuemaximizingifthereis
– asufficientnumberofbidderstoenterthegame– nocollusion– anunfairorillegalagreementbytwooremore
bidderstomanipulateprices
Collusion
Anunfairorillegalagreementbytwooremorebidderstomanipulateprices.Itcanhappeninsecretbackroomdealsortacitly,withintherulesofthemechanism
Exampleofpricemanipulationwithintherulesofthemechanism– In1999,Germanyauctionedtenblocksofcell-phonespectrumwitha
simultaneousaction(bidsweretakenonalltenblocksatthesametime)usingtherulethatanybidmustbeaminimumofa10%raiseoverthepreviousbidonablock.
– Therewereonlytwocrediblebidders,Mannesman andT-Mobile– Mannesman entered thebidof20millionDEMonblocks1-5and18.18
millionDEMonblocks6-10.– T-MobileinterpretedMannesman’s firstbidasanoffer:bothparties
couldcomputethata10%raiseon18.18Mis19.99M.Mannesman’s bidwasinterpretedasanoffer “wecangeteachhalfofblocksfor20M”
Whattodowithit?– ahigherreserveprice– a sealed-bid first-priceauction– bringathirdbidder
Truth-revealing
Ingeneral,boththesellerandtheglobalutilityfunctionbenefitiftherearemorebidders.Onewaytoencouragemorebiddersistomakethethemechanismeasierforthem.Itisdesirablethatthebiddershaveadominantstrategy,strategythatworksagainstallotherstrategies.– anagentwithadominantstrategycanjustbid,withoutwastingtimecontemplating theotheragents’possiblestrategies
Usuallysuchastrategyinvolvesthebiddersrevealingtheirtruthvaluevi – thenitiscalledatruth-revealing,ortruthful,auction.
PropertiesofEnglishauction
TheEnglishauctionhasmostofthedesirableproperties:– biddershaveasimpledominant strategy:keepbiddingaslongasthecurrentcostisbelowyourvi
– thisisnotquitetruth-revealing, becausethewinningbidderrevealsonlythathisvi ≥ b0 +d(weknowonlyalowerboundonvi)
SomedisadvantagesoftheEnglishauction:– ifthere isoneclearlystrongerbiddersuchthathecanalwaysbidhigherthananyotherbidderthenthecompetitorsmaynotenteratall,andthestrongbidderendsupwinningatthereserveprice(discourage competition)
– highcommunication costsastheauctiontakesplaceinoneroomorallbiddershavetohavehigh-speed,securecommunicationlines
Sealed-bidauction
Analternativemechanismisthesealed-bidauction.– eachbiddermakesasinglebidandcommunicatesittotheauctioneerwithouttheotherbiddersseeingit
– thehighestbidwins
Thereisno longerasimpledominantstrategy– thebiddependsonexpectedbidsofotheragentsagents– letvi beyourutilityvalueandb0 betheexpectedmaximumofalltheotheragents’bids
– thenyoushouldbidb0+3 (forsomesmall3),ifthatislessthanvi
Notethattheagentwiththehighestvi mightnotwintheauction,reducingthebiastowardanadvantagedbidder(theauctionismorecompetitive).
Sealed-bidsecond-priceauction
Asmallchangeinthemechanismforsealed-bidauctionsproducesthesealed-bidsecond-priceauction,alsoknownasaVickrey auction.– Thewinnerpaysthepriceofthesecond-highestbid,b0,
ratherthanpayinghisownbid.– Thedominant strategyisnowsimplytobidvi;the
mechanismistruth-revealing.
Whyisthisadominantstrategy?theutilityofagenti intermsofhisbidbi,hisvaluevi,andthebestbidamongtheotheragentsb0:(vi– b0)ifbi>b0,otherwise0
• when(vi– b0)>0,thenanybidthatwinstheauctionisoptimal,andbiddingviinparticularwinstheauction
• when(vi– b0)<0,thananybidthatlosestheauctionisoptimal,andbidding,vi inparticularlosestheauction
• sobiddingvi isoptimalforallpossiblevaluesofb0,andinfact,vi istheonlybidthathasthisproperty
Commongoods
Consideranothertypeofgame,inwhichcountriessettheirpolicyforcontrollingairpollution.
Eachcountryhasachoice• theycanreducepollutionatacostof-10pointsforimplementing thenecessary changes
• ortheycancontinuetopollute,whichgivesthemanetutilityof-5(inaddedhealthcosts,etc.)andalsocontributes-1pointstoeveryothercountry(becausetheairissharedacrosscountries)
Whatisthestrategyofeachcountry?• Clearly,thedominantstrategyforeachcountryis“continuetopollute”.
• Ifthereare100countriesandeachfollowsthispolicy,theneachcountrygetsatotalutility-104.
• Ifevery countryreducespollution,theywouldeachhaveautilityof-10!
Tragedyofcommons
Tragedyofcommons:ifnobodyhastopayforusingacommonresource,thenittendstobeexploitedinawaythatleadstoalowertotalutilityforallagents.
Itissimilartotheprisoner’sdilemma:thereisanothersolutiontothegamethatisbetterforallparties,butthereappearstobenowayforrationalagentstoarriveatthatsolution.
Tragedyofcommons:taxes
Astandardapproachfordealingwiththetragedyofcommonsistochangethemechanismtoonethatchargeseachagentforusingthecommons(acarbontax).
Weneedtoensurethatallexternalities – effectsonglobalutilitythatarenotrecognizedinthe individualagents’transactions–aremadeexplicit.
Anotherexample:– SupposeacitydecidesitwantstoinstallsomefreewirelessInternettransceivers.However,thenumberoftransceiverstheycanaffordislessthanthenumberofneighborhoodsthatwantthem.
– Theproblemisthatifthey justaskeachneighborhoodcouncil“howmuchdoyouvaluethisfreegift?“theywouldallhaveanincentivetolie,andreportahighvalue.
– Asolutionisaskingtopayforit.
Vickrey-Clarks-Grovesmechanism
1. thecenteraskseachagenttoreportitsvalueforreceivinganitem– bi
2. the centerallocatesthegoodstoasubsetAofthebidders.Letbi(A)=bi,ifi∈A,otherwise0.ThecenterchoosesAtomaximizetotalreportedutilityB=Σi bi(A)
3. each agentpaysataxequaltoW-i – B-i,whereB-i =Σj≠i bj(A)W-i =totalglobalutilityifi werenotinthegameeachwinnerwouldpayataxequaltothehighestreported valueamongthelosers(loserspaynothing)
PropertiesofVickrey-Clarks-Grovesmechanism
WhydoestheVCGmechanismsmaketheagentshappy?– allwinnersshouldbehappybecausetheypayataxthatisless
thantheirvalue– alllosersareashappyastheycanbe,becausetheyvaluethe
goodslessthantherequiredtax
Whyisitthatthismechanism istruth-revealing?– eachagentmaximizes hispayoff,whichisthevalueofgettingan
item,minusthetaxvi(A)– (W-i – B-i,)
– agenti knowsthatthecenterwillmaximize globalutilityusingthereportedvaluesΣj bj(A)=bi(A)+Σj ≠i bj(A)
– whereasagenti wantsthecentertomaximizevi(A)+Σj≠i bj(A)– W-i
– Sinceagenti cannotaffectsthevalueofW-i (itdependsonlyontheotheragents),theonlywayi canmake thecenteroptimizewhati wantsistoreportthetrueutilitybi =vi
© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic