artificial intelligence -...

13
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Decisions with multiple agents What if the uncertainty is due to other agents and the decisions they make? And what if the decisions of those agents are in turn influenced by our decisions? agent design game theory can analyze the agent’s decisions and compute the expected utility for each decision (under the assumption that other agents are acting optimally according to game theory) mechanismdesign inverse game theory make it possible to define the rules of the environment so that the collective good of all agents is maximized (when each agent adopts the game-theoretic solution that maximizes its own utility)

Upload: truongquynh

Post on 07-Apr-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

ArtificialIntelligence

Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

Decisionswithmultipleagents

Whatiftheuncertaintyisduetootheragentsandthedecisionstheymake?Andwhatifthedecisionsofthoseagentsareinturninfluencedbyourdecisions?

• agentdesign– gametheorycananalyzetheagent’sdecisionsandcomputetheexpectedutilityforeachdecision(undertheassumptionthatotheragentsareactingoptimallyaccordingtogametheory)

• mechanismdesign– inversegametheorymakeitpossibletodefinetherulesoftheenvironmentsothatthecollectivegoodofallagentsismaximized(wheneachagentadoptsthegame-theoreticsolutionthatmaximizesitsownutility)

Single-movegames

Considerarestrictedsetofgames,whereallplayerstakeactionsimultaneouslyandtheresultofthegameisbasedonthissinglesetofactions

– whatmattersisthatnoplayerhasknowledgeoftheotherplayers’choices

Asingle-movegameisdefinedbythreecomponents:– players(oragents),likeO(odd)andE(even)– actionsthattheplayerscanchoose, likeoneortwofingers– a payofffunction thatgivestheutilitytoeachplayerforeachcombinationofactionsbyallplayers;thepayoffmatrixfortwo-fingerMorra isasfollows:

O:one O:twoE:one E=+2, O=-2 E=-3, O=+3

E:two E=-3, O=+3 E=+4, O=-4

Single-movegames:solutionandstrategy

Eachplayerinagamemustadoptandthenexecuteastrategy(policy).

– a purestrategyisadeterministicpolicy;forasingle-movegame,itisjustasingleaction

– a mixedstrategyisrandomizedpolicythatselectsactionsaccordingtoaprobabilitydistribution;fortwoactions,itiswritten[p,a;(1-p),b]

Thegame’soutcomeisanumericvalueforeachplayer.Asolutiontoagameisastrategyprofile(anassignmentofstrategytoeachplayer)inwhicheachplayeradoptsarationalstrategy.

– Whatdoes“rational”meanwheneachagentchoosesonlypartofthestrategyprofilethatdeterminestheoutcome?

Prisoner‘sdilemma

Considerthefollowingstory:– Twoallegedburglars,AliceandBob,arecaughtred-handednear

thesceneofburglaryandareinterrogatedseparately.– Aprosecutorofferseachadeal: ifyoutestifyagainstyour

partnerastheleaderofaburglaryring,youwillgofreewhileyourpartnerwillserve10yearsinprison.

– However, ifyoubothtestifyagainsteachother,youwillbothget5years.

– Ifyoubothrefusetotestify,youwillserveonly1yeareachforlesserchargeofpossessingstoleproperty.

Showtheytestifyorrefuse?– Therationaldecisionistotestify.

Alice:testify Alice:refuseBob:testify A=-5, B=-5 A=-10, B=0

Bob:refuse A=0, B=-10 A=-1, B=-1

Dominance

TestifyisadominantstrategyforthePrisoner‘sdilemma.– a strategysforplayerpstronglydominatesstrategys’iftheoutcomeforsis

betterforpthantheoutcomefors’,foreverychoiceofstrategiesbytheotherplayer(s)

– a strategysweaklydominatess’ifsisbetteronatleastonestrategyprofileandnoworseonanyother

Itisirrational toplayadominatedstrategyandnottoplayadominantstrategyifoneexists.

Wheneachplayerhasadominantstrategy,thecombinationofthosestrategiesiscalledadominantstrategyequilibrium.Astrategyprofileformsanequilibrium ifnoplayercanbenefitbyswitchingstrategies,giventhateveryotherplayerstickswiththesamestrategy.Everygamehasatleastoneequilibrium– Nashequilibrium.

TheoutcomeofgameisParetooptimal ifthereisnootheroutcomethatallplayerswouldprefer.

– AnoutcomeisParetodominatedbyanotheroutcomeifallplayerswouldprefertheotheroutcome.

Prisoner’sdilemmaisduetohavingadominantstrategyequilibrium(testify,testify)thatisParetodominatedbyoutcome(refuse, refuse).

NodominantstrategyConsiderthefollowinggame

– Acme,avideogameconsolemanufacturer,hastodecidewhetheritsnextgamemachinewilluseBlu-raydiscsorDVDs.

– Meanwhile,thevideogamesoftwareproducerBestneedstodecidewhethertoproducenextgameonBlu-rayorDVD.

– Theprofitsofbothwillbepositiveiftheyagreeandnegativeiftheydisagree.

Thereisnodominantstrategyequilibrium forthisgame,buttherearetwoNashequilibria.Therearemultipleacceptablesolutions,butifeachagentaimsforadifferentone,thenbothagentswillsuffer.Howcantheyagreeonasolution?

BothcanshouldchoosethePareto-optimal Nashequilibriumprovidedthatoneexists;(bluray,bluray)isthePareto-optimalsolutionWhatiftherearemuchsuchsolutions(forexampleif(bluray,bluray)hadpayoff(5,5))?

• agentscaneitherguessorcommunicate• coordination games(gamesinwhichplayersneedtocommunicate)

Acme:bluray Acme:dvdBest:bluray A=+9, B=+9 A=-4, B=-1

Best:dvd A=-3, B=-1 A=+5, B=+5

Mixedstrategies

Considertwo-fingerMorra game– nopure-strategy profileexists

• ifthetotalnumberoffingersiseven,thenOwillwanttoswitch

• ifthetotalisodd,thenEwillwanttoswitch

– wemustlookformixedstrategiesinstead

VonNeumanndevelopedamethodforfindingtheoptimalmixedstrategyfortwo-player,zero-sumgames(gamesinwhichthesumofthepayoffsisalwayszero).– themaximin technique– weneedtoconsider thepayoffsofonlyoneplayer(E)

Maximin technique(purestrategies)

Supposewechangetherulesasfollows:– FirstEpicksherstrategyandrevealsittoO.ThenOpickshis

strategy,withknowledge ofE‘sstrategy.• Thisgivesaturn-takinggametowhichwecanapplythestandardminimax algorithm.

• Clearly,thisgamesfavorsO,sowegetalowerboundforthetrueutilityforE(-3).

– NowsupposewechangetherulestoforceOtorevealhisstrategyfirst,followedbyE.• ThisgivesanupperboundforthetrueutilityofE(2).

Maximin technique(mixedstrategies)

Weneedtoturnoutouranalysistomixedstrategies[p,one;(1-p),two]

Oncethefirstplayerhasrevealedhisorherstrategy,thesecondplayermightaswellchooseapurestrategy.

WhatisthevalueofptogetthebestutilityforE(left)?– p=7/12andthepayoff -1/12

WhatisthevalueofqtogetthebestutilityforO(right)?– q=7/12andthepayoff-1/12

Optimalstrategyforbothplayersis[7/12,one;5/12,two]» maximin equilibrium(itisalsoaNashequilibrium)

» two-fingerMorra gamefavortheplayerO

- -

Repeatedgames

Whatifthesamegameisrepeatedmoretimes?

Repeatedgame isthesimplestkindofamultiple-movegame:

• playersfacethesamechoicerepeatedly,buteachtimewithknowledgeofthehistoryallplayers’previouschoices

• payoffsareadditiveovertime

StrategiesforrepeatedgamesTherepeatedversionoftheprisoner’sdilemma:1. thesameplayersplay100rounds

– rationalstrategyisstilltotestify(thelastgameisnottherepeatedgameetc.)

– earningatotaljailsentenceof500yearseach2. 99%chancethattheplayersmeetagain

– theexpectednumberofroundsisstill100,butneitherplayerknowsforsurewhichroundwillbethelast

– perpetualpunishmentstrategy:eachplayerrefuses unlesstheotherplayhaseverplayedtestify

– theexpectedfuturepayoffis-100(∑ 0.99% ∗ (−1)+,-. )ifbothplayers

adoptedthisstrategy– a playerwhodeviatesfromthestrategyandchoosestestifywillgain

ascore0,butthenbothplayerswillplaytestifyandthetotalexpectedfuturepayoffbecomes-495(0 + ∑ 0.99% ∗ (−5)+

,-1 )

Afamousstrategyiscalledtit-for-tat:• startingwithrefuse andtheechoingtheotherplayer’sprevious

moveonallsubsequentmoves• highlyrobustandeffectiveagainstawidevarietyofstrategies

MechanismdesignSofarwefocusedonthequestion„Givenagame,whatisarationalstrategy?“Whatifweask„Giventhatagentspickrationalstrategies,whatgameshouldwedesign?“Wewouldlike todesignagamewhosesolutions,consistingofeachagentpursuingitsownrationalstrategy,resultinthemaximizationofsomeglobalutilityfunction.

Thisiscalledmechanismdesignorsometimes inversegametheory.Itisusedineconomicsandpoliticalscience.Ingeneralitallowsustoconstructsmartsystemsoutofcollectionofmorelimited(evenuncooperative)systems.

Amechanism consistsof:– a language fordescribingthesetofallowablestrategies

thatagentsmayadopt,– a distinguishedagent– center – thatcollectsreports

ofstrategychoicesfromagentsinthegames– anoutcomerule,knowntoallagents,thatthecenter

usestodeterminethepayoffsofeachagentgiventheirstrategychoices

Auctions

Anauction isamechanismforsellingsomegoodstomembersofapoolofbidders.

Forsimplicity,weconcentrateonauctionswithasingleitemforsale.

Eachbidderi hasautilityvalue vi forhavingtheitem.• Insomecases,eachbidderhasaprivatevaluefortheitem.

– Anoldfurniturehasdifferentvalueforafurniturecollectorandyoungfamily.

• Inothercases,theitemhasacommonvalue,butthereisuncertaintyastowhattheactualvalueis.– Differentbiddershavedifferentinformationandhencedifferent

estimatesoftheitem’struevalue.

Auctionmechanism– eachbiddergetsachancetomakeabidbi– thehighestbidbmaxwinstheitem,buttheprice

paidneednotbebmax (partofmechanismdesign)

Englishauction

Thebest-knownauctionmechanismistheascending-bid, orEnglishauction.

– Thecenterstartsbyaskingforaminimum (orreserve)bidbmin

– Ifsomebidderiswillingtopaythatamount,thecenterthenasksforbmin+d,forsomeincrementd,andcontinuesupfromthere.

– Theauctionendswhennobodyiswillingtobidanymore.– Thenthelastbidderwinstheitem,payingthepricehebid.

Howdoweknowifthisisagoodmechanism?– onegoalistomaximizeexpectedrevenue fortheseller;another

goalistomaximizeanotionofglobalutility– wesayanactionisefficientifthegoodsgototheagentwho

valuesthemmostTheEnglishauctionisusuallybothefficientandrevenuemaximizingifthereis

– asufficientnumberofbidderstoenterthegame– nocollusion– anunfairorillegalagreementbytwooremore

bidderstomanipulateprices

Collusion

Anunfairorillegalagreementbytwooremorebidderstomanipulateprices.Itcanhappeninsecretbackroomdealsortacitly,withintherulesofthemechanism

Exampleofpricemanipulationwithintherulesofthemechanism– In1999,Germanyauctionedtenblocksofcell-phonespectrumwitha

simultaneousaction(bidsweretakenonalltenblocksatthesametime)usingtherulethatanybidmustbeaminimumofa10%raiseoverthepreviousbidonablock.

– Therewereonlytwocrediblebidders,Mannesman andT-Mobile– Mannesman entered thebidof20millionDEMonblocks1-5and18.18

millionDEMonblocks6-10.– T-MobileinterpretedMannesman’s firstbidasanoffer:bothparties

couldcomputethata10%raiseon18.18Mis19.99M.Mannesman’s bidwasinterpretedasanoffer “wecangeteachhalfofblocksfor20M”

Whattodowithit?– ahigherreserveprice– a sealed-bid first-priceauction– bringathirdbidder

Truth-revealing

Ingeneral,boththesellerandtheglobalutilityfunctionbenefitiftherearemorebidders.Onewaytoencouragemorebiddersistomakethethemechanismeasierforthem.Itisdesirablethatthebiddershaveadominantstrategy,strategythatworksagainstallotherstrategies.– anagentwithadominantstrategycanjustbid,withoutwastingtimecontemplating theotheragents’possiblestrategies

Usuallysuchastrategyinvolvesthebiddersrevealingtheirtruthvaluevi – thenitiscalledatruth-revealing,ortruthful,auction.

PropertiesofEnglishauction

TheEnglishauctionhasmostofthedesirableproperties:– biddershaveasimpledominant strategy:keepbiddingaslongasthecurrentcostisbelowyourvi

– thisisnotquitetruth-revealing, becausethewinningbidderrevealsonlythathisvi ≥ b0 +d(weknowonlyalowerboundonvi)

SomedisadvantagesoftheEnglishauction:– ifthere isoneclearlystrongerbiddersuchthathecanalwaysbidhigherthananyotherbidderthenthecompetitorsmaynotenteratall,andthestrongbidderendsupwinningatthereserveprice(discourage competition)

– highcommunication costsastheauctiontakesplaceinoneroomorallbiddershavetohavehigh-speed,securecommunicationlines

Sealed-bidauction

Analternativemechanismisthesealed-bidauction.– eachbiddermakesasinglebidandcommunicatesittotheauctioneerwithouttheotherbiddersseeingit

– thehighestbidwins

Thereisno longerasimpledominantstrategy– thebiddependsonexpectedbidsofotheragentsagents– letvi beyourutilityvalueandb0 betheexpectedmaximumofalltheotheragents’bids

– thenyoushouldbidb0+3 (forsomesmall3),ifthatislessthanvi

Notethattheagentwiththehighestvi mightnotwintheauction,reducingthebiastowardanadvantagedbidder(theauctionismorecompetitive).

Sealed-bidsecond-priceauction

Asmallchangeinthemechanismforsealed-bidauctionsproducesthesealed-bidsecond-priceauction,alsoknownasaVickrey auction.– Thewinnerpaysthepriceofthesecond-highestbid,b0,

ratherthanpayinghisownbid.– Thedominant strategyisnowsimplytobidvi;the

mechanismistruth-revealing.

Whyisthisadominantstrategy?theutilityofagenti intermsofhisbidbi,hisvaluevi,andthebestbidamongtheotheragentsb0:(vi– b0)ifbi>b0,otherwise0

• when(vi– b0)>0,thenanybidthatwinstheauctionisoptimal,andbiddingviinparticularwinstheauction

• when(vi– b0)<0,thananybidthatlosestheauctionisoptimal,andbidding,vi inparticularlosestheauction

• sobiddingvi isoptimalforallpossiblevaluesofb0,andinfact,vi istheonlybidthathasthisproperty

Commongoods

Consideranothertypeofgame,inwhichcountriessettheirpolicyforcontrollingairpollution.

Eachcountryhasachoice• theycanreducepollutionatacostof-10pointsforimplementing thenecessary changes

• ortheycancontinuetopollute,whichgivesthemanetutilityof-5(inaddedhealthcosts,etc.)andalsocontributes-1pointstoeveryothercountry(becausetheairissharedacrosscountries)

Whatisthestrategyofeachcountry?• Clearly,thedominantstrategyforeachcountryis“continuetopollute”.

• Ifthereare100countriesandeachfollowsthispolicy,theneachcountrygetsatotalutility-104.

• Ifevery countryreducespollution,theywouldeachhaveautilityof-10!

Tragedyofcommons

Tragedyofcommons:ifnobodyhastopayforusingacommonresource,thenittendstobeexploitedinawaythatleadstoalowertotalutilityforallagents.

Itissimilartotheprisoner’sdilemma:thereisanothersolutiontothegamethatisbetterforallparties,butthereappearstobenowayforrationalagentstoarriveatthatsolution.

Tragedyofcommons:taxes

Astandardapproachfordealingwiththetragedyofcommonsistochangethemechanismtoonethatchargeseachagentforusingthecommons(acarbontax).

Weneedtoensurethatallexternalities – effectsonglobalutilitythatarenotrecognizedinthe individualagents’transactions–aremadeexplicit.

Anotherexample:– SupposeacitydecidesitwantstoinstallsomefreewirelessInternettransceivers.However,thenumberoftransceiverstheycanaffordislessthanthenumberofneighborhoodsthatwantthem.

– Theproblemisthatifthey justaskeachneighborhoodcouncil“howmuchdoyouvaluethisfreegift?“theywouldallhaveanincentivetolie,andreportahighvalue.

– Asolutionisaskingtopayforit.

Vickrey-Clarks-Grovesmechanism

1. thecenteraskseachagenttoreportitsvalueforreceivinganitem– bi

2. the centerallocatesthegoodstoasubsetAofthebidders.Letbi(A)=bi,ifi∈A,otherwise0.ThecenterchoosesAtomaximizetotalreportedutilityB=Σi bi(A)

3. each agentpaysataxequaltoW-i – B-i,whereB-i =Σj≠i bj(A)W-i =totalglobalutilityifi werenotinthegameeachwinnerwouldpayataxequaltothehighestreported valueamongthelosers(loserspaynothing)

PropertiesofVickrey-Clarks-Grovesmechanism

WhydoestheVCGmechanismsmaketheagentshappy?– allwinnersshouldbehappybecausetheypayataxthatisless

thantheirvalue– alllosersareashappyastheycanbe,becausetheyvaluethe

goodslessthantherequiredtax

Whyisitthatthismechanism istruth-revealing?– eachagentmaximizes hispayoff,whichisthevalueofgettingan

item,minusthetaxvi(A)– (W-i – B-i,)

– agenti knowsthatthecenterwillmaximize globalutilityusingthereportedvaluesΣj bj(A)=bi(A)+Σj ≠i bj(A)

– whereasagenti wantsthecentertomaximizevi(A)+Σj≠i bj(A)– W-i

– Sinceagenti cannotaffectsthevalueofW-i (itdependsonlyontheotheragents),theonlywayi canmake thecenteroptimizewhati wantsistoreportthetrueutilitybi =vi

© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

[email protected]