tddd10 ai programming multiagent decision...

21
TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new map: Kobe2013-stations ChangeSet contains all the properties, not just new ones In AbstractAgent class: protected void processSense(KASense sense) { model.merge(sense.getChangeSet()); Collection<Command> heard = sense.getHearing(); think(sense.getTime(), sense.getChangeSet(), heard); } You can override it: protected void processSense(KASense sense) { // send update to other agent // using world model before merge super.processSense(sense); } 3 / 83 Lectures 1AI Programming: Introduction 2Introduction to RoboRescue 3Agents and Agents Architecture 4Multi-Agent and Communication 5 Multi-Agent Decision Making 6Cooperation And Coordination 1 7Cooperation And Coordination 2 8Machine Learning 9Automated Planning 10Putting It All Together 4 / 83 Lecture goals Multi-agent decision in a competitive environment Learn about the concept of utility, rational agents, voting and auctioning

Upload: ngonhan

Post on 09-Aug-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

TDDD10AIProgrammingMultiagentDecisionMaking

CyrilleBerger

2/83

Labsnewmap:Kobe2013-stationsChangeSetcontainsalltheproperties,notjustnewonesInAbstractAgentclass:protectedvoidprocessSense(KASensesense){model.merge(sense.getChangeSet());Collection<Command>heard=sense.getHearing();think(sense.getTime(),sense.getChangeSet(),heard);}Youcanoverrideit:protectedvoidprocessSense(KASensesense){//sendupdatetootheragent//usingworldmodelbeforemergesuper.processSense(sense);}

3/83

Lectures1AIProgramming:Introduction2IntroductiontoRoboRescue3AgentsandAgentsArchitecture4Multi-AgentandCommunication5Multi-AgentDecisionMaking

6CooperationAndCoordination17CooperationAndCoordination28MachineLearning9AutomatedPlanning

10PuttingItAllTogether

4/83

Lecturegoals

Multi-agentdecisioninacompetitiveenvironmentLearnabouttheconceptofutility,rationalagents,votingandauctioning

Page 2: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

5/83

Lecturecontent

Self-InterestedAgentsSocialChoiceAuctionsSingleDimensionAuctionsCombinatorialAuctions

Self-InterestedAgents

7

UtilitiesandPreferencesAssumewehavejusttwoagents:Ag={i,j}Agentsareassumedtobeself-interested:theyhavepreferencesoverhowtheenvironmentisAssumeΩ={ω₁,ω₂,…}isthesetof“outcomes”thatagentshavepreferencesoverWecapturepreferencesbyutilityfunctions:u i=Ω→ℝuⱼ=Ω→ℝ

Utilityfunctionsleadtopreferenceorderingsoveroutcomes:ω⪰ω’meansuᵢ(ω)≥uᵢ(ω’)ω⪲ω’meansuᵢ(ω)>uᵢ(ω’)

8

Whatisutility?Utilityisnotmoney,butsimilar

Page 3: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

9

Self-InterestedAgentsIfagentsrepresentindividualsororganizationsthenwecannotmakethebenevolenceassumption.

Agentswillbeassumedtoacttofurtherthereowninterests,possiblyatexpenseofothers.

Potentialforconflict.Maycomplicatethedesigntaskenormously.

10

MultiagentEncounters(1/2)

Weneedamodeloftheenvironmentinwhichtheseagentswillact…agentssimultaneouslychooseanactiontoperform,andasaresultoftheactionstheyselect,anoutcomeinΩwillresulttheactualoutcomedependsonthecombinationofactionsassumeeachagenthasjusttwopossibleactionsthatitcanperform,C(“cooperate”)andD(“defect”)

Environmentbehaviorgivenbystatetransformerfunction:τ:Acⁱ⨯Acʲ→Ω

11

MultiagentEncounters(2/2)

ExamplesofastatetransformerfunctionThisenvironmentissensitivetoactionsofbothagents:τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄Neitheragenthasanyinfluenceinthisenvironment:τ(D,D)=ω₁τ(D,C)=ω₁τ(C,D)=ω₁τ(C,C)=ω₁Thisenvironmentiscontrolledbyjτ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₁τ(C,C)=ω₂

12

CoordinationgameSupposewehavethecasewherebothagentscaninfluencetheoutcome,andtheyhaveutilityfunctionsasfollows:uᵢ(ω₁)=2uᵢ(ω₂)=1uᵢ(ω₃)=3uᵢ(ω₄)=4uⱼ(ω₁)=2uⱼ(ω₂)=3uⱼ(ω₃)=1uⱼ(ω₄)=4

Thisenvironmentissensitivetoactionsofbothagents:τ(D,D)=ω₁τ(D,C)=ω₂τ(C,D)=ω₃τ(C,C)=ω₄

Withabitofabuseofnotation:uᵢ(D,D)=2uᵢ(D,C)=1uᵢ(C,D)=3uᵢ(C,C)=4uⱼ(D,D)=2uⱼ(D,C)=3uⱼ(C,D)=1uⱼ(C,C)=4

Thenagenti’spreferencesare:C,C⪰ᵢC,D≻ᵢD,C⪰ᵢD,D

“C”istherationalchoicefori.

Page 4: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

13

PayoffMatrices

Wecancharacterizethepreviousscenarioinapayoffmatrix:

AgentiisthecolumnplayerAgentjistherowplayer

14

DisgraceofGijón(WorldCup1982)

Onegameleft:Germany-Austriauᵢ(≥3-0)=2uⱼ(≥3-0)=-1uᵢ(2-0)=uᵢ(1-0)=2uⱼ(2-0)=uⱼ(1-0)=1uᵢ(a-a)=-1uⱼ(a-a)=2uᵢ(0-a)=-1uⱼ(0-a)=2(a>1)Finalscore:Germany1-0Austria

15

ThePrisoner’sDilemmaTwomenarecollectivelychargedwithacrimeandheldinseparatecells,withnowayofmeetingorcommunicating.Theyaretoldthat:ifoneconfessesandtheotherdoesnot,theconfessorwillbefreed,andtheotherwillbejailedforthreeyearsIfbothconfess,theneachwillbejailedfortwoyears

Bothprisonersknowthatifneitherconfesses,thentheywilleachbejailedforoneyear

16

ThePrisoner’sDilemmaPayoffmatrixforprisoner’sdilemma:

Topleft:Ifbothdefect,thenbothgetpunishmentformutualdefectionTopright:Ificooperatesandjdefects,igetssucker’spayoffof1,whilejgets4Bottomleft:Ifjcooperatesandidefects,jgetssucker’spayoffof1,whileigets4Bottomright:Rewardformutualcooperation

Page 5: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

17

SolutionConcepts

Howwillarationalagentbehaveinanygivenscenario?Answeredinsolutionconcepts:dominantstrategy;Nashequilibriumstrategy;Paretooptimalstrategies;strategiesthatmaximizesocialwelfare.

18

DominantStrategies(1/2)

Givenanyparticularstrategy(eitherCorD)ofagenti,therewillbeanumberofpossibleoutcomesWesays₁dominatess₂ifeveryoutcomepossiblebyiplayings₁ispreferredovereveryoutcomepossiblebyiplayings₂ArationalagentwillneverplayadominatedstrategySoindecidingwhattodo,wecandeletedominatedstrategies

Unfortunately,thereisnotalwaysauniqueundominatedstrategy

19

DominantStrategies(2/2)

Coordinationgame:

Prisoner'sDilemna:

20

(PureStrategy)NashEquilibrium(1/2)

Ingeneral,wewillsaythattwostrategiess1ands2areinNashequilibriumif:undertheassumptionthatagentiplayss₁,agentjcandonobetterthanplays₂;andundertheassumptionthatagentjplayss₂,agenticandonobetterthanplays₁.

NeitheragenthasanyincentivetodeviatefromaNashequilibriumUnfortunately:NoteveryinteractionscenariohasaNashequilibriumSomeinteractionscenarioshavemorethanoneNashequilibrium

Page 6: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

21

(PureStrategy)NashEquilibrium(2/2)

Coordinationgame:

Prisoner'sDilemna:

22

ParetoOptimality(1/2)AnoutcomeissaidtobeParetooptimal(orParetoefficient)ifthereisnootheroutcomethatmakesoneagentbetteroffwithoutmakinganotheragentworseoff.IfanoutcomeisParetooptimal,thenatleastoneagentwillbereluctanttomoveawayfromit(becausethisagentwillbeworseoff).

IfanoutcomeωisnotParetooptimal,thenthereisanotheroutcomeω’thatmakeseveryoneashappy,ifnothappier,thanω.“Reasonable”agentswouldagreetomovetoω’inthiscase.(EvenifIdon’tdirectlybenefitfromω,youcanbenefitwithoutmesuffering.)

23

ParetoOptimality(2/2)Coordinationgame:

Prisoner'sDilemna:

24

SocialWelfare(1/2)Thesocialwelfareofanoutcomeωisthesumoftheutilitiesthateachagentgetsfromω:

Thinkofitasthe“totalamountofutilityinthesystem”.Asasolutionconcept,maybeappropriatewhenthewhole

system(allagents)hasasingleowner(thenoverallbenefitofthesystemisimportant,notindividuals).

Page 7: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

25

SocialWelfare(2/2)Coordinationgame:

Prisoner'sDilemna:

26

ThePrisoner’sDilemmaSolutionconceptsDisadominantstrategy.(D,D)istheonlyNashequilibrium.Alloutcomesexcept(C,C)areParetooptimal.(C,C)maximizessocialwelfare.

Theindividualrationalactionisdefect

Thisguaranteesapayoffofnoworsethan2,whereascooperatingguaranteesapayoffofatmost1.Sodefectionisthebestresponsetoallpossiblestrategies:bothagentsdefect,andgetpayoff=2Butintuitionsaysthisisnotthebestoutcome:Surelytheyshouldbothcooperateandeachgetpayoffof3!

27

ThePrisoner’sDilemma

Thisapparentparadoxisthefundamental

problemofmulti-agentinteractions.

Itappearstoimplythatcooperationwillnot

occurinsocietiesofself-interestedagents.Realworldexamples:nucleararmsreduction(“whydon’tIkeepmine...”)freeridersystems—publictransport;televisionlicenses.

Canwerecovercooperation?

28

TheIteratedPrisoner’sDilemma

Oneanswer:playthegamemorethanonceIfyouknowyouwillbemeetingyouropponentagain,thentheincentivetodefectappearstoevaporateCooperationistherationalchoiceintheinfinitelyrepeatedprisoner’sdilemma

Page 8: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

29

BackwardsInductionBut…,supposeyoubothknowthatyouwillplaythegameexactlyntimesOnroundn-1,youhaveanincentivetodefect,togainthatextrabitofpayoff…Butthismakesroundn–2thelast“real”,andsoyouhaveanincentivetodefectthere,too.Thisisthebackwardsinductionproblem.

Playingtheprisoner’sdilemmawithafixed,finite,pre-determined,commonlyknownnumberofrounds,defectionisthebeststrategy

30

Axelrod’sTournament

Supposeyouplayiteratedprisoner’sdilemmaagainstarangeofopponents…Whatstrategyshouldyouchoose,soastomaximizeyouroverallpayoff?Axelrod(1984)investigatedthisproblem,withacomputertournamentforprogramsplayingtheprisoner’sdilemma

31

StrategiesinAxelrod’sTournament

RANDOMALLD:“Alwaysdefect”—thehawkstrategy;TIT-FOR-TAT:Onroundu=0,cooperateOnroundu>0,dowhatyouropponentdidonroundu–1

TESTER:On1stround,defect.Iftheopponentretaliated,thenplayTIT-FOR-TAT.Otherwiseinterspersecooperationanddefection.

JOSS:AsTIT-FOR-TAT,exceptperiodicallydefect

32

Axelrod’sTournamentresults

TIT-FOR-TATwonthefirsttournamentAsecondtournamentwascalledTIT-FOR-TATwonthesecondtournamentaswell

Page 9: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

33

RecipesforSuccessinAxelrod’sTournament

Axelrodsuggeststhefollowingrulesforsucceedinginhistournament:Don’tbeenvious:Don’tplayasifitwerezerosum!Benice:Startbycooperating,andreciprocatecooperationRetaliateappropriately:Alwayspunishdefectionimmediately,butuse“measured”force—don’toverdoitDon’tholdgrudges:Alwaysreciprocatecooperationimmediately

34

CompetitiveandZero-SumInteractions

WherepreferencesofagentsarediametricallyopposedwehavestrictlycompetitivescenariosZero-sumencountersarethosewhereutilitiessumtozero:uᵢ(ω)+u (jω)=0forallω∊Ω

ZerosumimpliesstrictlycompetitiveZerosumencountersinreallifeareveryrare,butpeopletendtoactinmanyscenariosasiftheywerezerosum

35

MatchingPennies

Playersiandjsimultaneouslychoosethefaceofacoin,either“heads”or“tails”.Iftheyshowthesameface,theniwins,whileiftheyshowdifferentfaces,thenjwins.

36

MixedStrategiesforMatchingPennies

NopairofstrategiesformsapurestrategyNashEquilibrium:whateverpairofstrategiesischosen,somebodywillwishtheyhaddonesomethingelse.Thesolutionistoallowmixedstrategies:play“heads”withprobability0.5play“tails”withprobability0.5.

ThisisaNashEquilibriumstrategy.

Page 10: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

37

MixedStrategies

Amixedstrategyhastheformplayα₁withprobabilityp₁playα₂withprobabilityp2₂...playαkwithprobabilitypk.thatp₁+p₂+…+pₖ=1.NashprovedthateveryfinitegamehasaNashequilibriuminmixedstrategies.

SocialChoice

39

SocialChoice

Socialchoicetheoryisconcernedwithgroupdecisionmaking.Classicexampleofsocialchoicetheory:voting.Formally,theissueiscombiningpreferencestoderiveasocialoutcome.

40

ComponentsofaSocialChoiceModel

AssumeasetAg={1,…,n}ofvoters.Thesearetheentitieswhoexpressespreferences.VotersmakegroupdecisionswrtasetΩ={ω₁,ω₂,…}ofoutcomes.Thinkoftheseasthecandidates.If|Ω|=2,wehaveapairwiseelection.

Page 11: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

41

Preferences

EachvoterhaspreferencesoverW:anorderingoverthesetofpossibleoutcomesΩ.Example,Suppose:Ω={gin,rum,brandy,whisky}thenwemighthaveagentiwithpreferenceorder:ωᵢ=(brandy,rum,gin,whisky)

meaning:brandy>ᵢrum>ᵢgin>ᵢwhisky

42

PreferenceAggregationThefundamentalproblemofsocialchoicetheory:Givenacollectionofpreferenceorders,oneforeachvoter,howdowecombinethesetoderiveagroupdecision,thatreflectsascloselyaspossiblethepreferencesofvoters?variantsofpreferenceaggregation:socialwelfarefunctions;socialchoicefunctions.

43

SocialWelfareFunctionsLetП(Ω)bethesetofpreferenceorderingsoverΩ.Asocialwelfarefunctiontakesthevoterpreferencesandproducesasocialpreferenceorder:

Wedefine≻*astheoutcomeofasocialwelfarefunctionwhisky≻*gin≻*brandy≻*rum≻*ginS≻*M≻*SD≻*MP≻*C≻*V≻*FP≻*KD≻*FI≻*PP

44

SocialChoiceFunctions

Sometimes,wewantjusttoselectoneofthepossiblecandidates,ratherthanasocialorder.Thisgivessocialchoicefunctions:

Example:presidentialelection.

Page 12: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

45

VotingProcedures:Plurality

Socialchoicefunction:selectsasingleoutcome.Eachvotersubmitspreferences.Eachcandidategetsonepointforeverypreferenceorderthatranksthemfirst.Winneristheonewithlargestnumberofpoints.Example:PoliticalelectionsinUK,France,USA...

Ifwehaveonlytwocandidates,thenpluralityisasimplemajorityelection.

46

AnomalieswithPlurality

Suppose|Ag|=100andΩ={ω₁,ω₂,ω₃}with:40%votersvotingforω₁30%ofvotersvotingforω₂30%ofvotersvotingforω₃

Withplurality,ω₁getselectedeventhoughaclearmajority(60%)preferanothercandidate!

47

StrategicManipulationbyTacticalVoting

Supposeyourpreferencesareω₁≻ω₂≻ω₃

whileyoubelieve49%ofvotershavepreferencesω₂≻ω₁≻ω₃

andyoubelieve49%havepreferenceω₃≻ω₂≻ω₁

Youmaydobettervotingforw2,eventhoughthisisnotyourtruepreferenceprofile.Thisistacticalvoting:anexampleofstrategicmanipulationofthevote.Especiallyaproblemintwolegselections

48

Condorcet’sParadoxSupposeAg={1,2,3}andΩ={ω₁,ω₂,ω₃}with:ω₁≻₁ω₂≻₁ω₃ω₂≻₂ω₃≻₂ω₁ω₃≻₃ω₁≻₃ω₂

Foreverypossiblecandidate,thereisanothercandidatethatispreferredbyamajorityofvoters!ThisisCondorcet’sparadox:therearesituationsinwhich,nomatterwhichoutcomewechoose,amajorityofvoterswillbeunhappywiththeoutcomechosen.

Page 13: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

49

Applicationsofsocialchoicetheory

MainapplicationisforhumanchoiceanddecisionmakingResultsaggregationaggregatetheoutputofseveralsearchengines Auctions

51

ApplicationofauctionsWiththeriseoftheInternet,auctionshavebecomepopularinmanye-commerceapplications(e.g.eBay)Auctionsareanefficienttoolforreachingagreementsinasocietyofself-interestedagentsForexample,bandwidthallocationonanetwork,sponsorlinks

AuctionscanbeusedforefficientresourceallocationwithindecentralizedcomputationalsystemsFrequentlyutilizedforsolvingmulti-agentandmulti-robotcoordinationproblemsForexample,team-basedexplorationofunknownterrain

52

WhatisanAuction?

AnauctiontakesplacebetweenanagentknownastheauctioneerandacollectionofagentsknownasthebiddersThegoaloftheauctionisfortheauctioneertoallocateallgoodstothebiddersTheauctioneerdesirestomaximizethepriceandbiddersdesiretominimizetheprice

Page 14: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

53

LimitPriceEachtraderhasavalueorlimitpricethattheyplaceonthegood.Abuyerwhoexchangesmorethantheirlimitpriceforagoodmakesaloss.Asellerwhoexchangesagoodforlessthantheirlimitpricemakesaloss.

Limitpricesclearlyhaveaneffectonthebehavioroftraders.Thereareseveralmodels,embodyingdifferentassumptionsaboutthenatureofthegood.

54

LimitPricePrivatevalueGoodhasanvaluetomethatisindependentofwhatitisworthtoyou.TextbookgivestheexampleofJohnLennon’slastdollarbill.

CommonvalueThegoodhasthesamevaluetoallofus,butwehavedifferingestimatesofwhatitis.Winner’scurse

CorrelatedvalueOurvaluesarerelated.Themoreyouarepreparedtopay,themoreIshouldbepreparedtopay.

55

Winner'scurseTermedinthe1950s:OilcompaniesbidfordrillingrightsintheGulfofProblemwasthebiddingprocessgiventheuncertaintiesinestimatingthepotentialvalueofanoffshoreoilfieldCompetitivebiddinginhighrisksituations,byCapen,ClappandCampbell,JournalofPetroleumTechnology,1971

ForexampleAnoilfieldhadanactualintrinsicvalueof$10Oilcompaniesmightguessitsvaluetobeanywherefrom$5millionto$20Thecompanywhowronglyestimatedat$20millionandplacedabidatthatlevelwouldwintheauction,andlaterfindthatitwasnotworththatmuch

Inmanycasesthewinneristhepersonwhohasoverestimatedthemost⇒“TheWinner’scurse”BidShading:Offerbidbelowacertainamountofthevaluation

56

AuctionCharacteristicsAuctionprocedureOneshot:OnlyonebiddingAscending:Auctioneerbeginsatminimumprice,biddersincreaseDescending:Auctioneerbeginsatpriceovervalueofgoodandlowersthepriceateachround

Continuous:Internet

AuctionsmaybeStandardAuction:OnesellerandmultipleReverseAuction:OnebuyerandmultipleDoubleAuction:Multiplesellersandmultiple

CombinatorialAuctionsBuyersandsellersmayhavecombinatorialvaluationsforbundlesof

Page 15: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

57

SingleversusMulti-dimensional

SingledimensionalauctionsTheonlycontentofanofferarethepriceandquantityofsomespecifictypeofgood.“I’llbid$200forthose2chairs”

MultidimensionalauctionsOfferscanrelatetomanydifferentaspectsofmanydifferentgoods.“I’mpreparedtopay$200forthosetworedchairs,but$300ifyoucandeliverthemtomorrow.”Frequencyrangesforcellphones

SingleDimensionAuctions

59

EnglishAuctionAnexampleoffirst-priceopen-cryascendingauctionsProtocol:AuctioneerstartsbyofferingthegoodatalowAuctioneeroffershigherpricesuntilnoagentiswillingtopaytheproposedlevel

Thegoodisallocatedtotheagentthatmadethehighest

PropertiesGeneratescompetitionbetweenbidders(generatesrevenueforthesellerwhenbiddersareuncertainoftheirvaluation)

Dominantstrategy:Bidslightlymorethancurrentbit,withdrawifbidreachespersonalvaluationofgood

Winner’scurse(forcommonvaluegoods)

60

DutchAuctionDutchauctionsareexamplesoffirst-priceopen-crydescendingauctionsProtocol:AuctioneerstartsbyofferingthegoodatartificiallyhighvalueAuctioneerlowersofferpriceuntilsomeagentmakesabidequaltothecurrentofferpriceThegoodisthenallocatedtotheagentthatmadetheoffer

PropertiesItemsaresoldrapidly(cansellmanylotswithinasingleday)Intuitivestrategy:waitforalittlebitafteryourtruevaluationhasbeencalledandhopenooneelsegetsintherebeforeyou(nogeneraldominantstrategy)Winner’scursealsopossible

Page 16: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

61

First-PriceSealed-BidAuctions

First-pricesealed-bidauctionsareone-shotauctions:Protocol:WithinasingleroundbidderssubmitasealedbidforthegoodThegoodisallocatedtotheagentthatmadehighestbidWinnerpaysthepriceofhighestbidOftenusedincommercialauctions,e.g.,publicbuildingcontractsetc.

Problem:thedifferencebetweenthehighestandsecondhighestbidis“wastedmoney”(thewinnercouldhaveofferedless)Intuitivestrategy:bidalittlebitlessthanyourtruevaluation(nogeneraldominantstrategy)Asmorebiddersassmallerthedeviationshouldbe!

62

VickreyAuctionsProposedbyWilliamVickreyin1961(NobelPrizeinEconomicSciencesin1996)

Vickreyauctionsareexamplesofsecond-pricesealed-bidone-shotProtocol:withinasingleroundbidderssubmitasealedbidforthegoodgoodisallocatedtoagentthatmadehighestbidwinnerpayspriceofsecondhighestbid

Dominantstrategy:bidyourtrueifyoubidmore,yourisktopaytoomuchifyoubidless,youloweryourchancesofwinningwhilestillhavingtopaythesamepriceincaseyouwin

Antisocialbehavior:bidmorethanyourtruevaluationtomakeopponentssuffer(not“rational”)

Forprivatevalueauctions,strategicallyequivalenttotheEnglishauctionmechanism

63

Generalizedfirstpriceauctions

UsedbyYahoofor“sponsoredlinks”auctionsIntroducedin1997forsellingInternetadvertisingbyYahoo/Overture(beforetherewereonly“bannerads”)Advertiserssubmitabidreportingthewillingnesstopayonaper-clickbasisforaparticularkeywordCost-Per-Click(CPC)bid

Advertiserswerebilledforeach“click”onsponsoredlinksleadingtotheirpageThelinkswerearrangedindescendingorderofbids,makinghighestbidsthemostprominentAuctionstakeplaceduringeach

However,auctionmechanismturnedouttobeunstable!Biddersrevisedtheirbidsasoftenaspossible

64

Generalizedsecondpriceauctions

IntroducedbyGoogleforpricingsponsoredlinks(AdWordsSelect)Observation:BiddersgenerallydonotwanttopaymuchmorethantherankbelowthemTherefore:2ndpriceauctionFurthermodifications:AdvertisersbidforkeywordsandkeywordcombinationsRank:CPC_BIDXqualityscorePrice:withrespecttolowerranks

http://www.chipkin.com/google-adwords-actual-cpc-calculation/AfterseeingGoogle’ssuccess,Yahooalsoswitchedtosecondpriceauctionsin2002

Page 17: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

CombinatorialAuctions

66

CombinatorialAuctionsInacombinatorialauction,theauctioneerputsseveralgoodsonsaleandtheotheragentssubmitbidsforentirebundlesofgoodsGivenasetofbids,thewinnerdeterminationproblemistheproblemofdecidingwhichofthebidstoacceptThesolutionmustbefeasible(nogoodmaybeallocatedtomorethanoneagent)Ideally,itshouldalsobeoptimal(inthesenseofmaximizingrevenuefortheauctioneer)Achallengingalgorithmicproblem

67

ComplementsandSubstitutes

ThevalueanagentassignstoabundleofgoodsmaydependonthecombinationComplements:ThevalueassignedtoasetisgreaterthanthesumofthevaluesassignstoitselementsExample:„apairofshoes”(leftshoeandarightshoe)

Substitutes:ThevalueassignedtoasetislowerthanthesumofthevaluesassignedtoitselementsExample:atickettothetheatreandanotheronetoafootballmatchforthesamenight

Insuchcasesanauctionmechanismallocatingoneitematatimeisproblematicsincethebestbiddingstrategyinoneauctionmaydependontheoutcomeofotherauctions

68

ProtocolOneauctioneer,severalbidders,andmanyitemstobesoldEachbiddersubmitsanumberofpackagebidsspecifyingthevaluation(price)thebidderispreparedtopayforaparticularbundleTheauctioneerannouncesanumberofwinningThewinningbidsdeterminewhichbidderobtainswhichitem,andhowmucheachbidderhastopayNoitemmaybeallocatedtomorethanonebidder

Examplesofpackagebids:Agent1:({a,b},5),({b,c},7),({c,d},6)

Agent2:({a,d},7),({a,c,d},8)

Agent3:({b},5),({a,b,c,d},12)

Generally,thereare2n−1non-emptybundlesfornitems,howtocomputetheoptimalsolution?

Page 18: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

69

OptimalWinnerDeterminationAlgorithm

AnauctioneerhasasetofitemsM={1,2,…,m}toThereareN={1,2,…,n}buyersplacingbidsBuyerssubmitasetofpackagebidsB={B1,B2,…,Bn}ApackagebidisatupleB=[S,v(S)],whereS⊆Misasetofitems(bundle)andvi(S)>0buyer’sitruevaluationxS,i∈{0,1}isadecisionvariableforassigningbundleStobuyeriThewinnerdeterminationproblem(WDP)istolabelthebidsaswinningorlosing(bydecidingeachxs,isoastomaximizethesumofthetotalacceptedbidprice)ThisisNP-Complete!Canbesolvedwithanintegerprogramsolver,orheuristicsearch

70

SolvingWDPsbyHeuristicSearch

TwowaysofrepresentingthestateBranch-on-items:AstateisasetofitemsforwhichanallocationdecisionhasalreadybeenmadeBranchingiscarriedoutbyaddingafurtheritem

Branch-on-bids:AstateisasetofbidsforwhichanacceptancedecisionhasalreadybeenmadeBranchingiscarriedoutbyaddingafurtherbid

71

Branch-on-ItemsBranchingbasedonthequestion:“Whatbidshouldthisitembeassignedto?”EachpathinthesearchtreeconsistsofasequenceofdisjointbidsBidsthatdonotshareitemswitheachotherApathendswhennobidcanbeaddedtoit

Costsateachnodearethesumofthepricesofthebidsacceptedonthepath

72

Problemwithbranch-on-items

Whatiftheauctioneer'srevenuecanincreasebykeepingitems?Example:

Thereisnobidfor1,$5bidfor2,$3bidfor{1;2}

Thus,bettertokeep1andsell2thansellingTheauctioneer'spossibilityofkeepingitemscanbeimplementedbyplacingdummybidsofpricezeroonthoseitemsthatreceivedno1-itembids(Sandholm2002)

Page 19: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

73

Exampleofbranch-on-items

Bids:{1,2},{2,3},{3},{1;3}WeaddDummyBids:{1},{2}

74

Branch-on-bidsBranchingisbasedonthequestion:“Shouldthisbidbeacceptedorrejected?“Binarytree

Whenbranchingonabid,thechildreninthesearchtreearetheworldwherethatbidisaccepted(IN),andtheworldwherethatbidisrejected(OUT)NodummybidsareneededFirstabidgraphisconstructedthatrepresentsallconstraintsbetweenthebidsThen,bidsareaccepted/rejecteduntilallbidshavebeenhandledOnaccept:removeallconstrainedbidsfromthegraph

Onreject:removebiditselffromthegraph

75

Branch-on-bids-Example

Bids:{1,2},{2,3},{3},{1;3}

76

HeuristicFunctionForanynodeNinthesearchtree,letg(N)betherevenuegeneratedbybidsthatwereacceptedaccordinguntilNTheheuristicfunctionh(N)estimatesforeverynodeNhowmuchadditionalrevenuecanbeexpectedongoingfromNAnupperboundonh(N)isgivenbythesumoverthemaximumcontributionofthesetofunallocateditemsA:

Tighterboundscanbeobtainedbysolvingthelinearprogramrelaxationoftheremainingitems(Sandholm2006)

Page 20: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

77

AuctionsforMulti-RobotExploration

Considerateamofmobilerobotsthathastovisitanumberofgiventargets(locations)ininitiallypartiallyunknownterrainExamplesofsuchtasksarecleaningmissions,space-exploration,surveillance,andsearchandrescueContinuousre-allocationoftargetstorobotsisnecessaryForexample,robotsmightdiscoverthattheyareseparatedbyablockagefromtheirtarget

Toallocateandre-allocatethetargetsamongthemselves,therobotscanuseauctionswheretheysellandbuytargetsTeamobjectivecanbetominimizethesumofallpathcosts,hence,biddingpricesareestimatedtravelcostsThepathcostofarobotisthesumoftheedgecostsalongitspath,fromitscurrentlocationtothelasttargetthatitvisits

78

Multi-RobotExploration

ThreerobotsexploringMars.Therobots’taskistogatherdataaroundthefourcraters,e.g.tovisitthehighlightedtargetsites.Source:N.Kalra

79

GeneralExplorationRobotalwaysfollowaminimumcostpaththatvisitsallallocatedtargetsWheneverarobotgainsmoreinformationabouttheterrain,itsharesthisinformationwiththeotherrobotsIftheremainingpathofatleastonerobotisblocked,thenallrobotsputtheirunvisitedtargetsupforauctionTheauction(s)closeafterapredeterminedamountoftimeConstraints:eachrobotwinsatmostonebundleandeachtargetiscontainedinexactlyonebundle

Aftereachauction,robotsgainednewtargetsorexchangedtargetswithotherrobotsThen,thecyclerepeats

80

Single-RoundCombinatorialAuction

Protocol:EveryrobotbidsallpossiblebundlesoftargetsThevaluationistheestimatedsmallestpathcostneededtovisitalltargetsinthebundle(TSP)Acentralauctioneerdeterminesandinformsthewinningrobotswithinoneround

Optimalteamperformance:Combinatorialauctionstakeallpositiveandnegativesynergiesbetweentargetsintoaccount

MinimizationofthetotalpathcostsDrawbacks:RobotscannotbidonallpossiblebundlesoftargetsbecausethenumberofpossiblebundlesisexponentialinthenumberoftargetsTocalculatecostsforeachbundlerequirestocalculatethesmallestpathcostforvisitingasetoftargets(TravelingSalesmanProblem)WinnerdeterminationisNP-hard

Page 21: TDDD10 AI Programming Multiagent Decision MakingTDDD10/lectures/05_multi_agent_decision_making.pdf · TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 83 Labs new

81

ParallelSingle-ItemAuctions

Protocol:Everyrobotbidsoneachtargetinparalleluntilalltargetsareasigned

Thevaluationisthesmallestpathcostfromtherobotscurrentpositiontothetarget

SimilartoTargetClustering

Advantage:Simpletoimplementandcomputationandcommunicationefficient

Disadvantage:Theteamperformancecanbehighlysuboptimalsinceitdoesnottakeanysynergiesbetweenthetargetsintoaccount

82

SequentialSingle-ItemAuctions

Protocol:TargetsareauctionedafterthesequenceT1,T2,T3,T4,…ThevaluationistheincreaseinitssmallestpathcostthatresultsfromwinningtheauctionedtargetTherobotwiththeoverallsmallestbidisallocatedthecorrespondingtargetFinally,eachrobotcalculatestheminimum-costpathforvisitingallofitstargetsandmovesalongthispath

Advantages:Hillclimbingsearch:somesynergiesbetweentargetsaretakenintoaccount(butnotallofthem)SimpletoimplementandcomputationandcommunicationefficientSincerobotscandeterminethewinnersbylisteningtothebids(andidentifyingthesmallestbid)themethodcanbeexecuteddecentralized

Disadvantages:Orderoftargetschangetheresult

83/83

SummaryUtilitiesandcompetitiveVotingmechanismWediscussedEnglish,Dutch,First-PriceSealed-Bid,andVickreyauctionsGeneralizedsecondpriceauctionshaveshowngoodpropertiesinpractice,however,“truthtelling”isnotadominantstrategyCombinatorialauctionsareamechanismtoallocateanumberofgoodstoanumberofagents