lectures - department of computer and information science...

16
TDDD10 AI Programming Automated Planning Cyrille Berger Planning slides borrowed from Dana Nau: http://www.cs.umd.edu/~nau/planning/slides/ 2 / 61 Lectures 1AI Programming: 2Introduction to 3Agents and Agents 4Multi-Agent and 5Multi-Agent Decision 6Cooperation And Coordination 7Cooperation And Coordination 8Machine 9Automated Planning 10Putting It All 3 / 61 Lecture content Automated Planning What is planning? Type of planners Domain-dependent planners Configurable planners Reenforcement Learning Task allocation learning Individual and Group Assignment Automated Planning

Upload: others

Post on 26-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

TDDD10AIProgramming

AutomatedPlanningCyrilleBerger

Planningslidesborrowed

fromDanaNau:http://www.cs.umd.edu/~nau/planning/slides/ 2/61

Lectures1AIProgramming:

2Introductionto

3AgentsandAgents

4Multi-Agentand

5Multi-AgentDecision

6CooperationAndCoordination

7CooperationAndCoordination

8Machine

9AutomatedPlanning

10PuttingItAll

3/61

Lecturecontent

AutomatedPlanningWhatisplanning?

Typeofplanners

Domain-dependentplanners

Configurableplanners

ReenforcementLearningTaskallocationlearning

IndividualandGroupAssignment

AutomatedPlanning

Page 2: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

Whatisplanning?

6

DictionaryDefinitionsof“Plan”

1Ascheme,program,ormethodworkedoutbeforehand

fortheaccomplishmentofanobjective:aplanofattack.2Aproposedortentativeprojectorcourseofaction:hadno

plansfortheevening.3Asystematicarrangementofelementsorimportant

parts;aconfigurationoroutline:aseatingplan;theplan4Adrawingordiagrammadetoscaleshowingthe

structureorarrangementofsomething.5Aprogramorpolicystipulatingaserviceorbenefit:

apensionplan.

7

AIDefinitionofPlan

[arepresentation]offuturebehavior

(...)usuallyasetofactions,with

temporalandotherconstraintson

them,forexecutionbysomeagent

oragents.

–AustinTate,MITEncyclopediaofthe

CognitiveSciences,1999

8

StatetransitionsystemRealworldisabsurdlycomplex,needto

approximateOnlyrepresentwhattheplannerneedstoreason

StatetransitionsystemΣ=(S,A,E,ɣ)S={abstracte.g.,statesmightincludearobot’slocation,butnot

itspositionandorientationA={abstracte.g.,“moverobotfromloc2toloc1”mayneed

complexlower-levelimplementationE={abstractexogenousNotundertheagent’sɣ=statetransitionGivesthenextstate,orpossiblenextstates,after

anactionoreventɣ:S×(A∪E)→Sorɣ:S×(A∪E)→{S₁,...

Page 3: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

9

Example

statesS={s₀,…,s₅}

A={move1,move2,

put,take,load,

unload}

E=∅

ɣ:S×A→S:defined

ontherightside

10

Fromplantoexecution

11

PlanningproblemDescriptionofΣ=

(S,A,E,ɣ)

Initialstateorsetof

states

ObjectiveGoalstate,setofgoalstates,set

oftasks,“trajectory”ofstates,

objectivefunction,…

Example:Initialstate=s₀

Goalstate=s₅

12

PlanClassicalplan:a

sequenceofactions:⟨take,move1,load,move2⟩

Policy:partialfunction

fromSintoA{(s₀,take),(s₁,move1),(s₃,load),(s₄,

move2)}

{(s₀,move1),(s₂,take),(s₃,load),(s₄,

move2)}

Both,ifexecutedstarting

ats,produces₅

Page 4: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

13

PlanningandScheduling

SchedulingDecidewhenandhowtoperform

agivensetofactionsTimeandResourceconstraintsand

priorities

i.e.theschedulerofyourkernel

NP-Complete

PlanningDecidewhatactionstouseto

achievesomesetofobjectivesCanbemuchworsethanNP-

complete;worstcaseis

undecidable

14

ApplicationsRoboticsSequenceofactions

Pathandmotionplanning

IndustrialPrinter

Productionmachines:sheet-metalbending

ForestfirePackagesGames:playingbridge,...

Typeofplanners

16

TypeofplannersDomain-specificMadeortunedforaspecificplanningdomain

Won’tworkwell(ifatall)inotherplanningdomains

Domain-independentInprinciple,worksinanyplanningdomain

Inpractice,needrestrictionsonwhatkindofplanningdomain

ConfigurableDomain-independentplanningengine

Inputincludesinfoabouthowtosolveproblemsinsome

domain

Page 5: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

17

Domain‐SpecificPlanners

Mostsuccessfulreal-worldplanningsystemswork

thisway:Marsexploration,sheet-metalbending,playingbridge,etc.

Oftenuseproblem-specifictechniquesthatare

difficulttogeneralizetootherplanningdomains

18

Domain-SpecificPlanner-Example

Bridge

Aftercarddealing,players

makebidsandprepareaplan

thattheywillfollowduringthe

game

Domainspecific:Removestatedependingonthecardyou

areholding

Forinstance,Northwillnotchoose

"heart"astrumpcolor

19

Domain-IndependentPlanners

Inprinciple,worksinanyplanningdomainNodomain-specificknowledgeexceptthedescriptionofthe

systemΣ

InpracticeNotfeasibletomakedomain-independentplannerswork

wellinallpossibleplanningdomains

Makesimplifyingassumptionstorestrictthe

setofdomainsClassicalplanning

Historicalfocusofmostresearchonautomatedplanning

20

RestrictiveAssumptionsA0:Finitesystem:finitelymanystates,actions,events

A1:Fullyobservable:thecontrolleralwaysΣ’scurrentstate

A2:Deterministic:eachactionhasonlyoneoutcome

A3:Static(noexogenousevents):nochangesbutthecontroller’s

actions

A4:Attainmentgoals:asetofgoalstatesSg

A5:Sequentialplans:aplanisalinearlyorderedsequenceof

actions(a1,a2,...an)

A6:Implicittime:notimedurations;linearsequenceof

instantaneousstates

A7:Off-lineplanning:plannerdoesn’tknowtheexecutionstatus

Page 6: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

Domain-dependentplanners

22

ClassicalPlanningClassicalplanningrequiresalleight

restrictiveassumptionsOfflinegenerationofactionsequencesforadeterministic,static,finite

system,withcompleteknowledge,attainmentgoals,andimplicittime

ReducestothefollowingGivenaplanningproblemP=(Σ,s0,Sg)

Findasequenceofactions(a1,a2,...an)thatproducesasequenceofstate

transitions(s1,s2,...,sn)suchthatsnisinSg.

Thisisjustpath-searchinginaNodes=states

Edges=actions

Isthis

23

ClassingPlanningGeneralizetheearlier5locations,

3robotvehicles,

100containers,

3palletstostackcontainerson

Thisisprobablyjustasingleboat...

Thenthereare10²⁷⁷statesNumberofparticlesintheuniverseisonlyabout10⁸⁷

Theexampleismorethan10¹⁹⁰timesaslarge

Automated-planningresearchhasbeen

heavilydominatedbyclassicalplanningDozens(hundreds?)ofdifferentalgorithms

24

Plan‐SpacePlanningDecomposesetsofgoalsintothe

individualgoals

PlanforthemseparatelyBookkeepinginfotodetectandresolve

interactions

Produceapartiallyorderedplan

thatretainsasmuchflexibilityas

possible

TheMarsroversusedatemporal-

planningextensionofthis

Page 7: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

25

PlanningGraphs

Roughidea:First,solvearelaxedproblemEach“level”containsalleffectsofallapplicableactions

Eventhoughtheeffectsmaycontradicteachother

Next,doastate-spacesearchwithintheplanninggraph

Graphplan,IPP,CGP,DGP,LGP,PGP,SGP,TGP,...

26

HeuristicSearchHeuristicfunctionlikethoseinA*Createdusingtechniquessimilartoplanninggraphs

Problem:A*quicklyrunsoutofmemorySodoagreedysearchinstead

Greedysearchcangettrappedinlocal

minimaGreedysearchpluslocalsearchatlocalminima

HSP[Bonet&Geffner]

FastForward[Hoffmann]

Configurableplanners

28

ConfigurableplannersInanyfixedplanningdomain,adomain-independentplanner

usuallywillnotworkaswellasadomain-specificplannermade

specificallyforthatdomainAdomain-specificplannermaybeabletogodirectlytowardasolutioninsituations

whereadomain-independentplannerwouldexploremayalternativepaths

Butwedon’twanttowriteawholenewplannerforeverydomain

ConfigurableplannersDomain-independentplanningengine

Inputincludesinfoabouthowtosolveproblemsinthedomain

Generallythismeansonecanwriteaplanningenginewithfewer

restrictionsthandomain-independentplannersHierarchicalTaskNetwork(HTN)planning

Planningwithcontrolformulas

Page 8: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

29

PlanningwithControlFormulas

Ateachstates,wehaveacontrolformulawrittenintemporal

logic

e.g.,“neverpickupxunlessxneedstogoontopofsomethingelse”

Foreachsuccessorofs,deriveacontrolformulausinglogical

progression

PruneanysuccessorstateinwhichtheprogressedformulaisfalseTLPlan,TALplanner,...

30

HTNPlanning(1/2)ProblemreductionTasks(activities)ratherthangoals

Methodstodecomposetasksintosubtasks

Enforceconstraints,backtrackifnecessaryE.g.,taxinotgoodforlongdistances

Real-worldapplications

Noah,Nonlin,O-Plan,SIPE,SIPE-2,SHOP,

SHOP2

31

HTNPlanning(2/2)

32

ForwardandBackwardSearch

Instate-spaceplanning,mustchoosewhethertosearch

forwardorbackward

InHTNplanning,therearetwochoicestomakeabout

direction:forwardorbackward

upordown

Page 9: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

33

LimitationofOrdered-TaskPlanning

Problemoftotalorder

Thiscouldbenicer

Solvedwithpartialordermethod

34

Planninginanuncertainworld

Untilnow,wehaveassumedthateach

actionhasonlyonepossibleoutcomeButoftenthat’sunrealistic

Inmanysituations,actionsmayhave

morethanonepossibleoutcomeActionfailures

e.g.,gripperdropsitsload

Exogenousevents

e.g.,roadclosed

Wouldliketobeabletoplaninsuch

situations

Oneapproach:MarkovDecision

Processes

35

AutomatedPlanning-Summary

Domain-specificplannerWriteanentirecomputerprogram-lotsofwork

Lotsofdomain-specificperformanceimprovements

Domain-independentplannerJustgiveitthebasicactions-notmucheffort

Notveryefficient

ReenforcementLearning

Page 10: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

37

ReinforcementLearningDefinition

Agentsaregivensensoryinputs:States∊S

RewardR∊ℝ

Ateachsteps,agentsselectan

output:Actiona∊A

38

NaïveApproachUsesupervised

learningtolearn:f(s,a)=R

Foranyinputstate,

pickthebest

action:a=argmaxf(s,a)

a∊A

Willthatwork?

39

MarkovDecisionProcess(1/2)

Theagentneedtothinkahead!

Itneedsagoodsequenceofactions.

FormalizedintheMarkovDecision

Processframework!

40

MarkovDecisionProcess(2/2)

FinitesetofstatesS,finitesetofactions

Ateachdiscretetimestepagentobservesstatesₜ∊S

choosesactionaₜ∊A

andreceivesanimmediaterewardrₜ.

Thestatechangestosₜ+1∊

Markovassumptionissₜ+1=δ(sₜ,aₜ)and

rₜ=r(sₜ,aₜ).

Page 11: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

41

PolicyFunctionThepolicyfunctiondecideswhichactiontotakeineach

state:aₜ=π(sₜ)

Thepolicyfunctioniswhatwewanttolearn!

42

RewardsTothinkahead:anagentslooksatfuturerewards:r(sₜ₊₁,aₜ₊₁),r(sₜ₊₂,aₜ₊₂)...

formalizedasthesumofrewards(alsocalledutilityorvalue):

V=∑ɣᵗr(sₜ,aₜ)

ɣisthediscountfactormakingrewardsfar

offintothefuturelessvaluable.

Ifwefollowaspecificpolicyπ,thevalueof

statesₜis:Vπ(sₜ)=r(sₜ,π(sₜ))+ɣVπ(sₜ₊₁)

43

ValueFunctionValuefunctionforrandommovement:

Optimalvaluefunctionforoptimalpolicy:

44

OptimalPolicy

Ifwefollowaspecificpolicyπ:Vπ(sₜ)=r(sₜ,π(sₜ))+ɣVπ(sₜ₊₁)

IfweknowVπ(st)thenthepolicyπis

givenby:π(s)=argmaxa(r(s,a)+ɣVπ(δ(s,a)))

Findingtheoptimalpolicyisabout

findingπ(s)orVπ(sₜ)orboth.

Page 12: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

45

ValueIteration

InitializethefunctionV(s)with

randomvaluesV₀(s)

Foreachstatesₜandeachiterationk

do:computeVₖ₊₁(sₜ)=maxₐ(r(sₜ,a)+ɣVₖ(sₜ₊₁))

46

Q-FunctionLearninginunknownenvironment

Optimalpolicyπ*(s)=argmaxₐ(r(s,a)+V*(δ(s,a)))

Whatifwedonotknowδ(s,a)?orr(s,a)?

Q-FunctionQ(s,a)=r(s,a)+V*(δ(s,a)))

π*(s)=argmaxa(Q(s,a))

47

UpdatetheQ-Function

QandV*arecloselyV*(s)=maxₐ

QcanbewrittenQ(sₜ,aₜ)=r(sₜ,aₜ)+V*(δ(sₜ,aₜ))

=r(sₜ,aₜ)+ɣmaxₐ'Q(sₜ₊₁,a'

IfQ^denotethecurrent

approximationofQthenitcanbeQ^(s,a):=r+ɣmaxₐ'

48

Q-LearningforDeterministicWorlds

Foreachs,ainitializetableentryQ^(s,a)⟵

0.

Observecurrentstates.

Doforever:Selectanactionaandexecuteit

Receiveimmediaterewardr

Observethenewstates'

UpdatethetableentryforQ^(s,a):Q^(s,a):=r+ɣmaxₐ'Q^(s',a')

s⟵s'

Page 13: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

49

Q-LearningExample

50

Q-LearningforNonDeterministicWorlds

Whatiftheworldisnon-

deterministic?

VandQarethenexpectedvalues:V=E[∑ɣᵗr(sₜ,aₜ)]

Q(s,a)=E[r(s,a)+V*(δ(s,a)))]

51

Q-LearningforNonDeterministicWorlds

LearningQbecomes:Q^ₙ(s,a):=(1-αₙ)Q^ₙ₋₁(s,a)+αₙ(r+ɣmaxₐ'

Q^ₙ₋₁(s',a'))

Taskallocationlearning

Page 14: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

53

Context

Mosttasksrequiremorethananone

agentExtinguishafire

Whichbuildingtoextinguish?

Howmanyagentsper-task?

54

TaskallocationlearningFiresare

Localdecisionfor

whichbuildingtoSelective

DecisiontreewithQ-

ValuesAteachstep,usethe

treetogetthereward

forextinguishinga

specificbuilding

55

SummaryAdvantagesAllowsanagenttoadapttomaximiserewardsina

potentiallyunknownenvironment.

DisadvantagesRequirescomputationpolynomialinthenumberof

states!

Thenumberofstatesgrowsexponentiallywithinput

dimensions!

ReinforcementLearningassumesdiscretestatesand

actionspaces.

IndividualandGroupAssignment

Page 15: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

57

Project

Agroupof4to6students

ImplementaRoboRescueteam

Workindividuallyonasubpartofthe

problem

58

Tasks

FoundationtasksNavigation

Communication

Agents:police,ambulance,fire

brigadExploration

Prediction

Tasksallocation

59

ReportsIndiviualplanFindaround4relatedarticles

Writeaonepagedescription

Deadline:October,30th

IndividualreportImplementandevaluatethetechnique

Writeareportdescribingthetechnique,resultsandadiscussion

Deadline,draft:December,16th,final:January6th

Commentsdeadline:December,21th

GroupreportOneperteam!

Adescriptionofthealgorithmsandstrategiesused

Deadline,final:January6th

60

WhatisagoodReportGradebasedonthereportquality,suchasreadability,

language,pictures,structureandlength,and

theleveloftechnicaldetailweightedwiththedifficultyofthe

chosenapproaches

Thereportsshouldbe5-6pages,butitismoreimportantto

makeitpossibleforthereadertounderstandyourwork

thantogettheexactrightnumberofpages.

Page 16: Lectures - Department of Computer and Information Science ...TDDD10/lectures/09_automated_planning.pdf · HSP [Bonet & Geffner] FastForward [Hoffmann] Configurable planners 28 Configurable

61/61

Summary

AutomatedplanningClassicalplanningproblem

HTN

ReenforcementlearningMarkovdecisionprocess

Q-Learning