by: messias, spaan, lima presented by: mike plasker dmes – ocean engineering
TRANSCRIPT
GSMDPs for Multi-Robot Sequential Decision Making
By: Messias, Spaan, Lima
Presented by: Mike PlaskerDMES – Ocean Engineering
IntroductionRobotic Planning under uncertaintyMDP solutionsLimited real-world application
Assumptions for Multi-Robot teamsCommunication (Inexpensive, free, or costly)Synchronous and steady state transitionsDiscretization of environment
A Different ApproachStates and actions discrete (like MDP)Continuous measure of timeState transitions regarded as random ‘events’
AdvantagesNon-Markovian effects of discretization
minimizedFully reactive to changesCommunication only required for ‘events’
GSMDPsGeneric temporal probability distributions
over eventsCan model concurrent (persistently enabled)
eventsSolvable by discrete-time MDP algorithms by
obtaining an equivalent (semi-)Markovian model
Avoids negative effects of synchronous alternatives
Why GSMDPs for RoboticsCooperative Robotics requires:
Operation in inherently continuous environments
Uncertainty in actions (and observations)Joint decision making for optimizationReactive
Definitionsmultiagent GSMDP: tuple <d, S, X, A, T, F, R, C, h>
d = number agentsS = state space (contains state factors)X = state factorsA = set of joint actionsT = transition functionF = time modelR = instantaneous reward functionC = cumulative reward rateh = planning over continuous time
DefinitionsEvent in a GSMDP:An abstraction to state transitions that share the same properties
Persistently enabled events:Events that are enabled from step ‘t’ to step ‘t+1’, but not triggered at step ‘t’
Common ApproachSynchronous actionPre-defined time step
• Performance• Reaction time
GSMDPsPersistently enabled events modeled by
allowing their temporal distributions to depend on the time they were enabled
Explicit modeling of non-Markovian effects from discretization
Communication efficiency
Modeling EventsGroup state transitions as events to minimize
temporal distributions and transitions(battery low)
Transition function found by estimating relative frequency of each transition in the event
Time model found by timing the transition data
Approximated as a phase-type distributionReplaces events with acyclic Markov chains
Events (cont.)Not always possibleDecompose events with minimum duration
into deterministically timed transitionsCan then better approximate using phase-
type distribution
Solving a GSMDPCan be viewed as an equivalent discrete-time
MDPAlmost all solution algorithms for MDPs work
ExperimentRobotic soccerScore a goal (reward 150)Passing around obstacle (reward 60)
ResultsMDP: T = 4s
GSMDP
ResultsNo idle timeReduced
communicationImproved scoring
efficiencySystem failures
(zero goals) independent of model
Example Video
Future WorkExtend to partially observable domainsApply bilateral phase distributions to
increase the class of non-Markovian events that are able to be modeled
Questions?
MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential Decision-Making. AAAI Conference on Artificial Intelligence, North America, jun. 2013. Available at: <http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843>. Date accessed: 06 Apr. 2014