multi-agent learning for cs 7631: multi robot systems azfar aziz

Download Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Post on 16-Jan-2016




0 download

Embed Size (px)


PowerPoint Presentation

Multi-Agent Learning

For CS 7631: Multi Robot Systems

Azfar Aziz

1The PapersOn Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot SoccerUrieli, MacAlpine, Kalyanakrishnan, Bentor, Stone AAMAS 2011

Cooperative Multi-Agent Learning: The State of the Art Panait and Luke 2005Optimizing Interdependent Skills: Problem DescriptionHow can one learn behaviors that comprise of multiple interdependent skills?

Optimizing Interdependent Skills:InsightsAutomatic learning and optimization helps byImproving and refining human intuitionSignificantly less labor to adapt to changes in agent and environment

Most complex systems naturally decompose into smaller sub-unitsIt is convenient and beneficial to explicitly recognize their decompositionOptimizing Interdependent Skills:ApproachDemonstrate that individual skills can be parameterized and optimized

Create a framework for optimizing skills in conjunction with one another

Key IdeaSkills can be optimized while respecting the tight coupling induced over them by high-level behaviors

Optimizing Interdependent Skills:DomainRoboCup 3D simulation based on SimSparkGeneric physical multi-agent system simulatorRobot agentsHomogeneousModeled after Aldebaran NaoHeight:57cm (~22.5in) Mass: 4.5kg (~10lbs)

Optimizing Interdependent Skills:Domain contd.Agents interact with SimulatorSending actuation commandsReceive perceptual information

Optimizing Interdependent Skills:Agent SkillsLowest Level of Control: PID controllerProportional-Integral-DerivativeInput: target angle, Output: appropriate torqueSet of SkillsWalking (forwards, backwards, sideways)TurningKickingStandingGoalie-divingGetting up

Optimizing Interdependent Skills: Open-Loop ApproachImplemented as periodic state machineMultiple key framesKey Frame = static pose of fixed joint positionsOptimizing Interdependent Skills: Individual SkillsInitially hand code skillsResults in slow but stable skillsFurther optimizeExample: WalkingFour Key Frames which agent periodically loops

Optimizing Interdependent Skills: Optimize Individual SkillsEvaluate: Distance in forward direction traveled in 15 secondsEncourage straight walks

Four Machine Learning AlgorithmsHill Climbing (HC)Cross-Entropy Method (CEM)Genetic Algorithm (GA)Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Optimizing Interdependent Skills: Individual Skills ResultsCMA-ES performed bestAdditional advantage: low configuration overheadInitial Mean and standard deviation for each parameter

Sample Size = 15,000 simulation runsNumber of generation * population size * number of measurements

Optimizing Interdependent Skills: Sequence of SkillsEvaluation CriterionTime taken to score a goal on an empty field

Optimizing Interdependent Skills: Optimize Sequence of Skills

Optimizing Interdependent Skills: Sequence of Skills Results

Optimizing Interdependent Skills: Results

Optimizing Interdependent Skills: CritiqueA lot of human-intuition used

Decomposing methodology shows promise

CPU time spent optimizing skills = order of 100,000 hours

Cooperative Multi-Agent Learning:CategoriesFocus on Machine Learning and cooperating tasks

Team LearningApplying Single learner to discover joint solutions

Concurrent LearningUsing multiple simultaneous learnersOften one per agent

Cooperative Multi-Agent Learning:Agent vs. Multi-AgentAgentComputational mechanism that exhibits a high degree of autonomy, performing actions in its environment based on information from the environmentMulti-AgentEnvironment which there is more than one agent and a constraint that a single agent cant know everything about the world that other agents know

Cooperative Multi-Agent Learning:FeaturesSearch space can be unusually large

Can create unpredictable changes in the macro-level

May involve multiple learnersIntroduces game-theoretic issues

Cooperative Multi-Agent Learning:Machine Learning MethodsThree approaches: Supervised, Unsupervised, Reward-Based

Reward Based used mostReinforcement LearningEstimate value functionsEvolutionary ComputingDirectly learn behaviors without appealing to value functions

Cooperative Multi-Agent Learning:Team LearningSingle Learner

Discovers set of behaviors for a team of agents

Emergent ComplexityAgents interact with on another, the joint behavior can be unexpected

Categories: Homogenous, Heterogeneous

Cooperative Multi-Agent Learning:Team Learning Pro/ConProSingle learning so standard Machine Learning can be usedConcern with performance of entire team, not an individual

ConLarge state spaceRequires centralization of learning algorithmCooperative Multi-Agent Learning:Homogenous Team LearningAll agents assigned identical behaviorSearch space reduced drasticallyCan act heteroUse of sub-behaviors to specialize

Cooperative Multi-Agent Learning:Heterogeneous Team LearningAgents with different behaviorsSingle learner trying to improve the team as a wholeMore diversityIncrease search spaceBulk of research concerns the emergence of specialistsUse of restricted breeding works better than no restriction

Cooperative Multi-Agent Learning:Hybrid Team LearningSet of agents are split into several squadsAgents within each squad have same behavior

Not shown to work well in RoboCup vs homogeneousDid exhibit initial offensive-defensive squad specializationCooperative Multi-Agent Learning:Concurrent LearningMultiple learning processes attempt to improve parts of the team

Current approach: each agent has its own unique learning processCooperative Multi-Agent Learning:Concurrent Learning Pro/ConProGood in domains which decomposition is possibleProjects the joint team space onto smaller separate search spacesSmaller learning chunks may be more flexible in computational resources

ConEach learner adapts behavior in context of othersCan violate basic ML assumptionsCooperative Multi-Agent Learning:Credit AssignmentGlobal RewardAll learners rewards increase equallyBetter for soccerLocal RewardLeads to faster learning rates but not always betterBetter for foragingGlobal vs. Local leads has significant impact on the dynamics of learning

Cooperative Multi-Agent Learning:Dynamics of LearningDynamic environment, agent must keep up with the environment

Nash equilibriumNo single agent has any rational incentive to change its strategy away from equilibriumSometimes suboptimal

Cooperative Multi-Agent Learning:Dynamics of Learning contd.Fully CooperativeGlobal reward scheme to divvy reinforcement equally among all agentsGlobal Nash equilibria

Competitive LearningAgents co-adapt to each otherOne learner can dominate causing no additional learning loss of gradientDifficult to monitor coevolutionCyclical behavior

Cooperative Multi-Agent Learning:Teammate ModelingLearn about other agents to expect their behavior and act accordinglyProne to infinite recursionAgent A is doing X because it thinks agent B thinks that agent A thinks that agent B thinks thatCreate levels: N-level models its teammates as (N-1)-levelBest policy is to minimize assumptions about other agents policies

Cooperative Multi-Agent Learning:CommunicationAltering the state of the environment such that other agents can perceive the modification and decode from it

Communicate everything single-agent system

Explicit communication increases search spaceCooperative Multi-Agent Learning:Direct CommunicationShared blackboards, signaling, and message-passing

Help w.r.t. Reinforcement LearningShare past experiences in form of episodesJoint unity/policy table

Cooperative Multi-Agent Learning:Indirect CommunicationLeaving footsteps, trail of bread crumbs, placement of objects, pheromonesInspiration from social insects

Cooperative Multi-Agent Learning:Major Open TopicsScalabilitySearch space grows with the number and complexity of agent behaviorsAdaptive Dynamics Moving the goalposts dilemmaSuboptimal solutionsProblem DecompositionDone at various levels and assume behaviors can be learned independentlyCooperative Multi-Agent Learning:Future TopicsMultiple agentsMore than two agents

Team HeterogeneityAgents with different capabilities

Complex Agents

Dynamically Changing Teams and ScenariosQuestions?

Optimizing Interdependent Skills: Individual Skills contd.Performance of various skills using CMA-ES

Optimizing Interdependent Skills: Open-Loop ApproachDoes not rely on corrective feedback

Simpler to implement

Yield faster walks

Less robust to disturbance


View more >