multi-agent learning for cs 7631: multi robot systems azfar aziz
TRANSCRIPT
Multi-Agent Learning
For CS 7631: Multi Robot Systems
Azfar Aziz
The Papers
On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot SoccerUrieli, MacAlpine, Kalyanakrishnan, Bentor, Stone AAMAS 2011
Cooperative Multi-Agent Learning: The State of the Art Panait and Luke 2005
Optimizing Interdependent Skills: Problem Description
How can one learn behaviors that comprise of multiple interdependent skills?
Optimizing Interdependent Skills:Insights
Automatic learning and optimization helps by Improving and refining human intuition Significantly less labor to adapt to changes in agent
and environment
Most complex systems naturally decompose into smaller sub-units It is convenient and beneficial to explicitly recognize
their decomposition
Optimizing Interdependent Skills:Approach
Demonstrate that individual skills can be parameterized and optimized
Create a framework for optimizing skills in conjunction with one another
Key Idea Skills can be optimized while respecting the
tight coupling induced over them by high-level behaviors
Optimizing Interdependent Skills:Domain
RoboCup 3D simulation based on SimSpark Generic physical multi-agent system
simulator Robot agents
Homogeneous Modeled after Aldebaran Nao
Height:57cm (~22.5in) Mass: 4.5kg (~10lbs)
Optimizing Interdependent Skills:Domain contd.
Agents interact with Simulator Sending actuation commands Receive perceptual information
Optimizing Interdependent Skills:Agent Skills
Lowest Level of Control: PID controller Proportional-Integral-Derivative Input: target angle, Output: appropriate torque
Set of Skills Walking (forwards, backwards, sideways) Turning Kicking Standing Goalie-diving Getting up
Optimizing Interdependent Skills: Open-Loop Approach
Implemented as periodic state machine Multiple key frames
Key Frame = static pose of fixed joint positions
Optimizing Interdependent Skills: Individual Skills
Initially hand code skills Results in slow but stable skills
Further optimize Example: Walking
Four Key Frames which agent periodically loops
Optimizing Interdependent Skills: Optimize Individual Skills Evaluate:
Distance in forward direction traveled in 15 seconds Encourage straight walks
Four Machine Learning Algorithms Hill Climbing (HC) Cross-Entropy Method (CEM) Genetic Algorithm (GA) Covariance Matrix Adaptation Evolution Strategy (CMA-
ES)
Optimizing Interdependent Skills: Individual Skills Results
CMA-ES performed best Additional advantage: low
configuration overhead Initial Mean and standard
deviation for each parameter
Sample Size = 15,000 simulation runs Number of generation *
population size * number of measurements
Optimizing Interdependent Skills: Sequence of Skills Evaluation Criterion
Time taken to score a goal on an empty field
Optimizing Interdependent Skills: Optimize Sequence of Skills
Optimizing Interdependent Skills: Sequence of Skills Results
Optimizing Interdependent Skills: Results
Optimizing Interdependent Skills: Critique
A lot of human-intuition used
Decomposing methodology shows promise
CPU time spent optimizing skills = order of 100,000 hours
Cooperative Multi-Agent Learning:Categories
Focus on Machine Learning and cooperating tasks
Team Learning Applying Single learner to discover joint solutions
Concurrent Learning Using multiple simultaneous learners Often one per agent
Cooperative Multi-Agent Learning:Agent vs. Multi-Agent
Agent Computational mechanism that exhibits a high
degree of autonomy, performing actions in its environment based on information from the environment
Multi-Agent Environment which there is more than one agent and
a constraint that a single agent can’t know everything about the world that other agents know
Cooperative Multi-Agent Learning:Features
Search space can be unusually large
Can create unpredictable changes in the macro-level
May involve multiple learners Introduces game-theoretic issues
Cooperative Multi-Agent Learning:Machine Learning Methods
Three approaches: Supervised, Unsupervised, Reward-Based
Reward Based used most Reinforcement Learning
Estimate value functions
Evolutionary Computing Directly learn behaviors without appealing
to value functions
Cooperative Multi-Agent Learning:Team Learning
Single Learner
Discovers set of behaviors for a team of agents
Emergent Complexity Agents interact with on another, the joint behavior
can be unexpected
Categories: Homogenous, Heterogeneous
Cooperative Multi-Agent Learning:Team Learning Pro/Con
Pro Single learning so standard Machine Learning can be
used Concern with performance of entire team, not an
individual
Con Large state space Requires centralization of learning algorithm
Cooperative Multi-Agent Learning:Homogenous Team Learning
All agents assigned identical behavior Search space reduced drastically Can act hetero
Use of sub-behaviors to “specialize”
Cooperative Multi-Agent Learning:Heterogeneous Team Learning
Agents with different behaviors Single learner trying to improve the team as a whole
More diversity Increase search space Bulk of research concerns the emergence of
specialists Use of restricted breeding works better than
no restriction
Cooperative Multi-Agent Learning:Hybrid Team Learning
Set of agents are split into several squads Agents within each squad have same behavior
Not shown to work well in RoboCup vs homogeneous Did exhibit initial offensive-defensive squad
specialization
Cooperative Multi-Agent Learning:Concurrent Learning
Multiple learning processes attempt to improve parts of the team
Current approach: each agent has its own unique learning process
Cooperative Multi-Agent Learning:Concurrent Learning Pro/Con
Pro Good in domains which decomposition is possible Projects the joint team space onto smaller separate
search spaces Smaller learning chunks may be more flexible in
computational resources
Con Each learner adapts behavior in context of others Can violate basic ML assumptions
Cooperative Multi-Agent Learning:Credit Assignment
Global Reward All learners’ rewards increase equally Better for soccer
Local Reward Leads to faster learning rates but not always better Better for foraging
Global vs. Local leads has significant impact on the dynamics of learning
Cooperative Multi-Agent Learning:Dynamics of Learning
Dynamic environment, agent must keep up with the environment
Nash equilibrium No single agent has any rational incentive to change
its strategy away from equilibrium Sometimes suboptimal
Cooperative Multi-Agent Learning:Dynamics of Learning contd.
Fully Cooperative Global reward scheme to divvy
reinforcement equally among all agents Global Nash equilibria
Competitive Learning Agents co-adapt to each other One learner can dominate causing no
additional learning – loss of gradient Difficult to monitor coevolution Cyclical behavior
Cooperative Multi-Agent Learning:Teammate Modeling
Learn about other agents to expect their behavior and act accordingly
Prone to infinite recursion Agent A is doing X because it thinks agent B thinks that
agent A thinks that agent B thinks that… Create levels: N-level models its teammates as (N-1)-level
Best policy is to minimize assumptions about other agent’s policies
Cooperative Multi-Agent Learning:Communication
Altering the state of the environment such that other agents can perceive the modification and decode from it
Communicate everything single-agent system
Explicit communication increases search space
Cooperative Multi-Agent Learning:Direct Communication
Shared blackboards, signaling, and message-passing
Help w.r.t. Reinforcement Learning Share past experiences in form of episodes Joint unity/policy table
Cooperative Multi-Agent Learning:Indirect Communication
Leaving footsteps, trail of bread crumbs, placement of objects, pheromones
Inspiration from social insects
Cooperative Multi-Agent Learning:Major Open Topics
Scalability Search space grows with the number and complexity of
agent behaviors Adaptive Dynamics
“Moving the goalposts” dilemma Suboptimal solutions
Problem Decomposition Done at various levels and assume behaviors can be
learned independently
Cooperative Multi-Agent Learning:Future Topics
Multiple agents More than two agents
Team Heterogeneity Agents with different capabilities
Complex Agents
Dynamically Changing Teams and Scenarios
Questions?
Optimizing Interdependent Skills: Individual Skills contd. Performance of various skills using CMA-ES
Optimizing Interdependent Skills: Open-Loop Approach
Does not rely on corrective feedback
Simpler to implement
Yield faster walks
Less robust to disturbance