multi-agent learning for cs 7631: multi robot systems azfar aziz

41
Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Upload: bernice-nichols

Post on 16-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Multi-Agent Learning

For CS 7631: Multi Robot Systems

Azfar Aziz

Page 2: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

The Papers

On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot SoccerUrieli, MacAlpine, Kalyanakrishnan, Bentor, Stone AAMAS 2011

Cooperative Multi-Agent Learning: The State of the Art Panait and Luke 2005

Page 3: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Problem Description

How can one learn behaviors that comprise of multiple interdependent skills?

Page 4: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills:Insights

Automatic learning and optimization helps by Improving and refining human intuition Significantly less labor to adapt to changes in agent

and environment

Most complex systems naturally decompose into smaller sub-units It is convenient and beneficial to explicitly recognize

their decomposition

Page 5: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills:Approach

Demonstrate that individual skills can be parameterized and optimized

Create a framework for optimizing skills in conjunction with one another

Key Idea Skills can be optimized while respecting the

tight coupling induced over them by high-level behaviors

Page 6: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills:Domain

RoboCup 3D simulation based on SimSpark Generic physical multi-agent system

simulator Robot agents

Homogeneous Modeled after Aldebaran Nao

Height:57cm (~22.5in) Mass: 4.5kg (~10lbs)

Page 7: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills:Domain contd.

Agents interact with Simulator Sending actuation commands Receive perceptual information

Page 8: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills:Agent Skills

Lowest Level of Control: PID controller Proportional-Integral-Derivative Input: target angle, Output: appropriate torque

Set of Skills Walking (forwards, backwards, sideways) Turning Kicking Standing Goalie-diving Getting up

Page 9: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Open-Loop Approach

Implemented as periodic state machine Multiple key frames

Key Frame = static pose of fixed joint positions

Page 10: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Individual Skills

Initially hand code skills Results in slow but stable skills

Further optimize Example: Walking

Four Key Frames which agent periodically loops

Page 11: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Optimize Individual Skills Evaluate:

Distance in forward direction traveled in 15 seconds Encourage straight walks

Four Machine Learning Algorithms Hill Climbing (HC) Cross-Entropy Method (CEM) Genetic Algorithm (GA) Covariance Matrix Adaptation Evolution Strategy (CMA-

ES)

Page 12: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Individual Skills Results

CMA-ES performed best Additional advantage: low

configuration overhead Initial Mean and standard

deviation for each parameter

Sample Size = 15,000 simulation runs Number of generation *

population size * number of measurements

Page 13: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Sequence of Skills Evaluation Criterion

Time taken to score a goal on an empty field

Page 14: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Optimize Sequence of Skills

Page 15: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Sequence of Skills Results

Page 16: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Results

Page 17: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Critique

A lot of human-intuition used

Decomposing methodology shows promise

CPU time spent optimizing skills = order of 100,000 hours

Page 18: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Categories

Focus on Machine Learning and cooperating tasks

Team Learning Applying Single learner to discover joint solutions

Concurrent Learning Using multiple simultaneous learners Often one per agent

Page 19: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Agent vs. Multi-Agent

Agent Computational mechanism that exhibits a high

degree of autonomy, performing actions in its environment based on information from the environment

Multi-Agent Environment which there is more than one agent and

a constraint that a single agent can’t know everything about the world that other agents know

Page 20: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Features

Search space can be unusually large

Can create unpredictable changes in the macro-level

May involve multiple learners Introduces game-theoretic issues

Page 21: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Machine Learning Methods

Three approaches: Supervised, Unsupervised, Reward-Based

Reward Based used most Reinforcement Learning

Estimate value functions

Evolutionary Computing Directly learn behaviors without appealing

to value functions

Page 22: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Team Learning

Single Learner

Discovers set of behaviors for a team of agents

Emergent Complexity Agents interact with on another, the joint behavior

can be unexpected

Categories: Homogenous, Heterogeneous

Page 23: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Team Learning Pro/Con

Pro Single learning so standard Machine Learning can be

used Concern with performance of entire team, not an

individual

Con Large state space Requires centralization of learning algorithm

Page 24: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Homogenous Team Learning

All agents assigned identical behavior Search space reduced drastically Can act hetero

Use of sub-behaviors to “specialize”

Page 25: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Heterogeneous Team Learning

Agents with different behaviors Single learner trying to improve the team as a whole

More diversity Increase search space Bulk of research concerns the emergence of

specialists Use of restricted breeding works better than

no restriction

Page 26: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Hybrid Team Learning

Set of agents are split into several squads Agents within each squad have same behavior

Not shown to work well in RoboCup vs homogeneous Did exhibit initial offensive-defensive squad

specialization

Page 27: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Concurrent Learning

Multiple learning processes attempt to improve parts of the team

Current approach: each agent has its own unique learning process

Page 28: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Concurrent Learning Pro/Con

Pro Good in domains which decomposition is possible Projects the joint team space onto smaller separate

search spaces Smaller learning chunks may be more flexible in

computational resources

Con Each learner adapts behavior in context of others Can violate basic ML assumptions

Page 29: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Credit Assignment

Global Reward All learners’ rewards increase equally Better for soccer

Local Reward Leads to faster learning rates but not always better Better for foraging

Global vs. Local leads has significant impact on the dynamics of learning

Page 30: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Dynamics of Learning

Dynamic environment, agent must keep up with the environment

Nash equilibrium No single agent has any rational incentive to change

its strategy away from equilibrium Sometimes suboptimal

Page 31: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Dynamics of Learning contd.

Fully Cooperative Global reward scheme to divvy

reinforcement equally among all agents Global Nash equilibria

Competitive Learning Agents co-adapt to each other One learner can dominate causing no

additional learning – loss of gradient Difficult to monitor coevolution Cyclical behavior

Page 32: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Teammate Modeling

Learn about other agents to expect their behavior and act accordingly

Prone to infinite recursion Agent A is doing X because it thinks agent B thinks that

agent A thinks that agent B thinks that… Create levels: N-level models its teammates as (N-1)-level

Best policy is to minimize assumptions about other agent’s policies

Page 33: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Communication

Altering the state of the environment such that other agents can perceive the modification and decode from it

Communicate everything single-agent system

Explicit communication increases search space

Page 34: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Direct Communication

Shared blackboards, signaling, and message-passing

Help w.r.t. Reinforcement Learning Share past experiences in form of episodes Joint unity/policy table

Page 35: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Indirect Communication

Leaving footsteps, trail of bread crumbs, placement of objects, pheromones

Inspiration from social insects

Page 36: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Major Open Topics

Scalability Search space grows with the number and complexity of

agent behaviors Adaptive Dynamics

“Moving the goalposts” dilemma Suboptimal solutions

Problem Decomposition Done at various levels and assume behaviors can be

learned independently

Page 37: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Cooperative Multi-Agent Learning:Future Topics

Multiple agents More than two agents

Team Heterogeneity Agents with different capabilities

Complex Agents

Dynamically Changing Teams and Scenarios

Page 38: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Questions?

Page 39: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz
Page 40: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Individual Skills contd. Performance of various skills using CMA-ES

Page 41: Multi-Agent Learning For CS 7631: Multi Robot Systems Azfar Aziz

Optimizing Interdependent Skills: Open-Loop Approach

Does not rely on corrective feedback

Simpler to implement

Yield faster walks

Less robust to disturbance