dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...

DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor

Networks

Manish JainMatthew E.

TaylorMakoto YokooMilindTambe

1Manish Jain

MotivationReal-world Applications of Mobile

Sensor Networks◦Robots in an urban setting◦Autonomous Under-water vehicles

2Manish Jain

Challenges

Rewards are unknown

Limited time-horizon

Anytime performance is important

3Manish Jain

Distributed Constraint Optimization for sensor networks◦[Lesser03, Zhang03, …]

Mobile Sensor Nets for Communication ◦[Cheng2005, Marden07, …]

Factor Graphs◦[Farinelli08, …]

Swarm Intelligence, Potential Games

Other Robotic Approaches …

Existing Models

Manish Jain 4

ContributionsPropose new algorithms for DCOPs

Seamlessly interleave Distributed Exploration and Distributed Exploitation

Tests on physical hardware

5Manish Jain

OutlineBackground on DCOPs

Solution Techniques

Experimental Results

Conclusions and Future Work

6Manish Jain

a2 a3 Reward

a1 a2 Reward

DCOP Framework

a1 a2 a3

7Manish Jain

Applying DCOP

Manish Jain 8

DCOP Construct Domain Equivalent

Agents Robots

Agent Values Set of Possible Locations

Reward on the Link Signal Strength between neighbors

Objective: Maximize Net Reward

Objective: Maximize net signal strength

k-Optimality [Pearce07]

1-optimal solutions: all or all R< > = 12

R< > = 6

a2 a3 Reward

a1 a2 Reward

a1 a2 a3

9Manish Jain

MGM-Omniscient

a_i a_j Reward

Manish Jain

MGM-Omniscient

Manish Jain

a_i a_j Reward

MGM-Omniscient

a_i a_j Reward

10 12 10

Manish Jain

MGM-Omniscient

a_i a_j Reward

10 12 10

Only one agent per neighborhood allowed to change

Monotonic Algorithm13

Manish Jain

Solution TechniquesStatic Estimation

◦SE-Optimistic◦SE-Realistic

Balanced Exploration using Decision Theory◦BE-Backtrack◦BE-Rebid◦BE-Stay

Manish Jain

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

Manish Jain

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

SE-Realistic◦More conservative – assume

exploration gives mean reward◦Faster convergence

Manish Jain

Balanced Exploration Techniques

BE-Backtrack◦Decision Theoretic Limit on

exploration◦Track previous best location Rb

◦State of the agent: (Rb,T)

Manish Jain

Manish Jain 19

Manish Jain 20

Utility of Exploration

Manish Jain 21

Utility of Backtrack after

Successful Exploration

Manish Jain 22

Utility of Backtrack after Unsuccessful Exploration

BE-Rebid

◦Allows agents to backtrack

◦Re-evaluate every time-step

◦Allows for on-the-flyreasoning

◦Same equations as BE-Backtrack

Manish Jain

BE-Stay◦Agents unable to backtrack◦Dynamic Programming Approach

Manish Jain

Results

Manish Jain

Results

Manish Jain

Learning Curve (20 agents, chain, 100 rounds)

Results (simulation)

Manish Jain

(chain topology, 100 rounds)

5 15 30 500

Varying Number of RobotsSE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Manish Jain

5 25 50 75 1000

0.8Varying Total Number of Rounds

SE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

(10 agents, random graphs with 15-20 links)

Manish Jain

Chain Density = 1/3 Density = 2/3 Full0

Varying TopologySE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

(20 agents, 100 rounds)

Results (physical robots)

Manish Jain

Results (physical robots)

Manish Jain

Chain Random Fully Connected

1000Physical Robot Results

SE-Mean BE-Rebid

(4 robots, 20 rounds)

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

Manish Jain

Future WorkScaling up the evaluation

◦different approaches◦different parameter settings

Examine alternate metrics ◦battery drain◦throughput◦cost to movement

Verify algorithms in other domains

Manish Jain 33

Manish Jain

Thank You

manish.jain@usc.eduhttp://teamcore.usc.edu/

manish

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

Manish Jain

manish.jain@usc.eduhttp://teamcore.usc.edu/

manish

dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...

manish jain slide

dcop manish jain

utility of exploration

unsuccessful exploration

distributed exploration

net signal strength

mgmomniscient a1 a2

dcop framework a1 a2

Documents

hand crosscheck hslm1. position of ref dcops center mab...

registration brochure · darrell whitley (sp3) distributed...

distributed constraint satisfaction: foundation of...

er-dcops: a framework for distributed constraint ... ·...

flowers (makoto yamaguchi)

makoto yamaguchi - kusudamas 1

tadanori yokoo: clash of reality & ideals

flores - makoto yamaguchi

makoto taguchi

makoto pro events

intervenant : shin yokoo - archi

shinkai makoto

dcops daq status

makoto shinohara - kassouga

realnetworks april 15th, 2000. realnetworks april 15th, 2000...

makoto assembly manual

d ocper c ontractor o nline p rocessing s ystem dcops

makoto & sho pecha kucha

makoto dimout - perpetuum.ro

makoto takeuchi work book