two body problem: collaborativevisual task completion cvpr
TRANSCRIPT
Two Body Problem:Collaborative Visual Task Completion
CVPR 2019https://prior.allenai.org/projects/two-body-problem
�� �� ��������������������� ����
Adam (Iou-Jen) LiuUIUC
Ali FarhadiAI2, UW, XNOR.ai
Ani KembhaviAI2
Alex SchwingUIUC
Eric KolveAI2
Luca WeihsAI2
Mohammad RastegariAI2, XNOR.ai
Svetlana LazebnikUIUC
Collaborations
Typical visual tasks
Object detection[COCO]
Question AnsweringVQA
Redmon et al. YOLO9000: Better, Faster, StrongerAgarwal et al. VQA: Visual Question Answering
Visual embodied tasks
Zhu et al. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
Multi-agent virtual world: Existing work
Lowe et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Single-agent virtual world: Existing work
Kolve et al. AI2-THOR: An Interactive 3D Environment for Visual AI
Find and Lift Furniture
Agent 1 and Agent 21. Navigate to TV2. Collaborative pickup
Agent 1 quickly finds it
Agent 2 is on the wrongSide
Need for communication
WAL
L
Top-down view (for illustration)
Task Definition
Step 1: NavigationStep 2: Collaborative pickup
Three conditions for successful pickup, both agents need to:1. Be near the TV and looking at it2. Simultaneously execute PICKUP3. Be on different sides of the TV
Reward structure and Loss
• Rewards+1 for performing a successful joint pickup-0.01 for every time step-0.1 for a failed pickup action-0.02 for any other failed action
• Asynchronous advantage actor critic• Value loss
• Policy loss
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Asynchronous advantage actor critic (A3C) algorithm+ Imitation learning (vs shortest path oracle agent)
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Talk stage Reply stage
Two Body Network
Mnih et al. Asynchronous Methods for Deep Reinforcement LearningRoss et al. Reduction of Imitation Learning to No-Regret Online Learning
Talk stage Reply stage
Communication and Belief Refinement
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
Output 1
Output 2
Agent 1Agent 2
Communication and Belief Refinement
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
Output 1Agent 1Agent 2
Communication and Belief Refinement
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
Output 1
Output 2
Agent 1Agent 2
Talk and reply modules
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
Local belief(from LSTM)
Final belief(to actor-critic)
Agent 1Agent 2
Without explicit communication
Total steps: 165Unsuccessful pickups: 6
Top-down view (for illustration)
WAL
L
With explicit communication
Total steps: 86Unsuccessful pickups: 0
WAL
L
Top-down view (for illustration)
Interpretation of messages
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Interpretation of messages
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Interpretation of messages
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Interpretation of messages
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Interpretation of messages
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Communication symbols
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
!"# (Ag 1)!"# (Ag 2)
!$# (Ag 1)!$# (Ag 2)
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning 4. Two Body Network model
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning 4. Two Body Network model5. Comm. and BR module
, Softmax x =
Belief Refinement
Communication
TalkSymbols
Belief
, Softmax x
=
Belief Refinement
Communication
ReplySymbols Refined
Belief
Talk stage Reply stage
Refined Belief Belief
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning 4. Two Body Network model5. Comm. and BR module6. Interpretation of messages
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning 4. Two Body Network model5. Comm. and BR module6. Interpretation of messages7. Results on communication and
generalization
Summary1. Multi-agent simulation framework2. Collaborative find and lift task3. Reinforcement + Imitation Learning 4. Two Body Network model5. Comm. and BR module6. Interpretation of messages7. Results on communication and
generalization8. Visual vs Gridworld
Two Body Problem:Collaborative Visual Task Completion
CVPR 2019
Adam (Iou-Jen) LiuUIUC
Ali FarhadiAI2, UW, XNOR.ai
Ani KembhaviAI2
Alex SchwingUIUC
Eric KolveAI2
Luca WeihsAI2
M. RastegariAI2, XNOR.ai
Svetlana LazebnikUIUC
�� �� ��������������������� ����