collaborative manipulation in natural language
TRANSCRIPT
Spatial References and Perspective in Natural Language Instructions for
Collaborative Manipulation
Rosario Scalise, Shen LiHenny Admoni, Stephanie Rosenthal, Siddhartha S. Srinivasa
1
2
● Background, why tabletop is important● Problem: object uniqueness
○ Solution 1: spatial reference○ Solution 2: perspective
● Study 1○ Image generation○ Study design○ Result
■ Human vs robot■ Visual search + word frequencies■ Difficulty
● Study 2○ Data coding○ Study design○ Result
■ Block ambiguity■ Perspective
● Discussion○ 3 approaches to give instructions○ Block ambiguity and perspective ambiguity○ Neither perspective is the best○ Future work - interactivity
3
Herb image courtesy of Pittsburgh Post-Gazette
4
5
“I am going to pick up the cup on the right!”
6
Key Issue: Ambiguity
Question by Jessica Lock from the Noun Project
7
Key Issue: Ambiguity
As scene complexity increases, so does the difficulty in specifying an object.
8
Key Issue: Ambiguity
As scene complexity increases, so does the difficulty in specifying an object.
Natural language is inherently ambiguous.
Forms of Ambiguity
9
Visual Appearance
“Pick up the coffee cup.”
Forms of Ambiguity
10
Visual Appearance
“Pick up the coffee cup.”
Which one?
Forms of Ambiguity
11
Perspective
“Pick up the coffee cup on the right.”
Forms of Ambiguity
12
Perspective
“Pick up the coffee cup on the right.”
Whose right?
Forms of Ambiguity
13
Proximity
“Pick up the coffee cup next to the donuts.”
Forms of Ambiguity
14
Proximity
“Pick up the coffee cup next to the donuts.”
How close is ‘next to’?
15
16
Can youuniquely
describethis block?
How can we best overcome ambiguity when grounding our references while keeping communication natural?
17
Approach
Learn by observing what humans do and extract best-practices from the examples that are most successful.
18
19bender by Jordan Díaz Andrés from the Noun Project
20bender by Jordan Díaz Andrés from the Noun Project
Collect Corpus
21bender by Jordan Díaz Andrés from the Noun Project
Collect CorpusGain Insights
22bender by Jordan Díaz Andrés from the Noun Project
Collect CorpusGain Insights
Evaluate Corpus
23bender by Jordan Díaz Andrés from the Noun Project
Collect CorpusGain Insights
Evaluate CorpusExtract Guidelines
24bender by Jordan Díaz Andrés from the Noun Project
Collect CorpusGain Insights
Evaluate CorpusExtract Guidelines
+ Analysis Tools
25
26
Study 1 : Collecting Instructions for Corpus
27
Study 1 : Collecting Instructions for Corpus
28
person
person
Study 1 : Collecting Instructions for Corpus
29
robot
Study 1 : Collecting Instructions for Corpus
30
robot
1400 Total
Evaluating
31
How do we tell how good any specific instruction is?
“Pick up the blue block”
Evaluating
32
Given an instruction and the stimulus it corresponds to, can people infer the correct block?
“Pick up the blue block”
Evaluating
33
Given an instruction and the stimulus it corresponds to, can people infer the correct block?
“Pick up the blue block”
Study 2 : Corpus Evaluation
34
Metrics
35
For each instruction, we calculate:
Metrics
36
Accuracy: # of successful block selections
For each instruction, we calculate:
total # of times instruction is shown
Metrics
37
Accuracy: # of successful block selections
For each instruction, we calculate:
total # of times instruction is shown
Avg. Completion time: How long it takes to select the indicated block on average
38
Full investigation and results TBR in:
“Spatial References and Perspective in Natural Language Instructions for Collaborative Manipulation”
at IEEE Ro-Man 2016 (Late August)
Perspectives
39
40
Partner
Participant (Speaker)
Types of Perspective:
41
Partner
Participant (Speaker)
Partner:“Pick up the blue block on your left”
Types of Perspective:
42
Partner
Participant (Speaker)
Participant:“Pick up the blue block on my right”
Partner:“Pick up the blue block on your left”
Types of Perspective:
43
Partner
Participant (Speaker)
Participant:“Pick up the blue block on my right”
Partner:“Pick up the blue block on your left”
Neither:“Pick up the blue block closest to the orange block.”
Types of Perspective:
44
Partner
Participant (Speaker)
Participant:“Pick up the blue block on my right”
Partner:“Pick up the blue block on your left”
Neither:“Pick up the blue block closest to the orange block.”
Unknown:“Pick up the blue block to the right of the orange block.”
Types of Perspective:
Perspective vs
Accuracy and Completion Time
45
46
Pick up the box furthest to your left.
Partner perspective
Partner
Participant
47
Pick up the box furthest to your left.
Partner
Participant
48
Pick up the box furthest to your left.
Partner
Participant
49
Pick up the orange block closest to my right hand side.
Participant perspective
Partner
Participant
50
Pick up the orange block closest to my right hand side.
Partner
Participant
51
Pick up the orange block closest to my right hand side.
Partner
Participant
52
Please pick up the orange block that is closest to me.
Partner
Participant
Neither perspective
53
Please pick up the orange block that is closest to me.
Partner
Participant
54
Pick up the rightmost orange block
Partner
Participant
Right to ???
55
Pick up the rightmost orange block
Partner
Participant
Unknown perspective
Hypothesis:
Neither Perspective is better
56
57
58
Result:
Prefer Neither Perspective
59
Other Factors
60
61
Pick the blue block that is closer to you and right next to the yellow block
Partner
Participant
Neither perspective
62
Pick the blue block that is closer to you and right next to the yellow block
Partner
Participant
Neither perspective
63
Pick the blue block that is closer to you and right next to the yellow block
Partner
Participant
Neither perspective
64
Pick up the blue block on your far right.
Partner
Participant
Partner perspective
65
Pick up the blue block on your far right.
Partner
Participant
Partner perspective
Tradeoff
66
Robot Partner vs Human Partner
67
68
Robot Partner
Human Partner
Pick up the third blue block from your left
Spatial References and Perspective in Natural Language Instructions for
Collaborative Manipulation
Rosario Scalise, Shen [email protected], [email protected]
69
70
Thank You!
Learn More @ Poster Session
Investigated
Visual features
Perspectives
Dataset will be made available soon!
Perspectives