language and reasoning diversity in grounded natural ... · language and reasoning diversity in...
TRANSCRIPT
![Page 1: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/1.jpg)
Language and Reasoning Diversity in Grounded Natural Language Understanding
Yoav Artzi
SiVL, NAACL 2019
![Page 2: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/2.jpg)
TodayUnderstanding Acting
NLVRNLVR2 (nlvr.ai)
Touchdown (touchdown.ai)DRIF
• Robustness to biases • Language and
reasoning diversity
• Real-life input • Robotic agents
![Page 3: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/3.jpg)
Biases and Reasoning Diversity
What is the dog carrying?Stick
VQA• Implicit biases • Relatively simple language
CLEVR NLVR
Are there an equal number of large things and metal spheres?
Yes
there are exactly three squares not touching any edge
TRUE
![Page 4: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/4.jpg)
Natural Language Visual Reasoning (NLVR)
• Isolates compositional reasoning problem • Box structure encourages set and comparison reasoning • Controlled environment → focus sentences on specific phenomena • Compare and contrast for balanced data • But: synthetic vision and limited lexical diversity
there are exactly three squares not touching any edge TRUE
![Page 5: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/5.jpg)
Natural Language Visual Reasoning (NLVR)
• Isolates compositional reasoning problem • Box structure encourages set and comparison reasoning • Controlled environment → focus sentences on specific phenomena • Compare and contrast for balanced data • But: synthetic vision and limited lexical diversity
there are exactly three squares not touching any edge
TRUE
FALSE
![Page 6: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/6.jpg)
Natural Language Visual Reasoning (NLVR)
• No control of image content
• No box structure for set reasoning • Can’t generate images for compare and contrast
there are exactly three squares not touching any edge
TRUE
How to generalize this type of data to real images?
![Page 7: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/7.jpg)
Natural Language Visual Reasoning for Real (NLVR2)
One image shows exactly two brown acorns in back-to-back caps on green foliage
Task: Determine whether the sentence is true or false about the pair of images
![Page 8: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/8.jpg)
Natural Language Visual Reasoning for Real (NLVR2)
One image shows exactly two brown acorns in back-to-back caps on green foliage
FALSE
Task: Determine whether the sentence is true or false about the pair of images
![Page 9: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/9.jpg)
NLVR2
• Re-creates the NLVR setup with real web images
• Natural language data
• Paired images analogous to boxes
• Compare and contrast to create balanced data
One image shows exactly two brown acorns in back-to-back caps on green foliage
FALSE
![Page 10: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/10.jpg)
NLVR2One image shows exactly two brown acorns in
back-to-back caps on green foliage
FALSE
FALSE
TRUE
TRUE
![Page 11: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/11.jpg)
Data Collection
• Collecting images using search engines
• Sentence writing using compare and contrast
• Validation
![Page 12: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/12.jpg)
Image Collection1. Pick 124 synsets from ImageNet
Chose synsets that would often appear multiple times in one image: e.g., acorn >> sump pump
- Allows use of ImageNet models and tools - Allows for weak annotation of image content
🔍 acorn
![Page 13: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/13.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images Combine synset names with numerical phrases, hypernyms, and similar words
🔍 two acorns
![Page 14: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/14.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images
Don’t contain synset, drawings, inappropriate content
✘
![Page 15: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/15.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images 4. Construct sets of eight images
Each set must contain at least three interesting images (e.g., multiple objects)
![Page 16: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/16.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images 4. Construct sets of eight images
Each set must contain at least three interesting images (e.g., multiple objects)
![Page 17: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/17.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images 4. Construct sets of eight images
Each set must contain at least three interesting images (e.g., multiple objects)
![Page 18: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/18.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images 4. Construct sets of eight images
Each set must contain at least three interesting images (e.g., multiple objects)
✘
![Page 19: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/19.jpg)
Image Collection1. Pick 124 synsets from ImageNet 2. Generate and execute search queries and get
similar images 3. Remove low-quality images 4. Construct sets of eight images
Each set must contain at least three interesting images (e.g., multiple objects)
Set of eight
images
![Page 20: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/20.jpg)
Sentence Writing5. Display a set of randomly paired images
6. Ask workers to select two pairs
7. Workers write a sentence true about the selected pairs, but false about the others
![Page 21: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/21.jpg)
Sentence Writing5. Display a set of randomly paired images
6. Ask workers to select two pairs
7. Workers write a sentence true about the selected pairs, but false about the others
![Page 22: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/22.jpg)
Sentence Writing5. Display a set of randomly paired images
6. Ask workers to select two pairs
7. Workers write a sentence true about the selected pairs, but false about the others
✔
✔
![Page 23: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/23.jpg)
Sentence Writing5. Display a set of randomly paired images
6. Ask workers to select two pairs
7. Workers write a sentence true about the selected pairs, but false about the others
✔
✔
One image shows exactly two brown acorns in back-to-
back caps on green foliage
![Page 24: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/24.jpg)
Validation8. Show each images/sentence pair to another
worker and ask them to label it
One image shows exactly two brown acorns in back-to-back caps on green foliage
TRUE
FALSE ✔
![Page 25: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/25.jpg)
Statistics• 107,296 total examples
• 29,680 unique sentences • 127,506 unique images • 80% train, 20% evenly split to dev and two test sets
• Agreement: near perfect (α = 0.912, κ = 0.889) • Total cost: $19,282.99
• Average sentence length: 14.8 tokens
• Vocabulary size: ~7,500 word types
![Page 26: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/26.jpg)
Related ResourcesTask Real Images Natural Language
VQA QA ✔ ✔
COCO Captions Caption generation ✔ ✔
CLEVR QA ✗ ✗
CLEVR-Humans QA ✗ ✔
GQA QA ✔ ✗
NLVR Binary classification ✗ ✔
NLVR2 Binary classification ✔ ✔
![Page 27: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/27.jpg)
Sentence Length
0
6
12
18
24
30
1 6 11 16 21 26 31 36 41
NLVR2 VQA real imagesNLVR VQA abstract imagesGQA MSCOCOCLEVR-Humans CLEVR
![Page 28: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/28.jpg)
Linguistic Analysis• Analyze 13 semantic and syntactic categories
• Sampled 800 sentences
• Compare to 200 sentences from GQA, VQA, and NLVR
• Release scripts to break down system performance according to categories
![Page 29: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/29.jpg)
Linguistic Analysis
75
NLVR2 NLVR VQA GQA
30
20
35
10
15
70
10
Hard cardinality
Soft cardinality
Coordination
Negation Universal quantifiers
Coreference
Spatial relations
Comparisons
![Page 30: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/30.jpg)
Soft Cardinality
30
01
23.622.5
NLVR2 NLVR VQA GQA
Soft cardinality
One image contains a single vulture in a standing pose
with its head and body facing leftward, and the other image contains a group of at least
eight vultures.
TRUE
FALSE
![Page 31: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/31.jpg)
Negation
Negation
10
2.51
9.59.6
NLVR2 NLVR VQA GQA
One dog sled team is moving and one is not
TRUE
FALSE
![Page 32: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/32.jpg)
Universal Quantifiers
Universal quantifiers
20
4.51
7.516.8
NLVR2 NLVR VQA GQA
All the chairs have backsTRUE
FALSE
![Page 33: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/33.jpg)
Comparisons
Comparisons
10
21
38
NLVR2 NLVR VQA GQA
TRUE
FALSE
There are more birds in the image on the left than in the image on the right
![Page 34: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/34.jpg)
Evaluation
• Accuracy
• Consistency
• Proportion of unique sentences for which predictions are correct for all paired images
[Goldman et al. 2018]
![Page 35: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/35.jpg)
Baselines
Text only
(RNN)
Image only
(CNN)
CNN+RNN Object DetectionMaxEnt
53.553.251.951.4
Accuracy Consistency
Majority class
51.4
Unreleased test set
• Robust to single-modality biases
• MaxEnt on top of detector does best12.011.27.14.6
Accuracy and consistency are not to scale
Human 96.1
![Page 36: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/36.jpg)
SOTA Visual Reasoning
N2NMN FiLM MAC-Network Object DetectionMaxEnt
53.551.253.051.5
Accuracy Consistency
Majority class
51.4
Unreleased test set
• SOTA methods perform poorly
• CLEVR-NLVR2 performance mismatch12.011.210.6
5.0
Accuracy and consistency are not to scale
Hu et al. 2017
Perez et al. 2017
Hudson et al. 2018
CLEVR 83.7% 97.7% 98.9%
Human 96.1
![Page 37: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/37.jpg)
TodayUnderstanding Acting
NLVRNLVR2 (nlvr.ai)
Touchdown (touchdown.ai)DRIF
• Robustness to biases • Language and
reasoning diversity
• Real-life input • Robotic agents
![Page 38: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/38.jpg)
Realistic Environments• Most research on
instruction following uses simple simulated environments
• Existing physical environments are simple and built in the lab
• Real-life environments are both visually and distributionally different
Chalet [Yan et al. 2018; Misra et al. 2018]
Take watermelon, oranges, and cucumber from the counter. Put them …
![Page 39: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/39.jpg)
The Environment
• Google Street View panoramas
• 29,941 panoramas
• 61,319 edges
• 122,638 states for discrete navigation
90°
145°
31°
270°
325°
211°
![Page 40: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/40.jpg)
Task-focused Navigation• Writing task: instruct to follow a path and describe
the location of an object they hide
• The focused task makes the instruction more natural for the writer
• Guide workers not to count intersections and not to use text and store names
• What do we hide?
![Page 41: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/41.jpg)
![Page 42: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/42.jpg)
Example 1
Orient yourself so that the umbrellas are to the right. Go straight and take a right at the first intersection. At the next intersection there should be an old-fashioned store to the left. There is also a dinosaur mural to the right. Touchdown is on the back of the dinosaur.
Backup
![Page 43: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/43.jpg)
Task-focused Navigation• This formulation allows for multiple tasks:
• Navigation only: given instruction and a starting point, navigate to the goal position
• Spatial description resolution (SDR) only: given a sentence and a panorama, find Touchdown
• The complete task: navigate first, and then find Touchdown
![Page 44: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/44.jpg)
Data Collection
• A sequence of four tasks on Mechanical Turk
• Writing, propagation, validation, and segmentation
• Workers use a customized StreetView environment
![Page 45: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/45.jpg)
Task I: Writing
Place Touchdown
Can’t Place Touchdown
Turn so that the trees are to your left. At the first intersection, turn left and stop.
![Page 46: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/46.jpg)
Task I: Writing
Place Touchdown
Can’t Place Touchdown
Turn so that the trees are to your left. At the first intersection, turn left and stop.
Place Touchdown
Can’t Place Touchdown
Turn so that the trees are to your left. At the first intersection, turn left and stop. Touchdown is on top of the blue mailbox on the right hand corner.
![Page 47: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/47.jpg)
Task I: Writing
Turn so that the trees are to your left. At the first intersection, turn left and stop.
Place Touchdown
Can’t Place Touchdown
Turn so that the trees are to your left. At the first intersection, turn left and stop. Touchdown is on top of the blue mailbox on the right hand corner.
Place Touchdown
Can’t Place Touchdown
![Page 48: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/48.jpg)
Task II: Propagation
• Touchdown position may be visible from multiple panoramas
• We propagate the location to neighboring panoramas
Turn so that the trees are to your left. At the first intersection, turn left and stop. Touchdown is on top of the blue mailbox on the right hand corner.
Place Touchdown
Bear is Occluded
![Page 49: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/49.jpg)
Task III: Validation
• Validate instruction by finding Touchdown
• Easy to verify
• Give bonuses to original writer and follower if successful
Turn so that the trees are to your left. At the first intersection, turn left and stop. Touchdown is on top of the blue mailbox on the right hand corner.
Remaining Attempts: 2
You Found Touchdown!
![Page 50: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/50.jpg)
Task IV: Task Segmentation
• Segment the text to the two tasks: navigation and SDR
• Segments may overlap
Touchdown is on top of the blue mailbox on the right hand corner.
Turn so that the trees are to your left. At the first intersection, turn left and stop. Touchdown is on top of the blue mailbox on the right hand corner.
Target Location Instructions:
Submit
![Page 51: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/51.jpg)
What Did We Get?
• Over 200 people wrote and validated instructions
• Collected 9,326 examples, split to 6,526/1,391/1,409 for train/dev/test
![Page 52: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/52.jpg)
Analysis• Average length is 108 tokens on average
• 89.6 for navigation, compared to 29.3 in R2R
• 29.8 for SDR, compared to 8.5 in Google RefExp and 4.4 in ReferItGame
• Relatively large vocabulary size of 5,625, compared 3,156 for R2R
• Paths are on average 35.2 panoramas, compared to 6 in R2R
[R2R: Anderson et al. 2018; Google RefExp: Mao et al. 2016; ReferItGame: Kazemzadeh et al. 2014]
![Page 53: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/53.jpg)
Linguistic Analysis
• Sampled 25 examples from Touchdown and R2R
• Analyzed for 11 semantic categories
• Report the mean number of instances per example (more analysis in the paper)
![Page 54: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/54.jpg)
Linguistic Analysis…You’ll pass three trashcans on your left …
11.0
Touchdown R2REntity reference
… There is a fire hydrant, the bear is on top …
3.0
Allocentric spatial relation
… up ahead there is some flag poles on your right hand side …
Egocentric spatial relation
4.0
… Follow the road until you see a school on your right …
2.0
Temporal condition
… You should see a small bridge ahead …2.0
State verification
… a brownish colored brick building with a black fence around it …
2.5
Coreference
![Page 55: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/55.jpg)
Spatial Description Resolution
There is also a dinosaur mural to the right. Touchdown is on the back of the dinosaur.
Where is Touchdown?
![Page 56: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/56.jpg)
SDR Evaluation
• Accuracy: predicting the position close enough to the gold position (threshold: 80px)
• Consistency: consider a unique SDR as correct only if solved for all propagated panoramas
• Mean distance error: the distance of the predicted position from the gold position
![Page 57: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/57.jpg)
14.6
11.7
0.60.0
Test Results
Random Average Text2Conv LingUNet
34.630.4
5.20.8
Accuracy Consistency Distance
• LingUNet is able to solve some cases
• But there is a lot of room for improvement
Accuracy, consistency, and distance are not to scale
708747744
1,179
[Blukis et al. 2018]
![Page 58: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/58.jpg)
Example: LingUNeta black doorway with red brick to the right of it, and
green brick to the left of it. it has a light just above the doorway, and on that light is where you find Touchdown
❌
![Page 59: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/59.jpg)
Navigation
Orient yourself so that the umbrellas are to the right. Go straight and take a right at the first intersection. At the next intersection there should be an old-fashioned store to the left. There is also a dinosaur mural to the right.
![Page 60: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/60.jpg)
Navigation Evaluation• Accuracy: stopping at the annotated goal
panorama, or to one of the propagated panoramas
• Mean distance error: the shortest-path distance between the stopping position and the goal
• Success-weighted by edit distance (SED)
![Page 61: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/61.jpg)
Success weighted by Edit Distance (SED)
• Measure edit distance between reference and prediction • Weight success by distance • The closer the agent is to the correct execution, success
is considered better
![Page 62: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/62.jpg)
Test Results
• Non-learning models show the task is challenging
• No model learns effectively
Accuracy, distance, and SED are not to scale
0.104
0.054
0.0010
Stop Random Gated-attention Concat
10.7
5.5
0.20.0
Accuracy Distance SED
19.521.3
26.927.0
[Chaplot et al. 2018] [Mirowski et al. 2018]
![Page 63: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/63.jpg)
TodayUnderstanding Acting
NLVRNLVR2 (nlvr.ai)
Touchdown (touchdown.ai)DRIF
• Robustness to biases • Language and
reasoning diversity
• Real-life input • Robotic agents
![Page 64: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/64.jpg)
Dynamic Robot Instruction Following (DRIF)
STOPGo towards the blue fence passing the anvil
and tree on the rightf( )=, , vt
ωt
Linear forward velocity
Angular yaw rate
![Page 65: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/65.jpg)
after the blue bale take a right towards the small white bush before the white bush take a right and
head towards the right side of the banana
![Page 66: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/66.jpg)
TodayUnderstanding Acting
NLVRNLVR2 (nlvr.ai)
Touchdown (touchdown.ai)
DRIF
Alane Suhr
Howard Chen
Valts BlukisStephanie Zhou
(now UMD)
Facebook AI Research
![Page 67: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/67.jpg)
TodayUnderstanding Acting
NLVRNLVR2 (nlvr.ai)
Touchdown (touchdown.ai)
DRIF
• Robustness to biases • Language and
reasoning diversity
• Real-life input • Robotic agents
Alane Suhr
Howard Chen
Valts BlukisStephanie Zhou
(now UMD)
![Page 68: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/68.jpg)
Resources: Visual Understanding to Interaction
DRIF
touchdown.ai
nlvr.ai
CHALET
LANI
![Page 69: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/69.jpg)
[fin]
![Page 70: Language and Reasoning Diversity in Grounded Natural ... · Language and Reasoning Diversity in Grounded Natural Language Understanding Yoav Artzi SiVL, NAACL 2019](https://reader033.vdocuments.mx/reader033/viewer/2022051917/60093bd5bc5d54320e4049e6/html5/thumbnails/70.jpg)
Example 1 Video (Backup)
Orient yourself so that the umbrellas are to the right. Go straight and take a right at the first intersection. At the next intersection there should be an old-fashioned store to the left. There is also a dinosaur mural to the right. Touchdown is on the back of the dinosaur.
Back