international conference on computer vision (iccv 2017) li...

42
Presented by: Gursimran Singh Borna Ghotbi {msimar,bgotbi}@cs.ubc.ca Inferring and executing programs for Visual Reasoning Justin Johnson, Bharath Hariharan, Laurens Maaten, Judy Hoffman, Li Fei-Fei, C.Lawrence Zitnick, Ross Girshick Stanford University, Facebook Research International Conference on Computer Vision (ICCV 2017) CPSC 532L presentation 1

Upload: others

Post on 25-Sep-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Presented by:Gursimran SinghBorna Ghotbi{msimar,bgotbi}@cs.ubc.ca

Inferring and executing programs for Visual Reasoning

Justin Johnson, Bharath Hariharan, Laurens Maaten, Judy Hoffman, Li Fei-Fei, C.Lawrence Zitnick, Ross Girshick

Stanford University, Facebook ResearchInternational Conference on Computer Vision (ICCV 2017)

CPSC 532L presentation

1

Page 2: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Visual question answering

LSTM

CNN

MERGE Predict

Woman

Who is wearing glasses?

Antol etal. Vqa: Visual question answering : Proceedings of the IEEE International Conference on Computer Vision2

● Generalizes well to new kinds of questions○ who is wearing spectacles; how many people?

Page 3: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

3Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 4: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

4

Identify big sphere

Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 5: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

5

Identify big sphere

Spheres on left

Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 6: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

6

Identify big sphere

Spheres on left

Rubber cylinder

Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 7: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

7

Identify big sphere

Spheres on left

Rubber cylinder

Sphere of same color

Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 8: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Compositional visual reasoning

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder? A:1

8

Identify big sphere

Spheres on left

Rubber cylinder

Sphere of same color

Count

Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017

Page 9: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

LSTM

CNN

MERGE Predict

Answer?

9

● Can’t model complex questions

LIMITATIONS

Page 10: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

10

● Can’t model complex questions● Lacks composition

LIMITATIONS

LSTM

CNN

MERGE Predict

Answer?

Page 11: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

Cylinder?

Sphere?

Predict

Answer?

11

● Can’t model complex questions● Lacks composition

LIMITATIONS

Move Left

Spheres of same

color

Decompose the network into multiple modules

Page 12: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

● Can’t model complex questions● Lacks composition● Uses same structure

Q: How many objects are either red cylinders or metal objects?

LIMITATIONS

12

LSTM

CNN

MERGE Predict

Answer?

Page 13: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

● Can’t model complex questions● Lacks composition● Uses same structure

Q: How many objects are either red cylinders or metal objects?

LIMITATIONS

13● Use composition and structure

Solution

B

AH

GC

D

B

AH

G

D

Use separate networks for each question

Page 14: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Instead: consider a compositional model

Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder? Attributes identification

Counting objectsComparisonsSpatial relationshipsLogical operations

Q: Is the big sphere the same material as the thing on the right of the cube?

Network architecture corresponding to the

third question

Common operations

14

Page 15: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Overview of approach

15Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

Page 16: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Overview of approach

16Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

Page 17: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Module networks

17Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

NLPSemantic

Parser

Page 18: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Module networks

18Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

NLPSemantic

Parser

18

Page 19: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Modules recap

19

Andreas etal; Deep Compositional Question Answering with Neural Module Networks: arxiv 2017

Page 20: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Module networks - limitations

20Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

NLPSemantic

Parser

20

Trained separatelyUses some pre-trained parser

Page 21: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Inferring and executing programs

21Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

Trained end-end!!!

Page 22: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Inferring and executing programs

22Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk

Page 23: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Execution engine

23

Page 24: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Modules architectures

24

a) Visual feature extraction

b.1) Unary modules

b.2) Binary modules

d) Classifier

Page 25: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

What do the modules learn?

25

Page 26: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Training● Train Program Generator● Freeze Program Generator,

Train Execution Engine● Finetune

26

Reinforce

Page 27: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Clever datasetA training set of 70,000 images and 699,989 questions

● A validation set of 15,000 images and 149,991 questions

● A test set of 15,000 images and 14,988 questions

● Answers for all train and val questions

● Scene graph annotations for train and val images giving ground-truth

locations, attributes, and relationships for objects

● Objects can be cubes, cylinders and spheres.27

Page 28: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments: Baselines

28

Page 29: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments: Strongly and semi-supervised learning

29

Page 30: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments

30

Generalizing to new attribute combinations

Page 31: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments

31

Generalizing to new question types

Short: all questions which their questions family has a mean program length less than 16

Long: otherwise

Page 32: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments

32

The CLEVR-Humans Dataset

● Use of questions that are hard to answer for a “smart robot”● Filtered questions by asking three workers to answer them and removing

those that a majority of workers answers incorrectly.● About 17000 training questions and 7000 validation and test questions on● CLEVR images.

Page 33: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Experiments

33

Human-composed questions

Page 34: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Results

34

Page 35: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Other approaches

35

Andreas et al. ICCV 2017 Santoro et al. arXiv 2017

Page 36: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Strengths and weaknessesStrengths

● Novel idea of using compositional reasoning to answer complex questions● Train program generator on questions using LSTMs● Training the whole network end to end

Weaknesses

● Not enough results on real images!● More complex questions may not work properly

36

Page 37: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Future works/ possible improvementsIdeas taken from paper

● Adding ternary operations (if/else/then) and loops (for, do) to answer questions like “What color is the object with a unique shape?” .

● Control-flow operators could be incorporated into the framework● Learning programs with limited supervision

Our ideas

● Using treeRNNs to synthesize programs● Testing the whole framework on real images

37

Page 38: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Conclusion● This method outperforms previous baselines.● Neural module networks are a more natural way to reproduce reasoning step.● More flexibility in the composition of the neural module network as modules

have generic architectures.

38

Page 39: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

References● Inferring and Executing Programs for Visual Reasoning● CLEVR: A Diagnostic Dataset for Compositional Language and Elementary

Visual Reasoning● Talk - https://www.youtube.com/watch?v=3pCLma2FqSk● Learning to Reason: End-to-End Module Networks for Visual Question

Answering● https://github.com/facebookresearch/clevr-iep● Deep Compositional Question Answering with Neural Module Networks

39

Page 40: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Thanks!

40

Page 41: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Visual question answering

LSTM

CNN

MERGE Predict

Woman

Who is wearing hat?

● But does not really understand the question; same answer for○ who is wearing hat? who is wearing?; wearing?

41Antol etal. Vqa: Visual question answering : Proceedings of the IEEE International Conference on Computer Vision

Page 42: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the

Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?

Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?

INPUT Predict

Answer?

42

IdentifySphere Left/ Right

Decompose the network!!