visual7w grounded question answering in images

20

Visual7W Grounded Question Answering in Images Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei Slides by Issey Masuda Mora Computer Vision Reading Group (09/05/2016) [arXiv ] [web ] [GitHub ]

Upload: universitat-politecnica-de-catalunya

Post on 22-Jan-2018

384 views

Category:

Technology

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Visual7W Grounded Question Answering in Images

Visual7WGrounded Question Answering in Images

Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei

Slides by Issey Masuda MoraComputer Vision Reading Group (09/05/2016)

[arXiv] [web] [GitHub]

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

http://arxiv.org/abs/1511.03416

http://web.stanford.edu/~yukez/visual7w/

https://github.com/yukezhu/visual7w-toolkit

Page 2: Visual7W Grounded Question Answering in Images

Context

Page 3: Visual7W Grounded Question Answering in Images

Visual Question Answering

Goal: predict the answer of a given question related to an image

Page 4: Visual7W Grounded Question Answering in Images

Motivation

New Turing test? How to evaluate AI’s image understanding?

Page 5: Visual7W Grounded Question Answering in Images

Visual7W

Page 6: Visual7W Grounded Question Answering in Images

The 7W

WHAT

WHERE

WHEN

WHO

WHY

HOW

WHICH

Questions: multi-choice4 candidates, only one correct

Page 7: Visual7W Grounded Question Answering in Images

Grounding: image-text correspondencesExploit the relation between image regions and nouns in the questions

Page 8: Visual7W Grounded Question Answering in Images

The new answer is...Question-Answer types:

● Telling questions: the answer is text

● Pointing questions: a new type of QA that they introduce where the answers are image regions

Page 9: Visual7W Grounded Question Answering in Images

Related work

Page 10: Visual7W Grounded Question Answering in Images

Common approach

Who is under the umbrella?

Extract visual features

Embedding

Merge Predict answer Two women

Page 11: Visual7W Grounded Question Answering in Images

The Dataset

Page 12: Visual7W Grounded Question Answering in Images

Visual7W DatasetCharacteristics:

● 47.300 images from COCO dataset● 327.939 QA pairs● 561.459 object bounding boxes spread across 36.579 categories

Page 13: Visual7W Grounded Question Answering in Images

Creating the DatasetProcedure:

● Write QA pairs● 3 AMT workers evaluate as good or bad each pair● Only the ones with at least 2 good evaluations are considered● Write the 3 wrong answers (having the right one)● Extract object names and draw bounding boxes for each one

Page 14: Visual7W Grounded Question Answering in Images

The Model

Page 15: Visual7W Grounded Question Answering in Images

Attention-based modelPointing questions model

Page 16: Visual7W Grounded Question Answering in Images

Experiments & Results

Page 17: Visual7W Grounded Question Answering in Images

ExperimentsDifferent experiments have been conducted depending on the information given to the subject:

● Only the question● Question + Image

Subjects/models:

● Human● Logistic regression● LSTM● LSTM + attention model

Page 18: Visual7W Grounded Question Answering in Images

Results

Page 19: Visual7W Grounded Question Answering in Images

Conclusions

Page 20: Visual7W Grounded Question Answering in Images

Conclusions

● Visual QA model has been presented● Attention model to focus on local regions of the image● Dataset created with goundings

Overcoming Language Priors in VQA via Decomposed ......perform poorly. Along with the VQA-CP dataset, they pro-posed a grounded visual question answering (GVQA) model that disentangles

Grounded Classification: Grounded Theory and Faceted Classification

Grounded Theory :

Electrical safety in Grounded, Resistance Grounded and UG Systems

Question Answering Question Answering Question Answering Structure Survey Structure Survey Structure Survey Dictation Dictation

Grounded Abstractions

Interpretive Grounded

Grounded Semantic Composition for Visual Scenes · 2006. 1. 11. · Grounded Semantic Composition for Visual Scenes 1.1 Grounded Semantic Composition We use the term grounded semantic

Grounded theory.ppt

GROUNDED DEVELOPING GROUNDED THEORY - CRC · PDF fileDEVELOPING GROUNDED THEORY ... Grounded theory is the most popular genre of qualitative research used in the ... I have chosen

GROUNDED - extension.wsu.edu

Grounded Semantic Composition for Visual Scenes · Grounded Semantic Composition for Visual Scenes 1.1 Grounded Semantic Composition We use the term grounded semantic composition

Grounded - Concordia

GROUNDED THEORY Creswell Qualitative Inquiry 2e Grounded Theory 1

Transformational Grounded Theory: Theory, Voice, and Action · 2015. 7. 6. · expanding grounded theory methodology, explain transformational grounded theory as a critical grounded

StructuredTriplet Learning with POS -tag Guided Attention for … · YZhu,O,Groth,M.Bernstein,Fei-Fei Li.“Visual7W: Grounded Question Answering in Images”,CVPR, 2016 Multiple

Visual7W: Grounded Question Answering in ImagesThe where, when and why questions usually involve high-level common sense reasoning. The second row The second row depicts pointing (which)

Grounded Theory

Visual7W: Grounded Question Answering in Imagesiie.fing.edu.uy/~mdelbra/DL2017/papers/17_2016_Zhu_CVPR.pdf · Visual7W: Grounded Question Answering in Images Yuke Zhu† Oliver Groth‡

Answering the Call Answering T the Call he General Board ... · Candidacy Guidebook. Candidacy Guidebook Answering the Call. Answering the Call. Candidacy Guidebook Answering the

Don’t Just Assume; Look and Answer: Overcoming Priors for …sunw.csail.mit.edu/abstract/vqa-prior.pdf · Visual7w: Grounded question answering in images. In Pro-ceedings of the

Grounded Natural Language - Angel Xuan Chang | Angel Xuan ...€¦ · Generation and Summarization Information Extraction and Question Answering Information Retrieval Language Resources

Visual7W: Grounded Question Answering in Imagesvision.stanford.edu/pdf/zhu2016cvpr.pdfFigure2: Examples of multiple-choice QA from the 7W question categories. The ﬁrst row shows

Introduction to Grounded Theory Hilary Engward Grounded Theory1

Grounded Consumer

Learning to Compose Neural Networks for Question Answering · and Mitchell (2013), Lewis and Steedman (2013), and Beltagy et al. (2013). The visually-grounded component of this work

Visual7W: Grounded Question Answering in Images · PDF filenew task of visual question answering (QA) has been pro-posed to evaluate a model’s capacity for deep image under-standing