aishwarya agrawal ph.d. student machine learning and ... · aishwarya agrawal ph.d. student machine...

55
Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab

Upload: others

Post on 28-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Aishwarya Agrawal

Ph.D. Student

Machine Learning and Perception Lab

Page 2: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

2

Page 3: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Identify objects in scene

3

sky

buscar

stop light

person

building

sidewalk

Page 4: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Identify attributes of objects

4

blue

sky

red

bus

many

cars

green

stop light

one

bicycle

tall

building

Page 5: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Identify activities in scene

5

person wearing a

helmet riding bicycle

man walking

on sidewalk

Page 6: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Identify the scene

6

street scene

Page 7: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Describe the scene

8

A person on bike going through

green light with bus nearby

Page 8: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

A giraffe standing in the

grass next to a tree.

11

Page 9: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

• Answer questions about the scene

– Q: How many buses are there?

– Q: What is the name of the street?

– Q: Is the man on bicycle wearing a

helmet?

13

Page 10: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

14

Page 11: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Visual Question Answering (VQA)

Task: Given an image and a natural language open-

ended question, generate a natural language answer.

15

Page 12: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Task

16

Page 13: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA CloudCV Demo

cloudcv.org/vqa/?useVoice=1&listenAnswer=1

17

Page 14: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Applications of VQA

• An aid to visually-impaired

Is it safe to cross the street now?

18

Page 15: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Applications of VQA

• Surveillance

What kind of car did the man in red shirt leave in?

19

Page 16: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Applications of VQA

• Interacting with robot

Is my laptop in my bedroom upstairs?

20

Page 17: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Dataset

21

Page 18: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Real images (from MSCOCO)

Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014.

http://mscoco.org/

22

Page 19: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Questions

Stump a smart robot!

Ask a question that a human can answer,

but a smart robot probably can’t!

23

Page 20: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Two modalities of answering

• Open Ended

• Multiple Choice

24

Page 21: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Open Ended Task

What is the girl holding in her hand?How many mirrors?Why is the girl holding an umbrella?

25

Page 22: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Multiple Choice Task

What is the bus number?

a) 3 b) 1 c) green d) 4 e) window trim f) blue

g) m5 h) corn, carrots, onions, rice i) red j) 125 k) san antonio l) sign pen

m) 478 n) no o) 25 p) 2 q) yes r) white

26

Page 23: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Dataset Stats

• >250K images (MSCOCO + 50K Abstract Scenes)

• >750K questions (3 per image)

• ~10M answers (10 w/ image + 3 w/o image)

27

Page 24: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Please visit www.visualqa.org for more details.

28

Page 25: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Browse the Dataset

http://visualqa.org/browser/

29

Page 26: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Questions

30

Page 27: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Dataset Visualization

http://visualqa.org/visualize/

32

Page 28: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Answers

• 38.4% of questions are binary yes/no

• 98.97% questions have answers <= 3 words

– 23k unique 1 word answers

33

Page 29: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Answers

34

Page 30: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

2-Channel VQA Model

Convolution Layer

+ Non-Linearity

Pooling Layer Convolution Layer

+ Non-Linearity

Pooling Layer Fully-Connected MLP

4096-dim

Embedding

Embedding

“How many horses are in this image?”

Neural Network

Softmax

over top K answers

Image

Question

36

1024-dim

Page 31: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Ablation #1: Language-alone

Convolution Layer

+ Non-Linearity

Pooling Layer Convolution Layer

+ Non-Linearity

Pooling Layer Fully-Connected MLP

1k output

units

EmbeddingNeural Network

Softmax

over top K answers

Image

“How many horses are in this image?”

Question Embedding

37

1024-dim

Page 32: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Ablation #2: Vision-alone

Convolution Layer

+ Non-Linearity

Pooling Layer Convolution Layer

+ Non-Linearity

Pooling Layer Fully-Connected MLP

4096-dim

EmbeddingNeural Network

Softmax

over top K answers

Image

“How many horses are in this image?”

Question Embedding

38

Page 33: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Accuracy Metric

39

Page 34: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Open-Ended Task Accuracies

40

Human Machine

25.14room for

improvement

Human vs. Machine performanceHuman performance

Page 35: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Results

41

Code available!

• Multiple-Choice > Open-Ended

• Question alone does quite well

• Image helps

Page 36: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Commonsense

• Does this person have 20/20 vision?

42

Page 37: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Does this question need commonsense?

43

Q: How many calories are in this pizza?

Page 38: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

How old does a person need to be?

44

Q: How many calories are in this pizza?

Page 39: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Most “commonsense” questions

45

Page 40: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Least “commonsense” questions

46

Page 41: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Spectrum

3-4 (15.3%) 5-8 (39.7%) 9-12 (28.4%) 13-17 (11.2%) 18+ (5.5%)

Is that a bird in the sky? How many pizzas are shown? Where was this picture taken? Is he likely to get mugged if he walked down a dark alleyway like this?

What type of architecture is this?

What color is the shoe? What are the sheep eating? What ceremony does the cake commemorate?

Is this a vegetarian meal? Is this a Flemish bricklaying pattern?

How many zebras are there? What color is his hair? Are these boats too tall to fit under the bridge?

What type of beverage is in the glass? How many calories are in this pizza?

Is there food on the table? What sport is being played? What is the name of the white shape under the batter?

Can you name the performer in the purple costume?

What government document is needed to partake in this activity?

Is this man wearing shoes? Name one ingredient in the skillet. Is this at the stadium? Besides these humans, what other animals eat here?

What is the make and model of this vehicle?

47

Page 42: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Question Average Age

what brand 12.5

why 11.18

what type 11.04

what kind 10.55

is this 10.13

what does 10.06

what time 9.81

who 9.58

where 9.54

which 9.32

does 9.29

do 9.23

what is 9.11

what are 9.04

are 8.65

is the 8.52

is there 8.24

what sport 8.06

how many 7.67

what animal 6.74

what color 6.6 48

Page 43: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Age

• Average “age of questions” = 8.98 years.

• Our model =* 4.74 years old!

* age as estimated by untrained crowd-sourced workers

49

Page 44: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Common sense• Average common sense required = 31%.

• Our best algorithm has* 17% common sense!

* as estimated by untrained crowd-sourced workers

50

Page 45: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Challenges on www.codalab.org

51

Page 46: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Challenge @ CVPR16

52

Page 47: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Challenge @ CVPR16

53

code available!

Page 48: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Workshop @ CVPR16

54

Page 49: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Papers using VQA

… and many more

55

Page 50: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Dataset: >1k downloads

Code: >1.5k views

Academia, industry, start ups

56

Page 51: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Conclusions

• VQA: Visual Question Answering

– The next “grand challenge” in vision, language, AI

• Spectrum: Easy to Difficult

– “What room is this?” Scene Recognition

– “How many …” Object Recognition

– …

– “Does this person have 20/20 vision” Common sense

• Exciting times ahead!

57

Page 52: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

VQA Team

Aishwarya Agrawal

Virginia Tech

Meg Mitchell

Microsoft Research

Dhruv Batra

Virginia Tech

Larry Zitnick

Facebook AI

Research

Jiasen Lu

Virginia Tech

Devi Parikh

Virginia Tech

Stanislaw Antol

Virginia Tech

Akrit Mohapatra

Virginia Tech

Webmaster

58

Page 53: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Closing Remarks

• CloudCV VQA Exhibition: Booth 101

• Contact email: [email protected]

• Please complete the Presenter Evaluation sent to

you by email or through the GTC Mobile App. Your

feedback is important!

59

Page 54: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Thanks!

Questions?

60

Page 55: Aishwarya Agrawal Ph.D. Student Machine Learning and ... · Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab. 2. Identify objects in scene 3 sky bus car stop light

Visual Question Answering (VQA)

61