cs 2770: computer vision · 2019-11-19 · segmentation object detection instance segmentation...

114
CS 1674: Intro to Computer Vision Object Recognition Prof. Adriana Kovashka University of Pittsburgh November 19, 2019

Upload: others

Post on 11-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

CS 1674: Intro to Computer Vision

Object Recognition

Prof. Adriana KovashkaUniversity of Pittsburgh

November 19, 2019

Page 2: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Different Flavors of Object Recognition

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 2

Classification

+ Localization

Semantic

SegmentationObject

Detection

Instance

Segmentation

GRASS, CAT,

TREE, SKYCAT DOG, DOG, CAT DOG, DOG, CAT

Multiple ObjectNo objects, just pixels Single Object

May 10, 2017

Adapted from Justin Johnson

Page 3: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Plan for the next two lectures

• Detection approaches

– Pre-CNNs• Detection with whole windows: Pedestrian detection

• Part-based detection: Deformable Part Models

– Post-CNNs• Detection with region proposals: R-CNN, Fast R-CNN, Faster-R-CNN

• Detection without region proposals: YOLO, SSD

• Segmentation approaches

– Semantic segmentation: FCN

– Instance segmentation: Mask R-CNN

Page 4: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Object Detection

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

CATGRASS, CAT,

TREE, SKYDOG, DOG, CAT DOG, DOG, CAT

Single Object Multiple ObjectNo objects, just pixels

May 10, 201753

Slide by: Justin Johnson

Page 5: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Object detection: basic framework

• Build/train object model

• Generate candidate regions in new image

• Score the candidates

Adapted from Kristen Grauman

Page 6: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Window-template-based models

Building an object model

Car/non-car

Classifier

Yes, car.No, not a car.

Given the representation, train a binary classifier

Kristen Grauman

Page 7: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Window-template-based models

Generating and scoring candidates

Car/non-car

Classifier

Kristen Grauman

Page 8: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Window-template-based object detection: recap

Car/non-car

Classifier

Feature

extraction

Training examples

Training:

1. Obtain training data

2. Define features

3. Define classifier

Given new image:

1. Slide window

2. Score by classifier

Kristen Grauman

Page 9: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Evaluating detection methods

• True Positive - TP(c): a predicted bounding box (pred_bb) was made for class c, there is a ground truth bounding box (gt_bb) of class c, and IoU(pred_bb, gt_bb) >= 0.5.

• False Positive - FP(c): a pred_bb was made for class c, and there is no gt_bb of class c. Or there is a gt_bb of class c, but IoU(pred_bb, gt_bb) < 0.5.

Page 10: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Dalal-Triggs pedestrian detector

1. Extract fixed-sized (64x128 pixel) window at multiple positions and scales

2. Compute HOG (histogram of gradient) features within each window

3. Score the window with a linear SVM classifier

4. Perform non-maxima suppression to remove overlapping detections with lower scores

Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 11: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Histograms of oriented gradients (HOG)

Orientation: 9 bins

(for unsigned angles)Histograms in

8x8 pixel cells

Adapted from Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Votes weighted by magnitude

Divide image into 8x8 regions

Page 12: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Adapted from Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pos w

Train SVM for pedestrian detection using HoG

pedestrian

+

Page 13: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Remove overlapping detections

Non-max suppression

Score = 0.1

Score = 0.8 Score = 0.8

Adapted from Derek Hoiem

Page 14: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Are window templates enough?• Many objects are articulated, or have parts that

can vary in configuration

• Many object categories look very different from different viewpoints, or from instance to instance

Adapted from N. Snavely, D. Tran

Images from Caltech-256, D. Ramanan

Page 15: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Parts-based Models

Define object by collection of parts modeled by

1. Appearance

2. Spatial configuration

Slide credit: Rob Fergus

Page 16: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

How to model spatial relations?

• One extreme: fixed template

Derek Hoiem

Page 17: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fixed part-based template

• Object model = sum of scores of features at fixed positions

+3 +2 -2 -1 -2.5 = -0.5

+4 +1 +0.5 +3 +0.5 = 10.5

> 7.5?

> 7.5?

Non-object

Object

Derek Hoiem

Page 18: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

How to model spatial relations?

• Another extreme: bag of words

=

Derek Hoiem

Page 19: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

How to model spatial relations?

• Star-shaped model

=X X

X

Derek Hoiem

Page 20: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Parts-based Models

• Articulated parts model

– Object is configuration of parts

– Each part is detectable and can move around

Adapted from Derek Hoiem, images from Felzenszwalb

Page 21: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Deformable Part Models

P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection

with Discriminatively Trained Part Based Models, PAMI 32(9), 2010

Root

filterPart

filtersDeformation

weights

Lana Lazebnik

Page 22: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Scoring an object hypothesis

• The score of a hypothesis is

the sum of appearance scores

minus the sum of deformation costs

Felzenszwalb et al.

Appearance weights Part features Deformation weightsi.e. how much we’ll penalize the part pi

for moving from its expected location

Displacementsi.e. how much the part pi moved from its

expected anchor location in the x, y directions

part loc

(where we see part)

anchor loc (where we

expect to see part)

Page 23: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Detection

Felzenszwalb et al.

Page 24: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Car model

Component 1

Component 2

Lana Lazebnik

Page 25: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Car detections

Lana Lazebnik

Page 26: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Person model

Lana Lazebnik

Page 27: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Person detections

Lana Lazebnik

Page 28: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Cat model

Lana Lazebnik

Page 29: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Cat detections

Lana Lazebnik

Page 30: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

“Sliding window” detector

Page 31: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Plan for the next two lectures

• Detection approaches

– Pre-CNNs• Detection with whole windows: Pedestrian detection

• Part-based detection: Deformable Part Models

– Post-CNNs• Detection with region proposals: R-CNN, Fast R-CNN, Faster-R-CNN

• Detection without region proposals: YOLO, SSD

• Segmentation approaches

– Semantic segmentation: FCN

– Instance segmentation: Mask R-CNN

Page 32: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

0

10

20

30

40

50

70

60

mA

P(%

)

PASCAL VOC challenge dataset

Top

competition

results (2007 -

2012)

VOC’07 VOC’12

DPM

DPM,

HOG+BOW

DPM,

MKL

DPM++DPM++,

MKL,

Selective

Search

Selective

Search,

DPM++,

MKL

VOC’08 VOC’09 VOC’10 VOC’11

PASCAL VOC challenge dataset

41% 41%37%

28%23%

17%

plateau & increasing complexity

[Source: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc20{07,08,09,10,11,12}/results/index.html]

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Complexity and the plateau

Page 33: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Impact of Deep Learning

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 33

May 10, 2017

Slide by: Justin Johnson

Page 34: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Class Scores

Cat: 0.9

Dog: 0.05

Car: 0.01

...

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Vector:

4096

Fully-Connected:

4096 to 1000

May 10, 2017

Before: Image Classification with CNNs

Slide by: Justin Johnson

Page 35: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

CATGRASS, CAT,

TREE, SKYDOG, DOG, CAT DOG, DOG, CAT

Single Object Multiple ObjectNo objects, just pixels

May 10, 2017

Classification + Localization

Slide by: Justin Johnson

Page 36: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Class Scores

Cat: 0.9

Dog: 0.05

Car: 0.01

...

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Vector:

4096

Fully Connected: 4096 to 1000

Box

Coordinates

(x, y, w, h)

Fully Connected: 4096 to 4

May 10, 2017

Treat localization as a

regression problem!

Classification + Localization

Slide by: Justin Johnson

Page 37: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Class ScoresCat: 0.9

Dog: 0.05

Car: 0.01

...

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Vector:

4096

Fully Connected: 4096 to 1000

Box

Coordinates

Fully Connected: 4096 to 4

Softmax

Loss

L2 Loss

Correct label:

Cat

(x, y, w, h)Treat localization as a

regression problem! Correct box:

(x’, y’, w’, h’)

May 10, 2017

Classification + Localization

Slide by: Justin Johnson

Page 38: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Class ScoresCat: 0.9

Dog: 0.05

Car: 0.01

...

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Vector:

4096

Fully Connected: 4096 to 1000

Box

Coordinates

Fully Connected: 4096 to 4

Softmax

Loss

L2 Loss

Loss

Correct label:

Cat

+Multitask Loss

(x, y, w, h)Treat localization as a

regression problem! Correct box:

(x’, y’, w’, h’)

May 10, 2017

Classification + Localization

Slide by: Justin Johnson

Page 39: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Class ScoresCat: 0.9

Dog: 0.05

Car: 0.01

...

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Vector:

4096

Fully Connected: 4096 to 1000

Box

Coordinates

Fully Connected: 4096 to 4

Softmax

Loss

L2 Loss

Loss

Correct label:

Cat

+

Often pretrained on ImageNet

(Transfer learning)

(x, y, w, h)Treat localization as a

regression problem! Correct box:

(x’, y’, w’, h’)

May 10, 2017

Classification + Localization

Slide by: Justin Johnson

Page 40: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Object Detection as Regression?

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 40

CAT: (x, y, w, h)

DOG: (x, y, w, h)

DOG: (x, y, w, h)

CAT: (x, y, w, h)

DUCK: (x, y, w, h)

DUCK: (x, y, w, h)

….

May 10, 2017

Slide by: Justin Johnson

Page 41: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

CAT: (x, y, w, h)

DOG: (x, y, w, h)

DOG: (x, y, w, h)

CAT: (x, y, w, h)

4 numbers

Each image needs a different number of outputs!

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

16 numbers

DUCK: (x, y, w, h) ManyDUCK: (x, y, w, h) numbers!….

Object Detection as Regression?

Slide by: Justin Johnson

Page 42: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Object Detection as Classification: Sliding Window

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 42

Dog? NO

Cat? NO

Background? YES

Apply a CNN to many different crops of the

image, CNN classifies each crop as object

or background

May 10, 2017

Slide by: Justin Johnson

Page 43: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Dog? YESCat? NO

Background? NO

Apply a CNN to many different crops of the

image, CNN classifies each crop as object

or background

May 10, 2017

Object Detection as Classification: Sliding Window

Slide by: Justin Johnson

Page 44: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Dog? YESCat? NO

Background? NO

Apply a CNN to many different crops of the

image, CNN classifies each crop as object

or background

May 10, 2017

Object Detection as Classification: Sliding Window

Slide by: Justin Johnson

Page 45: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Dog? NO

Cat? YES

Background? NO

Apply a CNN to many different crops of the

image, CNN classifies each crop as object

or background

May 10, 2017

Object Detection as Classification: Sliding Window

Slide by: Justin Johnson

Page 46: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Apply a CNN to many different crops of the

image, CNN classifies each crop as object

or background

May 10, 2017

Dog? NO

Cat? YES

Background? NO

Problem: Need to apply CNN to huge

number of locations and scales, very

computationally expensive!

Object Detection as Classification: Sliding Window

Slide by: Justin Johnson

Page 47: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Region Proposals

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 47

● Find “blobby” image regions that are likely to contain objects● Relatively fast to run; e.g. Selective Search gives 1000 region

proposals in a few seconds on CPU

Alexe et al, “Measuring the objectness of image windows”, TPAMI 2012

Uijlings et al, “Selective Search for Object Recognition”, IJCV 2013

Cheng et al, “BING: Binarized normed gradients for objectness estimation at 300fps”, CVPR 2014

Zitnick and Dollar, “Edge boxes: Locating object proposals from edges”, ECCV 2014

May 10, 2017

Slide by: Justin Johnson

Page 48: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Speeding up detection: Restrict set of windows we pass through SVM to those w/ high “objectness”

Alexe et al., CVPR 2010

Page 49: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Objectness cue #1: Where people look

Alexe et al., CVPR 2010

Page 50: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Objectness cue #2: color contrast at boundary

Alexe et al., CVPR 2010

Page 51: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Objectness cue #3: no segments “straddling” the object box

Alexe et al., CVPR 2010

Page 52: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 52

May 10, 2017

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 53: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 54: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 55: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

ConvNet

ConvNet

ConvNet

Page 56: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

ConvNet

ConvNet

ConvNet

Page 57: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

ConvNet

ConvNet

ConvNet

Page 58: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN: Regions with CNN features

aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

Classify regions

(linear SVM)

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 59: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 1aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

(Used in this work)

Proposal-method agnostic, many choices

- Selective Search [van de Sande, Uijlings et al.]

- Objectness [Alexe etal.]

- Category independent object proposals [Endres & Hoiem]

- CPMC [Carreira & Sminchisescu]

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 60: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 2aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 61: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 2aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

Dilate proposal

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 62: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 2aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

a. Crop

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 63: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 2aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

a. Crop b. Scale (anisotropic)

227 x 227

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 64: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

. Crop b. Scale (anisotropic)

R-CNN at test time: Step 2aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

c. Forward propagate

Output: “fc7” featuresGirshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 65: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN at test time: Step 3aeroplane? no.

..tvmonitor? no.

..person? yes.

CNN

Input

image

Extract region

proposals (~2k / image)

Compute CNN

features

proposal4096-dimensional

fc7 feature vector

linear classifiers

(SVM or softmax)

person? 1.6

horse? -0.3

...

...

Classify

regions

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 66: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Linear regression

on CNN features

Step 4: Object proposal refinement

Original

proposal

Predicted

object bounding box

Bounding-box regression

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 67: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

R-CNN on ImageNet detection

0 100

UIUC−IFP

Delta

GPU_UCLA

SYSU_Vision

Toronto A

*OverFeat (1)

*NEC−MU

UvA−Euvision

*OverFeat (2)

*R−CNN BB

20 40 60 80

mean average precision (mAP) in %

ILSVRC2013 detection test set mAP

1.0%

6.1%

9.8%

10.5%

11.5%

19.4%

20.9%

22.6%

24.3%

31.4%

post competition result

competition result

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

Page 68: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

R-CNN

Girshick et al., “R ich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”, CVPR 2014

ConvNet

ConvNet

ConvNet

Post hoc component

Page 69: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

What’s wrong with slow R-CNN?

• Ad hoc training objectives• Fine-tune network with softmax classifier (log loss)

• Train post-hoc linear SVMs (hingeloss)

• Train post-hoc bounding-box regressions (least squares)

• Training is slow (84h), takes a lot of disk space

• Inference (detection) is slow• 47s / image with VGG16 [Simonyan & Zisserman, ICLR15]

~2000 ConvNet forward passes per imageGirshick, “Fast R-CNN”, ICCV 2015

Page 70: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fast R-CNN

• One network, applied one time, not 2000 times• Trained end-to-end (in one stage)• Fast test time• Higher mean average precision

Girshick, “Fast R-CNN”, ICCV 2015

Page 71: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fast R-CNN

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 71

May 10, 2017

Girshick, “Fast R-CNN”, ICCV 2015

Page 72: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015

Page 73: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015

Page 74: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015

Page 75: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015

Page 76: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015

Page 77: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fast R-CNN (Training)

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 77

May 10, 2017

Girshick, “Fast R-CNN”, ICCV 2015

Page 78: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Fast R-CNN (Training)

Girshick, “Fast R-CNN”, ICCV 2015

Page 79: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fast R-CNN vs R-CNN

Fast R-CNN R-CNN

Train time (h) 9.5 84

Speedup 8.8x 1x

Test time / image 0.32s 47.0s

Test speedup 146x 1x

mAP 66.9% 66.0%

Timings exclude object proposal time, which is equal for all methods. All methods use VGG16 from Simonyan and Zisserman.

Girshick, “Fast R-CNN”, ICCV 2015

Page 80: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 80

Make CNN do proposals!

Insert Region Proposal

Network (RPN) to predict

proposals from features

Jointly train with 4 losses:

1. RPN classify object / not object

2. RPN regress box coordinates3. Final classification score (object

classes)

4. Final box coordinates

May 10, 2017

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015

Faster R-CNN

Page 81: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

⅓ Mile, 1760 feet

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 82: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

Fast R-CNN 70.0 .5 FPS 2 s/img

Faster R-CNN 73.2 7 FPS 140 ms/img

YOLO 69.0 45 FPS 22 ms/img

2 feet

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 83: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Detection without Proposals: YOLO

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 84: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Each cell predicts boxes and confidences: P(Object)

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 85: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Each cell also predicts a probability P(Class | Object)

Dog

Bicycle Car

Dining Table

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 86: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Combine the box and class predictions

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 87: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Finally do NMS and threshold detections

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 88: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

This parameterization fixes the output size

Each cell predicts:

- For each bounding box:- 4 coordinates (x, y, w, h)

- 1 confidence value

- Some number of class

probabilities

For Pascal VOC:

- 7x7 grid

- 2 bounding boxes / cell

- 20 classes

7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 89: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

YOLO works across many natural images

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 90: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

It also generalizes well to new domains

Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016

Page 91: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 92: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 93: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 94: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 95: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 96: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Redmon and Farhadi, “YOLO9000: Better, Faster, Stronger”, CVPR 2017

Page 97: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Plan for the next two lectures

• Detection approaches

– Pre-CNNs• Detection with whole windows: Pedestrian detection

• Part-based detection: Deformable Part Models

– Post-CNNs• Detection with region proposals: R-CNN, Fast R-CNN, Faster-R-CNN

• Detection without region proposals: YOLO, SSD

• Segmentation approaches

– Semantic segmentation: FCN

– Instance segmentation: Mask R-CNN

Page 98: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Semantic Segmentation

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 98

GRASS, CAT,

TREE, SKYCAT DOG, DOG, CAT DOG, DOG, CAT

Multiple ObjectNo objects, just pixels Single Object

May 10, 2017

Slide by: Justin Johnson

Page 99: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Sky

Cow

Grass

Label each pixel in the

image with a category

label

Don’t differentiate

instances, only care about

pixelsGrass

May 10, 2017

Cat

Sky

Semantic Segmentation

Slide by: Justin Johnson

Page 100: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Semantic Segmentation Idea: Sliding Window

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 100

Full image

Extract patchClassify center

pixel with CNN

Cow

May 10, 2017

Cow

Grass

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Slide by: Justin Johnson

Page 101: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Full image

Extract patchClassify center

pixel with CNN

Cow

May 10, 2017

Cow

Grass

Problem: Very inefficient! Not

reusing shared features between

overlapping patches

Semantic Segmentation Idea: Sliding Window

Slide by: Justin Johnson

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Page 102: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Input:

3 x H x W

Convolutions:

D x H x W

Conv Conv Conv Conv

Scores:

C x H x W

argmax

May 10, 2017

Predictions:

H x W

Design a network as a bunch of convolutional layers

to make predictions for pixels all at once!

Semantic Segmentation Idea: Fully Convolutional

Slide by: Justin Johnson

Page 103: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Input:

3 x H x W

Convolutions:

D x H x W

Conv Conv Conv Conv

Scores:

C x H x W

argmax

Predictions:

H x W

Design a network as a bunch of convolutional layers

to make predictions for pixels all at once!

Problem: convolutions at

original image resolution will

be very expensive ...

May 10, 2017

Semantic Segmentation Idea: Fully Convolutional

Slide by: Justin Johnson

Page 104: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Input:

3 x H x WPredictions:

H x W

Design network as a bunch of convolutional layers, with

downsampling and upsampling inside the network!

High-res:

D1 x H/2 x W/2

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

May 10, 2017

High-res:

D1 x H/2 x W/2

Med-res:

D2 x H/4 x W/4

Med-res:

D2 x H/4 x W/4

Low-res:

D3 x H/4 x W/4

Semantic Segmentation Idea: Fully Convolutional

Slide by: Justin Johnson

Page 105: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Input:

3 x H x WPredictions:

H x W

Design network as a bunch of convolutional layers, with

downsampling and upsampling inside the network!

High-res:

D1 x H/2 x W/2

May 10, 2017

High-res:

D1 x H/2 x W/2

Med-res:

D2 x H/4 x W/4

Med-res:

D2 x H/4 x W/4

Low-res:

D3 x H/4 x W/4

Downsampling:

Pooling, strided

convolution

Upsampling:

???

Semantic Segmentation Idea: Fully Convolutional

Slide by: Justin Johnson

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Page 106: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

1 2

3 4

Input: 2 x 2 Output: 4 x 4

Nearest Neighbor

1 2

3 4

Input: 2 x 2 Output: 4 x 4

“Bed of Nails”1 1 2 2

1 1 2 2

3 3 4 4

3 3 4 4

May 10, 2017

1 0 2 0

0 0 0 0

3 0 4 0

0 0 0 0

In-Network upsampling: “Unpooling”

Slide by: Justin Johnson

Page 107: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

…Rest of the network

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

Input: 4 x 4

1 2

3 4

Input: 2 x 2 Output: 4 x 4

0 0 2 0

0 1 0 0

0 0 0 0

3 0 0 4

Max Unpooling

Use positions from

pooling layer

5 6

7 8

1 2 6 3

3 5 2 1

1 2 2 1

7 3 4 8

Max Pooling

Remember which element was max!

Output: 2 x 2

Corresponding pairs of

downsampling and

upsampling layers

May 10, 2017

In-Network upsampling: “Max Unpooling”

Slide by: Justin Johnson

Page 108: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

3 x 3 transpose convolution, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Learnable Upsampling: Transpose Convolution

Slide by: Justin Johnson

Page 109: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Input: 2 x 2 Output: 4 x 4

Input gives

weight for

filter

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

3 x 3 transpose convolution, stride 2 pad 1

Learnable Upsampling: Transpose Convolution

Slide by: Justin Johnson

Page 110: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Input: 2 x 2 Output: 4 x 4

Input gives

weight for

filter

Sum where

output overlaps

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

3 x 3 transpose convolution, stride 2 pad 1

Filter moves 2 pixels in

the output for every one

pixel in the input

Stride gives ratio between

movement in output and

input

Learnable Upsampling: Transpose Convolution

Slide by: Justin Johnson

Page 111: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Transpose Convolution: 1D Example

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 111

a

b

x

y

z

Input Filter

Output

ax

ay

az + bx

by

bz

Output contains

copies of the filter

weighted by the

input, summing at

where at overlaps in

the output

May 10, 2017

Adapted from Justin Johnson

Page 112: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Instance Segmentation

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 - 112

CATGRASS, CAT,

TREE, SKYDOG, DOG, CAT DOG, DOG, CAT

Single Object Multiple ObjectNo objects, just pixels

May 10, 2017

Slide by: Justin Johnson

Page 113: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

May 10, 2017

Mask R-CNNHe et al, “Mask R-CNN”, ICCV 2017

Slide by: Kaiming He

Page 114: CS 2770: Computer Vision · 2019-11-19 · Segmentation Object Detection Instance Segmentation GRASS, CAT, TREE, SKY CAT DOG, DOG, CAT DOG, DOG, CAT No objects, justpixels SingleObject

Fei-Fei Li & Justin Johnson &

SerenaYeungLecture 11 -

He et al, “Mask R-CNN”, ICCV 2017

RoI AlignConv

Classification Scores: C

Box coordinates (per class): 4 * C

CNN Conv

May 10, 2017

Predict a mask for

each of C classes

Mask R-CNN

Adapted from Justin Johnson