computer vision group uc berkeley how should we combine high level and low level knowledge? jitendra...

24
Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 20

Upload: antonio-carroll

Post on 27-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

How should we combine high level and low level knowledge?

Jitendra Malik UC Berkeley

Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 2009)

Page 2: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

The central problems of vision

Grouping /Segmentation

3D structure/Figure-Ground

Object and Scene Recognition

Page 3: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Detection and Segmentation: Giraffes

Orig. Image Segmentation Orig. Image Segmentation

Page 4: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Detection and Segmentation: Mugs

Orig. Image Segmentation Orig. Image Segmentation

Page 5: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Outline

• Current paradigm: Multiscale scanning

• Our approach– Bottom up region segmentation– Hough transform style voting (learned weights)– Top down segmentation

• Results on ETHZ , Caltech 101, MSRC

Page 6: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Detection: Is this an X?

Ask this question repeatedly, varying position, scale, category…

Paradigm introduced by Rowley, Baluja & Kanade 96 for face detectionViola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08

Page 7: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Problems with the multi-scale scanning paradigm

• Computational complexity•10^6 windows, 10 scales, 10^4 categories

• Not natural for irregularly shaped objects

• Segmentation is delinked

• Context is delinked

Page 8: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Our Approach

• Perceptual Organization provides the right primitives for visual recognition.

• After more than a decade of work, we finally have high quality, generic, detectors for contours and regions. We now only need to work with ~100 elements, each with its local scale estimate.

• In this talk, we demonstrate recognition using regions. Detection and segmentation happen in the same framework.

• There will always be some errors in the bottom-up grouping process, the recognition machinery needs to be robust to that.

Page 9: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Contour Detection (CVPR 2008)

Page 10: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Region Detection (CVPR 2009)

Page 11: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Region detector wins on any measure!Region Benchmarks on BSDS

Probabilistic Rand Index on BSDS Variation of Information on BSDS

Region Benchmarks on MSRC/PASCAL08

Page 12: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint
Page 13: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint
Page 14: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint
Page 15: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint
Page 16: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Parallelizing Image SegmentationCatanzaro et al, UC Berkeley, ICCV 09

• GTX 280 is an Nvidia Graphics Processor, massively parallel general purpose computing platform– 30 cores, 8 wide SIMD

= 240 way parallelism– 140 GB/s memory bandwidth

(Modern CPUs have ~10-20 GB/s)– Special memory subsystems for

graphics processing

• Sequential Implementation: 5 minutes per image

• Parallel, Optimized Implementation: 2 seconds

Page 17: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Why Use Regions?

• Local estimate of scale; no search necessary

• Shape, color and texture in the same framework

• Hierarchy of regions (“partonomy”) represents scenes, objects, parts. Makes use of context natural.

• Do not suffer from background clutter

• Reduce candidate windows on detection task– 1000 to 10000 times fewer windows on the ETHZ dataset

• Need to be robust to segmentation errors

Page 18: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Object Representation using Regions

Bag of Regions

RegionSegmentation

Page 19: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Computer Vision GroupUC Berkeley

Region Representation

Page 20: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Region-based Hough Voting• Recover transformation from matched regions• Transform exemplar bounding box to query

20

Exemplar Query

T(x,y,sx,sy)

T(x,y,sx,sy)

Page 21: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Region-based Voting

Exemplar 1

Query

21

Page 22: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Region-based Voting

Exemplar 1

Query

22

Page 23: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Region-based Voting

Exemplar 1

Query

23

Page 24: Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint

Region-based Voting

Exemplar 1

Query

24