computer vision group uc berkeley how should we combine high level and low level knowledge? jitendra...

Computer Vision GroupUC Berkeley

How should we combine high level and low level knowledge?

Jitendra Malik UC Berkeley

Recognition using regions is joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez (CVPR 2009)


The central problems of vision

Grouping /Segmentation

3D structure/Figure-Ground

Object and Scene Recognition


Detection and Segmentation: Giraffes

Orig. Image Segmentation Orig. Image Segmentation


Detection and Segmentation: Mugs

Orig. Image Segmentation Orig. Image Segmentation


Outline

• Current paradigm: Multiscale scanning

• Our approach– Bottom up region segmentation– Hough transform style voting (learned weights)– Top down segmentation

• Results on ETHZ , Caltech 101, MSRC


Detection: Is this an X?

Ask this question repeatedly, varying position, scale, category…

Paradigm introduced by Rowley, Baluja & Kanade 96 for face detectionViola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08


Problems with the multi-scale scanning paradigm

• Computational complexity•10^6 windows, 10 scales, 10^4 categories

• Not natural for irregularly shaped objects

• Segmentation is delinked

• Context is delinked


Our Approach

• Perceptual Organization provides the right primitives for visual recognition.

• After more than a decade of work, we finally have high quality, generic, detectors for contours and regions. We now only need to work with ~100 elements, each with its local scale estimate.

• In this talk, we demonstrate recognition using regions. Detection and segmentation happen in the same framework.

• There will always be some errors in the bottom-up grouping process, the recognition machinery needs to be robust to that.


Contour Detection (CVPR 2008)


Region Detection (CVPR 2009)


Region detector wins on any measure!Region Benchmarks on BSDS

Probabilistic Rand Index on BSDS Variation of Information on BSDS

Region Benchmarks on MSRC/PASCAL08


Parallelizing Image SegmentationCatanzaro et al, UC Berkeley, ICCV 09

• GTX 280 is an Nvidia Graphics Processor, massively parallel general purpose computing platform– 30 cores, 8 wide SIMD

= 240 way parallelism– 140 GB/s memory bandwidth

(Modern CPUs have ~10-20 GB/s)– Special memory subsystems for

graphics processing

• Sequential Implementation: 5 minutes per image

• Parallel, Optimized Implementation: 2 seconds


Why Use Regions?

• Local estimate of scale; no search necessary

• Shape, color and texture in the same framework

• Hierarchy of regions (“partonomy”) represents scenes, objects, parts. Makes use of context natural.

• Do not suffer from background clutter

• Reduce candidate windows on detection task– 1000 to 10000 times fewer windows on the ETHZ dataset

• Need to be robust to segmentation errors


Object Representation using Regions

Bag of Regions

RegionSegmentation


Region Representation

Region-based Hough Voting• Recover transformation from matched regions• Transform exemplar bounding box to query

20

Exemplar Query

T(x,y,sx,sy)

T(x,y,sx,sy)

Region-based Voting

Exemplar 1

Query

21

Region-based Voting

Exemplar 1

Query

22

Region-based Voting

Exemplar 1

Query

23

Region-based Voting

Exemplar 1

Query

24

computer vision group uc berkeley how should we combine high level and low level knowledge? jitendra...

Documents

imagesegmentation slide

delinked slide

seconds slide

msrcpascal08 slide

msrc slide

scene recognition slide

s y slide

region segmentation