groups of adjacent contour segments for object detection

Groups of Adjacent Contour Segments Groups of Adjacent Contour Segments for Object Detectionfor Object Detection

Vittorio FerrariVittorio FerrariLoic FevrierLoic Fevrier

Frederic JurieFrederic JurieCordelia SchmidCordelia Schmid

Problem: object class detection & localizationProblem: object class detection & localization

Training

Testing

?Focus:classes with characteristic shape

Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)

Contour segment network[Ferrari et al. ECCV 2006]

1) edgels extracted with Berkeley boundary detector

2) edgel-chains partitioned into straight contour segments

3) segments connected at edgel-chains’ endpoints and junctions


segments connected in the network

PAS = groups of two connected segments

2

• encodes geometric properties of the PAS• scale and translation invariant• compact, 5D

PAS descriptor:


Example PAS

Why PAS ?

+ intermediate complexity:good repeatability-informativeness trade-off

+ scale-translation invariant

+ connected: natural grouping criterion (need not choose a grouping neighborhood or scale)

+ can cover pure portions of the object boundary

PAS codebookPAS codebookBased on descriptors, cluster PAS into types

a few of the most frequent types based on 10 outdoor images (5 horses and 5 background).

types based on 15 indoor images (bottles)

• Frequently occurring PAS have intuitive, natural shapes• As we add images, number of PAS types converges to just ~100• Very similar codebooks come out, regardless of source images

+ general, simple features. We use a single, universal codebook (1st row) for all classes

Window descriptorWindow descriptor

1. Subdivide window into tiles.2. Compute a separate bag of PAS per tile3. Concatenate these semi-local bags

[Lazebnik et al. CVPR 2006]; [Dalal and Triggs CVPR 2005]

+ distinctive: records which PAS appear where weight PAS by average edge strength

+ flexible: soft-assign PAS to types rather coarse tiling

+ fast to compute using Integral Histograms

TrainingTraining1. Learn mean positive window dimensions2. Determine number of tiles T3. Collect positive example descriptors

4. Collect negative example descriptors: slide window over negative training images

TrainingTraining5. Train a linear SVM

Here a few of the top weighted descriptor vector dimensions (= 'PAS + tile'):

+ lie on object boundary (= local shape structure common to many training examples)

TestingTesting1. Slide window of aspect ratio , at multiple scales

2. SVM classify each window + non-maxima suppression

detections

Results – INRIA horsesResults – INRIA horses

+ tiling brings a substantial improvement optimum at T=30 -> keep this setting on all other experiments+ works well: 86% det-rate at 0.3 FPPI (with 50 pos + 50 neg training images)

Dataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter

(missed and FP)

Results – INRIA horsesDataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter

+ PAS better than any IP all interest point (IP) comparisons with T=10, and 120 feature types, (= optimum over INRIA horses, and ETHZ Shape Classes; all IP codebooks are class-specific)

(missed and FP)

Results – Weizmann-Shotton horsesDataset: Shotton et al., ICCV 2005 327 positive + 327 negative images (training = 50 pos + 50 neg) no scale changes; modest clutter

Shotton’s EER

- exact comparison to Shotton et al.: use their images and search at a single scale- PAS same performance (~92% precision-recall EER), but: + no need for segmented training images (only bounding-boxes) + can detect objects at multiple scales (see other experiments)

Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter

Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter

Missed

Results – ETHZ Shape Classes

+ mean det-rate at 0.4 FPPI = 79%

+ PAS >> I.P for apple logos, bottles, mugs PAS ~= IP for giraffes (texture!) PAS < IP for swan

+ overall best IP: Harris-Laplace

+ class specific IP codebooks

Giraffes Mugs Swans

Apple logos Bottles

Results – Caltech 101Results – Caltech 101Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004

42 anchor, 62 chair, 67 cup imagestrain = half + same number of caltech101 backgroundtesting = other half pos + same number of backgroundscale changes; only little clutter

Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004

On caltech101’s anchor, chair, cup:+ PAS better than Harris-Laplace+ mean PAS det-rate at 0.4 FPPI: 85%

Comparison to Dalal and Triggs CVPR 2005

Giraffes Mugs Swans

Apple logos Bottles

Comparison to Dalal and Triggs CVPR 2005

Caltech anchors Caltech chairs Caltech cups

INRIA horses Shotton horses

+ overall mean det-rate at 0.4 FPPI: PAS 82% >> HoG 58%

PAS >> HoG for 6 datasets PAS ~= HoG for 2 datasets PAS < HoG for 2 datasets

Generalizing PAS to Generalizing PAS to kkASASkAS: any path of length k through the contour segment network

segments connected in the network 3AS 4AS

• scale+translation invariant descriptor with dimensionality 4k-2• k = feature complexity; higher k -> more informative, but less repeatable kAS• overall mean det-rates (%)

1AS PAS 3AS 4AS 0.3 FPPI 69 77 64 57 0.4 FPPI 76 82 70 64

PAS do best !

ConclusionsConclusions

Connected local shape features for object class detection

Experiments on 10 diverse classes from 4 datasets show:

+ better suited than interest points for these shape-based classes

- fixed aspect-ratio window: sometimes inaccurate bounding-boxes

+ object detector deals with clutter, scale changes, intra-class variability

- single viewpoint

+ PAS have the best intermediate complexity among kAS

+ object detector compares favorably to HoG-based one

Current work: detecting object outlinesCurrent work: detecting object outlines

Training: learn the common boundaries from examples

Model• collection of PAS and their spatial variability• only common boundary

1. detect edges

Current work: detecting object outlinesCurrent work: detecting object outlinesDetection on a new image

2. match PAS based on descriptors

3. vote for translation + scaleinitializations

4. match deformable thin-plate spline based on deterministic annealing

Outline object in test image,without segmented training images !

A few preliminary resultsA few preliminary results

groups of adjacent contour segments for object detection

Documents

pas scale

pas tile

weight pas

cluster pas

number of pas types

single scale pas

negative images training

segmented training images