recognition by parts visual recognition lecture 12 “the whole is equal to the sum of its parts”...

Recognition by Parts

Visual RecognitionVisual Recognition

Lecture 12Lecture 12

“The whole is equal to the sum of its parts” Euclid

Main approaches to recognition:

Pattern recognitionPattern recognition InvariantsInvariants AlignmentAlignment Part decompositionPart decomposition Functional descriptionFunctional description

Recognize!

“ “One of the most interesting aspect of the One of the most interesting aspect of the world is that it can be considered to be world is that it can be considered to be made up of patterns. A pattern is essentially made up of patterns. A pattern is essentially an arrangement. It is characterized by the an arrangement. It is characterized by the order of the elements of which it is made order of the elements of which it is made rather than by intrinsic nature of the rather than by intrinsic nature of the elements” elements”

Norbert WienerNorbert Wiener

Nonsense Object

The description reflect the working of a The description reflect the working of a representational system representational system

Segmentation at regions of deep concavitySegmentation at regions of deep concavity Parts are described with common Parts are described with common

volumetric termsvolumetric terms The manner of segmentation and analysis The manner of segmentation and analysis

into components does not depend on our into components does not depend on our familiarity with the object familiarity with the object

Issues

Why parts? Why partition the shape?Why parts? Why partition the shape? How does the visual system decompose How does the visual system decompose

shapes into parts ?shapes into parts ? Are parts chosen arbitrarily by the visual Are parts chosen arbitrarily by the visual

system?system? How the 3D parts of an object are inferred How the 3D parts of an object are inferred

from its 2D projection delivered by the eye?from its 2D projection delivered by the eye? Etc. Etc.

Between Speech and OR

Number of categories rivals the number of Number of categories rivals the number of words that can be identified from speechwords that can be identified from speech

Speech perception: by identification of Speech perception: by identification of primitive elements – phonemes primitive elements – phonemes

Small set of primitives (English 44) each Small set of primitives (English 44) each with a handful of attributeswith a handful of attributes

The representational power derives from The representational power derives from combinations of the primitivescombinations of the primitives

OR – The Visual Domain

Primitives – modest number of simple geometric components Primitives – modest number of simple geometric components Generally convex and volumetric (cylinders, blocks, cones, etc.)Generally convex and volumetric (cylinders, blocks, cones, etc.) Segmentation at regions of sharp concavitySegmentation at regions of sharp concavity Primitives derive from combinations of few qualitative Primitives derive from combinations of few qualitative

characteristics of the edges in the 2D image (straight vs. curved, characteristics of the edges in the 2D image (straight vs. curved, symmetry etc.)symmetry etc.)

These particular properties of edges are invariant over changes in These particular properties of edges are invariant over changes in orientation and can be determined from just a few points on each orientation and can be determined from just a few points on each edgeedge

Tolerance for variations of viewpoint, occlusion, noiseTolerance for variations of viewpoint, occlusion, noise The representational power derives from the enormous number of The representational power derives from the enormous number of

combinationscombinations

Count VS. Mass Noun Objects

Categorization of isolated (unanticipated) objectsCategorization of isolated (unanticipated) objects

Modeling is limited to concrete entities with Modeling is limited to concrete entities with specified boundariesspecified boundaries

Mass nouns (water, sand) do not have a simple Mass nouns (water, sand) do not have a simple volumetric description and are identified volumetric description and are identified differently. Primarily through surface differently. Primarily through surface characteristics (texture, color)characteristics (texture, color)

Unexpected Object Recognition

Is possible (not an obvious Is possible (not an obvious conclusion)conclusion)

Can be done rapidlyCan be done rapidly When viewed from novel When viewed from novel

orientations orientations Under moderate level of visual noiseUnder moderate level of visual noise When partially occludedWhen partially occluded When it is a new exemplar of a When it is a new exemplar of a

categorycategory

Resulting Constraints

Access to mental representation should not be Access to mental representation should not be dependent on absolute judgment of quantitative dependent on absolute judgment of quantitative detail detail

The information that is the basis of recognition The information that is the basis of recognition should be relatively invariant with respect to should be relatively invariant with respect to orientation and modest degradation orientation and modest degradation

Partial matches should be computablePartial matches should be computable

RBC: Recognition-By-Components

The contributionThe contribution: a proposal for a particular : a proposal for a particular vocabulary of components derived from vocabulary of components derived from perceptual mechanisms and its account of perceptual mechanisms and its account of how an arrangement of these components how an arrangement of these components can access a representation of an object in can access a representation of an object in memorymemory

Issues

Stages up to and including the identification of Stages up to and including the identification of components are assumed to be bottom-upcomponents are assumed to be bottom-up

It is likely that top-down routes (e.g. from expectancy, It is likely that top-down routes (e.g. from expectancy, object familiarity, scene constraints) will be observed object familiarity, scene constraints) will be observed at number of the stages (e.g. segmentation, component at number of the stages (e.g. segmentation, component definition, matching) – omitted in the interests of definition, matching) – omitted in the interests of simplicitysimplicity

Matching of the components occurs in parallelMatching of the components occurs in parallel Partial matches are possible (degree of match is Partial matches are possible (degree of match is

proportional to the similarity in the components proportional to the similarity in the components between image and representation)between image and representation)

Geons - Units of Representation

Segmentation into separate regions at points of deep Segmentation into separate regions at points of deep concavity (particularly at cusps where there are concavity (particularly at cusps where there are discontinuities in curvature)discontinuities in curvature)

Transversality – paired concavities arise whenever Transversality – paired concavities arise whenever convex volumes are joinedconvex volumes are joined

Each segmented region is approximated by one of a Each segmented region is approximated by one of a possible set of simple components = possible set of simple components = geonsgeons (geometrical (geometrical ions) ions)

Can be modeled by generalized cones: volume swept out Can be modeled by generalized cones: volume swept out by a cross section moving along an axis by a cross section moving along an axis

Geons

Are hypothesized to be simple, typically symmetrical Are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities (e.g. blocks, volumes lacking sharp concavities (e.g. blocks, cylinders, spheres)cylinders, spheres)

Can be differentiated on the basis of perceptual Can be differentiated on the basis of perceptual properties in the 2D image that are readily detectable properties in the 2D image that are readily detectable and relatively independent of viewing position and and relatively independent of viewing position and degradation (e.g. good continuation, symmetry)degradation (e.g. good continuation, symmetry)

Objects can be complex – the units are simple and Objects can be complex – the units are simple and regular regular

Relations Among the Geons

The arrangement of primitives is necessary The arrangement of primitives is necessary for representing a particular objectfor representing a particular object

Different arrangements of the same Different arrangements of the same components can lead to different objectscomponents can lead to different objects

Perceptual Basis for RBC Certain properties of edges in 2D are taken by the visual system as Certain properties of edges in 2D are taken by the visual system as

strong evidence that the 3D edges contain those same propertiesstrong evidence that the 3D edges contain those same properties Nonaccidental properties – would only rarely be produced by Nonaccidental properties – would only rarely be produced by

accidental alignments of viewpoint and object features accidental alignments of viewpoint and object features Five nonaccidental properties:Five nonaccidental properties:

Collinearity – the edge in the 3D world is also straightCollinearity – the edge in the 3D world is also straight Curvilinearity – smoothly curved elements in the image are Curvilinearity – smoothly curved elements in the image are

inferred to arise from smoothly curved features in the 3D worldinferred to arise from smoothly curved features in the 3D world Symmetry – the object projecting the image is also symmetricalSymmetry – the object projecting the image is also symmetrical Parallelism Parallelism Cotermination Cotermination

Nonaccidental Properties

Witkin & Tenenbaum 83:Witkin & Tenenbaum 83: surface’s silhouette surface’s silhouette override the perceptual override the perceptual interpretation of the interpretation of the luminance gradient luminance gradient

Penrose Impossible Triangle

Penrose Impossible Triangle

Cotermination – accidental alignment of the Cotermination – accidental alignment of the ends of noncoterminous segmentsends of noncoterminous segments

Muller-Lyer Illusion

Muller-Lyer Illusion

Y, arrow, and L vertices allow inference as to the Y, arrow, and L vertices allow inference as to the

identity of the volume in the imageidentity of the volume in the image

Generating Geons from GC

The primitives should be rapidly The primitives should be rapidly identifiable and invariant over viewpoint identifiable and invariant over viewpoint and noiseand noise

Differences among components are based Differences among components are based on differences in nonaccidental propertieson differences in nonaccidental properties

Variation over the nonaccidental relations Variation over the nonaccidental relations of four attributes of GC generates a set of of four attributes of GC generates a set of 36 geons 36 geons

Geon Set

The characteristics of the cross section: The characteristics of the cross section: Shape, Symmetry, Constancy of size along Shape, Symmetry, Constancy of size along the axis (2 x 3 x 3)the axis (2 x 3 x 3)

The shape of the axis ( x 2)The shape of the axis ( x 2) Here figures 6 and or 7Here figures 6 and or 7

Nonaccidental 2D Contrasts Among Geons

The values of the 4 attributes can be directly The values of the 4 attributes can be directly detected as differences in nonaccidental properties detected as differences in nonaccidental properties e.g. : e.g. : Cross-section edges and curvature of the axis – Cross-section edges and curvature of the axis –

collinearity or curvilinearity collinearity or curvilinearity Constant vs expand size of the cross section – Constant vs expand size of the cross section –

parallelismparallelism Specification of the above is sufficient to uniquely Specification of the above is sufficient to uniquely

classify a given arrangement of edges as one of classify a given arrangement of edges as one of the 36 geonsthe 36 geons

More Distinctive Nonaccidental Differences

The arrangement of vertices – a richer descriptionThe arrangement of vertices – a richer description

RBC - Summary

A specific set of primitives is derived from small A specific set of primitives is derived from small number of independent characteristics of the inputnumber of independent characteristics of the input

The perceptual system is designed to represent the The perceptual system is designed to represent the free combination of a modest number of primitives free combination of a modest number of primitives based on simple perceptual contrast based on simple perceptual contrast

Geons are uniquely specified from their 2D image Geons are uniquely specified from their 2D image properties ( -> 3D object centered reconstruction is properties ( -> 3D object centered reconstruction is not needed)not needed)

The input is mapped onto this modest number of The input is mapped onto this modest number of primitives. Then using a representational system primitives. Then using a representational system we can code and access free combinations of these we can code and access free combinations of these primitivesprimitives

RBC – General Principles A line drawing which represents discontinuities is an A line drawing which represents discontinuities is an

efficient description and sufficient for primal accessefficient description and sufficient for primal access Objects are better represented and analyzed by Objects are better represented and analyzed by

decomposing them into their natural components – partsdecomposing them into their natural components – parts A qualitative description of the components is necessary A qualitative description of the components is necessary

and sufficient to permit fast access to DB of object modelsand sufficient to permit fast access to DB of object models Non-accidental instances of viewpoint invariant features in Non-accidental instances of viewpoint invariant features in

the 2D line drawing are sufficient to permit fast access to the 2D line drawing are sufficient to permit fast access to the qualitative model of a 3D objectthe qualitative model of a 3D object

Primal access for visual OR is obtained by matching a Primal access for visual OR is obtained by matching a description of the spatial structure of components making description of the spatial structure of components making up the object to an indexed DB of models in similar up the object to an indexed DB of models in similar representation representation

RBC – Computational Hypotheses

Five specific classes of 2D line groupings are Five specific classes of 2D line groupings are sufficient to access the parts representationsufficient to access the parts representation

Segmentation should happen at concavities in the Segmentation should happen at concavities in the outline of an objectoutline of an object

The geons form an efficient qualitative shape The geons form an efficient qualitative shape representation for the parts which is suitable for representation for the parts which is suitable for primal accessprimal access

The symbolic description for objects and models The symbolic description for objects and models should include geon labels aspect ratios and should include geon labels aspect ratios and relative sizes of partsrelative sizes of parts

Implementations

PARVO - Bergevin and Levine 1988PARVO - Bergevin and Levine 1988 OPTICA – Dickinson, Rosenfeld, Pentland OPTICA – Dickinson, Rosenfeld, Pentland

19891989 Munck-Fairwood 1991Munck-Fairwood 1991 Pentland and Sclaroff 1991Pentland and Sclaroff 1991 Raja and Jain 1992 Raja and Jain 1992

Example - Recovering Geons using Superquadrics

Lame curves Lame curves (1818)(1818)::

1m m

x y

a b

Superellipse (Hein 1960)

0p

mq

Where p even positive integerand q odd positive integer

Superellipse

From star-shape to a square in the limit

Superellipsoid

3D surface is obtained by the spherical product of two 2D curves

1 2

1 2

1

1

2

3

cos cos

( , ) cos sin

sin

a

r a

a

e1 0.1 1 2

e2

0.1 1 2

Superquadrics

Barr 1981 – extension toBarr 1981 – extension to

IncludeInclude superhyperboloids superhyperboloids

(1-2 pieces) and(1-2 pieces) and supertoroidssupertoroids

Superquadrics in Genral Position

From world coordinates to SQ centered (11DOF)

IssuesDomain:Domain: Suitable mainly for categorization.Suitable mainly for categorization.

Problems:Problems: Extracting parts from the image is often difficult Extracting parts from the image is often difficult

and unreliable.and unreliable. Many objects cannot be distinguished by their part Many objects cannot be distinguished by their part

structure only.structure only. Metric information is essential in many cases.Metric information is essential in many cases.

recognition by parts visual recognition lecture 12 “the whole is equal to the sum of its parts”...

Documents

d parts

parts euclid slide

visual system

common volumetric terms

elements norbert wiener

speech number of categories

manner of segmentation

number of words