recognition by parts visual recognition lecture 12 “the whole is equal to the sum of its parts”...
Post on 20-Dec-2015
223 views
TRANSCRIPT
Recognition by Parts
Visual RecognitionVisual Recognition
Lecture 12Lecture 12
“The whole is equal to the sum of its parts” Euclid
Main approaches to recognition:
Pattern recognitionPattern recognition InvariantsInvariants AlignmentAlignment Part decompositionPart decomposition Functional descriptionFunctional description
Recognize!
“ “One of the most interesting aspect of the One of the most interesting aspect of the world is that it can be considered to be world is that it can be considered to be made up of patterns. A pattern is essentially made up of patterns. A pattern is essentially an arrangement. It is characterized by the an arrangement. It is characterized by the order of the elements of which it is made order of the elements of which it is made rather than by intrinsic nature of the rather than by intrinsic nature of the elements” elements”
Norbert WienerNorbert Wiener
Nonsense Object
The description reflect the working of a The description reflect the working of a representational system representational system
Segmentation at regions of deep concavitySegmentation at regions of deep concavity Parts are described with common Parts are described with common
volumetric termsvolumetric terms The manner of segmentation and analysis The manner of segmentation and analysis
into components does not depend on our into components does not depend on our familiarity with the object familiarity with the object
Issues
Why parts? Why partition the shape?Why parts? Why partition the shape? How does the visual system decompose How does the visual system decompose
shapes into parts ?shapes into parts ? Are parts chosen arbitrarily by the visual Are parts chosen arbitrarily by the visual
system?system? How the 3D parts of an object are inferred How the 3D parts of an object are inferred
from its 2D projection delivered by the eye?from its 2D projection delivered by the eye? Etc. Etc.
Between Speech and OR
Number of categories rivals the number of Number of categories rivals the number of words that can be identified from speechwords that can be identified from speech
Speech perception: by identification of Speech perception: by identification of primitive elements – phonemes primitive elements – phonemes
Small set of primitives (English 44) each Small set of primitives (English 44) each with a handful of attributeswith a handful of attributes
The representational power derives from The representational power derives from combinations of the primitivescombinations of the primitives
OR – The Visual Domain
Primitives – modest number of simple geometric components Primitives – modest number of simple geometric components Generally convex and volumetric (cylinders, blocks, cones, etc.)Generally convex and volumetric (cylinders, blocks, cones, etc.) Segmentation at regions of sharp concavitySegmentation at regions of sharp concavity Primitives derive from combinations of few qualitative Primitives derive from combinations of few qualitative
characteristics of the edges in the 2D image (straight vs. curved, characteristics of the edges in the 2D image (straight vs. curved, symmetry etc.)symmetry etc.)
These particular properties of edges are invariant over changes in These particular properties of edges are invariant over changes in orientation and can be determined from just a few points on each orientation and can be determined from just a few points on each edgeedge
Tolerance for variations of viewpoint, occlusion, noiseTolerance for variations of viewpoint, occlusion, noise The representational power derives from the enormous number of The representational power derives from the enormous number of
combinationscombinations
Count VS. Mass Noun Objects
Categorization of isolated (unanticipated) objectsCategorization of isolated (unanticipated) objects
Modeling is limited to concrete entities with Modeling is limited to concrete entities with specified boundariesspecified boundaries
Mass nouns (water, sand) do not have a simple Mass nouns (water, sand) do not have a simple volumetric description and are identified volumetric description and are identified differently. Primarily through surface differently. Primarily through surface characteristics (texture, color)characteristics (texture, color)
Unexpected Object Recognition
Is possible (not an obvious Is possible (not an obvious conclusion)conclusion)
Can be done rapidlyCan be done rapidly When viewed from novel When viewed from novel
orientations orientations Under moderate level of visual noiseUnder moderate level of visual noise When partially occludedWhen partially occluded When it is a new exemplar of a When it is a new exemplar of a
categorycategory
Resulting Constraints
Access to mental representation should not be Access to mental representation should not be dependent on absolute judgment of quantitative dependent on absolute judgment of quantitative detail detail
The information that is the basis of recognition The information that is the basis of recognition should be relatively invariant with respect to should be relatively invariant with respect to orientation and modest degradation orientation and modest degradation
Partial matches should be computablePartial matches should be computable
RBC: Recognition-By-Components
The contributionThe contribution: a proposal for a particular : a proposal for a particular vocabulary of components derived from vocabulary of components derived from perceptual mechanisms and its account of perceptual mechanisms and its account of how an arrangement of these components how an arrangement of these components can access a representation of an object in can access a representation of an object in memorymemory
Issues
Stages up to and including the identification of Stages up to and including the identification of components are assumed to be bottom-upcomponents are assumed to be bottom-up
It is likely that top-down routes (e.g. from expectancy, It is likely that top-down routes (e.g. from expectancy, object familiarity, scene constraints) will be observed object familiarity, scene constraints) will be observed at number of the stages (e.g. segmentation, component at number of the stages (e.g. segmentation, component definition, matching) – omitted in the interests of definition, matching) – omitted in the interests of simplicitysimplicity
Matching of the components occurs in parallelMatching of the components occurs in parallel Partial matches are possible (degree of match is Partial matches are possible (degree of match is
proportional to the similarity in the components proportional to the similarity in the components between image and representation)between image and representation)
Geons - Units of Representation
Segmentation into separate regions at points of deep Segmentation into separate regions at points of deep concavity (particularly at cusps where there are concavity (particularly at cusps where there are discontinuities in curvature)discontinuities in curvature)
Transversality – paired concavities arise whenever Transversality – paired concavities arise whenever convex volumes are joinedconvex volumes are joined
Each segmented region is approximated by one of a Each segmented region is approximated by one of a possible set of simple components = possible set of simple components = geonsgeons (geometrical (geometrical ions) ions)
Can be modeled by generalized cones: volume swept out Can be modeled by generalized cones: volume swept out by a cross section moving along an axis by a cross section moving along an axis
Geons
Are hypothesized to be simple, typically symmetrical Are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities (e.g. blocks, volumes lacking sharp concavities (e.g. blocks, cylinders, spheres)cylinders, spheres)
Can be differentiated on the basis of perceptual Can be differentiated on the basis of perceptual properties in the 2D image that are readily detectable properties in the 2D image that are readily detectable and relatively independent of viewing position and and relatively independent of viewing position and degradation (e.g. good continuation, symmetry)degradation (e.g. good continuation, symmetry)
Objects can be complex – the units are simple and Objects can be complex – the units are simple and regular regular
Relations Among the Geons
The arrangement of primitives is necessary The arrangement of primitives is necessary for representing a particular objectfor representing a particular object
Different arrangements of the same Different arrangements of the same components can lead to different objectscomponents can lead to different objects
Perceptual Basis for RBC Certain properties of edges in 2D are taken by the visual system as Certain properties of edges in 2D are taken by the visual system as
strong evidence that the 3D edges contain those same propertiesstrong evidence that the 3D edges contain those same properties Nonaccidental properties – would only rarely be produced by Nonaccidental properties – would only rarely be produced by
accidental alignments of viewpoint and object features accidental alignments of viewpoint and object features Five nonaccidental properties:Five nonaccidental properties:
Collinearity – the edge in the 3D world is also straightCollinearity – the edge in the 3D world is also straight Curvilinearity – smoothly curved elements in the image are Curvilinearity – smoothly curved elements in the image are
inferred to arise from smoothly curved features in the 3D worldinferred to arise from smoothly curved features in the 3D world Symmetry – the object projecting the image is also symmetricalSymmetry – the object projecting the image is also symmetrical Parallelism Parallelism Cotermination Cotermination
Nonaccidental Properties
Witkin & Tenenbaum 83:Witkin & Tenenbaum 83: surface’s silhouette surface’s silhouette override the perceptual override the perceptual interpretation of the interpretation of the luminance gradient luminance gradient
Penrose Impossible Triangle
Penrose Impossible Triangle
Cotermination – accidental alignment of the Cotermination – accidental alignment of the ends of noncoterminous segmentsends of noncoterminous segments
Muller-Lyer Illusion
Muller-Lyer Illusion
Muller-Lyer Illusion
Y, arrow, and L vertices allow inference as to the Y, arrow, and L vertices allow inference as to the
identity of the volume in the imageidentity of the volume in the image
Generating Geons from GC
The primitives should be rapidly The primitives should be rapidly identifiable and invariant over viewpoint identifiable and invariant over viewpoint and noiseand noise
Differences among components are based Differences among components are based on differences in nonaccidental propertieson differences in nonaccidental properties
Variation over the nonaccidental relations Variation over the nonaccidental relations of four attributes of GC generates a set of of four attributes of GC generates a set of 36 geons 36 geons
Geon Set
The characteristics of the cross section: The characteristics of the cross section: Shape, Symmetry, Constancy of size along Shape, Symmetry, Constancy of size along the axis (2 x 3 x 3)the axis (2 x 3 x 3)
The shape of the axis ( x 2)The shape of the axis ( x 2) Here figures 6 and or 7Here figures 6 and or 7
Nonaccidental 2D Contrasts Among Geons
The values of the 4 attributes can be directly The values of the 4 attributes can be directly detected as differences in nonaccidental properties detected as differences in nonaccidental properties e.g. : e.g. : Cross-section edges and curvature of the axis – Cross-section edges and curvature of the axis –
collinearity or curvilinearity collinearity or curvilinearity Constant vs expand size of the cross section – Constant vs expand size of the cross section –
parallelismparallelism Specification of the above is sufficient to uniquely Specification of the above is sufficient to uniquely
classify a given arrangement of edges as one of classify a given arrangement of edges as one of the 36 geonsthe 36 geons
More Distinctive Nonaccidental Differences
The arrangement of vertices – a richer descriptionThe arrangement of vertices – a richer description
RBC - Summary
A specific set of primitives is derived from small A specific set of primitives is derived from small number of independent characteristics of the inputnumber of independent characteristics of the input
The perceptual system is designed to represent the The perceptual system is designed to represent the free combination of a modest number of primitives free combination of a modest number of primitives based on simple perceptual contrast based on simple perceptual contrast
Geons are uniquely specified from their 2D image Geons are uniquely specified from their 2D image properties ( -> 3D object centered reconstruction is properties ( -> 3D object centered reconstruction is not needed)not needed)
The input is mapped onto this modest number of The input is mapped onto this modest number of primitives. Then using a representational system primitives. Then using a representational system we can code and access free combinations of these we can code and access free combinations of these primitivesprimitives
RBC – General Principles A line drawing which represents discontinuities is an A line drawing which represents discontinuities is an
efficient description and sufficient for primal accessefficient description and sufficient for primal access Objects are better represented and analyzed by Objects are better represented and analyzed by
decomposing them into their natural components – partsdecomposing them into their natural components – parts A qualitative description of the components is necessary A qualitative description of the components is necessary
and sufficient to permit fast access to DB of object modelsand sufficient to permit fast access to DB of object models Non-accidental instances of viewpoint invariant features in Non-accidental instances of viewpoint invariant features in
the 2D line drawing are sufficient to permit fast access to the 2D line drawing are sufficient to permit fast access to the qualitative model of a 3D objectthe qualitative model of a 3D object
Primal access for visual OR is obtained by matching a Primal access for visual OR is obtained by matching a description of the spatial structure of components making description of the spatial structure of components making up the object to an indexed DB of models in similar up the object to an indexed DB of models in similar representation representation
RBC – Computational Hypotheses
Five specific classes of 2D line groupings are Five specific classes of 2D line groupings are sufficient to access the parts representationsufficient to access the parts representation
Segmentation should happen at concavities in the Segmentation should happen at concavities in the outline of an objectoutline of an object
The geons form an efficient qualitative shape The geons form an efficient qualitative shape representation for the parts which is suitable for representation for the parts which is suitable for primal accessprimal access
The symbolic description for objects and models The symbolic description for objects and models should include geon labels aspect ratios and should include geon labels aspect ratios and relative sizes of partsrelative sizes of parts
Implementations
PARVO - Bergevin and Levine 1988PARVO - Bergevin and Levine 1988 OPTICA – Dickinson, Rosenfeld, Pentland OPTICA – Dickinson, Rosenfeld, Pentland
19891989 Munck-Fairwood 1991Munck-Fairwood 1991 Pentland and Sclaroff 1991Pentland and Sclaroff 1991 Raja and Jain 1992 Raja and Jain 1992
Example - Recovering Geons using Superquadrics
Lame curves Lame curves (1818)(1818)::
1m m
x y
a b
Superellipse (Hein 1960)
0p
mq
Where p even positive integerand q odd positive integer
Superellipse
From star-shape to a square in the limit
Superellipsoid
3D surface is obtained by the spherical product of two 2D curves
1 2
1 2
1
1
2
3
cos cos
( , ) cos sin
sin
a
r a
a
e1 0.1 1 2
e2
0.1 1 2
Superquadrics
Barr 1981 – extension toBarr 1981 – extension to
IncludeInclude superhyperboloids superhyperboloids
(1-2 pieces) and(1-2 pieces) and supertoroidssupertoroids
Superquadrics in Genral Position
From world coordinates to SQ centered (11DOF)
IssuesDomain:Domain: Suitable mainly for categorization.Suitable mainly for categorization.
Problems:Problems: Extracting parts from the image is often difficult Extracting parts from the image is often difficult
and unreliable.and unreliable. Many objects cannot be distinguished by their part Many objects cannot be distinguished by their part
structure only.structure only. Metric information is essential in many cases.Metric information is essential in many cases.