representation and matching pictorial...

26
67 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-22, NO. 1, JANUARY 1973 [4] K. D. Senne and R. S. Bucy, "Digital realization of optimal discrete-time nonlinear estimators," in Proc. 4th Annu. Princeton Conf. Syst. Sci., 1970. [5] R. S. Bucy and K. D. Senne, "Digital synthesis of non-linear filters," Automatica, vol. 7, pp. 287-298, 1971. [6] E. C. Tacker and T. D. Linton, "Digital and hybrid simulation of a discrete-time optimal nonlinear filter," in Proc. 4th Hawaii Int. Conf. Syst. Sci., 1971, pp. 465-467. [7] -, "Digital and hybrid simulation of a Bayes-optimal non- linear filter," Studies in Digital Automata, Louisiana State Univ., Baton Rouge, Air Force Office of Scientific Research Contract F-44620-68-C-0021, Tech. Rep. LSU-T-TR-40, Sept. 1970. [8] R. E. Mortensen, "Mathematical problems of modeling sto- chastic nonlinear dynamic systems," J. Statist. Phys., vol. 1, no. 2, 1969. [9] C. W. Helstrom, "Markov processes and applications," in Com- munication Theory, A. V. Balakrishnan, Ed. New York: McGraw-Hill, 1968. [10] R. S. Bucy, M. J. Merritt, and D. S. Miller, "Hybrid computer synthesis of optimal discrete nonlinear dynamic systems," in Proc. 2nd Symp. Nonlinear Estimation Theory and Its Applica- tions (San Diego, Calif.), 1971. [11] ,"Hybrid synthesis of the optimal discrete nonlinear filter," Stochastics, vol. 1, Jan. 1973. [12] D. S. Miller, Ph.D. dissertation, Dep. Elec. Eng., Univ. South- ern California, Los Angeles, 1972. Edgar C. Tacker (S'59-M'64) was born in Savannah, Tenn., on September 26, 1935. He received the B.S. degree (with distinction) in electrical engineering from the University of Oklahoma, Norman, in 1960, the M.S. degree in electrical engineering from New York University, New York, N. Y., in 1962, and the Ph.D. degree from the University of Florida, Gainesville, in 1964. IN 01!31,11~5~Q~N, ~0 His industrial experience includes two :1_ i 5 years as a Systems Engineer with Bell Tele- 8 phone Laboratories, Inc. He is presently an Associate Professor in the Departments of Electrical and Chemical Engineering, Loui- siana State University, Baton Rouge. He has developed graduate courses in applied func- tional analysis, systems science, and digital and hybrid computation at LSU and Michigan State University, East Lansing. His current research interests are in the areas of stochastic control theory and multilevel system theory, with emphasis on com- putational aspects. He has applied these theories to problems involv- ing the control of chemical processes and interconnected electrical energy systems. Dr. Tacker is a member of Tau Beta Pi, Pi Mu Epsilon, and Eta Kappa Nu. Thomas D. Linton was born in Frost, Ohio, on November 9, 1944. He received the B.S. i degree in electrical engineering from the Uni- versity of Arkansas, Fayetteville, in 1967, and i% the M.S. degree in systems science from Michigan State University, East Lansing, in1969. He is presently working toward the Ph.D. degree in the Department of Electrical Engi- neering, Louisiana State University, Baton Rouge. Mr. Linton is a member of Eta Kappa Nii and Phi Kappa Phi. The Representation and Matching of Pictorial Structures MARTIN A. FISCHLER AND ROBERT A. ELSCHLAGER Abstract-The primary problem dealt with in this paper is the following. Given some description of a visual object, find that object in an actual photograph. Part of the solution to this problem is the specification of a descriptive scheme, and a metric on which to base the decision of "goodness" of matching or detection. We offer a combined descriptive scheme and decision metric which is general, intuitively satisfying, and which has led to promis- ing experimental results. We also present an algorithm which takes the above descriptions, together with a matrix representing the in- tensities of the actual photograph, and then finds the described object in the matrix. The algorithm uses a procedure similar to dynamic programming in order to cut down on the vast amount of computation otherwise necessary. One desirable feature of the approach is its generality. A new programming system does not need to be written for every new description; instead, one just specifies descriptions in terms of a certain set of primitives and parameters. There ate many areas of application: scene analysis and descrip- tion, map matching for navigation and guidance, optical tracking, Manuscript received November 30, 1971; revised Mav 22, 1972, and August 21, 1972. The authors are with the Lockheed Palo Alto Research Labora- tory, Lockheed Missiles & Space Company, Inc., Paln Alto, Calif. 94304. stereo compilation, and image change detection. In fact, the ability to describe, match, and register scenes is basic for almost any image processing task. Index Terms-Dynamic programming, heuristic optimization, picture description, picture matching, picture processing, represen- tation. INTRODUCTION T llHE PRIMARY PROBLEM dealt with in this T paper is the following. Given some description of a visual object, find that object in an actual photo- graph. The object might be simple, such as a line, or complicated, such as an ocean wave, and the description can be linguistic, pictorial, procedural, etc. The actual photograph will be called the "sensed scene," a two- dimensional array of gray-level values, while the object being sought is called the "reference." This ability to find a reference in a sensed scene, or, equivalently, to match or register the images of two scenes, is basic for almost any image processing task. Application to such areas as scene analysis and descrip- tion, map matching for navigation and guidance, optical

Upload: others

Post on 30-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

67IEEE TRANSACTIONS ON COMPUTERS, VOL. c-22, NO. 1, JANUARY 1973

[4] K. D. Senne and R. S. Bucy, "Digital realization of optimaldiscrete-time nonlinear estimators," in Proc. 4th Annu. PrincetonConf. Syst. Sci., 1970.

[5] R. S. Bucy and K. D. Senne, "Digital synthesis of non-linearfilters," Automatica, vol. 7, pp. 287-298, 1971.

[6] E. C. Tacker and T. D. Linton, "Digital and hybrid simulationof a discrete-time optimal nonlinear filter," in Proc. 4th HawaiiInt. Conf. Syst. Sci., 1971, pp. 465-467.

[7] -, "Digital and hybrid simulation of a Bayes-optimal non-linear filter," Studies in Digital Automata, Louisiana StateUniv., Baton Rouge, Air Force Office of Scientific ResearchContract F-44620-68-C-0021, Tech. Rep. LSU-T-TR-40, Sept.1970.

[8] R. E. Mortensen, "Mathematical problems of modeling sto-chastic nonlinear dynamic systems," J. Statist. Phys., vol. 1, no.2, 1969.

[9] C. W. Helstrom, "Markov processes and applications," in Com-munication Theory, A. V. Balakrishnan, Ed. New York:McGraw-Hill, 1968.

[10] R. S. Bucy, M. J. Merritt, and D. S. Miller, "Hybrid computersynthesis of optimal discrete nonlinear dynamic systems," inProc. 2nd Symp. Nonlinear Estimation Theory and Its Applica-tions (San Diego, Calif.), 1971.

[11] ,"Hybrid synthesis of the optimal discrete nonlinear filter,"Stochastics, vol. 1, Jan. 1973.

[12] D. S. Miller, Ph.D. dissertation, Dep. Elec. Eng., Univ. South-ern California, Los Angeles, 1972.

Edgar C. Tacker (S'59-M'64) was born in Savannah, Tenn., onSeptember 26, 1935. He received the B.S. degree (with distinction) inelectrical engineering from the University of Oklahoma, Norman, in1960, the M.S. degree in electrical engineering from New YorkUniversity, New York, N. Y., in 1962, and the Ph.D. degree fromthe University of Florida, Gainesville, in 1964.

IN 01!31,11~5~Q~N, ~0 His industrial experience includes two

:1_ i 5years as a Systems Engineer with Bell Tele-8 phone Laboratories, Inc. He is presently an

Associate Professor in the Departments ofElectrical and Chemical Engineering, Loui-siana State University, Baton Rouge. He hasdeveloped graduate courses in applied func-tional analysis, systems science, and digitaland hybrid computation at LSU and MichiganState University, East Lansing. His currentresearch interests are in the areas of stochastic

control theory and multilevel system theory, with emphasis on com-putational aspects. He has applied these theories to problems involv-ing the control of chemical processes and interconnected electricalenergy systems.

Dr. Tacker is a member of Tau Beta Pi, Pi Mu Epsilon, and EtaKappa Nu.

Thomas D. Linton was born in Frost, Ohio,on November 9, 1944. He received the B.S.

i degree in electrical engineering from the Uni-versity of Arkansas, Fayetteville, in 1967, and

i% the M.S. degree in systems science fromMichigan State University, East Lansing,in1969.He is presently working toward the Ph.D.

degree in the Department of Electrical Engi-neering, Louisiana State University, BatonRouge.

Mr. Linton is a member of Eta Kappa Nii and Phi Kappa Phi.

The Representation and Matching of Pictorial StructuresMARTIN A. FISCHLER AND ROBERT A. ELSCHLAGER

Abstract-The primary problem dealt with in this paper is the

following. Given some description of a visual object, find that objectin an actual photograph. Part of the solution to this problem is thespecification of a descriptive scheme, and a metric on which to basethe decision of "goodness" of matching or detection.We offer a combined descriptive scheme and decision metric

which is general, intuitively satisfying, and which has led to promis-ing experimental results. We also present an algorithm which takesthe above descriptions, together with a matrix representing the in-tensities of the actual photograph, and then finds the describedobject in the matrix. The algorithm uses a procedure similar to

dynamic programming in order to cut down on the vast amount ofcomputation otherwise necessary.

One desirable feature of the approach is its generality. A new

programming system does not need to be written for every new

description; instead, one just specifies descriptions in terms of a

certain set of primitives and parameters.There ate many areas of application: scene analysis and descrip-

tion, map matching for navigation and guidance, optical tracking,

Manuscript received November 30, 1971; revised Mav 22, 1972,and August 21, 1972.

The authors are with the Lockheed Palo Alto Research Labora-tory, Lockheed Missiles & Space Company, Inc., Paln Alto, Calif.94304.

stereo compilation, and image change detection. In fact, the abilityto describe, match, and register scenes is basic for almost anyimage processing task.

Index Terms-Dynamic programming, heuristic optimization,picture description, picture matching, picture processing, represen-tation.

INTRODUCTIONTllHE PRIMARY PROBLEM dealt with in this

T paper is the following. Given some description ofa visual object, find that object in an actual photo-

graph. The object might be simple, such as a line, orcomplicated, such as an ocean wave, and the descriptioncan be linguistic, pictorial, procedural, etc. The actualphotograph will be called the "sensed scene," a two-dimensional array of gray-level values, while the objectbeing sought is called the "reference."

This ability to find a reference in a sensed scene, or,equivalently, to match or register the images of twoscenes, is basic for almost any image processing task.Application to such areas as scene analysis and descrip-tion, map matching for navigation and guidance, optical

Page 2: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

1EEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

tracking, stereo compilation, and image change detec-tion is direct and obvious.There are two basic approaches to solving the image-

matching problem as defined above.If we possess a precise description of the noise and dis-

tortion process which defines the mapping between thereference and its image in the sensed scene, we can em-

ploy statistical decision theory techniques to derive an

image-matching procedure which optimizes some objec-tive criterion (e.g., minimum error, or minimum risk,in determining the best embedding of the reference inthe sensed scene). A typical outcome of such an analysisis the use of a correlation-like matching procedure. How-ever, in most practical problem situations, the requirednoise and distortion model is not available, nor is itfeasible to attempt to construct one. For example, we

might consider all human faces to be perturbed versionsof some single ideal or reference face. However, to at-

tempt to define completely a valid noise and distortionmodel for this situation would be a hopeless task.A second and more general approach to the image-

matching problem bypasses the need for a noise and dis-tortion model by accepting an embedding metric with-out requiring its justification as being equivalent to a

minimum error (or minimum risk) procedure. In fact,without a noise and distortion model, there is no the-oretically valid way to derive or predict the error per-

formance of a selected procedure prior to its actualapplication. Our primary concern in this paper is withthis latter case.

We offer the following two sets of criteria for an em-

bedding metric not theoretically derived on the basisof error performance. First, it must be successful inapplication (i.e., its observed error performance must

be acceptable), intuitively satisfying (so we can havesome confidence in its ability to deal with as yet untriedapplications), and general enough so that it can be em-

ployed over a wide range of problems without significantmodification. Second, it must be possible to specify a

computationally feasible decision algorithm which se-

lects a suitable embedding based on the given em-

bedding metric. (The combination of embedding metricand corresponding decision algorithm will be called an

embedding model.) In the remainder of this paper, we

will present an embedding metric and correspondingdecision algorithm, present examples of the applicationof this embedding model to a number of distinct prob-lem areas, and show how this embedding model is rele-vant to the problem of scene representation as well as to

image matching.

AN EMBEDDING METRIC

To introduce the generic form of our embeddingmetric, and establish its intuitive validity, let us firstconsider the following process. Assume that the refer-ence is an image on a transparent rubber sheet. Wemove this sheet over the sensed image and, at each pos-

sible placement, we pull or push on the rubber sheet to

get the best possible alignment between the referenceimage on the sheet and the underlying sensed image.We evaluate each such embedding both by how good acorrespondence we were able to obtain and by how muchpushing and pulling we had to exert to obtain it.

Let us now consider a discrete version of the aboveprocess which is both more precise and more reasonablefrom an implementation standpoint. In a specific appli-cation, we might have some information on the rangeof permissible distortions that can occur between thereference and sensed images. For instance, some subsetof the items appearing in the sensed image might alwaysretain their internal shape even though their relativepositions might be subject to change with respect to

their locations in the reference scene. Further, wherechange of relative position is possible, we might be ableto bound the extent of such change; and, finally, wemight like to assign variable "costs" to the differenttypes of change of relative position or relative change insome nongeometric attribute.To achieve these capabilities, we replace the rubber

sheet by a reference image which is composed of a num-ber of rigid pieces (components) held together by"springs." A rigid piece of the reference image can be assmall as a single resolution cell, or as large as the entirereference image, and corresponds to a single coherententity in the reference image. The springs joining therigid pieces serve both to constrain relative movementand to measure the "cost" of the movement by howmuch they are "stretched." (Typically, the springs willbe highly nonlinear in their behavior.) In determiningthe cost of an embedding, we measure the "tension" oneach spring (the tension can be a function of directionas well as stretch or even a relative change in somelocally defined attribute), and also make a local evalua-tion of how well each coherent piece is embedded as anindependent entity.The above model permits two interesting dichotomies.

The first dichotomy is the separation of "syntactic" and"semantic" information. The semantic information,which is application dependent, is embodied in thespecific partitioning of the reference into coherentpieces, the placement and cost functions assigned to thesprings, and the cost functions associated with the inde-pendent embedding of the coherent pieces. The syn-tactic information, which is relatively independent ofthe particular application, defines the class of descrip-tions which the algorithm can process. These data are

embodied in the limits set on reference decomposition(e.g., number and maximum size of pieces, etc.); in theformats which must be employed to specify the globalconstraints and costs; and in the form of the embeddingmetric which evaluates "global" fit. The separation ofsemantic and syntactic information is essential to per-mit application of the model to a broad range of prob-lem areas without the necessity of making significantchanges in the implementation.The second dichotomy is the separation between the

68

Page 3: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

local and global evaluation functions. The global evalu-ation function, associated with the relative positioningof the coherent pieces as described previously, hasstrong syntactic controls on its form to permit its inte-gration directly into the decision algorithm. This is im-portant because the global evaluation produces the mostsevere combinatorial problems. A local evaluation func-tion, associated with how well a given coherent piece isindependently embedded, is easily changed from prob-lem to problem (based on problem-dependent considera-tions) without requiring any change in the core algo-rithms. Thus, the form of a local evaluation functioncan be a (conventional) correlation function togetherwith a pictorial reference component, or a procedurebased on linguistic concepts together with a formaldescription of a reference component,' or even a seriesof guesses in'serted interactively by a human evaluator.The decoupling of the local evaluation functions fromthe core algorithms provides a great deal of flexibilityin' making changes or improvements in the evaluationfunctions for a given problem, as well as when switchingfrom problem to problem. Further, because of the aboveseparation, the performance of the algorithms (bothlocal and global) can be independently evaluated in adirect and intuitively obvious manner. Such an evalua-tion then permits iterative improvement in performanceby selective alteration in the problem-dependent options.We are now in a position to present formally the pro-

posed embedding metric. Let the reference be composedof p components (i.e., p coherent, or primitive, pieces).For 1 <i<p, let xi be a variable ranging over the setof all locations of the sensed scene. xi is defined to be thepostion of the ith component. Suppose there is a mech-anism, either a computer program, or possibly a person,or some mechanical device, which, for location xi of theith component, outputs a numerical value l1(x2) thatindicates how strongly the ith component fits at locationxi of the sensed scene. The smaller li(xi), the betterthe fit.While not formally required, the intent is that li(x2)

measure the presence of the ith component at a locationin the sensed scene independent of any knowledge of the.locations of the other components. That is, li(xi) is apurely local and possibly imprecise measure of the pres-ence of the ith component at location xi.

In addition to the purely local measure li, 1<i.p,there are the following considerations: 1) how well thedifferent components are situated in the required spa-tial relations to each other; and 2) how relative valuesof attributes of the components compare with the cor-responding measured values in the sensed image (e.g.,we might want to specify that the ith component bethicker and more greenish than the jth component). The

I Note that we are now further generalizinig the coincept of "com-ponent." ITt no longer has to be a rigid entity defined pictorially, butrather may be anv information structure or decision procedure whichcan be used to define a real-valued function whose domain of defini-tion is the set of all locations in the sensed image.

extent to which the above specifications are not satisfiedis reflected in the "stretching" of the springs between thecorresponding components.Each location in the sensed image can be associated

with a two-dimensional vector (e.g., the components ofthe vector can be the row and column number of thelocation in the sensed scene). In that case, xi-xj (usualvector subtraction) is a vector pointing from xj to xi.We can now let gij(xi, xj) =gij(xi-xj) be the cost associ-ated with the spring joining the ith and jth components.If there is no spring between these components, thengij is identically zero.

If we set gij(xi, xj) =lI(x) when i =j; and let Xi= {Xi, x2, * , xi }, then the total cost of embedding pcomponents at locations X, is G(Xp).

p i

G(Xp) = E E gij(xi, xi).i=i j-1

Expression (1) can also be written asp

G(Xp) = E hi(Xi)i=j

(1)

(2)

where

hi(Xi) gAjxi xj) .j-l

hi(Xi) can be thought of as the cost of embedding theith component at location xi, given that the previousi-1 components are at the locations specified by X2.

COMPUTATIONAL PROCEDURESIn this section of the paper, we will present computa-

tional procedures for locating a suitable embedding ofone image in another, based on the embedding metricjust presented. A discussion of dynamic programing(DP) is included to place our proposed algorithm [the"linear embedding algorithm" (LEA)] in proper per-spective. In particular, a generic (but computationallyimpractical) approach to solving the embedding prob-lem is some form of DP. The specific form of our em-bedding metric permits a simplification of the generalDP formulation, and the LEA is offered as a computa-tionally feasible approximation to this restricted DPformulation. A graph theoretic interpretation is in-cluded to provide a better intuitive appreciation of theLEA in relation to DP.

Let us assume that the sensed image, designated bythe abbreviation SM, is composed of M resolution ele-ments; while the reference, designated by the abbrevia-tion RM, is composed of P pictorially defined com-ponents (coherent pieces) with a total of N= niresolution elements, ni being the number of resolutionelements in the ith component.The most direct procedure for locating a best em-

bedding is to select combinationally N resolution ele-ments at a time from the SMT, determine if each sucl- se-lection satisfies the coherent (intracomponent) and

69

Page 4: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

global (intercomponent) constraints, and, if acceptable,then evaluate the embedding metric for the given selec-tion. Obviously, such an approach is completely imprac-tical even for small pictures. For example, a 50X 30 SM(M = 1500) and a 5 X 5 RM (N = 25) would require morethan 1054 selections and evaluations, a hopelessly largenumber. If we assume that the coherent and relationalconstraints provide us with P = 6 nonoverlapping com-ponents, each component sequentially constrained tostay within w = 10 locations referenced to the locationof the previously placed component, then we would stillbe required to perform on the order of 1500 X 105= 1.5 X 108 evaluations. Assuming 10-3 s per evaluation,we would require 1.5 X 105 s or approximately two daysof computation time. It is thus obvious that a moreeffective technique is required.

Dynamic Programming

Expressed in formal terms, the evaluation of the em-bedding metric for a typical picture results in a non-linear, integer programming problem with local optimadifferent from the global optimum (i.e., no particularregularity, such as unimodality). The only availableclass of computational procedures for finding the globaloptimum under the above conditions (other than the ex-haustive search techniques discussed earlier) is usuallydesignated by the generic name dynamic programming.DP is a multistage or iterative optimization proce-

dure which can be described in general terms as follows(see [6] and [7]). We wish to find

min G(X) = E h(Xi) (3)X iEI

where X= {X1, X2, , xp4, each xi, 1<i<p, rangesover a set of vectors with discrete components, I= {1, 2, * , p}, and Xt is the set of those variables(among X) upon which hi depends. hi 1, i< P, area given set of real-valued functions.2

Let XiI be the number of variables in Xi, and leteach xi, 1 <i < p, range over M values. Then each com-ponent hi(Xt) of the cost function is specified by meansof a table with IXiI +1 columns and Mlx I rows.The solution proceeds as follows. We select a variable

y1EX and compute the following expressions (thisgives us the minimization of G with respect to yi):

fl(F(y1)) = min E hi(Xi) (4)Yl iEIl

y,*(r(yl))= the value of yi which minimizes expression (4) (5)

where r(yi) is the set of all those variables (except yi)which occur in any one of those Xi which contains yi.In other words, F(yi) is the set of variables which in-teracts with yi. Ih is the set of those i such that Xi con-

TheI{hi defined in (2) are independenit of all1 xj for j>i; the

Ihs) defined in the general DP formulation can be a function ofany xi.

tains yi. yl* is the optimizing assignment for yi as a func-tion of the variables of F(y').The operation described by (4) is called the elimina-

tion of the variable y, and results in the following trans-formation of (3):

(6)min G(X) = min Ff(r(y1)) + Ex X-1v1 L i[I-I11]

Expression (6) has the same form as (3), but does notcontain yi. Thus, we can find an optimal assignment forX be sequentially "eliminating" all of the variables, andthen tracing back through the stored tables of yi*(P(yi)),where P(yi) can only contain yj such that j>i. That is,we must eventually reach a point where, for some s,

s

U Ij = Ij=1

and expression (6) has the form

r(Ya)min G(X) = min Lf8(Y8+i, YP)]X (Y,8+1,s ' '' Yp I

From (7) we can directly determine the global mini-mum cost for G and also the optimizing values forY= (y., Ys+1, , yp). Given the value for Y, we candetermine the value of y,,1* from the stored table forYsl*(r(y,s_1)), as indicated in expression (5). This "back-ward" recursive process is continued to provide us withthe complete optimizing assignment for X.The computational feasibility of the DP approach de-

pends on storage and computing time requirements.For a given objective function (in our case, the embed-ding metric), storage and computing time requirementsare a function of the order in which the variables areeliminated [3], [7] and the dimensionality of each ofthe eliminated variables. We will say that variable ythas dimensionality F(yi) |, where F(y) is the numberof variables in the set L(yi) as defined following (4) and(5). For many of the problems we shall be concernedwith, the dimensionality will be essentially constantover all variables (i.e., a constant number of springsattached to each component and a symmetric intercon-nection topology) and relatively independent of orderof elimination. Let us associate the dimensionality ofthe variables with the embedding function itself em-ploying the designation D(G) to denote the maximumdimensionality of any of the variables to be eliminatedfor a given order of elimination.Where the dimensionality of all variables is constant

for a given order of elimination, the complete embeddingprocedure will involve the iterative application of (4)and (5) [p-D(G) ] times, to evaluate the p arguments ofG corresponding to the embedding of the p compo-nents. In the kth iteration,3 we compute and store a

The kth iteration evaluates the "cost" of embedding the kthcomponent at yk*, given some specific embedding of the componentsassociated with the variables in r(y). This evaluation corresponds tothe elimination of Yk.

70

(7)

Page 5: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

table forfk and yk*. The number of lookups required forthe construction of this kth table will be proportional tothe number4 of its rows- [AMD(G)j] and (for each row)the constrained number of feasible locations of the-vari-able being eliminated (denoted Wk, whereWk<W.f). Thuswe have a storage requirement of 2 MD(G) elements pertable (the two entries stored in each row of the table arevalues of fk and yk*), and a computation requirement ofup to MD(G)+1 lookups (with one or more additions andcomparisons per lookup).

If Wk=W for all k, and all variables have dimension-ality D(G), the complete embedding procedure then hasa computation requirement proportional to

[p - D(G)] (w) [MD (G)] lookups (8)transient (fast store) storage requirement propor-tional to

2. [MD(G)] entries (9)

and a static (secondary store) storage requirement (forthe "backward" selection of the yi after obtaining theminimum global cost) proportional to

[p - D(G)][MD(G)] entries. (10)

For the earlier example where M=50X30 =1500 loca-tions, (for 1 < k < P)Wk =W = 10 locations, and p = 6 com-ponents, if these components are connected in a linearsequence where each interior component lhas only onespring attached to each of its two immediate neighborsand the end components only one spring each, the num-ber of computations5 would involve 75 X 103 lookups andevaluations, or 75 s of computing time at 10-3 s percomputation. This is certainly a much more reasonablerequirement than the two days of computing time for adirect evaluation. We pay for this speedup6 by having afast storage requirement of 3X103 entries (or words),and backup storage requirement of 7.5 X 103 entries,versus a storage requirement of only a few entries forthe direct evaluation.Now, however, note that, if we permit just one addi-

tional spring per component (or even a single springlinking the first and last components), D(G) = 2 and thenumber of computations increase by a factor of almostM= 1500 to 9 X 107 lookups and evaluations, or 9 X 104 s=25 h. The fast storage requirement increases by afactor of Al= 1500 to 4.5 X 106 entries, and the backupstorage requirement increases to 9 X 106 entries.

This exercise demonstrates that, for even a small in-crease over unit dimensionality, the utility of dynamic-programming as a computational technique is question-able for the embedding task. The next subsection intro-duces a heuristic modification to the DP type of sequen-

Relational constrainits canl redtuce the number of feasible rowsto a value considerably below the unconstrained case. However, thecomipuitational problems, in attempting to take "advantage" of thisreduction, may be prohibitive.

I Assuming the variables are eliminated in the order in which thevappear in the linear sequence, then D(G) = 1.

6 Dynamic programming cani be considered to he a way of tradingstorage for compuitation time.

tial optimization which eliminates the growth of dimen-sionality as more global constraints (springs) are per-mitted.

Linear Embedding

The sequential embedding technique which we pre-sent in this section will be called the linear embeddingalgorithm (LEA). The essential property of this algo-rithm is its ability to locate a suitable embedding witha linear, rather than exponential, growth of storage andcomputing time requirements as a function of thenumber of components in the reference. The algorithmis formally described as follows.

Given a reference with p components, for 1<i<P,li(x.) is the externally supplied local evaluation arrayfor the ith component as a function of xi, its embeddedlocation. Given that the ith component is embedded inlocation xi, wi(xi) is the constrained set of feasible loca-tions for xi-,.

If y is the sequence (1, 3, 2, 5), then y * 8 will be an-other way of writing (1, 3, 2, 5, 8). Si(xi, . . xi)= L_=1 gij(xi, xj), Where each gij is an externally sup-plied spring array, specified as a function of the relativeembeddings of the ith and jth components. This decom-position of Si into a sum of two-place functions (springs)is not required in the following LEA. Thus, if desired,we could extend the scope of the embedding metric toinclude more complex relational forms without increas-ing the computational complexity of the embeddingalgorithm (LEA).The LEA is the computation, in order, of the follow-

ing sequence of 2p+2 equations. (Note that hi=si+Ii.)

gi(xs) = 11(x1)

y1(x1) = XI

(11)(12)

g2(X2) = llun [S2(Xl, X2) + 1)(X2) + gl(Xl)] (13)x1 Eu, (X2 )

y2(x.,) = y1(x1) * X2, where x1 is that value whichminimizes the previous equation.

gi(xi)' =

(14)

min [S,(yi-l(xi l) + x,) * li(xi)xi _ I ! )

+ gi-1(xi--)] (15)

yi(Xi) = yi-(xi1) * xi, where xi-, is that value whichminimizes the previous equation. (16)

gp(xp) = . . .

yp(xp) = . .

G = min gp(xp).xp

Y = YP(xP),

(17)(18)

(19)

where x1, is that value whichminimizes the previous equation. (20)

As in the subsection on dynamic programming, G[see (19)] is the total embedding cost, and the cor-

71

Page 6: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

responding locations of the embedded components aredetermined from Y [see (20)].7

Given the restriction that the hi in (3) are independentof all xj forj > i [this is consistent with the embeddingmetric as presented in (2) ], then dynamic programmingcan be viewed as a procedure for finding the shortestpath through a graph. We define this graph in the fol-lowing way. The nodes of the graph are arranged in pcolumns, where each node in the ith column is labeledby the values of the variables corresponding to a uniqueset of embeddings for the first i components; there areas many nodes in the ith column as there are uniqueembeddings of the first i components. The length of abranch between a node in column i -1 (with label Xi1)and a node in column i (with label Xi) is hi(Xi). Wedelete branches with infinite length.To determine the shortest path from the root node

(a single node placed in column zero) to some node incolumn p, we can proceed as follows. In the ith column,for each node, sum all the branch lengths correspondingto the label of the node. We now determine that set ofvariables (xi) which do not appear in any hj for j> i,and call this set of variables Zi (the variables in Zi cor-respond to components in columns j<i which do nothave spring connections to components in columnsj>i). We now place the nodes in the ith column intosets, such that, for each set, the nodes are identical intheir labels except for the variables in Zi. For each suchset, we retain the node with the shortest path lengthfrom the root node, and delete all the other nodes in theset as well as those nodes in columnsj>i which branchfrom deleted nodes. It is this pruning process whichgives DP its computational advantage over completeenumeration. After the above set of operations is carriedout through the Pth column, we select the node definingthe shortest path through the tree, and thus the lowestcost embedding for the P components. The LEA differsfrom DP in that in the sequential determination of theshortest path, a maximum of only mi nodes will be re-tained in the ith column (mi is the number of permissiblelocations for embedding the ith component). In pro-cessing the ith column, the nodes are grouped into setssuch that, for each set, the nodes are identical in theirlabels for xi. For each such set we then proceed as in theDP case. Thus, at the ith iteration we save only the mi"best" current embeddings, such that every possible

7The yi defined in (16) clarify the presentation of the LEA. How-ever, in a computer implementation one need Inot calculate these yi,which are arravs, each element of which is a sequence of locations.Rather, it is only necessary to compute a more restricted funiction Zidefined by

Zi (Xi) = Xi-l

where xi- is that value which minimizes the previous equiationThese zi are arrays, each element of which is a single locatioii

rather than a sequence of locations. Then in the compuitations in-volving the Si, if Si depends on more than the two rightmost loca-tions of y_ (xi-,)*xi, the additional locations may be retrieved fromthe Zk, 1 <k<i-1.

positioning of the ith component occurs in one of theembeddings. This approximation technique may fail tofind the best embedding (shortest path) if the com-ponents with low indicies (small i) incur a high em-bedding cost when placed in their optimal locations.

If the components of the reference are linked by asingle chain of springs (which are then called primarysprings), DP and the LEA provide identical solutions.When additional springs are present, the LEA no longerassures the optimal embedding, and these additionalsprings are called hueristic springs (hueristic in thesense that while these additional springs provide moreinformation and thus give an intuitively better matchthan in the single chain case, the best possible use of thisadditional information is not always assured). Theoperation of the LEA is illustrated by the examplegiven in Fig. 1.

Additional Computation Speedup and StorageReduction Techniques

By slowing the growth of the computation and stor-age requirements to a linear function of the size (M) ofthe pictures, the LEA establishes itself as a feasible em-bedding procedure. However, because of the large pro-portionality constants, the practicality of employing theLEA will probably depend on the efficiency of the asso-ciated computer programing, and on the employmentof additional (second-order) speedup and storage reduc-tion techniques. Two of the more important speeduptechniques, applicable to both DP and the LEA, arediscussed below.The dynamic programming and LEA formalisms pre-

sented earlier do not explicitly consider constraints onthe variables (xi). The simplest way of treating suchconstraints is to introduce into the objective function(embedding metric) cost terms which become infinite ifthe relational constraints are violated. This approachhandles the problem by increasing the computation timeneeded to evaluate the additional cost terms. A muchmore desirable technique is to employ the explicit rela-tional constraints (the wi) to limit both the table sizeand search requirements by ignoring or eliminatinginfeasible variable combinations. This technique isillustrated in [1]- [3], and was considered in derivingexpression (8). When the relational constraints are im-plicit (i.e., must be computed from the given data), it isnot clear whether any advantage can be gained fromfirst converting them to explicit form, and then applyingthe above technique.Dynamic programming and the linear embedding

algorithm previously described can both be speeded up bythe following (type of branch-and-bound) technique.We examine a number of complete embeddings, andselect the lowest cost corresponding to any one of thesetrial embeddings. (The embeddings themselves couldhave been obtained by random placements of the corn-

72

Page 7: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

YSM

44 5 2 8

3 7 5 1 3

2 8 1 5 7

1 4 3 2 4

1 2 3 4 -_z

C1 (1, 1)

c2 (1,1)

C3 (1, 1)

C4 (1,1)

4Q-L C2

2 3

= CI =36C2 4

=C3 =5=C4 =3

I(z Y) = I SM(z,Y) Cifor I i - 4

Spring definition when (i, j) = (2, 1) or (i,j) - (4,3)

Xi - Xj =(Zi - Zj J _ yj) gi (xi- X)

1,0 0

2,0 1

otherwise

Spring definition when (i,j) - (4, 1) or (i,j) = (3,2)

xi-=xj- (zi- zi Yi Yj) g.i(Xi-x.)0,1 0

0,2 1

otherwise

(b)

Evaluation of g2

x2 x1 61 s2z2 Y2 z1Y1 I1 12 g21 92

24 14 1 2 0 3

3 4 1 4 1 4 1 6

2 4 4 4 0

4 4 2 4 4 4 1

3 4 2 4 0 6

2 3 1 3 1 1 0 2

3 3 1:3 1 3 1

2 3 1 3 0 4

4 3 2 3 1 1 1 3

3 3 5 1 0

22 1 2 2 3 0 5

3 2 1 2 2 1 1 4

2 2 5 1 0

4 2 2 2 5 3 1

32 1 3 0 4

Evaluation of 93

x3 x2 x s3

z3 33 z2 Y2 zlyl 13 g32 g2 g3

2 3 2 4 1 4 0 0 3 3

3 3 3 4 1 4 4 0 6 10

4 3 4 4 3 4 2 0 6 8

2 2 2 3 1 3 4 0 2 6

2 4 4 1 3

3 2 3 3 2 3 0 0 4 4

3 4 0 1 6

j4 2 4 3 2 3 2 0 3 5

4 4 2 1 6

2 1 2 2 2 0 5

2 3 1 3 2 1 2 5

3 1 3 2 1 2 3 0 4 7

3 3 3 1 4

4 1 4 2 3 2 1 0 4 5

4 3 1 1 3

Evaluation of g4 = G

x4 23 xl 94Z4y4 Z y33 1 Y

6g43 g41 S4 14 g3 G

1 3 2 3 1 4 0 0 0 4 3 7

3 3 1 4 1 0 1 4 10 15

2 3 3 3 1 4 0 2

4 3 3 4 1 2

3 3 4 3 3 4 0 0 0 2 8 10

1 2 2 2 1 3 0 0 0 5 6 11

3 2 2 3 1 5

2 2 3 2 2 3 0 0 0 2 4 6

4 2 2 3 1 0 1 2 5 8

3 2 4 2 2 3 0 2

1 1 2 1 1 3 0 1 1 1 5 7

3 1 1 2 1 0 1 1 7 9

21 31 12 0 0

41 32 1 0

31' 4 1 3 2 0 0 0 1 5 6

(c)

Fig. 1. An example illustrating the operation of the linear embedding algorithm. The definitions of x, gij, I, are given on pages

z and y are the components of x; that is, x = (z, y). (a) The sensed image. (b) The reference description. (c) Linear embedding algorithm.

73

x= (z Y)

1,42,43,44,41,32,33,34,31,22,23,24,21,12,13,14,1

SM (Zty)

5288751381574324

(a)

-[ (2, 3), (3, 3), (3, 2), (2, 2)]

-I (3, 2), (4, 2), (4, 1)(3, 1)]

Page 8: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

ponents, or perhaps by someone guessing at what a"good" embedding might actually look like.) Now, em-ploying the LEA (or DP) to place and evaluate thecumulative cost of placement of the components se-quentially, we can eliminate from further considerationany of those placements whose costs exceed the boundestablished by our best trial embedding. It should benoted that this technique is valid only if the cost asso-ciated with the embedding of each component is non-negative. However, this can almost always be the casefor the class of problems we are discussing in this paper.A heuristic embellishment of the above branch-and-

bound technique would be to use some fraction (say[k/N]') of the bound at the kth stage of an N-stageprocess as the threshold for eliminating a possible se-quence of embeddings.

Scale and Rotation (S&R) ConsiderationsIn attempting to match or register two images, we

frequently are faced with the problem of unknown rela-tive scale and orientation. While such variations areconceptually indistinct fromn any of a host of unwantedvariations between the reference and the image, they(S&R) can serve as a vehicle for clarifying some im-portant issues pertaining to the way the LEA is em-ployed.As noted in earlier sections, the embedding process is

carried out at two levels. First, the components of thereference are searched for as independent entities. Theparticular processes by which these searches are exe-cuted are not a direct issue of concern here; the im-portant point is that, regardless of the search mecha-nism, the outcome of the search for any individual com-ponent is presented to the LEA by a tabulation calledthe local evaluation array [L(EV)A]. Each entry in theL(EV)A corresponds to a possible embedding in theimage of the associated reference component, indexed bythe variables used to define the embedding. The entryconsists of a number related to the probability that the-component is actually present at the "location" specifiedby the indexing variables, and each entry can also con-tain the values of attributes of the component as mea-sured at the indexed location. The LEA has no knowl-edge of the component beyond what is presented to itin the L(EV)A for that component. The purpose of theLEA is to integrate global or structural knowledge withthe information provided in the L(EV)A's to find thebest overall embedding (or embeddings) of the referencein the image. The acceptability of the final embeddingselected by the LEA will thus be dependent on tlhe qual-ity of the information presented in the L(EV)A's,where the extent of this dependence is related to therelative importance of local (component definition) ver-sus global (intercomponent) information for the particu-lar problem. Thus, it is the responsibility of the localevaluation function, in attempting to gather evidenceabout the presence of some given reference component,to be able to deal with the various noise and distortion

Fig. 2. Reference description of a squiare.

processes (such as S&R) which might be encountered.To the extent that these same noise and distortion pro-cesses affect the global or structural relationships be-tween the reference components, the LEA provides themachinery necessary to deal with the resulting problems.

Ability to deal with variations at the global level isaccomplished by defining "attributes" which measure(or estimate) these variations, and then making the"spring" parameters functions of these attributes.Thus, in the case of S&R, if a component Pi has scale

and rotation attributes Si and 01, the springs (vectors)attached to P1 would be (conceptually) scaled as a func-tion of S1, and rotated as a function of 01. The followingexample illustrates some of the above comments.

Problem

Given a two-dimensional region in which there are krandomly oriented and positioned line segments, findthe four-line segments which best approximate a square.Each line is specified by a four-tuple of the type (x, y,0, 1) where the x, y coordinates locate the center of thesegment, 0 specifies the orientation, and I the length ofthe segment. To simplify this example, we will ignorethe detection problem and assume that the given valuesfor each segment are known with probability one. Weassign a cost Ci for each unit of positional disparity be-tween the sides of a candidate square, and a cost C2 foreach degree of rotational disparity between the sides(i.e., sides should meet at right angles).

Solution

Consider a single "local evaluation array" consistingof the given list of four-tuples (x, y, 0, 1). We can con-sider x, y to be the "location" indexing variables, and0, 1 to be attributes. All entries have unit probability,entries with zero probability are deleted. The "descrip-tion" of a square is shown schematically in Fig. 2. Each(spring) vector is rigidly attached to its line segment ata fixed angle of 45°. Costs (C1) associated with (x, y) dis-parity between the end of a vector and the center of thenext line segment are circularly symmetric (i.e., the"cost" function increases with distance from the tip ofthe vector) about the tip of the vector. We also assess acost (C2) proportional to the difference in measured

74

Page 9: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

(attribute value) orientation of more or less than 900for sequential line segments.The general form of the LEA, with orientation ad-

justed springs, and spring costs augmented by attribute(S&R) differences, is adequate to deal with the prob-lem as posed. A minor difficulty arises from the fact thatthe orientation of a line segment is actually two valued(i.e., 0 and 0+1800). If we list each line segment twicein the local evaluation array, once for each of its twoorientations, then the LEA can be applied withoutmodification.We can handle squares of differing size by using the

line-segment-length attribute in the same manner as theorientation attribute (except that double entries are notrequired here). Spring stretching is augmented by acost proportional to the difference in length attributefor sequential line segments. That is, when the LEAexamines a new line segment as a possible additionalside for a square already partially formed, it comparesthe length of the new line segment with the length of theline segment in the partial square to which the new seg-ment will be attached. The spring between the new andthe old line segments then is stretched (over and aboveany stretching due to angular and positional disparity)by an amount proportional to the difference in theselengths.

In the above example note that, because each entryin the local evaluation array had either zero (oo cost)or unit (0 cost) probability associated with its occur-rence, the size of this array could be reduced to listingonly those few coordinate combinations associated withthe feasible (nonzero probability) occurrence of a com-ponent (line segment). This procedure can be used inother situations where it is reasonable to reduce all lowprobability entries in the local evaluation array to zero.

PICTORIAL REPRESENTATION

A central problem in much of the work concernedwith the computer processing of pictorial data is that ofrepresentation. Since we cannot manipulate the realworld object (itself) within the computer, we attempt toconstruct a representation (or model) which can be usedin place of the actual object and which has the following(somewhat overlapping) properties.

Complete: Any question of interest which could beresolved by reference to the actual object should also becapable of being resolved by reference to the represen-tation.

Compact: The representation should be free of infor-mation redundant to the purposes for which it will beused. This is necessary to minimize computer storagerequirements.

Transformable: Much of the information contained ina representation will be implicit rather than explicit inform. The ability to manipulate easily the representa-tion to extract required information is essential. Forexample, if we represent a picture by an intensity matrixor raster, then a count of the number of isolated objects

appearing in the picture would be implicit informationwhich could be extracted from the representation afterconsiderable processing. However, if the representationconsisted of the contours of the object appearing in thepicture, then the required count could be obtainedrather simply.

Incrementally Changeable: If we observe a slightchange in the real world object, it should be a relativelysimple and straightforward task to alter the representa-tion. Further, from the standpoint of image matching,a small change in the real world should require only asmall change in the representation.

Accuracy and Simplicity of Translation: Given a realworld object, it should be relatively simple to derive anaccurate representation of the object.Over the past ten years or so, much of the work con-

cerned with pictorial representation has been restrictedto the domain of line type drawings, and the use of for-mal linguistic methods (see [8 ]- [10 ]). Very little successhas been achieved in attempts to extend this work toscenes of terrains, cloud covers, human faces, etc.,which can only be described meaningfully in terms ofpicture components which are not line elements, butwhich are regions with colors, textures, shadings, etc.8

Perhaps the most serious failing of the linguistic (andsimilar) techniques occurs with respect to the "transla-tion" property. These techniques build a representationby constructing a hierarchy based on picture primitives,assembled into linear expressions employing specifiedrelational forms, and satisfying a set of syntax rules. Theproblem arises from the fact that (usually) the onlydirect correspondence between the actual object and itsrepresentation occurs at the level of the primitive ele-ments (typically points, intensities, and lines), while inpractice there are pieces of a picture that are too in-volved or complicated to describe in terms of theseprimitives. Theoretically, the description is possiblesince the matrix representing the picture is finite. How-ever, such a description would be so complicated that,aside from the difficulty of composing it, there would bea considerable likelihood of error and inaccurate repre-sentation.

In the previous portions of this paper we have pre-sented a representational scheme for pictures, and wereprimarily concerned with its application to imagematching. We will now show that the representationalscheme has wide general applicability, and avoids manyof the problems of the linguistic approaches. First wenote that it is a hybrid type of representation in that itinvokes symbolic (numerical) elements as well as allow-ing actual picture segments to be part of the representa-tion. The ability to intermix picture segments and sym-bolic data in the same representation greatly simplifies

8 Specialized systems have been developed, highly tailored tospecific problems, which are exceptions to this assertion; e.g., seeKelly [11]. A number of papers, including Bledsoe [15], [161 and

* Goldstein and Harmon [17], effectively consider the problem of faceidentification based on feature measurements obtained manually.

75

Page 10: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

the translation problem. Where a pictorial concept isdifficult to describe symbolically, we can use an actualpiece of the picture as part of the description. A secondaspect of our representation that simplifies the transla-tion problem is the fact that the components (picturepieces, local evaluation arrays, etc.) and the relationalforms (springs) are two-dimensional rather than one-

dimensional entities. We thus avoid the problem of hav-ing to construct a one-dimensional model for a two-

dimensional structure.Let us now examine some of the other representational

attributes. Incremental changeability follows directlyfrom ease of translation (although the inverse relation-ship would not necessarily hold). Transformability withrespect to image matching has certainly been estab-lished; the fact that the representation is already in

two-dimensional form, with metric, geometrical, andtopological relationships explicitly and quantitativelyexpressed, implies that transformability for many otherapplications is more than adequate.

In many respects, compactness and transformabilityare antithetical since data in explicit form are usuallymore extensive than the equivalent implicit information.This is the case with our representation. It requires con-

siderably more storage than might be required for a

linguistic representation.The one area where the linguistic approach has an

obvious advantage is with respect to completeness. Alinguistic representation can treat nonpictorial informa-tion (e.g., relations between items in a picture and otheritems not visible, but perhaps implied or normallyassociated with the pictured items) in a way that wouldbe extremely difficult in our representation. This addi-tional capability could, of course, be achieved by ap-

pending the necessary linguistic machinery to our cur-

rent scheme, although the final result might well be a

mixture of two representations rather than one inte-grated representation.

EXPERIMENTAL RESULTS

In order to evaluate the practical implications of thetechniques presented in this paper, we have initiated a

program involving experiments on a variety of line type

drawings and gray-level imagery. Over 400 experimentshave already been performed with the following generalresults.

1) On well-defined imagery (i.e., relatively noise-freeand unblurred -pictures), the embeddings produced bythe LEA were almost always in agreement with the bestembedding as predetermined by human evaluators.Where the few deviations did occur, they were reason-

able, and usually related to the crude component de-scriptions employed.

2) On noisy imagery (course resolution, additiverandom and coherent noise), the fall in performanceparalleled the difficulty human evaluators had in locat-

ing suitable embeddings. Where the components werediscernible in the image, the embeddings were usuallycorrect; those components which were significantlyaltered by the noise were sometimes missed, but thesubstitution error was usually in close proximity to thecorrect embedding location, and, even in error, thecorrect embedding almost always had a score close tothe best score.

Since the programmed version of the algorithm is stillevolving, and some of the discussed features have notyet been implemented (e.g., the "attribute" feature isnot operational as yet), most of the experiments wereinformal in nature. However, for the purposes of thispaper, two sets of controlled experiments were run andare described below.

Image-Matching Experiments Using Faces

The majority of the experiments we have run to datehad human faces as their subject. Reasons for this selec-tion include the following.

1) The availability of a set of digitized gray-scalepictures containing faces.

2) A single reference (face) could be tested on all thefaces in the data set. In the case of, say, terrain pictures,a separate reference (or, at best, a unique compositionof standard reference components) is necessary for eachpicture.

3) Our familiarity with faces and their components(eyes, nose, mouth, etc.) facilitates evaluation of per-formance as noise and distortion are introduced.The data set used in the face experiments consisted of

15 human faces,9 both men (some with beards) andwomen, digitized to approximately 16 true gray levels,and each face typically was contained in a picture fieldof from 2000 to 3000 resolution elements. Using a refer-ence as shown schematically in Fig. 3(a), with com,ponents as described in Fig. 3(b) and (c), almost 300formal and informal experiments were performed. Ineach experiment, additive (truncated) Gaussian randomnoise with zero-mean and standard deviation of either 0,10, or 15 units was added to each resolution element(relative to a pseudogray scale of 64 units for the noise-free pictures). In some of the informal experiments,coherent noise consisting of randomly placed lines wasalso inserted [see Fig. 4(a)].With no more than two or three exceptions, when the

reference was restricted to hair, eyes, and sides of face,correct embedding was achieved. These results aregratifying in view of the simple component descriptionsemployed, and the equivocation displayed by the re-sulting L(EV)A's. (See examples shown in Fig. 4.)Two series of formal experiments were run on the

I These data were obtained from the Staiiford Artificial Intelli-gence Laboratory and are a subset of the data employed in the ex-periments described by Kelly [11].

76

Page 11: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

LEFTEDGE

A1P1I

77

V~7$J~O~I RIGHT

NOSE EDGE

MOUTH

(a)

VALUE(X)=(E+F+G+H)-(A+B+C+D)

Note: VALUE(X) is the value assigned to theL(EV)A corresponding to the location Xas a function of the intensities of locationsA through H in the sensed scene.

(b)

K K2=CONSTANTSa=(C+D+E+F)/4p=(A+B+G+H+I+J)/6

p-(X+F)IF [X<(a-K}) OR. a < /3)THEN VALUE(X)=yFK2ELSE VALUE (X) = y

(c)Fig. 3. Reference description of a face. (a) Schematic representation

of face reference, indicating components and their linkages.(b) Reference description for left edge of face. (c) Referencedescription for eye.

(noisy) face pictures using two references which in-cluded, but differed in, the nose/mouth definitions. Inthe first series, consisting of 90 experiments, there were

83 completely correct embeddings, and 7 partially incor-rect embeddings. The errors involved six experimentsin which the nose/mouth complex was offset by three to

four resolution cells from its ideal location, and one ex-

periment in which both the eyes and the nose/mouthcomplex were improperly placed. In the second series,consisting of 45 experiments, the placement of the nose/mouth complex was judged incorrect in 3 experiments,while all the other components were always correctlyembedded.

Analysis of the face experiments led to the followingconclusions. In spite of almost perfect performance inembedding the hair, eyes, and sides of the face, preciseplacement of the nose/mouth complex based on strictlylocal evaluation was almost impossible in some of thenoisy pictures due to loss of detail [e.g., see Fig. 4(b) ].With the attribute feature of the LEA not yet opera-

tional, and with the arbitrary decision to use binary(rather than multivalued) weights in the spring arrays

for these experiments, the LEA restricted the feasibleregion over which an optimum value could be selectedfor embedding the nose/mouth complex, but did notbias the selection as would genetally be the case. In thepresence of heavy noise, the simple nose/mouth descrip-

tions used in these experiments were not always ade-quate to produce a local optimum in the L(EV)A at ornear the ideal embedding location. (A three-resolutioncell deviation was considered an error.)

Image-Matching Experiments Using Terrain Scenes

Approximately 40 experiments have been performedusing terrain scenes (including both aerial and groundscenes). The object in each case was to create a relativelysimple description of some portion of the scene and thenattempt to find the proper embedding of the descriptionin the image (or some distorted or alternate view ofthe image).The descriptions employed two basic types of com-

ponents: 1) texture components, in which- the "texturevalue" of a point was defined as a crude statistical func-tion of the intensity values and gradients in some localregion surrounding the point; and 2) shape components,which were defined by collections of "edge" points hav-ing specified gradients.

Fig. 5(a) shows an example of a terrain (reference)description. Fig. 5(b) shows its successful embeddingrelative to the computer-stored version of the photo-graph of the actual terrain segment as shown in Fig.5 (c). Each coherent piece in reference 5 (a) is representedby several points enclosed by a dotted line. In this ex-

ample, the points of each enclosure of the reference com-

Page 12: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

1l34567800123456b890li34567890123456789.,

234 4XMeeAl5 Zvoy"9U34X_6 a-119-z8flfaOsx-

7 1%5i2[mMCW*UIZ3A8 Z7iZ- = IZXXXWIO

c so@x + --Zos FLc A33Z7 +A333)11 134 ZI7=-- -+AOUUS=T2 X' --- - *Ass -13 '36AZ 1+.= -+XEUKRX14 MUA7ZI- *744 _8_A15 XISXZ XX1= -.++ielaus+f16 )BSXM"MIlX -A44XXZMUUU*17 -X14tXAM-+4Mf4AZ 1M3US18 XIE)XAI AT) Z1T?AI19 XfX=-I77ZZ +* 1+++=la 120 134) +1 )- - +7X+)Zi +YAL+ Zi ++ M IT22 )AXl= IAZX) -Z7)+-23 ZXZ).Xl ZI- -1Z=24 + AY771TT--+Z)25 MXZ XMA XXZI += 1 +

26 AAXZXI)++Il))X-

28 A@IAXI )1IZZZ1Z=2ZS; =Amfif4XX1 IIXXL1 X_

3 1 - Z A;A X X X Z =+ 132 1M -I IXXZ 1I) ) ++ 1+=33 4.XWM 1iI71)+ + + = +X IX)34 +1XAM8=XX===++--= -= -AAX.+=ZX1+++35 I

1234567890123456 78Q3 12345678CC12 34567803

Original picture.

12345676 90 12 345o79l012345678 901234567890

2 ZZZI7 Z2ZZ177X )==4++.. lZl71 l)})ll)111111113 ZZ)1)X17 1+.+1- - )1 1 714Z21+ +)1111Z4 2711) 1Z2)Z1X1 )XEf3e'A' -==+I11111)llllllllZ5 771 )1 1)IZI)14*O93R6Z )=11ZZZZ11zz71111Zf: Z2XZZZ1) 1 I++XWiEll8!iYRO42 +)1 1 Z ZZZZ I ZZIZ7 7ZlZA1AZ )1.+?A334343tILZI 7 I1)1111ZZP Z)I Z1114Xf1)I+Z-Tf'.MM3VY =1127IXAZ1II1; Zll 11 l}ZMlX) 1++)11 }+++X@!X1+)11XZIZZ112

10 Z71)l3110Zf))l--)111++)1§YX-))1Z2ZZZZl1 ZI7Z11 Z4.3ZII)1+1 1 )ld M=Z*lfAXl ZZZ))Z1177

12 Z7Z7ZZZX3MXZ4M++1+ I 1 I7Z9MX)I I IZ L+=+ )Z13 ZZl 17@ xwXIBX+-l)?II )APMXXXX1)++) I ZZ1 4 Z 1.1 I ) Z-P LPZ1 1EMA==1) A19P.HMXX1I4.+++MIZZ1 5 ZZI 1Z AAAMAXXX IMM +)- A&JugMeA)1+=+M4.1 1Z16 ZXZ7XXX9878XM^9iX'1+MEX4I.EX+++A3SFli17 Z1 177Z t7 3A 3IRSG X3PG3R9AA14Z-=AlER) 7Z ZZ18 Z))1)ZIX9147XMMXX 114BAA7XZMA1++M.P7)+ZI7lZ19 ZIZ1)11X?+1XXMXl43:4-lXX3*MM33m)4.4.7lZXZ2C 714XzX1X3Z+=+11++4)el.zt1 IMS@l1t)ZXlIXXZ21 ZA@1MM1PAA1X1IXA1)1)M14++l.M3UA+4)111ZlZX22 ZlXAZ1X)4.7x6+ZAX1='m43IXUXMA+-)XZII)1723 Z)77)1l1))ZXM1AA111=tIi)A+PAEm4++1Z)1)724 ZlZl) EIieXAXXt4MW17AOIB11=+i))ZlllZZIZ25 17ZZZ7))11XXiM1XM)tXMEt1!14I)+)S3i)+XX47ZAZ

27 ZZZ7X11ZM7MZXAAX) Z7ZZAZZMX)+)1M1+ IXXIZ28 Z7XZ)11 Z7ZZ7AfAZZX7Z 1XMXXAXZ7 )l 4+M41 )1111)Z29 ZZXXZI)) 11 XlZAAX414XXXAZXA1 +-9lX111Z30 ZAZI,1)+117XX4tMA6AAAM1 1.ZZXXXlZZI 111)))11231 ZZ11) )+ 1l1)XX4MI'A11IXA)111)AX++41Z111117Z3Z LL1l1 1 1=A4llt3AZ17)Z) l)14.Ag4I=+X+i rT)733 Z=++Z7AZ17) lZZZ+=IlZZI+)ZZZm1)+)))1)+ZZ34 l1X1fmA)A6A1l1XX1X)YXl1A@Al1Z9gXXM48X1ZZZ35 Z 3E33333339zU535333*1337 7

123456789012 3456 789012 3456789012 3456789V)4. 4..= - I=

2 e =-- 1+ )-Z + - += --+J =1 - + - 43 + -.7 Z _- 1+ -Z+_- -_) 4+ + +

4 1 ) 1+ +ABP3A 1 - ==5 1 1 ) ASM8R1E571-=Z 1 - +6 -4.) - =+ =P&BSPVSEeIA+ - =4++*I4. =7 ) 11) = -3'"IZA8*3333Z= 1 1 -)8 = 334)1 4. M4314.3 -AX - XI-9 = += + 1-143s. s =+

1C == A3 z- 1 ==++An -+- =

11 4. I 3F41.+1 I -I 8==6fA 14. 4. 412 +-+ I)X3341=8 - - = *316,8SmZ- =+13 I Z4ASE"=U*- I -31361=X - +14 = ZMNA+ 1-se Z 338ij38X +*515 1 = ZZ71*AZAA 33 I - 363A3- 3 =16 =X 1+Z+XBAZAl46Ml+A6AX)IIA8F - 33 =17 = 4+ Z =*fHM361 ZA0§U2 1MYA8f oil +4118 = + Z M"A+XAX+X 33)iZ)ZA+U)A *(V=--=19 ++7 =)?M 1)61A1-Il A+AF3Z343 zZ AAA Z4+ XEX4 +++ 1-N§A+8Z.89jU3- + A -z-21 =AAZAA+.A7X +M) ) 3)-ZZ7O+ --= )-2Z -=ZZ AAAAA+ ='1XI 3)1 Of 3 - 4.)-2 3 Z 4 1t46A81 1)AA13M 31 z24 == Z 1WAIA1AAAAA+333) + C) 1 - 1 425 + Z 1 +I11733)X MZ)364AA -3) Z + 11Z6 zz Y1 AZ+XIg+ fg+=ZAAA -33 -27 Z 7 IfSv+%ZA1 1Z)AA+l1M +128 7Z + 1-F4)7ZZl+ A.I)XXZ 33=29 Z)) =#SA ) )M AX14-Al+X +- 3+1-34 LZ 1 X+AGARZZXMZX7XI 1 I - -31 1Z =+ 7X+XAAX)++IXA - --32 Z I XA64)+= XIFWM7Z=+4 +++4 311 +433 XXM+7)l+XlZ) AO=1Z= +H10111 + = +34 = 7MAIX=MX) +111)1 Zt - 1MIX=1+M'X35 314XKVVE33fI1ffI3U§3XX3389AXkA*1m3I7MXM)7I

12345678901 ?345678Q012345678901234567890

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (6, 18)L/EDGE WAS LOCATED AT (18, 10)R/EDGE WAS LOCATED AT (18, 25)L/EYE WAS LOCATED AT (17,13)R/EYE WAS LOCATED AT (17, 21)NOSE WAS LOCATtD AT (22,18)MOUTH WAS LOCATED AT (24,17)

123456 7°9'l23456 7890 12 34567 89 C12 3456 78q0

L(EV)A for eye. (Density at a point is proportional toprobability that an eye is present at that location.)

(a)

Fig. 4. Examples of image-matching experiments using faces. (a) Successful embedding under coherent nioise.

78

Page 13: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

12345678901234567890 123456789012345678901

3-4) I lZAMAI+65__ _ _m_ if8mZ-6 s+A6ffISlSOO@SSG 17 366 I{f3E66ISII§@7 -"e""e*[email protected] AOI@8866eUSS640531X

10 _ v3U6G9SMAA8G6I66Z11 _ __ XKSfiIIItl}+-__+11166.__ .-__12 XflftfE+- +met68+13 AIIIE X)- 1M6410-14 AGOM +XI+15 .11Z8 + 1116 8IMAMPMX+ =))++Si6X17 -XG8MXI XAI 3 ZXLxIS+Is 8 } SSXAMMZXZ =IAXI=+Zii+iS 3M6ZlZl=Xl =I)1A20 A81=+==Z) IX21 4-61- 4Z4 Zt22 -0Z) +4- Al23 6X81 )Z- =M+24 MA1= -ZZ== 6e25 *XZ++..+r- ZA26 )32111)+= 1t27 4+X11X2++=4+ -+28 +XZ8113= ++29 lAXZ3l)'=1+) +

30 lAMAZI)=) 331 +XMPAZ1+-+)1=32 =43XAMA211134 4+33 += =IZXXZI+-- 134 + +XZ12Z4 lXZ35 1Z362)33+- +AXW-36 11MMAI- 1AM637 511AMX+ XX88-3e =1 zzz) ZZZZ

1234567890123456789012345678901234567890

Original picture.

* e* ee w *

ssss ww.si{ 06@ 1.*S~@ i*I66M6e@I3 Z346060( 'U Z 5t)IIIIUL(5I4OO0SW U Z10 1CDO I

4 0* MiNI*@ ii 0@l00

0167 116.* $6 ii B @ * 661* 11600I@@

9~ 0600 1M166M#ei#@ B@ R s86 *!k I3*I0i10C10 0 1ti{ i i 61 1 CEf81M111666M01184I0S6001S666

1~2 - f 56BM 0g 6 10 6B 0-3 0O101i9*ElE6Ol0014 _00010631(46* 116116000*11160069100i

16 1{{0 0v 16{i! 91017 00!* il@X 4l@

19 * E*IEH 8l*80M 6088@ESMI 2E(8601 00O

2C *6*66M66<fMeE 1M 886MIM

22 0O00OOPf HAA MAAAA A889MM 6§6866#@EM 8i1*1023. * {e 88 AM688 M 8eMA686A6881*OIO"24 'AA 8AX 8M MMMs X 48MAX 00Jt@25 6606@OAMMXZMAhZA 86I8PRA M@868 XAP-EMMI1IO

26 16*fI1AdAM88M X2ZX2XAAZAM8MAA6AAMA1MI*627 0*iAA MM(86068M 66fS E Q >M'.4M> A7@28 OOOOO0AAXXXZ1XAA4AAAXXAAAAAPAI'AMYM-E010126 600060A KXXXAMAX 2AA 2X 8XXXAAXX 2182IOOI1C

3CSOOS06XXXA AZXAA8X2XXZ XXXZX XAAKA8O0O00Ci#31 006000XX2ZX )ZA66228821 ZAXX23Z) ZZ10000032 *06601XAA8XZX1 4-33))1XAXAXI8131Z ZX

34 oeoooooooooioooooooefio.oe2esoeoooooo35 "*0000000611 *4*068§*66600W100010000

37 6f f i 4 lcl 0 00al38 *06006000016000*@@e*1e*06116060000000000

1-2

12345678'C123456T8901234567893 1234557890

L(EV)A for nose. (Density at a point is proportional

to probability that nose is present at that loca-

tion.)

1234567$Ol123456780 12345678901234567Q9qO1 )M = +-I -+ + _=2 = 3- Z- + = 1 3 - == - X3 z =1 + + -A- -+ m4 - X 1 - ZZ+Z 1ZiW IA 1 1= - = X +5 Z 1 lIZMSS4lZ= X = -L 3--e = 8X A@* 14 ifiVMM - 1 X)7 1- + 1MMXs1Ul1.IiHX Z * 1 +8 1Z ) 1flSA.MZMZ 0Se + Z == +

-c -Z3AZ =-AUSmaoaHx4MIx44zsws ~~~--+)- -~~JC ZM = ,,W81-MPLZAl4§.MlMX3 2Z 111 z- - f*fEfRI13IX) +796S50) - =X +L12 1) 1 1 At4oGN X K -- 4A )RaiSo - +

13 3 =X-Pf8XI8 A + + + iV@ X X8 +X M14 1 =ZiHA9ZfS 1 1 Z Z61M -A X +

15 A I+ *I,RXX X M+ I= afXM=+ = Z16 3 Z,Z+@Z48 84 66-) xz--17 Z7-iXA61=AL Z 68-ZI 3%6F3M + ) 1218s-+-4+---+}1ISZ@1@= -lX-- -Al 17-- I-1X19 1 +AEIlZ4-IA = XM 4X+ - + - a2c += xx81z+3= Z= Z=+)l XSA= --1 z2I e3 11mx I M 1 + =R+ 122 Z I 1*X3Ll-A 1 * 3X3 1X)3I23 11 +e-) e-i1 - 13M A 1) r24 * 13 WI) e 1A I CZ =4 + -

25 A 1638-Z + ) 3 6 Z1 - -126 = 1 X1ZA6 L= )ZI X++ 1 1- 3

217 1-2+) 3M+X84 I2T3 TT1 A 8 =+28 +- = 1X AB3 Z -ZI + A - I =29 2 14 --ZAM+Zl ++Z 3 1 + -A -X13C = = + AAXA- A ++ -= =-1 - I +

31 I- 1+-361 )= 3 +- +X=1 + =X32 - 1 3 M 10A6X1388XX M4 A =M+ I +=33 J A - +WXEM3 X8JZ1 +A= 1+134 1 §ZA= ZZXAZ + ZA=) +4- 4 14+435 1 A I -+ MAU2 Z ABA 136 + == M6U)) -36AA=1 +1 1+) X37 I+) 1= 1 1xAZ1 =-4A++A)) 138 P1 -= =36M 1=XZ 1 3= 1 =

12345678901234567S93 1234567a9C1234567890

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (8,21)L/EDGE WAS LOCATED AT (17, 11)R/EDGE WAS LOCATED AT (17,25)L/EYE WAS LOCATED AT (17,14)R/EYE WAS LOCATED AT (17,20)NOSE WAS LOCATED AT (21,16)MOUTH WAS LOCATED AT (23,16)

(b)Fig. 4 (continued). (b) Incorrect embedding of nose under random noise.

79

Page 14: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

1234567890123456789012345678901234561 IlltIllI IIZtzIItZZZzIZZZZLZZlZZ2ZLLZLLZZLZZZLZZZAAXLZZZLZZZZLLZL..3 ZZZZZZZZZZZZZZZA§SSMXZLXZZZZXXXXX4 Z ZZZL1ZZL ZZXM36SSSS6AXZXXX1LXXXX5 ZZtZZtZZZZMtSSASSSSSq9XZZZZZZZZZe 1111ZZZ I I1A5I0I?61555365111@2Z11Z2Z7 i1lZZ1itIIeIe8SSSSSUIsSSmLUZZzzzz

.. Z LIZ IZLZZI2"4{{||@@X Z Z Z7LZU_._9 11)11)11) 1SSISSIS6ISUMAXA%VA

10t. *ISS6SSE3GMAX29q4+11 *1AV"691MAXXXXZZK81M12 .-1gUmxxxxxX~x1lzmgS9v=13 IIAlXAAAXXAAAAZ XSSS1J . ___.tSx XMM§eAX8(Xt M A I *.__IS +8SXMSSSZZFMSEMZAfM)16 I AMZlIXAX I XMMAI IZOMX.17 .. ... MMII ZIIIXLL1Iz MIIS -A2111Z)IXZZlZX*A+19 xOzzzzz xxxxxzztze20 -ZA4GAX XXL.AMAXXX XAX.21 SBXXXZAMAXAXXBI+22 XIAXMEMM§OMXMSS

-.-AXXAMMXAA%9X24 AIMAAMMAAMGOSSI25 )W XXAAMSSSL2L _ SSq88etSS,_

27 M.s sseSSSMM-28 *XASSSI0SSMXJM2S AXXAMESeM.AAXGI30 IMXXXAMMMAXXXK6Z31 -XAMXXAAMAAAAMISXt4A- ____ )XXAOMA AAMM AMMBe§eXXXj pa..33 1XXXXAHSMMMM8M6eMeMXZXXAA1I34 SZXAAXXXAIBMAMMMPMAAMMXZXZXXMM

12345678901234567890123456?890123456

Original picture.

123456789012345678901234567890123456I MPMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMM

-2"'MM MMM MMMMMMMMMNMMMMMMMMMMMMMMMMMM3 M4MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM4 PPMMMMM4MMMMMMMMMMMMMMMMMMMMMMMMMMM

-MWMIMMKMMMMMMMMMMMMMMMMMMMMMMP4IPMMM6 MNMMMMMMMMMMMMM4MMMMMMMMMMMMMMMMM7 MMMMMMAX 2 MSISMA SIEMeMZZ XMMMM4MS MMMMMMAMOeMMMMMMMAMISSO@SSOBQMMMM9 MMMMMMMSS$SMA XMIIA ZA8O9MMSSMMMMMM

10 MMMMMMSS8MAAAMeXXAXAAZ1IZAt1tMMMMMMTT1MgMMWAX<XZ± AxiSO*M0IXI0SSSMMMMMM12 MMMMMMAAMMUSAZXBSMXXAMSftBASSMMMMMM13 PMMMMMM*384l*JAAMMGRMMMMMAAZAMMMMMM14 PDMMM"MMAXASS8XXAII 11I2AM4XMMMMMM15 4MMPmMXX) 12LXXMMSMSSSIISSIM'MMM4Mm16 MNMMMMm8SgISMXM9MSMMAMXXM?M'PMMMMM*17 pHM^M9XXXM HAMAxXM@Jj,XLX*MMi4M?1W-16 MMMA4XMAAXXAAXMMXMXMXAMAMMMMM'419 MMMMMAMAMMMX2ZAXZXMX1ZXAAAXMFMMMM2C MMMMMMXZXXZZXXXMEM4MXAMAXXZMMMMMM21 MMMMMMAZXAZXAAMZZAZl2 1 IZMMAMIOMMMM22 MMMMMMXAAXXAAAXX2XXAM0XZIIZ2MMMMMM2124 MMMMMMtZtXAXIILZXIt)IIZAXZXZMMMMMM25 MMMMMZ22ZItIltjAMX2I -IZM2IOmMMMM26 mMMMMM)IJ1ZXMeAt+)XAXMAiZZXXMMMMMM2? MMMMMMAX X111IXAII+leeAZI1ZMMMMMM28 MMMMMMILZIIIIXAA+ 1+111t1 IMMMMM429RMh1TgMRI4MMmmmmmmmmmMMMMMMMMMmm30 MMMMMMMMMMMMMMMMMMMPMMMMMMMMMMMMM31 MMMNMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM32 MMMMMMMMMMMMMMMMMMMMFDMMMMMMMMMMMMM33 MMMMMNMMMMMMMMMMMMMMMMMMMMMMMMMMMM34 MMMMMMMMMMMMMMMMMMMMMNNMMMMMMMMMm

123456789012345678901234567890123456L(EV)A for nose. (Density at a point is proportional

to probability that nose is present at that loca-tion.)

123456769012345678901234567890123456I tX)XAlA0*++AIX)AZAZl *XOA9I XNA12 101 AtM. XaAXelSAA4ZAZ)-MZ -XOAX+l3 *l+lZX+MM1 +*Xf ZlX6MAAXXIAA9saX+4 )ItIAA )AA I I IE0 AA2AlI I*P+

- 5 - )4MJ1 tZAt 4JMUUX Z AGAUM+fl M-K-X-6 IZtXAXZ w8+M"WUUISSASUU4X Z0.@A)7 ++ XASI ZZSMSZS@SMASUIASASO8AMXI IAI -ZI-IcX ZXX@SPu"$LS0A"eAqf *MZlUl9 120 XMl SS*zlS)MAA4IUAtOzvSl x

10 x 1+ *SZ000MIOO@) 1924 + 1

12 1 ) I lU0S-XUMM-+9* Px *-13 +*A BeUsSZ*MMXXMSSSmi -

14 4A 4 SS-UM+91 U9)"0ag z15 * lLluullGKpSMgelzll+ * M16 + ) *+ )+ -SAMZ8 ++A9SIA *M-17 T FITA JlZ -rMMsx*40Z)@M1iu -13 + X I X)IAMIIM* S 1256X) 119 2 -++ +MX)+MASIXS+AXMM I I +20 -a 5 4 9XAXZAIXXI AXI I1K --21 a ) I E4GfXOM 9S18IRZMSA+ a I22 + Z2Z+ I XOA)@-)XOSRAU OM I2T a AllIXX A-a MO Al A124 +4 IAXMM§XMU+ZS it25 2Z + - X 1INZ6eAA6SeS1) +A *A- 6 -xIt-i MSM300@|SSS X 1

2? ' +X tl@|lelvlen I +28 + Z+ 1XXAMSSUuSLAM- It + -+

-2Xqxn- }-- T-FVZM-- I,11Z-69-- Y --3C .1+ 4e1XZiZA *IXZGXI+31 -+ II 1MIOMSU)X AlZLSst A 132 . A== A)llXxagAI2zX@AZL-1)1 A33 1 1 +M+AMelZ4XMS1+14G+XXMA-Z34 M+ +*XfMA4XM8ASXXSl*lXZ+t I ASS

1234567890 12345678901234567890123456

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (7, 23)b/EDGE WAS LOCATED AT (17,13)R/EDGE WAS LOCATED AT (17, 26)L/EYE WAS LOCATED AT (14,17)R/EYE WAS LOCATED AT (14,23)NOSE WAS LOCATED AT (20, 20)MOUTH WAS LOCATED AT (22, 20)

(c)

Fig. 4 (continued). (c) Successful embedding under random noise.

80

Page 15: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

1.

15

16~

213 C- 4n-F tvA A VJ%?08 AXYU3ZS XX X1 2 AXJIAWI5 A

23 I I IX Z-11+ - LASI+ ZXXXXAA*AX

22 8~4ALm-AX AAAVXI IIXA1 1)112XAXI It I X4A4ZL"A&

2,3 AAeAXXXX "XXAAAlXXZZ1 -43261.3ZXX'C1&11i251 X{4'AAAVX IX. 31R(V-X*)AA43319111121Z2

28AAXXX6AAXVt.XX2I3'14A+A**AXXXXXX4

32 XXXZ2ZXim+8II"4A1 3124*.AE4 AXXXXA

34 XXXXASS4 335 XXAB8*Ut )4?MA.AMAXXXXAS)ASUE-OX4?$M~4

12345618901234567890123456789C123456789

Original picture.

1234567h90 l234567e401234567PQ'1I2345t7R90I 1 XXIIXflX72ZlKZZZ111+++1333XA"AZZXXlII2 III XAAX3KXIZXL,17VAXZZIIIZX;4AAX)XXXXM1113 11 SAM. .AAAAXXMXA?A1AXXZZZA'44imAXAXAlll4 111 4.i.1AAAMXXXX 6XLXXXXXX7 XA6AI-AAXXIII

II !(3t4A14AXZXA6XXAXAXXA%XXAAXAAAMMM6m 11 1l1pomXAXZ%1ZY4Z1f( Ix0 X3AKZ 4zX!A*AIOJ1

7 111 XAe wXXXAAXAAXXAXXAXX?ZZZAAMl482Z1 18 1 IA M4A MAXZ XA!A#vAA KP0-AX1IZXlK4AA6Z111I

19) 11IIXXXXAMAP43UN-OIMHROOAXXAA1IXXX1III

12 111AMA XZA At fSU3anWSeno*#yjf1113 1 I" 1fffffRSLaG901i 4cmfMMA4M, Ak4 I 114 111w.t. A ft.*" 3 AAXXA I NA M U+f3.4MA I I

16 I11AXA4XAXZ2Z21131*3331IhA4qAAZZ2ZZII17 11 1IXAXXKA X . I+3 I))I + -ZA4P I 0AXXAAII1I16 II1XX)KA1AX2113,?)) 34 =1ZAWfR'4XAAAXA111IC 111AM-M'*3Wi6X11*I ))lZZlXMA'MAXAAM'4,11120 1 gdlYU84 4aAyMAFM P IA21 11164 A3.~~~A43 8A4M8~~~O@~AX6A

X Xvl1122 111XlAA6XXY''-AAX!lXMOIMAAIXAMAAft41123 1 1ZZIZAZXXZXX IL II izIxI3ZzX1ZAAXA$OMAvI I24 111X2ZX211Z1+14*1 3IlIl3lXXZX'A4AXX4II125 1 1AAVuAM A 1I)3I ==+++.4111IZ31Z XI 1 1 126 1 1l'4vi*am4YtXi II IZ I I... I)ZII Z X IZ I I27 1110SA"A4XlP)7 A AXX6Z I 3+313)1117 IZ X 11t23 1ll2X1Z213 )lAXLs %Z73+)4331112q 1112 1'? 3,)zZ,,XQ,4iiX.7Z 7Zl3I34443.13=1113r m1i13+1i7,4j)jZIzizKXXXX1l3+3+ +111l31 1lIZ3311X63vA461))11X6&P14171134= + il132 1171133 1114-) IXAAAAAZ+41ZIXXZZZxlxAZIZZZI111134 111 3)IZZXZXKZZZX3)1131)ZIIZZZZIXZXAIII35 11131171.433Ij11I111 3*4)ZXAAAAAIIII

I i3457 C~Q 1 2345il 7;'~1 I234''- -i~r 343= 7 i1

1ljj IZX1 -1 V !it Y t- # f :#. .3

2 &PE- zYVERYAE1, r- and1l

6

13~

14~

160 8§K--3Z4 0 " SI 1 as1 SX=X +

08 AN8x XE + i3 Ar11)FjV g+X-i8 fSl

4C ** X31

i1 Z XAZAAA3SA-"llFA = Z1Xa1= Ai:t- xz inA fXX M Af.Y-1A3Xl31 V*=l3AX1 XX XlA A'+

21 A7 (A -1 4= )- 4A

24 ) z PM 2=+ 1,.+ 7

25 3Il14E*f4X VX'+2l3 =+g6'31 +A XXK

26 IBP IB A=f IA Z X = L 11IXX1 X1I1+

27

28 1 X 1AII?3Z= MAUVAV*X 1 ) A Z2)1 33I A XX-'

2~~M I Z X3-ilUSyUSZ IXk4AA,6 14 X AX+

31 )Z9PXZZA ML*&S kMIAt,! A+,-'S8K+,Xi=X I1O) )+32 Afg AAPRII8AU )I).S i 8 A YX Cf~~N'-43?3 E2X-St. VAII 1A0~34 1ZIAI

35 f 118218e Li- x 1 Klj1V+-,X y38lUXwy-* 83O

1234567A'~012345A7SQ0123456789012345678Q

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (11, 21)

L/EDGE WAS LOCATED AT (25, 11)

R/EDGE WAS LOCATED AT (25, 24)

L/EYE WAS LOCATED AT (21, 15)

R/EYE WAS LOCATED AT (21, 21)

NOSE WAS LOCATED AT (26, 18)

MOUTH WAS LOCATED AT (29, 17)

1234567593 123456789012345678901~234567P?0

L(EV)A for hair. (Density at a point is proportionalto probability that hair is present at that loca-

tion.)

(d)

Fig. 4 (conitinued). (d) Successful embedding under random noise.

81

Page 16: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

3 tttt't.'*a.r6o @*'++# 4ez,i4 iJJ 4ij*-IiXtetV.fCC'kC- 4--1 £-=J'4I 1 ,,4140 +s

S i 6tet:*++oF 6:f'pt- -.0 ±.4Te4'4e1 }V

7~~~~~~~~~~~~~~~~~~~@**a;+'F4#---HIIIFi ;lp8 4*F-v:t;+ 0

f i t -It 1 V

t**+ffff~~~~~ Of~-;"_rE Jei.a.' 44&EjX;'qi -tit

r *^^ A1U1 0 t _ , .,3 A i!

1~ xX-- +=

124 f f- Fxl L-_ ,11

7.7 1 a x*

31 II )t 1 1i n-V VV4- X 'Y A y

| q {- * ~ W 4 S 4 ~ g il X x t. + I Z 1 =V Z 2X.* :X

23 P;'* g7 =+ + I~E Z I Xy, "A. .

c>2'': t^^ ': ;+@6y )=+ I y; -vXw} t Y e2 %4'AX )X= 1Y i I I -I fi)2 4 X+25 >9 y X> fil. Z +7k2 f .XX X X X X X Z 1+ ) 7 1 )--+ + X)Lx 111.7 X -TITITI I )II) I + +)7 A X 6XXYxx X XIYI Z Z ) + - li L + + II

29 AXY YYXyXXXZ IlZ, l. 1Z i lblZiIII 1+ )III-1-`-- XY7-77 7ZI Il I I Z711 1 ) ) )I}) + ~ ++ i+

31 YX7?Z1ii l11)=)11111 +++++l)- + iII)I)32 YZZ111111+- I11 1++++) 1 + II1 i i

+!tt7i11* +lF-4+ l l(134 IZI I X yI35 Z = =I 4;'isK

IZ 4 67 ^t. 1 2 14^ 7¢C- 2 5". 7 C;t1 2 4^

Original picttire.

21 7., tI 4'Sf 7 C Ia45678P 1 2 4 5 6 t3'4

i~'t, -K-.0';X 'K A- A*. o4-27 P_O"YK v;''"" ':FA ts/Q .; 3.Ati YMiA4 AP1e k4 X4P=i Ft isa

io9w O''r"f),:i,07? t-*= 4*~ .0-1i\*s&P r8F AiP '4fASE7"Vf ' ' *?4 te ii '-i A5.A 4 X 'AC

A Ye tIte fZ ..Z A,Kt I0f

IC "-",Ac,, *i6,@ tvI1 -E,,^f441~~~~~~~~~N k18*1* 4 Yt- Y I Zi*;X.7o . I A td V-1f.-\ 4^

V1 P tff9wF' -@'F f SIiAtv'A + X aif ;e P ., P-~ :

1 5 4(-X i4r- V43 if 5`,eliB A.J-" AA#9 i R 3AAW* -

I36 HPAAX SfOXX X 'd A z+Xtoa* Xi17 F#Y4@ f A'YAfR4*Tv XVIUX8vkeiSWAI A A G-9-?

1 8S FfA Xf Eft-i Z!'I§t R$8A{2f i0 Z=C ZA bFfU341I,R 11-Yt; f-3C9! A#k!wl X= AU '-' Z A-928-

2A F P-;4csJ*Asief E+t*6 C.aF*P 0YA- AA@

8xsA-

221 i"f:*^;E0F{1'~ afW W .i4& if '#'*A6,1- XXj

24 E-fI-f4'1C1 si:t8=*f it jjX4V-FfiVA I ml*f* a-Ait2 c fZ f;* A A R3iXA k V fFr-CXA A ZIf N X I'"Al K

26 f fTFr f-,4 !PA, i#44 P-P vt;V,.

2 F >:wA'WP- litJP9 tX% f t ve At-7 i P+i4Xi4 ' \H_~ ~~~~

l

3C @l*A2-fAsi*Z76-00ei1JO'tA+WkKAAA43

3n3 6w.i?{';.Z..8' X (AlzktfVAXRjj Aui;;34 E99WINPITAX196 I46'wN-e YPe{ 44Ae6AMXAIP) AR&35 F ioi1Pi1A3i iAAP'llox4il!V.^si^MAz<

* i 7 7 S4 r,s 4. -.'i1 4 3 4 rF ,7 ° 1 1 2 3 4,r 73 Q;;1 zAS'4w% 'ia8 Fot+Lt e1 v 1i1819 1 a- :9 i 6) i*2 A0=,ER08t ;,, 11":*k i+ 4d*aF- ItW RX A\BC 9#' 6eW

r *is,_,=s ; z_*6i t~~~~-.--4 A)0-fti -e*

aas*>^W xsaI ai 6C IF- C8 4Hw gi- i4'1 04+i=1 6,i0 0

~~~~~~~~~S : XZ ItiB IfE'A e',9A E4 -XAG8 v1)( ) #M&7 f'X ZE ^ 6A ~ " e0 umeXs*0Zi X t $h*WFI #s

_ 2 .1 .A .P. .4 a ..L.

c ;zA oA?.."" E>1 '"tKtZ3I' AtL'A 3*i"

1 X Ea 2E i' f'f'4 A - ) -A BiKZ}XV 6Z1L 'I

1 1 iELtIs21S~s _ 19 - 1 *A H41lX SVi# A' Z

1.3 f )6Yf* f J Z I A 1 *V6 ) tHAW) X14 VrC-t"fv")AZZ)1 )+ X XL +-+RA-17YAI1 5 +X fU 'V AV+ P ) X Z75b f@§frYK3ZAWAB916 *Af1 XX.* .- 114.Z==+ = A 5b1l*^Z ti

17 U 4#G49+*81X8ZA1S"+ IANS VA'81xx -4 I if I . X Z............... t A A ...A'-'.. .Z. '~I

19 A+"*-U 111114A8 ZlZ+ lZb= 'WAL4SX42' lZ?2 '9 >+'4-'l C ,7R 5 (;A (O Z +XIZi=*Glfl A *:*Z,2r *' t ) t )~ 1\ _ ) 1ZI

2 1 115 AAX XL- IC 1= i= 1 + ) ++ 1. )

22 S t4Z11AA1,+f ==;g A=++ z=61 =z23 X-FtAXSSlISx.4-1 X,+ ) i1 1 --* +A( !SZl6139'M24 IEM1 X Std XX' I9) XX(.eZ)S+H:VXEII 4X#*? +Ai25 IZ1Z e" AI1-itIP+ A)ZI I.I AZX)++3'-4)X2?6 P'14A)A APX 8AXt *\ Z) (*4+14 +_A1M)27 1i )1*=ve/+ )=f )A+4*) ?.)A Z+1 =0Z A28 X 1 )IgpWO+'#iAfX- Y91 XRI+ XI-A+ IZM29 %L)P'1I+r1XZZlZ1A)1 A+l+= + +a1 1

3C 6 Z-Z)1K 4+XtL ml CZ eZX= + I 1X=31 1 W f;Y1)= i+ + )ti Z ====3X3 2 Z+L'. Z== J 1 1XZ'l +Z+= + 7ZZ=)33 MXA + I 1'f1'A XX l4Z1A1Xm'M

34 A) +4A = 1 IX + M 4Z X=1-61A35 1 + - I _ A )M I m1@@1

12345o78Q')1234567031234567itOl23456789(

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (8, 20)L/EDGE WAS LOCATED AT (20, 11)R/EDGE WAS LOCATED AT (20, 27)L/EYE WAS LOCATED AT (18,15)R/EYE WAS LOCATED AT (18, 23)NOSE WAS LOCATED AT (23,18)MOUTH WAS LOCATED AT (25,18)

123456759C1234557P8n12345675C012345A 75)^.

L(EV)A for L/EDGE. (Density at a point is propor-tional -to probability that L/EDGE is present atthat location.)

(e)

Fig. 4 (continued). (e) Successful embedding under random noise.

82

Page 17: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

12 i4-rr 7 3C' 1? i4'r,,;7>lt4.1234546751'12 1456 710i'6hfVVYi;flE4.44f * *.eeswtienortngVI*

_2 _*IEVWWYfl6Me9*f6Owe*tfIt(t(itj.#,1eiguv.1i H9bUi?iwt* ent*ieifiri?ts e*e.4wx3e4 3flM ti etinX ZIttt ZvtrtftF9 X4t.t et44w5 '8 '1eX)k14t#4'fI 94v

17 IVE)FIRSSSI,SSRSm17.fefscXXIf;4*ueXes

_T )(teu0g8ES sSg s0 w,Ao6e ;

- IWtSISSSIhhhAHS1SESIR P3@JiX4i- --"

IL *&ff*MiHS"'RHuh0AX X =/*UInsmvXAtXX4A'sA13 *fIfEm 1 14) I=,+7vsErXs I-X1Ax An14 t**4:*4"lAA V W XZ Z4 7X * +) I z "ELjv5gjg 44 4.-3

I2f- 9" 'AA X L XA A YI 4) le I Ii X ILXAXI Ai A XN< S

16 *i"+IShIelGt:wX.X l)t)+ ++X..1xAIh Zhix7AAHXe;q

l 7 F'-M K Afo Ai= r . L I1 Xk61 +I 1Z7lllll X 7 Z Z7e

1EfI'1A5X3IRs813'IX)AZNSSS#t.t #S11mHm'YIX YI1i I15 **4XASISSffiIx V*(4'XVSSXKKKX IK IAM1PI-S14AZfEfE&Sfl.Y XXXfit'SSsZ?6X&-XX)4

3 2 tffAAXiv. :X il ) Z9EN IF X XZ AAA4

4: !4 Hai*>ti e1 4 VI i i:" A '1, et.L Ia=1 Y_tM/ZXX24 X X4X FMSOW4A X_f;x. "'A.x 4E1 f 131 0Jt XXX-AX\ Z

12?4Fl'7i.<t'^* 12X45c"7eg(:12345)7 >1 1234'>670

25

4

32X eXXKXXk'AESfl9>+'XAtK17;weez3)1111

?4xA X-X80f4XX.:.i .VB8X* XA34 A~~~~X/X~~~~*HhEgr ~ ~ WA

35 XX(X tf!I9SSUUVit'PAKKKxYKLAW*fESW9*3VSRS.X

12345eJ739212345o7tc 123450nyk'c 123455rv:

Original picture.

A A ,AKWA 7 *'77411'7I t'vx/4At;?2 AX)ALA"Vr4?K t'A i X4"%3 AZX;0AKAi@ZZ* , ?\\wK*--AiA9l)'. Xi<C

13 A) I;'," IK Z 7 ZAt Z t- -FM'A 4i 4 "A Al4}'

4 A 1A A'w''7)LZIX;. L *xlt1+ '-Kt"^

71 1 A r-'= X8E"- Lt,tI ZsX.Y A' X1 7 +;7 Z i 1.;jt1

17,AFrs-X2 A WA.t.Xi.rZi<~' 8sZ1N"y7~~~~~~~~~~~~~~~~~~~~~~"

8 AX)AFiAC?A^@'tA5^!'X%AA't &.5//l"'t^Zl'':#-4Ii AX IAZ*#L A-Z AL-AIi -*I V aI 1 Z X I* '/I

12 As,31 aJIYal71 YAA)b lSI.l77+IF-t8 ZII13 AAlAbz*#Ylx1 V~7ZAu?Z)fRS% 84A) a'"+!,'.KL4124 ,AE IV )I )Z .K L tL.*6A X ,+ AZ Xt 1A(M A15 A 1,Pt* I v; X 7 7-\73 7Y Z",:.) Iit.. AXZ"ZS<701/% V'tL7f?^JtHX7??r1Et?92'

-*I AX?-A-SV14 K1-X't?? 5-;X-i,4flL7!ZZ--A#+ -18 AztXgR,z 4lA X1A)ASa.K/2vARt1+KX+I*9H87419 x ' 4 Z Z

3lrrrr>r79 'r-17 ' i77TB n7 IJ K 5

21 A 1 'e/ 7Txt.X . / ?, Si-

f it. ( ZVT'22A4 P1.t'L 'ZK'tfE/I tcAI A r s A

2 412 --. 7` 1Y 2fZ; 3 4 9 b 7_ 2 4t kI4,45 7 _t ZI/r

27 AsAa 'Wt'? )f*AILY )1±4S*JAz1i,rie...2 EVxA 4Xo* / (XDens_Y tV tYa fpoXint is p

tionaw - lto prbailitythat R/EDGE'r;@is preen'at

30 AZ)locato.t?7)1?#"%t)Y?i rLf,,,31 AXAzz17v~*Ec=^1Ry9LeY'.s:?vZ/A+1I'AW8Xt1

1 23s45FLS C- |2 34" 7 l!|2 ,4~Ca3,7R,512734".t -

L(EV)A for R/EDGE. (Density at a point is propor-tional to probability that R/EDGE is present at-that location.)

83

j23j345675AcOj ?34¶,> 70 -)! 123456T7c0)I 23456791 .USGeSS*EQS6S ASSA@RR&SRSW&SSi%B AASWISA S2 *SE3SM8ECE6£G+SHAS.-a'-flMsS5KYEMi'AS4'a *V'8*WSSS9weertt&ltSS1'elSsSf+1SS'614*1xsc&feee:uh#.eeeesaaf1x hi1Axle-x+">i

'. erxu4#SSSetSwtflue**Sv-R ~tH 47e*;s ___9 tl8"'XtLlHSSStBS 'efllflfiS1dSf)ilSS4 el-+Sriz0

10 S*S6faSfl6kS+.B6&'1eiSASISvle1Sa*ntEtflL,lC SN(MhIHSSteHUSCS4GSWS@-1+=f'.Z±i%S12 4l"St41h5OS~LS: 11 7?XKHfft1l SiAI-C*1 3 USeXweISS*X#sY))XiI1-ZLVAWSESSNSxKzX8N0|SAV14 RISt*f11&1kkvlMZ 1 IY*I+'fSw51+Z1HtU1'15 XS'iE808SI\L1XAZ-+ AtS XSVRASX*N')16 *ISIAssixm 9Si e=)1271*?HSSX+ )1'tHU Z17 U,tR11*SZ: 1=- t+x vLsssrsIrazn,w7ev18 8leM ISE0SS? ZA.+ tEA"C4ZASSR#( 1=+t--&Y X19 4fSAttiS=SPHAahX~AXM?19#1# IE{AII*EF*F4I9) = f.S ^PYIS ^ a 1§11 X*'A,:A= EJ?C AJHN'vX l1I eNSRX' 6t -!'Z-IA4d 16 P 15<AX t21 iSXMKII'.S-X?*EA1F7-1'+BSKCI-z9-S22 Ma4S =XYitt)ISfe A ltA+) = )ltEiX-+Pt++23 tMhMSi ZXA%ZJZ0t + +X is EC#t ze4vjI A.243?1A1SA.1-1Z&-=.f+M)XA A x+il+Iii4102' 9tZ'SZSANs AKXf/Z+ -1XYi&1* i 1MAt72o XGIA-)|/*:11Sri' AF7vv+f27 1AIA %4vcK i\t*y* I I/ ZS L;-4 = ]sj IXX28 ZI = XXMK"1sAN5;KOX-7 ) "WS 1Z E+?LLX ++29 4AKXlZxY*vZE, ,+ 'V.-2%'VXAXA = ZA )+911301QiS M IkAiS1)k1 A.#(S+ieFv= G-1 lbZ'S31 8AItSl') ACWS%13X,AI "5B+-i= A=fl XAL32 teM,Z -lt-' SAleIAs ISSKtR 'S ASY88l l + AZ IA33 +g8:,* .sos 40 xs'. a^w I' A AN s;*oZ 'A X34 xY 1SASRSaAS*%/fx I A nI i f4aStZeSsAi.135 I Z-N11'SXA88 AA * z*XX S lIeFSSUZS9IK04X

1234 5675 93 12 3456 8QA1 2 3456a789$12 3456 78l

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (11, 15)L/EDGE WAS LOCATED AT (19,8)R/EDGE WAS LOCATED AT (19, 24)L/EYE WAS LOCATED AT (19,12)R/EYE WAS LOCATED AT (19, 20)NOSE WAS LOCATED AT (25,15)MOUTH WAS LOCATED AT (28,15)

(f)Fig. 4 (continued). (f) Successful embedding under random noise.

Page 18: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

1234507b'd,1345678se 123456759012 345b73tf

2345 +11+

--++= )A649e1 -7 +A9**5q*i,"Y*Z =

I010 Ii wSf)ii oeZi** -a

17I1 )A6180OBO8{s3UHR6i8S666.+

1 4 ) [email protected]+ __15 6Q16 ZI8,2 Z Z I1 ZXL66s86s417 -VKR4i8 4A+-- +Z ' IARSA

T1GEx -iAi t1+ - -)U568B0U410Q +i 69 1'X7)1- 8 I71 em2C +QC66*A"MX)I Z )l AB866O_e_

22 =166'iZK"48s1 *X)[email protected] 'v,6R'*-IZZLZX2Z22-1)6PSA

25 zSSAl + 1- Z 88#+25 1iSt I 1z= l __"+2 =- - )W+2R AOXI+-1).- += ++29 -AX)+ XA )=--)l3~~~~ ~~~~1YT2T- _ -

31 +X+)}+ -32 Y5) ++

34 Ad>X AZ+ I=35 -YtiZ ZZ14= 1)+

37 =+11)1*=)= = + ) Z Z xZ 1 1)38 ==1)+ - -+4+) IZXZZZZ1)-

1234567 S01234567 c012345678qOl234567 1)n

Original picture.

133457 ri,7 '1-, 71 11.7ZZ7AX 7I. 1..Z 11 1 I I1 .)++ 1)IIi I2 1177 A )1)))) + 1 j)))4 11) +

4 l)flh/1eXS77 1)1)1_1- += -+711l7^Y)1lt'7 Ii51YZ>7frlT72 Y7)7^;. r-- .-+ 4+# 11_Tf'Ft 7.UTT1T) 1-6 1 51 A?)4+,+-+*t YXJ'-+*- 171111J1Al7 1)JZ?iAxl+!7 =). 1JH *^.?1IX4, !F 1 1 }>+9s4j g s++)-W)AZZLT

I Xz JZ) I?I? ) I B ti ^' t G 4 !,10 XZ '£ J IIZ x 4' .-'Ai- F--*''s )1

l' I -lz =n,a1-''-*;4Ll : -'t- .; n-}-.TT 1I Zt^ / *X.Z hi: i i i-i'fi if- Xt,Ytl'; 24 ,

1 I Z I ) 4 1 t4- n ?V .

J 4 4l1k.4 J.te'.:2)F -IIbt,1 ) + KAL^ ? *lE! f it I *I ) +) ,1w

16 1 )'I I $8f +*1 7 )*.ut;HwI +' I1: 1

311 VP.s7 Ix /1==74 1 T? 177''-' ¼ H331o.)1111- **. j > 11 ++- )l ''"7--s} +

34 171 7 7 --

3751TT-i- -TV.'k7.i'4x 7i'.^.'>71 .1'

?? I ) ) z I t:q= !', "zha %'Z,R."1 IF ".]= ,,I7 11T) !1.1s8-IR JF= F +_ ii - r."<e i )v -I\-4

7 4 } XY * I II IsE N:f 4 , ^* ) +I<ft AF!^'75 1 17 11 7) ) '-"! IL +1 1 'tO jt1; 7 i

37 f1 1)/ 11 ) ) Ii -''^'I I. I C, + I-_1:'2p 1 1 1 1 I 1Z )F<Z.;,t\, + t-)1t)=>t}

3 11 I I KV+ I X I1 - 1- 1'1 Y- I I I I*'1I I I I i *ZI.)34 1 1tt XI t 1 I Y-= r-= ~ I+I'11J)+?'}}' '

3 ' l I l I1 17 1T T1 FT1 1 I \ ,\ji15

'A 6 1 1 )7 IXi ) 1 7 tI > ^ X,

3 67 1+-1 7x. 1 1

I :Ii"1+>~-

3 XX1+ I J I I )y j Z1)>- iZ v) I I7

f 7"> y fTi.

12345676og&034'LI -'34 12345678?0123456 '??O1 ()-X, )2= A=-= A- 1 + = 1

2 +=l I - + _=_= _ =3 I +-+= 2 X=1 = 1= XZ4 = : A 1 1-' I ++1 += Z +Z45 Z7 -- = I 1=A+ 1 ) X-iA= )A I

7Z +.x = =+=-Z I AN + 7 1.+ A7 + I I) 1 4 Wjatt46S66X Z -X1)-=1 A8 t 7Z +Z+XYYw88i.z*v vw*69"1= I0 +1 -1 + =-zi8I-5Z6hStai0Z = =

10 Z 1. .+7Xw06+UA6tU':6,64Ul 6*8ii^E -=1 1 * 1 - x*H6OYSZ0H66'6s36RwwI1V%= 112 -XZ1+1 1+ tRsUu6eeezdv+ 113 + 4 D AlX elE6VHM.m860S# I- +14 X) - o'A5a I*68EX?'5I4O6'a*.Zpi,Z +15 4 1 A.iesEeBm+vs0 mg".Zs*1-16 + 11)WE3sssaki*A\x + +Sl*3SSSli =Z17 Z1) - fY*10"'A1 11 )lM M70S,.IAA )1s 1= f£I66Rt'f - lx'11 16l9=19 )t 66RUAi,'X+ + - *ilggitifrl ) X=2C 4 1 +.f-4*lBI1 - 2 =Z X0-5-E 8Z+ All21 +1 ti=oSi-* = 6) 1541A687V*iX+=Z I22 1 *IlleiAS"m>.j ff= X.e9;*+z 123 + - - *10+.7 49"A9Si+) Zfl *SUI!=L-24 -x *=8s0*' + 1 M 3 -X= IhaS5 L X25 L=L86"= 1) +i= I#4661) 1=26 * 8* 1 E<1==- 1 =+A1 +6 ,E A I Z27 -1 x L + )B'2 )IiA ) '=4)4- (+ =28 = + 1- x§A+ IAIt Z '1'- 11 l29 +X)+-= Z7- YX1 =f')l+- ZR -= -

31 -7Z=) '-4 ) ))4*} = Z 1 7'4 +=32 +- = + 11=x 7 X Z - =+?3 + - A 2 Ar*l=t.-) I =34 - 5 'UAA X7 -.+ X- 1 +A++)35_A 1 -t46X Z2X + 7 ZAZ)()- = 13t ++ 2 -+-+ R 15 + =+ ZtLV +* 15x37 Z7 12 =+ llf=A X x >'31 111 I A Z1 Z Z -> )Xi=) X-M 1)

12345,795C12345t7h0C1?2345O7Q9c1234567890

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (13,23)L/EDGE WAS LOCATED AT (25,13)R/EDGE WAS LOCATED AT (25,28)L/EYE WAS LOCATED AT (22,16)R/EYE WAS LOCATED AT (22,23)NOSE WAS LOCATED AT (27,20)MOUTH WAS LOCATED AT (29,19)

L(EV)A for eye. (Density at a point is proportional toprobability that eye is present at that location.)

(g)Fig. 4 (continued). (g) Successful embedding under random noise.

84

Page 19: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

l2347CPiq-1234,A7OI, 1234-57Qc'A,' Oi' 7 QP .On1 = +~+=_= , ==_+44= 44+4+ *44 4

2 z+=+++4 -+e+++,*++*++++#.....+...+...)P)2 ++++++ + +++,++ -++++-+++.-)*. . . . . . . . .

4 .+.+....4.) P1111. 1 . P P P P P . . . . . .

5 4+4*4pP) IP)) .S'V hA~5X P P P)P) P)IPX) P114

7 P P P Ati '4i,*@ 1R1 P P P P P P111 1A P P)))Z}1 /* *4*+6*~r') t))))P) IP1P1P9 PPP I l 1 ee*eti fi 7 P ) 1 P )1 Pil

IC PPP PPAt*ii*ti-iXt PXPIiliLl11I Iz Aiv)IIW+i9=@4++z*> I1!1 1 1 1_12 Z T)l*libi)+-+ZrfP21 1) 1 111 11113 =+-++lif*9XL-AZ17AAZ+14CI11 ).)1MIII14 2*e*)AePxiPX [email protected]===15 -+.M4LXN:-IAA2 Z%'AP) +9 .16 +X9I +1Z1P} =ZXZ 1 +>'*L---.---17 -XF1-44+4+ -++=--KA_+1 8 z VI PI1IMA XZ4-44XX4I Iq 19XXX X AAtAKAL 1 +WZ-20 +4MX.'NA5 AZMMXPP __ A_X t

222* .I r* _,4 x2 3 = v*aElt*X-2325 43FP0VZ26 +i: i3

2';28 23SB8Hfl{}XI.t2S ~~Auea5EBae*"__

'31 +- £PPSPIUPPSBP32 4 Y#8B86R8I'

34 -yMA A-.iAS35 + 1 1 + +vxz

12345678901234567?90 12'345676C123456 J?-;3

Original picture.

1?34-67; ';.'123~~'123447XZ 5018-A1245X 73A )I AALAXAX1'.A; X] .K'W-MVM 5t. 1A ZY A -1.!,

3 AAA.^itlS".?.X'"4A;1AAA^l_ ZY7 7A&Ai4 A.A %=X77.' t eA1eAIX A^ )'AX L YLYZ A AtA415 A A f, _4A rIL IV X UMI it i K I I X I I Z X '-O A -M3YX P!.A A,6 A AA A X 7++LXZ 1 7 7 2^) -XAZlt2 JA AXI A A A7 A5' ,'AY IZ ^. I '.:XFll8iFiWV* W _0 Z I I I Y P. I2 X A AA A;P ,^,.AtAZ Y 11 \VOGX,s yi 7;-@s.. 1+ I Z'f1F 32 AAI- ^5 A-AAXIX'Af-* 1l1XV'AwvA4:'1Y~X AX I AiA

I A 4) I I 42+eL.L 77 ZAP- IS Zl IIIX11AAAA

1!A4 -% Y. 8e-Yf )ir =2 Z x Y ?' ++ a, A --, Al13 At.t '! I =+ X -XX I A -L ,7Z ,t 'At I I X) X r-X ) -.tAA

I S A A A A t ' 1.X I X1 t.?!VaP3AX) X2sZ OAA7 tte1+A X7 I + It+t I)+1 1 Y A AV Yft A A

Ic2 .114A .1 Z X Z A. '-t:i- 1J 1 'V' I V X )_ A X V,'O Xt.A .% Pt?^ Att 14Szx1sZ Y I y J* '6I V16i:*" Z I ? A ,YXA f A...................EA21 4-A I I ! I I 7 Z Y*r:'.,A A ) 1ta I Z 4A XV^A" A4.22 t.:. xz7 Z 7 I A [IfA E, PfAAec -IthA . x7 .1A*^

7 I . .7 .k tI K7- 7-.......%-' -1 A .A N

2~~~~~~~~~~~~~~~~~ Y ,X A y> v:4:-Z' I7YYjz A A -I7; A*AA A I Y Zqt 'AJ-.^zYx X I.t>AaA

28$ A\A,;KKI1t' i;4'¢-e.8,>8 t X 7 I ZX Y X~ *:X P >A27: izA -I". A Z X,L X ''4ik- ~,;4 zj 7 X X A -A A e ,PA* XtX) i( I A^.7

.4AAt,^ .-'2 I

)

I1f 1 1Y.i~* XI ZXh A MA L A AX AA

't

A

^

at ' 'K .'I - A. 4sw;4,? I ~+IV X X I t)_' X A A!A .E.

3Z 4'I:5\)7,.'"e Y 7

XA

> a^^

3, X4 K*^.,, #K^* *vl ^

tt** 5\

4 ~ St ^E -Y Y .: s X X . 8 1 )+x x z 1. X-Zs7^i*5

itz'.7: i4 %; 74 1,c 144 7 ^92 54 t ,7 a9

L(EV)A for mouth. (Density at a. point is proportionalto probability that.mouth is presetit at that loca-tion.)

1 s4-. 7 -I , 1?'4 a 7 A Q-'v 2 34 7 u1 2 3 4 5j J7 j1 5t P _-K 4VZ4 -+.P = ) - 1+ Ii 1)2 +1XZ)+Z P=1 = Z I + -PZ A= * *t'1

A , + K I PP ' +A I PI.A4 P ) =v 1 *Z 1= = P +5 X 2z ) *t- 4-fE1X" 2 X 1=ZI +A 1 146 Z=-l.l- l P16R, A 17-0-- -ZAXX +1- PI7 Z-=-) A0XOU S*W-@wWSE@'P'U* P AlX 4-X 9A X -x P'4-EewefflemelzP-ZS) OP S9 1 Z A Ij 4U=9S=ROSaj19 ) Z13 - 1 PfHUIti.AAveB=.v x i=A * A11 P -eA-I:o1g6XQ1SIB ) X + +I*5P* 1 4 81 P0=..12 i1.AEi FtBXlsz= +-+. V'4110P) 11v) Z13 + X+Af XA9B4= OP) SOAIPI Li-E 'I14 1 P. -*'7 hh'. * 5- YX(1C 44 PtIA#+*e- LZLvAI-*vS A q16 All)XINA X)# .'1?U *VS-X Z =)I P17 P + z=1 )P==+-1+ 7 z = A 4 _Pd - = XZ-+ X )P44) I + A= A + K19 X + )XI = A7 Z - PS =+20 I PS PZ9I1 XI I9 I_______21 i6e+ t* '8P 641A1t -. 1-22 1P -tfigs'^.44 1 + 4+23 * - AIfU-'-e-i-o * PZ 1 p))24 + - P4 P'0SH'I-Pv'xv*lO Z =?5 14P-4OPK=E,I9I1.S 'M.1 ) -

-.?F z -11 4IQtIRE4ij.*ySitliH 1 PA =2=L .._.._*7 -. /IgFt8i'tlSl+ei+j X)=XK i- I

Pj| P /Jei8R5fleYsaI-p x =2q 1 -t 1I .sRE*8pEi=.-pp-) - -3C -- +*. e£H-Eef X * z A31 = I P 'O6t6R =x1= 1 * Z

3 1 1 -X 4s; +4-i-34 P= AP.¶.AP&'A x P*P ==Z A35 2 ] *4)-z 7 =X-+ = 2

I Z k4 56 7- c l 2 5 70 -;) I 214'67 4OI2 34557j%(

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (8, 19)L/EDGE WAS LOCATED AT (16,9)R/EDGE WAS LOCATED AT (16, 23)L/EYE WAS LOCATED AT (14,12)R/EYE WAS LOCATED AT (14,18)NOSE WAS LOCATED AT (18, 16)MOUTH WAS LOCATED AT (21,16)

(h)Fig. 4 (continued). (h) Successful embedding under random noise.

85

Page 20: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

1234567,V1' 1?34';67$i9. 12 34517$30CIt -44 44';+; }4-eba in Q eEr a t e a c

2 *4D jf *44*If~

9*'}I I IV 0f 4f f E^eeb4-' 44441§i4 i*41

S (+i+-+.- te -FFlf- {,9f 4 t 0_,zFEv i,,u # If

*%- 8H-.HS86 QAt8-88f9 f- -;Vlff+E-.fFfkjffElSE 888jReSHEfigfSea8 f

ZAF" s E R 8X ' ti^ sS HA1Y Ii'l A'±4n12*94'^ t'@ l@Bfia|9 i0 ZiVAzII + Ig a$BRfs R a: " A

4

1 3 41 v. Y Z I +tIRE8a .A A AA14 (WAXAVURSR YXl )s--+ =411|0fl RaX X .AA

l

I1 54XX 448ftXX71 += I. (RISRA8 X XXX AA' M

16 M XXAA8101.Y 1 7 )- +A SxIAXAAA17 PA;4XXAUfJg4X'hAL+=-4)/Zi14HHH77XXYXXXXAA

--1--AXXXXXAR4Z888, XI XY

K A119 R AK X,4.Elff ie~k;2* It & IV'tF iMA74I I ZXXXXX X

2C AYXXAw'A f.AAXtA 1X IXX 1 11* AAZIXXZXXX7 T r C 1z7 F & x -,x1i k Y 'Al iXXXfX-X22 MXXXX*XI 1))X= 1I +X5PX?Z772Z7XX23 3 A XX X'i* X 1 4IAL 171- X '7 I1111I IZZZZXX

- 74A-' YWZX'/-XX t' A 18 1 j716--; S29I AY

)I I I I'I 1ZX X

25 A X XZ I X lA X7X0-MAX1XAXXX 4 ++ 11Z X26 AX2 X LZXIiL P vX XA VMA X AL.X Z II?v1--=I + +) I Z X X

7 AXYXA xL Z L r XFDI--= =iWT)TX-x2E XXZiZ2IZIAXXX(XXZZ/ZZZY)===+++))1IZZ29 77ZZ ZZZ Z 9V. AAXLZ ZKX X 'A +)1 I Izz3-37l7ZZZZ'7Z4- 127.AAJA-A+.-1)) II ZZZZZZ31 ZZt 71Z7Z PAE,q4AZ Xm4U'AX?- I 1 ZXKXXXXX A

32 ZZZZZX AX+AA7'`5X4AAAAXX 7 I -I ?MM¶4#-3 i 1LLZLXA41 X X XYWXTZZ - -1+Z1Xq34 XXXZIXXZ MXXAAIZZZI1IZ35 %IMXXI+ 4AZ1I 1I) +2- -I

123456789q123456786O 1234567BQO123456-7Vc

Original picture.

7

A-

2 P'NPJ AIIlV.j 74G A4+ Y'3 "94 .911: Yz X ,* 'o R

4 MA8t 8 t> 7 d. 7 , { 8t < 1 ASX t Y

73 -fWJ-5*t\ , .>+fi|j ++

i':P 'I3X i: I A X

12 M vX YV+ _+ !A!

7I v'@'A mm- m) I !A- .-ft,K t `- :3-y{FZ "' .'- AY .'1

14 MuBMv IX7*f-2 sm Zl S:A6 t. X z Z I Z t_41 7 if N4 t,;( -'lA .

~ ~ ~ ~ ~ ~

1 7 I+1A-,I I I7Y. 7 4Ij 1 '

~~~7~~' AaAk ,

W'" A A I MIRIP- }IJt" Z K >-f^"igdt ^ *I)W ,5A <

1 A v

_ X!,Q Z Y "AAX A| -i7 it Vf'Z t #0 + NI^X- 'K

L I W' -+ x

I M MA *

1" 3 17Y 7 Y ~ '. ||"-1' | IL * f 1 I .1YA . AIZL( V) fo nose.)((I Densty a a point is proportin

o pr' oa' bA t y Ytt no seis p esn1 at tha lo

ti on2 ) 5 x + + 1I Z I I I?21 MV9wh-M-X'7 i 7. w 1X.;l .'"YI I IX^271 '-tAl I 30'

24 9'fl'JE"*;3t9,* >, =+ ZZ -C tvP+X f-71i; 0

-2?, 1t'J 74}SiN ? A x"-#VYlk?7X.S1t4'

3 I Po v V it so- t 91~ 'P it 1 71X A I |

3 -J(Pt§-^9 ie"i YYY Y :, Z wiVA Vt.'7 x A'

b SYv p.L FCX A + X ct:¢-xL. s I ,,!"!, '. ,Iv..! E Y' 8

-3r1J;Y + + _ - ) l .!771 Y -5 '4 '- + I I I I Vy t'Z v-J'

i

1 2 4 5iTpC 2 4 12 -! 7j) 1 2 :4 F '7a,1t'5 h,7 7}

L(EV)A for nose. (Density at a point is proportionalto probability that nose is present at that loca-tion.)

19734,j7- "',, 2m7 -!1

3-a r

1

2 j

*i r

I f:1 o aY-sEl I Y YAl II -1 'l1 aR IEl- A-4At,#

2 UX4 7/1V t716Zfr F -e UIz s( =4 F +R '^AUE Tj@ 1 ERiAi0 ABAlUiYS#lii AS

4 0 Atift*41)1X /fE19 1 4f )'AV5~~~~~~~~~18884' X.8 P- -kW.*' 1-1a a^

6 v5Ht BASE '*9 8"E HRi Z A*IIX MJ6A x98H"b67 XA P, x Y HZ8e '

13 'fAXHUZ8-'8Hs 1k;-'5-

i 13Rsa1 I 1Xiiy 44

1 6 4I1f%lRHR8B/1 + X h0'J71X* X17t 1 I I tf A+ 8EE f A fJ 'A

3 f,XR" ig) +s'7AwSAO-A.AE 1t A I XI

21c 94- X)s o+ 17~x

116")? /I1 2 lk A-itfXZ'I 11Z WXSVIVUIX 8H

1 37:&' I I 9 f 1%+8XI I=F i xZ +* XI I7-24Fl Z d ZA AA4 xZ a- + X Z

25 fM *i^ " 1t AA + ' X I -X+11

2 + 7) A k a I, I+wR Ab 1 7 )4-.

2 7i4: 1X+ A Z 4 : 12 ='Zj/ 4Zi4fj Ajllj 4 Z + ) Y

A 1_ K Z7 A I 'Ai, I...A x--14) ) t77 A=I Z.

3 1 Z x 1 iAv AX ZYZ 1X 1AA )L is I I6.71 41*-'-

3?2 L X X71 , I I y.t<41X K I t22~ lA,Mt") )I +)I I -)+ X Xx Z I A

34 IAZA"71 A )Xv=A!f3Ul A + *)1

35R Z 8 -*= Z +A Z

123445 7r '4 12 345 r7, ?L1 2 3455 -78

Noisy pictuire (sensed scene) as uised in experiment.

HAIR WAS LOCATED AT (9,19)

L/EDGE WAS LOCATED AT (22 ,9)

R/EDGE WAS LOCATED AT (22, 22)

L/EYE WAS LOCATED AT (19, 12)

R/EYE WAS LOCATED AT (19, 19)

NOSE WAS LOCATED AT (24, 16)

MOUTH WAS LOCATED AT (26,15)

(i)

Fig. 4 (continued). (i) Successful embedding Illder ranidom noise.

86

Page 21: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

1~~~~~~~~'2344?i?(i?35h7°r!2 45 7Hq 1__234'_5h '7,iq

2

7 -HI~~~4beHllPe 0I4e414 + A4 i NeILRJ v I

5 -5P4RRflI29S99iI.

10 _A9ffsEReSs0*+1 gA7 B3

12 ;e980e8HR 71111 1sw8Sma13 f5SC9m*Hb5z1+++IlzE§RvK-14 4~?I117UU

19 +038649 11 1) IZAf4Ax%R8+17 __ 416LF-t-~l49*

1~ .X ASf 7 1 lX AX 7I A

201 39 X X X Z A Z 77 1+ I5II A'A

l20 1S FS0XX-X 7171) I +1X iAS =-

22 1 il,XI Z l+1++* + I#1-23 4,fX1 IS Z1 +--) 11X -1-24 (fZLi95Z ti625 +tEVAXYXAXZlI)1+1I A+26 XWIMHAPX1ZXZI1XI__

21t 1Ridb£tXZ7Xe'}f0+

28 +%S;!'13mL17I17X5x129<:___§ e11a6'- ~ 1i4.Xz }?71 -1 - - -

31 ) Kt98VS1X4Z X ZZ Z7 X1 I 1 A )131 7X6¢;Sa7-1X7ZLXY11Al

32 :sV'6S44XAX2l 1 I 1 1X+-

324 I X 7ZA'7AZ Z1Z AI+35 -- X+.1@9AXX111)41%1911+-

37 )44XX'lSmX1++) -+6Ex?Z+138 -*LXPflNWXi)=+lZXXllIII +++

12342S 789 12 3456 7X% 12 34 56 79C123}4 5675C

Original picture.

1?34¶'79C17347J1)12A,~b7~~)234,30'7R-)!31 11!9511X-+4 4 )+I1 3)=X) V1111117

3 Z LX XI X7 IZ416Ii; Y )I+1 1= -L--/6 711 xXlX1@ )++I I )1AXZ7111N+7 777 17' I i t1eR 4 +'* 1i + -11 1/I *

67 1 * X XI ERI' ; X 4 60T Of-- 4 A+ii1 11 yX I 1 7 1) Z

Z II X j2VP 4 F)=-= Ii X10F3j V 4 IA X ++A )

1;X4I69 12 345) R<; 12 34 56O =tl$C29)34-:1>7A X

77A/fl Ai4'B A Y.")A1 *,&/l I IlZ/

17XlZ)})++)+l

£ ++)1111155517

13 Z7 1}IZ 171 4 Hi4 PYAX 1t = == 744 lAT- X)lI X X X Z

35 Xl1155XxfXEX +U4v, =1X6X.6' 1 71 75/Z7

277@7LXA)lX"llAYA" l5"1/7 11+ 1 I IX1117II 71 1111771 Af 1A 711 - 11 IZ1?I 71)AIAl1 Rxx6'-e1-vzxx0YZI"/11 1 655)1IL

-.9- Z v;L-Xl/A,, Xt-*WX 10-Y- X IZ,'6x e81 -L A 10 l I11 ) Z2 27 11 1 1 )I )f4 I )iA,f iVA8{Ii ^XX -:Ab ^ZZ I 'Z ZAA1 Z

3 I t ZX1 1 1 )77A*A6/N'l4:H1^1H3A-T Z-*X

24 I7 Y JX "Al".f X )E"AhN1 71AZZ) VW)} e 1ZI XA 1 I 1L2

35 IUP.755sXXXx(N1XII44X+t A1* I II41 It/4Z1X 1736 Z71771 1 11Si M4fXiX/1+1A X I1 tZ1Af-I Z?L27Z7I AX I I I X 1 1 7 5 xS1 Z X 2 11 t1 I 7 l YI 5 X Z

l zIx,AI.1)t1N.g'4z.A t.IA =7vi tt.,veX 1 lI lZ 2X.,xx17 zi If IA7) AkyA6*61i4K4I +-YXA6f X I I I IIILXX Z

31 7AI I Z2A4PR RCIa'' 2 3 '455 7IZ912 134 X

38 21ZZ171 :-,TR i-j 3(I'Bff jZ'";YTI-Z71 11111'7

L(V2 fr e. ( tf at a pt i p or4c tFFx t ZAZ In1l1

34721 A'll I L)t.aZ 7'A}'S¢5t'- A P.R AA IxZ 127ZtI1 L

367 AIhehx ANAv1ar)}91Z4!lXXXZIXW889P. )A111479^Z... .,

.'_.._.I,

?3t x1I ItY IF10l'#MIfx Y,%A11)e.lF l l1x6,z

r)t1I 'F 9X 1 2 34' 7 9 0 12 3x4 0 1,7 1 231a# 7

to probability that eye is present at that loca-tion.)

12l 4567.i", 12245677CA?349 7 l?'57 771 X = _ = ) _

2- = X 7 i1Y * 7 _ lx1+--

4 x 1-5 Z7Q'FieA11= V 1 .A1+5 J .71 1 ) ^A1 Z I +6 Z 1 X 1I(t3E£HWBHEUi1 1 * A

7 -=-= 1~ 1 *8HS"Rf8wRU6081 == 1 13 1 - I .fs0se .9s z66 +1 -+ -

Q -r. *i6-eH*i*ua8 'lIlA .

10 - ++*.989-98VII 'N- 6I.6+ +11 -

12 1 76?91BNs- !- 7-7eSRE - 1 '11 3 -= 10880 --5X I X Z7'e .,si1 1 1 1

14 = *AUVU'+1-+ =:5,57wS9?= )A +

15 Xl I M +-'848.9.IY1)-i1A1XZ1 *14 Z

16 +* *=-StitIEUAA *1 7 = X*17. _ _ ZleSBmzeZ )*4'8;1,;+ =_1 V 1 frBSIf2Sl A6'lA '64+119 + I + 7 SZX1ZMAWI1A+AM) li- -* - -20 *4RZ A A+eU*l17=?XVZ7)=l75XXfX21 I ifl -A Z + 4 -AA !+ + X

22 1 X2 I=+ *ft)= I 16-A1s XKAFM ++ y

23 Z 7 *H651'l466l +Z -ZA - Z124 1L f5XY A?U8I 1+-X 16+ + 1 "A + 1

25 -= X ) 2 t'tl )C1Z7-= X M=+ X e 1 +26 X 7= +A41I,1X. lll*SlX-f- Z= -7177 V+)+X= A*5?7x511A -A 11 + 128 + + I +=:WAtEA4j7'A IIA Z29 1AIV= Y*YlW^AA1=fdZ7*A7 =+ Z-:74 -30 ) +4 -z A ?68fl8*'f h27A+ A l1a+ + I' I31 7Z 1sXIU *f4. HI HAMI-1 = It32 - - A +*A*.1U@X yiV'L+ZZ-= x 1- I3 --+ +fTl1 OX*YIB£ a/MXx1'3 w v =

34 X-) =YVZI.f+2Z3Z+ I" *! 4-iL35 +7Z+ Z A uMX 111=ISA= '4 Z117+ -A-) 7

36 +1 - M fi*EitZ.411+++A38llM14) 8 z)37

M

Z +I) =084 XA J=7.fX*11A ZXA )138 4 lI+Z1MLZ Z "661AA I+ f-+j1 14 M

1234 5 7s';312 3456 7800 12 3456780012 3456 7890

Noisy picture (sensed scene) as used in experiment.

HAIR WAS LOCATED AT (9, 20)L/EDGE WAS LOCATED AT (20 ,12)R/EDGE WAS LOCATED AT (20,27)L/EYE WAS LOCATED AT (18,15)R/EYE WAS LOCATED AT (18, 22)NOSE WAS LOCATED AT (24, 18)MOUTH WAS LOCATED AT (27,19)

(j)Fig. 4 (continued). (j) Successful embedding under random noise.

87

Page 22: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

88 IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

THIS PICTUJDE HAD lAD HlDS ADDO300 COCOIDNI 'HID PICTUREE DAD 160 DtVwS AND £30 CO)LU4NS

ItIt.

la

3D

SN

6'\'( }~'vregetation 145 S % D % ._

Vegetation 3\s

'Al

DIO

10~ ~ ~ ~ 7/

I5 I *

'U2

1.K.114

ID

I i14 Se~Seshre___ .. -

(a)Fig. S. Example of. image-matching experiment using a terraini scenie. (a) Reference for terrain.

Page 23: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PiCTORIAL STRUCTURES 89

IH000 PICYOIP HAS 100 0901 AND 000 CON.100NS lots PUctuOPF HSo I&O 0000S 0N0 200 CUL0.0400

02

30

41..

......:)veegettation1

40Si.

Is~~~~~~~~~~~~~~~~~~~~~

102

904

I0I00

li.tl/ e~tt~f

121~1.

140

144

Fig. 5 (continued). (b) Embedding of reference in sensed image for terrain.

Page 24: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

3834

AI,3049'St

at-

To33

304

3,

3r,

i4.

143

14114.

St.

392

319

-------------------913i334'?,*p33F4~T03'I?1'~730T3T W 33l~'S3~'3t3

i3q,99940907333------ ,aOus0Asu I t

33e 4133333 I33W4333 IIn..Ij-3.I.004014461114193.33333333343.03333333 4I333It333333333NA334If333934t933*4It3373343.I*....uS03330 3440f33SS4340430~~~34.,3'33430334340333333333A33401333M3333433333333313333')13?~5jij~ji34034M4033b4AUiUU334430S'4A403UO' 9339433U303333S3334934403331 333330334333033 333333.3333.3------------333A3A4 --------31.

-------------3393333334 aa343330333133133333333333333433 3333333~MR040999900339033 13.".304 333333 3304336633 09 13333in033333333TT1lIIU

* -. 333~~~~~~~~------itoww"4"404 7334333xie -~473333333413333333434'.~~~~~~~~~~~~~~~~~~~~~~~~~~~mItpg4500117-1aorA II?a

46333333333333I---3333333333 VA333

anoless

333 333333333

3.4.3..930..33433.,3.3..333m393333330933a343333333333333133311

3- -. 333333333313333~~~~~~~22#0I3.333 0g483p3S3 i333 40.3~036340333 3403 3U33 13 13333 61 109900334 10 9699333430171i 03033343a

3 4 3 3 3 3 3O393403333.3-339433339034033334 33.3I333t%t"04093 3333I3333I3I33-393I3I3339633 33333330*3333933333333333333-* 333333333033 3390434333333333333I34

3.g33433933333333343O33----------333 333333334ct3g333939393g33390139696096933493Al3333333333333333334I 93341 y- I333333.3334999" s90900199333333333339,00004,000406 39333 333333330334343333333 333333333

,..3333393333333403060 33 ~3334@33 41113g4. 33333'I 33333 3333314111 3314011333933333AI333 A 4"asoose" o~09333339333333333*33316 33443 333333133033303V9333333093 3333I33

* 3333333 333 333633333333333 3333343333933S333309933393333~ 3333339333333333333 3333333303939433333333943Al*~9~333340 343 393333304033333VEGETATION3333333*1

33333333333333333344094333333333333334333334333343403333333.3333 3333333993333333333.3133333334933933333333339333 333930333333333333333333433933AA333333334.34

I e -ie foppae 33II3333IIast403333I"I333ANP4491VAIIIAT100419,1I op ..ol i s410 "

33333433433343363333366406403340q33333333'43343333333333333333333333334333333333~3 333333339334393403333 403344033333334334003 334.3333333

A 333333044333333333333333333433333333333333933 39333333333333 333 33333333333 3339 3333333A 3339333333443933333333133333333343333333333333333-*33433333333333333333333033333933399* 33333333*333333343393433333434VEGETATION333133333333334333333333A333333333334333433333333333 3333433 333333:V G ''''O 3333333

333333333333333333333 it ....... APO ...43AA34A333

3-I0161 "*Of*8333033033339 3 394333333 4333?333333333I3333333433333333333343-3 . ? . IIva 4q4A "-o-0 IIIII.wvqow 33343333333333333393333I333I333-3333-33333333333933933333333 .33 p"337133333333 I

t- 11"11333333 3A-M-- 101493333I 0333v3 99996 aaI 3W,ItaA333333333I I333I I33A3 3o3Re33 44333 3 III o oof13rlp3333v333II.....I3.......96 33 3.3333*s* 33004sop4of-

Ia333433303 -933333343304440333334333333 333333I31433333I33A 3 M337I33333343AI337 I41033V3St33I3333 93933333V3A394033A33A?I3aI3II99I99I33A39333 9 0I333333 I3.I333I33I..................3333 33433343

A '403334333333333333033333333333 43 34333A3A333393314333343A33333333A33313A33v333333 Po ""39333 033333433393333I33I 3I333II33344I433I333I34I336I...3.34393..3

3 793334033333333363403333333343 3834333333349333343 33334433g33933334333339393333393933334033 33333333333334333333333333.............3.3333333.39303OS40313fAg3314334333333333A3333333A3333333333933f9333333333t33333333334333333k333433MAP

333933333 33333333339333333403333333333334 33333333333333333.333333333333333333333333933333 .333333333333333333333333333433333 333343333

3 333333333333333343333333333333344333.333333333333333 .336333.4033333~~~~~~~~~~~041009S.A.-Asee W03333333333433330333A 43A 3333933 0033A333A3334333400333 A39It3I 33.3I3I.3I I33-33I3I3I33I3I3I39A3I3I33I3I3I33I3I......333333333333333

33333333839333333333334333333344003333333 3433 ..34 33333433333333333333333333333 30333433

£ 3333333333333333333334333333334 383333333433333333343333333333033393333333 3333333333333333333033333333833333333333333 3391333333333

3 3933334333333333340g333343333333333393436343333933339333333333333333333484333333333333393633333393.,33333333333333333333333333 33333333333333333.33333333333

3fIf%OVV"A333333393333q34-333333333.i3339330 "O 3 Al4033944 33A3343333333A3393k33I3A33All33Sit3v33A33333 .13333333P33333"333......3 )A&314*3q*43333033403

m- I GO--mmaAm"M A x 4- -4 I It 441 A.-Vffft I t Iso".33 -&t33333333333o333333433333333343343a,*33333333I.333334333333 333 3333333333394404033933333340333334333433333333333333404I33 333433333333333333v333f3333333333333333 333333333333334 333I3I3I433333-33A3It33la331 .33333 ...333333333..333343333433

* 333333433~AAA A 133333333403r43333A33 AA-RV3333433333.0-4-03A333333433333333330Ps'- 1.333333343333333I3veto 344 3333333333440333333334333333333.99.,000*90.4..33 9"433933..33X333333333333of

33333333 43333333433 0- I .40i0V W A-33A333431333333433333333334333 . 333P3333vel A33 .* #v I 3qqs$33 te33of 4

O1 -.4.3wo" A-ON333333334R4 6333333333333. .3333A X 0.33..3? -A.43J,IfI V9* II e-e 98.8 9996'j3 334A03333434AA3A93 3~T3 Ii3A933V3t3 ANA33..O3333433333333390433333434V33-3A43A3333430-coax33v3 I3333I343333333333333 3443333 3333333333 333433333.33.3)34.334333333e333333* 1111"OKA33333333AN34433333S33 VA33333343&O" 93I3S4Iv To& a It3343343333333 333333333333333333333333333333330333 3333 333 w3 4343933 4..... .w33.3333 4333 83333 33333343333333413f OWNA3333339433333Aw 3333AA9333331113303333033334033t333333333833333333333433A33333333333333a3 33330433333333333 .33.3333333393333333I333I34383T 914433333333 0493334033333 333433404111 333,73394 la33333333-V333V33AI3IVIt001 tT*-13333"t33.443434...0433333330.00 010 "-%-14'4 *-A909'A44OA

I3333333334343343I3333A3A" 933493333I333V33I3I333A4333333)A333333303333333 333333333333338 33333333333334333339333343334333333333333333a3.3333..63 3.O3.3333333 343333333333

* 33333333333933343333333343393333333343333333333933333333333333333333334333333 333333333333333333433336333 333333333333333333333333333380333333333333331 ..333334333334333333333333403A3I3I3 343333333333333333333333333333333433333333343333333333343333393333-33 33it 33333333333334I3334333I3V33V33 34333333333I3I33a3334333334343I3333333 34333 3333113333333333 I3a33my3I3I3A

* .33343333333333333333333334333333333334333 333334333333334333333 333338334343433~~~~~~Otl)'211111I3333333393333333333333333333333333334903 3333333043333343333333333333333333333333833 3334333333333333333338333333.3633333333343433 3*33-. 333393 3333333333

3.3333333.33333333333333333940334433333343333333344333334343333333333 3333333403343343333333333333333333333333333334334 3 .333.334 3433~ 33-331333-33333L3913

3.6333433393333333333433333.33334333343333 43 333333433433433333333333333333333333333333333433334333333333 3 333~33433..~3~33~ 33.3 333' .33439333.3433333333

,.3333333333393333344043333333333933403339333333 3343 33 333403339333833343333343343339333 3I3333393333343333333333333333333334333333333333333343333 ...33....3333 344344333434033333333 4033933333333

9333333333333343333333339333333333S3333330333333433ASHORE3~33 333333333 333334333343433333333333333333*.43343334334334336343333S3 333333333333333333333VE E35g±.~JLJ3 33333*3339:34333333

6- izzizz tr Ertittizirzftttzyrfrfr fit tj ~ ~ ~ ~~~~~~~~~ ~ ~ ~ ~ ~ ~~~~~~~~33430333333333333333333434343333 Ill335 333333334343343333333333333333433 33334333343.333.3333.333333333333I3330333

943333433333333333333333333433339333333~33~ 333133 133 3333333333033333333733333303333333333333 3433 333..333*333333 3333.43334333333433333333333 3433333333

St 4333333433433g333433333336333II I43333333433344363333303339333333333333 3333I333333333.3Z33.33.33333333Z333333?33 03334333OA3334333343333333633433333393 3343. 3 33333333333333333334433333333333433 44.33..33.3434334333333333 33S3393

" 114$$ I t n II ft i? TrtrTIT I zrzztwwrtitI tIrttzrwr z trz I "ZI?trzl I r?" itmmAv3.333 33..mvm33 II I 33 333 3333xxo333040W3RA0833333333339AA&3A333334333333111.41-48TIltit~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~3733333.33333333333333333339333 3393333"W119333340331. 303333033443303333143433333333i43333333333333393333333333333333333333333333333I34333333333333333l33333 33333333

*333334033439333303330334~3333~3333333333.3333433333349333433333333333333333 33333333333433393333333433033333333333c33333334333333333333333333333343333333333 Fig.335 (continued).33334333 333333333333334333S3e3n3333ed333image3333333for34994terrain33.333333

90

Page 25: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES

posed a rigid template which is correlated at the dif-ferent positions of the sensed scene in order to get thelocal evaluation array. The correlation is performed onlyat the indicated points of the template. Reproductiondifficulties make it hard to see the intensities of theseenclosed points. For "Vegetation 1," all the points areof intensity 40, for "Vegetation 2," 20. For "Urban," thepoints were randomly assigned intensities of 25 and 35,and so on for the other pieces.The terrain-matching experiments represent a con-

tinuing area of investigation. Our current efforts areprimarily directed toward obtaining a satisfactory setof primitives which can be used as the basis for referencecomponent description. In low-noise experiments, withlimited geometric distortion, the simple descriptions(i.e., ad hoc shape and texture components linked bysprings) produced correct embeddings. Experimentsunder more severe conditions remain to be performed.

Implementation DetailsIn all of the face and terrain experiments presented

in this paper, the springs were assigned the values

0g

gij(Xj -xi) = q

t oo,

if A < row (xj-xi) < B and

C < column (xj- xi) < D

otherwise.

The values A, B, C, and D were typically set by takinga number of sample pictures and determining the small-est box encompassing the variation in the relative posi-tion between the components i and j. A subroutine was

written to perform this task and automatically set thederived parameters into the LEA.

In our current implementation, for a typical 35 X39(face) picture, we require 13 s to compute all theL(EV)A's, and 35 s to execute the LEA (these timesinclude picture input from a disk library, and storageof results on the disk). Most of the programming is inFortran, and the computer is-an IBM /360/40 (the addi-tion instruction on the 360/40 executes in 11.88 ,s).Total core and disk storage in bytes for all arrays is

M(4f+2h) and NM(f+2h), respectively, where M, N,f, h are the number of resolution cells in the sensedimage, the number of L(EV)A's, the amount of core

needed for a floating point number, and the amount ofcore required for a (small) integer. For a typical picturein the face experiments (M=1600, N=8, J=4, h=2),these requirements woi-k out to appr-oximately 32 Kbytes of core, and IOOK bytes of disk storage.

DISCUSSION

Many, though by no means all, visual objects can bedescribed by breaking down the object into a number ofmore "primitive parts," and by specifying an allowablerange of spatial relations which these "primitive parts"must satisfy for the object to be present.As an example, suppose we want to describe a frontal

view of a standing person. This visual object could be

decomposed into six primitive pieces: a head, two arms,a torso, and two legs. For this visual object to be presentin an actual picture, it is required that these six primi-tives occur (or at least that some significant subset ofthem occurs), and also that they occur within a certainspatial relationship one to the other-that is, the legsshould be next to each other, and below the torso; thetorso should be between and below the tops of the twoarms; and the head should be on top of the torso.

It may be noticed that in the previous two para-graphs, we implicitly separated the local aspects fromthe global aspects of the description; the local aspectsare the primitive parts of the picture, and the globalaspects are the spatial relations between these parts.While at first glance it does not seem unnatural to makethis separation, in practice there is a frequently encoun-tered difficulty, that is, the feedback between thWe localand the global.To illustrate this difficulty, let us go back to the ex-

ample of the view of a standing person. Any methodwhich detects torsos on a local level (that is, a methodwhich detects torsos without using any knowledge ofthe positions of nearby arms and legs) might very welldetect several torsos in a picture. In fact, the actual or"true" torso may be one of the weaker of the torsosdetected by the method; it may even happen that thetrue torso is not detected at all. What does determinethe position of the true torso is the position of the truearms, legs, and head. But, unfortunately, the reverse istrue. The positions of, say, the true arms depends onthe position of the true torso. Thus, the possible positionof each piece affects the possible position of each otherpiece, making for a circular type of dependency.

It seems that whatever the visual object is, wheneverwe try to separate the global and the local, the samecircular dependency occurs. Many times attempts torecognize visual objects described in such a way involvealternating between local and global analysis, heuristics,backup procedures, etc.One conceptual way of avoiding this circularity is to

evaluate simultaneously a complete interpretation of thepicture; e.g., in the example given above, we would lookat, and evaluate, complete configurations of head, arms,legs, torso, etc. The best complete interpretation couldthen be chosen. This approach, however, requires thatwe make an infeasibly large number of evaluations. Itwas just this computational problem that in the firstplace led to the decomposition approach.The implication of the above discussion is that, in

general, we cannot hope to decompose the global evalua-tion problem into a number of smaller independent prob-lems, but rather must use something akin to the simul-taneous evaluation, taking advantage of any reductionin total variable interdependency to reduce the requirednumber of such evaluations.

In this paper, we accomplish this through the follow-ing machinery. First, an embedding metric is presentedwhich sets the framework for evaluating how well any

91

Page 26: Representation and Matching Pictorial Structurespeople.csail.mit.edu/torralba/courses/6.870/papers/...The Representation and Matching of Pictorial Structures MARTINA. FISCHLER AND

IEEE TRANSACTIONS ON COMPUTERS, JANUARY 1973

composition of primitive picture pieces (parts of the

decomposed picture) matches the desired composite

picture. Second, a sequential optimization (dynamic

programing-type) algorithm is developed which takes

advantage of the decomposition to reduce drastically

the computational requirements (our computationalrequirements grow linearly with the size of the picture,

rather than exponentially). The contribution of this

paper is the simultaneous offering of the above two com-

ponents and their suitability for application to a wide

class of pictorial objects.In addition to the image-matching application, which

was the center of most of the development in this paper,

we have also attempted to establish the utility of the

representational aspects of the embedding metric for

general picture description applications.The work presented here is a continuation of the

investigation described in [1] and [2], where Fischler

uses sequential optimization for matching two-dimen-sional scenes, introduces the generic form of the em-

bedding metric elaborated on here, and presents the con-

cepts of coherent segmentation, arbitrary serialization,and sequential constraints. The relation of the heuristicembedding problem to formal decision theory is also

discussed. The only other paper in which sequentialoptimization is applied to a broad class of problems in-

volving two-dimensional scenes is where Martelli andMontanari [4] present a metric and matched algorithm

for smoothing pictures. Kovalewsky [5 ] and Montanari[3] have applied dynamic programing to the detectionof (one-dimensional) line-like pictures. Reference [3] isan outstanding paper, in which Montanari providesmuch insight into the characteristics of sequential opti-

mization. Both Kovalewsky [5 ] and Montanari [3]comment on the representational aspects associatedwith their optimization procedures. An embedding met-

ric conceptually very similar to the one given in [1 J, [2 ],and this paper is discussed in a broad and interestingwork by Bremermann'0 [121 with respect to its poten-

tial use in character recognition, speech recognition, and

control of effectors (e.g., manipulators).

ACKNOWLEDGMENT

The authors wish to thank 0. Firschein and J. Tenen-

baum for many constructive suggestions relative to theorganization of this paper.

REFERENCES[1] M. A. Fischler, "The detection of scene congruence," Lockheed

Missiles & Space Company, Inc., Palo Alto, Calif., Rep. 6-83-71-2, Jan. 1971.

[2] M. A. Fischler, "Aspects of the detection of scene congruence," inProc. 2nd Int. Joint Conf. Artificial Intelligence (Advance Paper),Sept. 1971, pp. 88-100.

[3] U. Montanari, "On the optimal detection of curves in noisypictures," Commun. Ass. Comput. Mach., vol. 14, pp. 335-345,May 1971.

[4] A. Martelli and U. Montanari, "Optimal smoothing in pictureprocessing," in Proc. IFIP Congr. Amsterdam, The Nether-lands: North-Holland, 1971.

10 Other relevant publications by Bremerman) incltude [131 and[14].

[5] V. A. Kovalewsky, "Sequential optimization in pattern recog-nition and pattern description," in Proc. IFIP Congr. Amster-dam, The Netherlands: North-Holland, 1968, pp. 146-151.

[6] R. Bellman and S. Dreyfus, Applied Dynamic Programming.Princeton, N. J.: Princeton Univ. Press, 1962.

[7] U. Bertele and F. Brioschi, "A new algorithm for the solutionof the secondary optimization problem in nonserial dynamicprogramming," J. Math. Anal. Appl., vol. 27, no. 3, pp. 565-574,1969.

[8] 0. Firschein and M. A. Fischler, "Describing and abstractingpictorial structures," Pattern Recognition, vol. 3, pp. 421-444,Nov. 1971.

[9] A. Rosenfeld, Picture Processing by Computer. New York:Academic, 1969.

[10] W. F. Miller and A. S. Shaw, "Linguistic methods in pictureprocessing-A survey," in 1968 Fall Joint Comput. Conf.,AFIPS Conf. Proc., vol. 33. Washington, D. C.: Thompson,1968, pp. 279-290.

[11] M. D. Kelly, "Edge detection in pictures by computer usingplanning," in Machine Intelligence 6, B. Meltzer and D. Michie,Ed. New York: Elsevier, 1971, pp. 397-410.

[12] H. J. Bremermann, "Cybernetic functionals and fuzzy sets," inAnn. Symp. Rec. 1971 IEEE Syst., Man Cybern. Group,Oct. 1971, pp. 248-253.

[13] "Pattern recognition, functionals, and entropy," IEEETrans. Bio-Med. Eng., vol. BME-15, pp. 201-207, July 1968.

[14] -, "What mathematics can and cannot do for patternrecognition," in Pattern Recognition in Biological and TechnicalSystems, 0. J. Grusser, Ed. Heidelberg, Germany: Springer,1971, pp.31-45.

[15] W. W. Bledsoe, "The model method in facial recognition,"Panoramic Research, Inc., Palo Alto, Calif., Rep. PRI:15,Aug. 1966.

116] , "Man-machine facial recognition," Panoramic Research,Inc., Palo Alto, Calif., Rep. PRI:22, Aug. 1966.

[17] A. J. Goldstein, L. D. Harmon, and A. B. Lesk, "Identificationof human faces, "Proc. IEEE, vol. 59, pp. 748-760, May 1971.

Martin A. Fischler (S'57-M'58) was born inNew York,I- Y., on February 15, 1932. Hereceived the B.E.E. degree from the City Col-lege of New York, New York, in 1954 and theM.S. and Ph.D. degrees in electrical engineer-ing from Stanford University, Stanford, Calif.,in 1958 and 1962, respectively.

He served in the U. S. Army for two yearsand held positions at the National Bureau ofStandards and at Hughes Aircraft Corpora-tion during the period 1954 to 1958. In 1958

he joined the technical staff of the Lockheed Missiles & Space Com-pany, Inc., at the Lockheed Palo Alto Research Laboratory, PaloAlto, Calif., and currently holds the title of Staff Scientist. He hasconducted research and published in the areas of artificial intelligence,picture processing, switching theory, computer organization, andinformation theory.

Dr. Fischler is a member of the Association for Computing Ma-chinery, the Pattern Recognition Society, the Mathematical Associa-tion of America, Tau Beta Pi, and Eta Kappa Nu. He is currentlyan Associate Editor of the journal Pattern Recognition and is a pastChairman of the San Francisco Chapter of the IEEE Society on Sys-tems, Man, and Cybernetics.

Robert A. Elschlager was born in Chicago,Ill., on May 25, 1943. He received the B.S.degree in mathematics from the University ofIllinois, Urbana, in 1964, and the M.S. degreein mathematics from the University of Cali-fornia, Berkeley, in 1969.

Since then he has been an AssociateScientist with the Lockheed Missiles & SpaceCompany, Inc., at the Lockheed Palo Alto Re-

Q< search Center, Palo Alto, Calif. His currentinterests are picture processing, operating

systems, computer languages, and computer understanding.Mr. Elschlager is a member of the American Mathematical

Society, the Mathematical Association of America, and the Associa-tion for Symbolic Logic.