discrete relaxation

23
PanrmRecoration . Vol.23.No .7,pp .711-733 .1990 . 0031-3203190S300 a .00 PrintedinGreatBntain . Perganon Press Ple 1990PatternRecognitionsociety DISCRETERELAXATION EDWINR .HANCOCK a,t and JOSEFKrFTLERt aS .E .R .C .RutherfordAppletonLaboratory,Chilton .Didcot,Oxfordshire,OXIIOQX, U .K . ; tDepartmentofElectronicandElectricalEngineering,UniversityofSurrey,Guildford,GU2SXH . U .K. (Received 23 January 1989 ;in revisedform 9June1989 ; receivedforpublication27September 1989) Abstract-Inthispaperweareconcernedwiththeapproachtodiscreterelaxationthatregardsthe globalconsistentlabellingofobjectsasmaximum aposteriori probability ( MAP)estimation.We commencebyreviewingexistingworkanddrawattentiontosomeofthetechnicaldifficultiesthatlimit theapplicabilityofthetechnique .Thedifficultiesoriginatefromthedistinctrequirementsthatthefinal labellingisbothaglobaloptimumandgloballyconsistent .Wedemonstratehowitispossibletoachieve agloballyconsistentMAPestimatebyiterativelabelreplacementwithoutsacrificingtherepresentational capacityofthelabelprocess .Thecomputationalrealisationoftheproposedapproachadmitsaconceptual ingredienthithertonotpresentindiscreterelaxationschemes,namely,thatofalabelerrorprocess . Thisprocessnaturallyleadstoameasureofcongruencybetweeninitialinconsistentlabellingsand physicallyoccurringdictionaryitems .Theuseofcongruencyprovidesameansofresolvinglabelling ambiguitiesandinconsistencieswhichwouldotherwiseremainunresolvediftheconventionalmodels ofthelabelprocesswereemployed .Theperformanceofthetechniqueisdemonstratedforthehighly structuredproblemofedgelabelling . Discreterelaxation Maximumaposterioriprobabilityestimation Consistentlabelling Dictionarymethods Congruency Edge-labelling 1 .INTRODUCTION Theinterpretationofsensorydataisfundamentalto manyapplicationsintheareasofcharacterrecog- nition,computervision,imageprocessingandspeech recognition .Thetaskisinvariablyaddressedasone ofpatternrecognitioninwhichitisnecessaryto assignaclassidentitytoeachofasetofperceptual entitieswhichhavebeenderivedthroughaprocess ofsegmentingtheavailabledata .Ithasbeencon- sistentlydemonstratedthattheperformanceofthe classificationprocedureisvastlyimprovedifcon- textualinformationisutilised .tl -1 ') Contextualinfor- mationexistsintwoforms.Firstly,thereis observationalinformationcontainedwithinthe environmentofeachobjecttobeclassified ;thisexists intheformofrawmeasurementsorfeaturesderived fromdataforthesegmentalentities .Secondly,there isknowledgeconcerningtheconstraintsthatapply betweenobjectsofdifferentclassidentity,i .e .a worldmodelofthelabellingprocessin-hand .Algor- ithmsdevelopedinconnectionwiththeapplications describedabovehaveutilisedavastdiversityof approaches to contextual classification . These approachescan be describedwithinataxonomy whichdistinguishesbetweentheviewpointsinwhich theavailablecontextualinformationisregardedand thelimitingassumptionsconcerningthemeasure- mentorlabelprocessesnecessarytoformulateprac- ticaldecisionschemes . Todevelopthistaxonomy,and,tosetthework reportedinthispaperincontext,wedistinguish 711 betweentwopossiblemodalitiesofdecisionmaking . Accordingtotheglobalmode,referredtoasthe messagecentredinterpretation,ajointassignment oflabelsissoughtwhichsimultaneouslybestexplains themassofobservationsforallobjects .Thisistobe contrastedwiththelocalmodeofdecisionmaking, referredtoastheobject-centredinterpretation,in whichattentionisconfinedtotheassignmentofa labeltoasingleobjectatatimebasedontheentirety ofobservationalinformationforallobjects .These twoviewpointsleadtoavarietyofalgorithms .Those belongingtothemessage-centredfamilyincludedic- tionarylook-up ( 18) discreterelaxation ( ') andthe simulatedannealingscheme of Geman and Geman . ( g) Thetechniqueknownasprobabilistic relaxation (' ), ontheotherhand,derivesfroman objectcentredviewpoint . (' ) Asindicatedabove,thetaxonomyofalgorithms canbefurtherextendedtoincludethewayinwhich themeasurementandlabelprocessesaremodelled . Themeasurementprocessisfrequentlyassumed eithertobeMarkovian,beingmodelled,for example,byanautonormaldistribution, ( '-" ) orto besubjecttoindependenceassumptionsasaconse- quenceofamemorylessnoiseprocess .t 7 J Thelabel process,ontheotherhand,encapsulatesknowledge ofconstraintsorstructurewhichapplytotheappli- cationin-hand .Modelassumptionsconcerningthe labelprocessrelatenotjusttotheperformanceof theresultingcontextualdecisionalgorithm,butalso toitsrepresentationalcapacity .Broadlyspeaking theexistingalgorithmscanbe dividedintothosethat

Upload: independent

Post on 19-Jan-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Panrm Recoration . Vol. 23. No . 7, pp . 711-733 . 1990 .

0031-3203190 S3 00 a .00Printed in Great Bntain .

Perganon Press Ple1990 Pattern Recognition society

DISCRETE RELAXATIONEDWIN R. HANCOCK a ,t and JOSEF KrFTLERt

aS .E.R.C. Rutherford Appleton Laboratory, Chilton. Didcot, Oxfordshire, OXII OQX, U .K . ;tDepartment of Electronic and Electrical Engineering, University of Surrey, Guildford, GU2 SXH .

U . K.

(Received 23 January 1989 ; in revised form 9 June 1989 ; received for publication 27 September 1989)

Abstract-In this paper we are concerned with the approach to discrete relaxation that regards theglobal consistent labelling of objects as maximum a posteriori probability ( MAP) estimation. Wecommence by reviewing existing work and draw attention to some of the technical difficulties that limitthe applicability of the technique . The difficulties originate from the distinct requirements that the finallabelling is both a global optimum and globally consistent . We demonstrate how it is possible to achievea globally consistent MAP estimate by iterative label replacement without sacrificing the representationalcapacity of the label process . The computational realisation of the proposed approach admits a conceptualingredient hitherto not present in discrete relaxation schemes, namely, that of a label error process .This process naturally leads to a measure of congruency between initial inconsistent labellings andphysically occurring dictionary items . The use of congruency provides a means of resolving labellingambiguities and inconsistencies which would otherwise remain unresolved if the conventional modelsof the label process were employed . The performance of the technique is demonstrated for the highlystructured problem of edge labelling .

Discrete relaxation

Maximum a posteriori probability estimation

Consistent labellingDictionary methods

Congruency

Edge-labelling

1 . INTRODUCTION

The interpretation of sensory data is fundamental tomany applications in the areas of character recog-nition, computer vision, image processing and speechrecognition. The task is invariably addressed as oneof pattern recognition in which it is necessary toassign a class identity to each of a set of perceptualentities which have been derived through a processof segmenting the available data . It has been con-sistently demonstrated that the performance of theclassification procedure is vastly improved if con-textual information is utilised .tl -1 ') Contextual infor-mation exists in two forms. Firstly, there isobservational information contained within theenvironment of each object to be classified ; this existsin the form of raw measurements or features derivedfrom data for the segmental entities . Secondly, thereis knowledge concerning the constraints that applybetween objects of different class identity, i .e . aworld model of the labelling process in-hand . Algor-ithms developed in connection with the applicationsdescribed above have utilised a vast diversity ofapproaches to contextual classification . Theseapproaches can be described within a taxonomywhich distinguishes between the viewpoints in whichthe available contextual information is regarded andthe limiting assumptions concerning the measure-ment or label processes necessary to formulate prac-tical decision schemes .To develop this taxonomy, and, to set the work

reported in this paper in context, we distinguish

711

between two possible modalities of decision making .According to the global mode, referred to as themessage centred interpretation, a joint assignmentof labels is sought which simultaneously best explainsthe mass of observations for all objects . This is to becontrasted with the local mode of decision making,referred to as the object-centred interpretation, inwhich attention is confined to the assignment of alabel to a single object at a time based on the entiretyof observational information for all objects . Thesetwo viewpoints lead to a variety of algorithms . Thosebelonging to the message-centred family include dic-tionary look-up ( 18) discrete relaxation ( ') and thesimulated annealing scheme of Geman andGeman . ( g) The technique known as probabilisticrelaxation ( ' ) , on the other hand, derives from anobject centred viewpoint . (' )

As indicated above, the taxonomy of algorithmscan be further extended to include the way in whichthe measurement and label processes are modelled .The measurement process is frequently assumedeither to be Markovian, being modelled, forexample, by an autonormal distribution, ( '-" ) or tobe subject to independence assumptions as a conse-quence of a memoryless noise process .t 7 J The labelprocess, on the other hand, encapsulates knowledgeof constraints or structure which apply to the appli-cation in-hand . Model assumptions concerning thelabel process relate not just to the performance ofthe resulting contextual decision algorithm, but alsoto its representational capacity . Broadly speakingthe existing algorithms can be divided into those that

712

make use of Markov assumptionsls . II ) and those thatutilize dictionaries of permissible labelling con-figurations to represent the label process . (')

In this paper we are concerned with the techniqueknown as discrete relaxation. The term is used todescribe a family of message centred algorithms thataim at updating the symbolic label assignments usedto represent objects in such a way as to achieveglobal consistency . The technique should be con-trasted with probabilistic relaxation in which theobjects are represented by a vector of label prob-abilities which can take on continuous values . Thetwo relaxation strategies also place differentdemands upon the dictionary of permissible label-lings . In the probabilistic case it is used in an evidencecombining procedure to update the label prob-abilities, whereas in the discrete case it is used togauge the consistency of label assignments .

The original discrete relaxation algorithm wasreported by Waltz( r 1 and drew on earlier work con-cerning the consistent labelling problem byHuffman( IS ) and Clowes .0 1) Waltz's algorithm wasdeveloped in connection with the interpretation ofline drawings . In this work the role of the dictionarywas to generate the set of global scene labellings thatwere both unambiguous and consistent . Subsequentwork by Hummel and Zucker1 I7) aimed at for-mulating discrete relaxation as a consistent labellingproblem through the definition of a global heuristicconsistency measure . Unambiguous label assign-ments were sought by optimising the consistencymeasure over the discrete space of label con-figurations . The compatibility of label assignmentswas represented by a vector of binary label prob-abilities that were updated in the optimisation pro-cedure. If instead of binary label probabilities,continuous values were admitted, then the algorithmtook on the features of probabilistic relaxation .

These early studies of discrete relaxation did notadmit any observational evidence for label assign-ments. More recent work has aimed at incorporatingmeasurement information by regarding the labellingprocedure as global MAP estimation . The work wasmotivated by the need to extend the highly successfulmethod of dictionary look-up"') to unsegmentedarrangements of objects. The applicability of theresulting algorithm is limited by the need to optimisethe a posteriori probability of label assignments overthe discrete space of possible global configurations .To avoid the computational overhead associated withthe generation of plausible global label configur-ations, several authors have suggested optimisationschemes that are based on the iterative replacementof single labels. This has the desirable effect ofpermitting a manifestly global process to be realisedin terms of local computation . In the stochastic relax-ation algorithm of Geman and Geman (B) globaloptimisation is achieved by the technique known assimulated annealing . Besag111) on the other hand,abandons the quest for a global optimum and devel-

EDWIN HANCOCK and JOSEF KrrrtER

ops a recursive label replacement scheme which isguaranteed only to find a local optimum . Kittlerand Foglein 1 19 ) adopt an object-centred viewpointin developing decision rules that are applicable toobjects arranged in lattice configurations .

Notwithstanding the issues of global versus localoptimisation, a second problem plagues the MAPapproach to discrete relaxation . At each stage of thelabel replacement scheme it is necessary to computethe a priori probability of label configurations . In theearly stages of iterative replacement, the labelling islikely to contain many highly inconsistent con-figurations obtained, for instance, by a non-con-textual means. Such corrupted label configurationswill invariably lay outside the dictionary of accept-able possibilities ; the problem becomes more criticalwith increasing size of the representational units usedto model the label process. If the strict dictionarymodel is adopted, i .e . that non-physical items occurwith zero probability, then it is impossible to proceedwith MAP estimation by single label replacements .In many applications the dictionary model is relaxedand non physical labellings are admitted with finitelikelihood . The majority of algorithms opt for theuse of Markov models to describe the label process .This is true of Besag, ( II ) Geman and Geman,(B) andKittler and Foglein t3) Yu and Fu 191 actually assumecomplete independence of the label process. Theonly example of the use of a dictionary-model is tobe found in Kittler and Pairman 1O J who coarselypartition the space of label configurations into itemsconsidered likely and those considered less likely .With the expedient measures employed to makethem practicable, the schemes described above havethe dual disadvantages that they degrade the repre-sentational capacity of the label process and are notguaranteed to increase consistency .

In this paper we would like to pursue the devel-opment of discrete relaxation with the objective ofachieving a MAP estimate of label assignments whichis globally consistent . We will retain the conditionthat configurations lying outside the dictionary occurwith zero probability . Our model of the label processderives from the observation that the distribution ofprobability over estimated label configurations isrelated to a consistency measure in some monotonicfashion . This is in the spirit of classical discreterelaxation algorithms where the role of the dictionaryis to determine the consistency of candidate labelconfigurations .

To meet the objectives set out above, we adopt asour starting point the MAP estimate of the labellingunder the assumption of conditional independenceof the measurement process . Our model of the labelprocess is based upon the global heuristic criterionused by Hummel and Zucker, with the dictionaryplaying the role of consistency measure . With thebinary assignment of label probabilities, the globalcriterion can be used as a coarse measure of con-sistency . However, it is not fine enough to represent

the departures from full labelling congruency whichare necessary to successfully model the label processin highly structured applications .

The problem of coarseness is overcome by intro-ducing the concept that is novel to this paper, namelythat inconsistent configurations have been arrived atfrom true and therefore consistent configurationsthrough the action of a label corrupting process .Under this assumption we can expand the probabilityof inconsistent labellings over the space of dictionaryitems. This has the important consequence that it isonly necessary to estimate the likelihood of incon-sistent labellings given each dictionary item . If it isassumed that the label errors are independent andequiprobable for different objects, then the labelerror process can be modelled by a binomial distri-bution . The resulting criterion function is very finelygraded according to consistency and can be used toadequately model the global label process .

Under certain simplifying assumptions we can for-mulate a number of specific realisations of thediscrete relaxation scheme which can potentiallyreduce the computational burden. One such schemeimplicitly searches the space of underlying labelassignments for the label replacement which resultsin maximum congruency, i .e . it selects the dictionaryitem which is most nearly isomorphic with theobserved configuration of labels . Another schemecorresponds to the dictionary partition scheme ofKittler and Pairman .

The outline of this paper is as follows . In Section2, we develop the MAP estimation procedure anddescribe the difficulties associated with assigningprobabilities to label configurations . In Section 3 .1we formally define the dictionary and introduce someassumptions concerning the interactions between theobjects to be labelled . Section 3.2 introduces theformulation of discrete relaxation as an optimisationproblem involving a criterion function . This idea isextended to accommodate the dictionary throughthe introduction of the concept of a label corruptingprocess in Section 3 .3. Section 3 .4 provides a physicalmodel of the error process . This model is developedin Section 3 .5 to provide some simplifications of theMAP estimation procedure . The ideas developed inSection 3 are applied to the edge labelling problemin Section 4. Section 5 provides some conclusions .

2. MAP ESTIMATION

We stated in the previous section that we areconcerned with developing discrete relaxationschemes that can be applied to highly structureddomains in which the label process is represented bya dictionary . To develop the necessary frameworkand to motivate the approach introduced in Section3, we commence by reviewing the MAP estimationapproach to discrete relaxation .

Consider the set of objects ai , j = 1, N which arearranged in a network configuration and whose index

Discrete relaxation

713

set is denoted by N =11, . . . , N} . We are concernedwith finding the class identity B ; of each objectaccording to the set of class labels Q = {w,,1= 1, m} .Given observations on each of the objects in thenetwork xi, Vj E N, the message centred viewpointfor interpreting the available contextual informationinvolves finding the labeling of the entire networkwhich maximises the a posteriori probability of thejoint labelling, P(9,, . . .Ot,IsI, . . .,IIN.), i .e . theMAP estimate . The optimisation procedure over thespace of label configurations can be stated as a globaldecision rule

assign the global labelling 0, = w a„ VI - N to theobjects in the network if

P(9, = w a,,VIE NIX?, VI E N)

max P(9,=taa„VIESIX,,VIES).te,CQ,YIEN

(1)

From the Bayes rule, the quantity of interest to theMAP estimation procedure becomes

P(9, = tuej , VIE SIX,, VI E N)

}p(XI,VI E

X101 = &o,, VI E X)

xP(9,=tu a,,VIEN)VP(X I , VIEN) .

(2)

Ignoring the joint measurement density appearing inthe denominator of (2) since it is a fixed property ofthe network, there are two quantities which influencethe decision process . Firstly, there is the joint con-ditional density function p(J, VIE XI0, = wa„VIE N) that models the observational process bywhich measurements have been obtained . It is notof primary interest to us here since it can be mod-elled by a variety of well established techniques .Secondly, and of more interest to us, there is thea priori probability of label configurations on thenetwork, i .e . P(9, = we,, VI E N) . It is the need toknow the complete set of consistent global labelconfigurations that renders the label MAP estimationprocedure implied by (1) and (2) above of littlepractical use .

The generation of plausible global labellings is,after all, the aim of the original discrete relaxationalgorithm of Waltz.rtl This algorithm commences byadmitting the complete set of label assignments forevery object . Each node in the network is sys-tematically visited and the dictionary is used todiscard inconsistent labels. This procedure ter-minates when no further labels can be discarded .Since each object can still admit several possibilities,the remaining set of object-label assignments,referred to as the greatest set of consistent labellings,is still potentially ambiguous . Unambiguous label-lings were obtained by repeating the procedure usedto generate the greatest set of consistent labellingswith each object in turn admitting only one of its

714

EDWIN HANCOCK and JOSEF Krrn.ER

label possibilities . In practice, the effectiveness ofthe label discarding process is determined by thepruning capacity of the dictionary, i .e . its ability toreduce the number of plausible global labellings toa manageable size . The set of globally consistent andunambiguous configurations generated by the Waltzalgorithm can be searched for the MAP label esti-mate .

In the work reported here our aim is different.Instead of globally assigning labels, we seek to iter-atively replace labels for single objects . The labelreplacements are designed to increase both the MAPestimate and the consistency of labelling . This two-fold objective is met by making certain assumptionsconcerning the interaction relations between objectsin the network . The key to the approach is thedevelopment of a global model of the label processwhich measures consistency but does not rely upongenerating the greatest set of consistent labellings .This model leads to a global MAP estimation pro-cedure that is achieved by considering the effects ofsingle label replacements on a restricted neigh-bourhood of the network . Although the algorithm isbased on local computations, it retains the messagecentred interpretation and therefore belongs to thediscrete relaxation family .

Our proposed methodology outlined above shouldbe contrasted with that described in the literature .Several authors have developed iterative labelreplacement schemes that attempt to reduce theglobal MAP estimation to a series of local sub-problems . Besag!In Geman and Geman,tel andYu and Fu19 1 seek to maximise the globally definedquantity P(9,, . . . 6,11 1 , . . . ,x,v), by making indi-vidual label replacements . The success of theoptimisation procedures is limited by the modelsadopted for the label process. For instance, thescheme of Besagt 10 locates a local optimum of theglobally defined MAP . Unlike the work reportedhere, the label process is not modelled by a com-pound quantity; label replacements are determinedon the basis of the most probable local update direc-tion. However, under conditions of small initiallabelling inconsistency, Besag's method is a specialcase of our approach . Kittler and Foglein,( 19 ) and,Kittler and Pairman,t't on the other hand, adopt anobject-centred viewpoint . Rather than seeking theglobal MAP estimate, they maximise the quantityP(9, I x1 , x 1 , 9i , Vi E Ii , i 0j), i .e. they perform labelreplacements which simultaneously best explain theavailable observations together with the current con-figuration of estimated labels for the local objectneighbourhood 1;. With the exception of Kittler andPairman,t 41 and Kittler and Fogleint 19 j the schemeslisted above all opt for Markov field models .To commence the development of our iterative

label replacement scheme, we consider two realis-ations of the labelling of the network in which objecta, takes on label values w and A respectively . Theratio of the a posteriori probabilities of the two

labellings isp(9;=w,9i , V;ENIx;,Vi EN )P(9 =A 1 9 i , Vi E .Ylai, V i E N)

P(di,V,EXI9

to, V i EJr,i#j)P(di,V,ENI9j=A,V,EN,1*j)

P(e) =w,9,,V,EN,i*j)X P(9, = x, 9 i , V 1 E X, i # j) .

Under the assumption that the measurement pro-cess is conditionally independent we can factorisethe joint conditional density as follows

P(,rnVIE X I9j°w,9i,VIEK,i#j)

= p(aj 19, = w) fl P(3i 19 ) .

(4)iE$i+j

The ratio relevant to the label replacement schemenow reduces to

P(9,=w,e,,ViE .ti,i#jla i ,ViE

P(9i =A,9 i,ViEK,i# j I s,,ViE-v)P(X) 1 0, = w)P(9, = w, 9i , Vi E V. i # j) (5)

P(di I9i = w(9, = A, 9i, Vi E s, i #j)This formula is the basis for optimizing the a

posreriori probability P(0 1 , . . . 9vlx,, . . . , x,N) overthe space of possible label assignments to thenetwork. The algorithm which suggest itself is toreplace label A by label w at site a, if the numeratorexceeds the denominator in equation (5) . By sys-tematically repeating the procedure at all sites inthe network the MAP estimate will monotonicallyincrease .

We have now met our first objective by elucidatinga global optimisation procedure which can berealised by iterative label replacements . A means ofmeeting our second objective of improving con-sistency is more elusive for reasons that we will nowdiscuss . For some neighbourhoods it is impossible tomake a single label replacement that will create aconfiguration which is isomorphic with a dictionaryitem. Procedures which require knowledge of the apriori probability of any occurring label configurationP(91, Vi E I), where Ij C N, therefore become lessamenable to dictionary representation as the size ofthe neighbourhood I; increases . This is equally trueof global MAP estimation procedures and objectcentred recursive decision rules . The limitation arisesfrom the observation that in the presence of noise orsome other label corrupting process . the potentialnumber of initial configurations is vast in comparisonwith the number of physical dictionary items . Forinstance, in the edge labelling application, which isrepresented by a set of 5 labels, the label processis encapsulated in a dictionary which contains 181configurations for the 3 x 3 pixel sublattice . By con-trast the corrupted label configurations are drawnfrom the 5 9 combinatorial possibilities . For a noisyimage the likelihood of finding an initial labelling of

(3)

the sublattice which belongs to the dictionary is verysmall indeed.

An attempt at circumventing the problem of theinconsistency of the initial non-contextual labellinghas been suggested by Kittler and PairmanJ 4l Theirmethod utilises a dictionary which implicitly admitsall possibilities, but avoids exhaustively listing themby partitioning the label space into those con-figurations considered likely and those consideredunlikely . The deficiency of this scheme lies in thefact that it assigns balanced likelihood to competingnon-physical labellings irrespective of the degree towhich they depart from configurations in the physicalsection of the dictionary, i .e. there is no congruencymetric associated with the labellings . In short, thecoarse partitioning of label configurations results inslow convergence or even deadlock and, as we shalldemonstrate in Section 3, in some highly structuredcases leads to an unacceptably high level of residual,unresolvable, ambiguity . This coarseness in the grad-ing of consistency is also a feature of the Gibbsdistributions used by Geman and Geman, and, theheuristic global criterion of Hummel and Zucker .Our aim in the next section of this paper is to

develop a metric of consistency that is sufficientlyfine to model the label process in highly structuredlabelling problems . In doing this we will retain theinterpretation that configurations lying outside thedictionary are considered non-physical and occurwith zero probability . Our approach to the problemof initial inconsistency is to utilise the fact that thejoint prior satisfies the theorem of total probabilityand can therefore be expanded over the completespace of label configurations . Given that non-physi-cal items occur with zero probability, instead of beingof exponential complexity, the expansion satisfiesclosure over the space of label configurations in thedictionary . This property allows us to reformulatethe MAP label replacement scheme in a way whichutilises the transition probabilities between observedlabellings and dictionary items . With the additionalassumption that the label error process is inde-pendent for different objects, the transition prob-abilities are modelled by a geometric function whichdepends on a measure of congruency betweenobserved labellings and dictionary items .

3. MAP ESTIMATION AND DISCRETE RELAXATION

In this section we are concerned with demon-strating how the global MAP estimation procedurecan be reduced to a series of local subproblems whileretaining the message centred interpretation and howthis relates to conventional discrete relaxationschemes. This approach is based upon the assump-tion that the network can be described in terms ofits superclique structure, i.e. that it can be consideredas consisting of cliques of mutually interactingobjects. We commence by introducing the concept ofthe superclique structure of the network and defining

N 2304

Discrete relaxation

715

the corresponding dictionary . In order to measurethe consistency of the superclique labellings we con-struct a criterion function . With a suitable choice ofglobal consistency measure this criterion functionleads to an iterative label replacement algorithm . Byintroducing the important conceptual ingredient ofthis paper, namely, that label estimates are obtainedfrom true values through an error process, we canexpand the criterion function over the dictionaryof label configurations and avoid the problem ofdistributing the available mass of a priori probabilityamong the non-physical labellings . This leads to aphysical reinterpretation of the local compatibilitymeasure as a label error probability .

3 .1 . Definition of superclique dictionaries

We assume that the network can be describedin terms of groups of mutually interacting objects,referred to as cliques . In general, these cliques willbe defined over a particular set of interactionrelations . For instance, on the pixel lattice the setof interaction relations in a 3 x 3 neighbourhoodincludes pixel pairs, triplets and quartets ; examplesof the cliques are shown in Fig . 1(a) . Each object in

¢I

Fig . 1(a) . Examples of cliques on the pixel lattice .

716

a l a l

EDWIN HANCOCK and JOSEF KrraER

a1

Fig . 1(b) . The superclique-set of object a, on the pixellattice .

the network may be specified in terms of the set ofcliques to which it belongs . In general, there will bea set, referred to as the superclique set, which con-tains all cliques of lower cardinality. In the case ofthe 5 x 5 neighbourhood on the pixel lattice thesuperclique set consists of all nonets which containthe pixel of interest ; the cliques of pixel pairs andquartets are subsets of the objects contained in thesuperclique. Figure 1(b) shows the superclique-setof object a, on the pixel lattice. For convenience wedenote the superclique set for object a, by `gi =(C,(j)Vrl C,j)C!,}, where .1, is the index set ofobjects in the neighbourhood of object ex .

We can now define the dictionary of labellingsover the superclique set . Suppose that the numberof permissible labellings for the clique C,(j) isZ(C,(j)) and that these labellings are listed in adictionary, denoted by ec,lg . IfAB, denotes the labelon the object a,, corresponding to the kth entry inthe dictionary ec , 1D , then we can introduce thefollowing shorthand notation for entries in the dic-tionary

A;.i = {9, = AL, VIE C,(j)}

with the superclique dictionary denoted by

®c tD = {A 1 , k = 1, Z(C,(j))} .

(6)

In what follows we assume that the superclique dic-tionaries are node invariant, ie ec,l t = ec ,lt , Vi, jand Z(C,4j)) = Z(C,(i)) = Z„ Vij; to maintaininternal consistency we demand that if C,(j) - Ce (l )then OC,1i7 - ecq(i) . We also adopt the strict dict-ionary-model for the distribution of probabilityamong label configurations . Accordingly, the prob-ability measure associate with any physically impos-sible configuration {B, = x e ,, Vl- C,(j)} (E ec,1 ,t is

zero, i .e .

P({61 -we,, VIEC,(j)}eec,1A)=0.

(7)

The available probability mass is distributed uni-

formly between the label configuration in the dic-tionary, i .e . they are asumed to be equiprobable

Pl,{f) 1 =w e,,VIEC,(i)}EO cA,) )=Z. (8)

3.2 . Definition of criterion function for discreterelaxation

Our objective is to find an unambiguous and con-sistent labelling of the network which best explainsthe given observations for the objects . As we demon-strated in (5), for a single label replacement the onlymeasurement that influences the MAP estimate isthat for the object under consideration . The behav-iour of the joint probability of label assignments andits influence on the MAP estimate is more complex .We would therefore like to define a criterionfunction, F, that can be used to represent the jointprobability of global label assignments, i .e .P(BO , Vi E N) . We commence by making a choicewhich encapsulates the mathematics of classicaldiscrete relaxation that has been suggested by Hum-mel and Zucker ( lt )

h;IF= 7, 1 2; 1 P(81 = (o)

jE 1 -E Q ,- l R;_, E eC,pl

x P(B, = AL, VI E C,(1)) ]I

P(9t = we,),IEC4,,,lfi

(9)

where P(8, = we,) is the probability of label w e, onobject a, . According to discrete relaxation there is asingle unambiguous label assignment to each object .Consequently, the label probabilities have values ofeither zero or unity . Under the assumption that thesuperclique dictionary items are equiprobable thecriterion function can be expressed in terms ofKroneker delta functions as

",I_I I

1F

1-f

k

iEXr. I Z,A :,i

l CXI)bwg,, ZB,) • (10)c,Iu

The criterion function counts the number of super-cliques whose labellings are isomorphic with a dic-tionary item, i .e. each superclique contributes avalue of 1 or 0 to the criterion function dependingon whether it is congruent or not . Using (7) and (8),and, noting that under the node invariance assump-tion each clique a, features in F-J%l times we caninterpret the consistency function for a particularlabelling realisation Yr = {O = OBE , Vi E X} as

F(Yr)=II`g+IP(9i =the,,VjECv(i)) (11)_

i .e . a sum of those labelling combinations on theprincipal supercliques, CP(i), ensuring that eachclique features only once in the criterion .

If F,,,,, is the maximum value of the criterionfunction then one way of realising our objective of

relating the a priori probability of label con-figurations to the criterion function in (3) is to modelthe consistency of labelling using a Dirac delta func-tion

constant if F = Fm„P(9.VieX)={

. (12)0

otherwise

This choice is of limited computational advantage forthe optimisation procedure since the binary nature ofP(B;, Vi E X) makes it impossible to locate the MAPestimate by any means other than exhaustive search .However, if instead of making the binary assignmentof probability given in (9), we allow consistent label-lings to have a non-zero prior by making the defi-nition P(9, Vi E N) = constant . F, then whenF = Fm„ the assigned labelling is consistent . Incon-sistency is then graded by the number of inconsistentsuperclique labellings . Unfortunately even this grad-ing is too coarse to be of any practical use . We seeka model of the label process which can measure theconsistency of individual label placements to a finerresolution .

3.3 . The criterion function for the label process

In Section 2 .1 we explained that the MAP criterionwas satisfied by systematically replacing labels so asto maximize the ratio given in (5) . In this section ouraim is to obtain a procedure for finding a consistentlabelling which maximises the ratio with P(01 = w,B,, Vi E N, i # j) replaced by a criterion function inthe spirit of (11), i .e. we seek to express the ratio inthe form

P(8,=w,B,ViE .N,i*jI di , ViEN)

P(9i = A, B;, Vi E N, i # jla,, Vi E N)

-P(x,Iei=w)F(Oi=w,9,,ViEN,i#J) (13)P(ai 1 0, = A)F(Bi = A, Br, Vi E N, i # J )

Provided that the ratio is greater than unity, theMAP will monotonically increase in value . We canthen choose that label w which maximises the ratio .It should be noted that by changing one label, thecurrent value of the consistency criterion F(B% = w,8, Vi E N, i * j) will change by an amount

AF =F(01 =w,B;,ViEN,i#j)

-F(01 =A,9,,ViEN,i#j)

(14)

which involves only the superclique-set of object ai.Thus the effect of a label change is relatively easy toevaluate. As it stands the proposed optimisationscheme has the dual drawbacks of legalising non-physical labellings and coarsely quantising the cri-terion function .

A more justifiable conceptual basis is to regardnon-physical labellings as corrupted realisations ofconsistent labellings . It is the label corrupting process

Discrete relaxation

717

which gives rise to an erroneous and inconsistentinterpretation of the network in terms of the labellingB; = 0 9, Vi E N . This places a new interpretation onthe observed labelling . Previous interpretationsallow globally inconsistent labellings but assign zeroprobability to locally inconsistent configurations ; thisis clearly contradictory since global inconsistencycan not be accommodated unless the observabilityof such configurations is admitted locally . We avoidthis conflict by distinguishing between the underlyingglobally consistent labelling and its observable realis-ation. Any observable realisation has a finite prob-ability of occurrence irrespective of whether or notit is consistent . Underlying inconsistent labellings donot occur; they have zero probability in compliancewith the strict dictionary concept .

Let us consider the change in the criterion functiondue to the replacement of the label on object a) .Clearly we can confine our attention only to thecontributions from the supercliques that contain thisobject. At iteration n of the replacement scheme, theobserved labelling of the neighbourhood of object a,is denoted by Y7 = {9i = Ono„ Vi E I;, i * j} . Thecriterion function can be expressed as a sum of twoterms; the first term depends on the label replace-ment and while the second does not, i .e .

F(9,=m,Y"x)=F(9,=w,Y9,)

+F(9,=AB,,ViEN,ieI) .(15)

We therefore confine our attention to the first term,i .e . F(61 = w, in the further development ofthe label replacement scheme .

To develop further the idea introduced above thatobserved labellings are corrupted versions of con-figurations belonging to the dictionary, we denotethe underlying, i .e . true, class identity of objects inthe network by 9, = r ay , Vi E N . According to ournew interpretation, the label configurations on indi-vidual supercliques are drawn from the relevent dic-tionary, i .e . (9,= r s ,, Vi E C,(j)} E 6c ,(, ) . A priori,we have no knowledge as to the identity of the trueunderlying consistent labelling. We must thereforeadmit all possibilities from the dictionary . Accord-ingly, we use the theorem of total probability, toexpand the criterion function over the completespace of consistent label assignments permitted bythe dictionaries on the superclique-set of object ai ,i .e .

N'l

F(9,=w,Y7)=I`,I1 7, P(0 =w,9,

'-I Ak,,E9rA7

=tL,Bn=A~.,V1,mEC,(j),1*j) . (16)

We can now re-express this result in a way whichemphasises the important conceptual ingredient ofthis paper, namely, that the observed labelling ofeach neighbourhood is obtained from an un-

718

determined dictionary item through a label errorprocess

%,IRe,=w,Yi,)=I<,I 1:

1

P(Bj=w.61"'A!,, E SCA )

= he„VIE C,(j), I # 116, = Ze„ VIE C,(j))

xP(B,=AL,VIEC,(j))

(17)

where P(B, = w, B1 = th e„VIEC,(j),1#j =,k4,,VI E Cr (i )) is the likelihood of the current estimateto the labelling on the superclique C,(j) given thedictionary item indexed k, i .e . the neighbourhoodtransition probability. Since the underlying labellingswhich do not belong to the dictionary occur withzero probability, the sum over object labels, whichis potentially of exponential complexity, is restrictedto a summation over superclique dictionary items .

This formulation of the criterion function has theadvantage that potentially occurring nonphysicallabellings only appear in respect to physical dic-tionary items. It is no longer necessary to know apriori the probability of a non-physical label con-figuration . The realisation of the scheme dependsonly on the likelihood of non-physical configurationsgiven each dictionary item . In the next section wewill present a generally applicable model of the labelerror process .

3.4. The label error process

The novel concept introduced in the previous sec-tion was that observed inconsistent labellings arisefrom undetermined dictionary items through a labelerror process . To pursue the development of thediscrete relaxation algorithm for label replacementimplied by (17), we require a model of the labelerror process, i.e. a realisation of the neighbourhoodtransition probabilities . In this subsection we presentsuch a model which is applicable to a variety ofrealistic labelling applications . Section 3.5 containsfurther developments which are based on morerestrictive assumptions ; these lead to considerablecomputational simplification of the discrete relax-ation scheme .

Our basic model assumption is that the label errorsfor each object are independent . This allows theneighbourhood transition probabilities to be fac-torized in the following manner

P(Bi=w,8,=ttme,,VIEC,(j),I#1161

= At,, VIE C,U)) = P(B1 = wlBi = 2e,)

x II P(61=hme,10,=AL) .

(18)FEC,(1) .10i

Under the further assumption that the label errorprobabilities are class independent, we can naturallydevelop the model to include the concept of congru-ency between an observed neighbourhood labelling

EDWIN HANCOcK and JOSEF K1TrLEa

and a dictionary item . To this end we define thecongruency measure K(r, j, k) between the estimatedlabelling of the superclique C,(j) and the dictionaryitem Ak_ j to be

K(r,l,k)=e(w,,At)+

$d4,,xe,)

(19)

where e(i, j) = 0 if i = j and e(i, j) = 1 if i * j.If the label errors occur with equal probability, P„

in each class and the correct labellings occur with aclass-independent probability, I - P„ then theneighbourhood transition probability is

P(B;=to,01=M,,V1EC,(j),I#jIB,

= At,, VIE C, (f )) = ( 1 - P,) I C,O)I - K(r .i k) pK(,.t . k) .

(20)

Since P.1c, 01 i s a fixed property of the supercliquesystem, it does not enter the decision making process .Consequently, the labelling transition probabilitiescan be modelled in terms of geometric weights whichdepend on the ratio a= [P,/(1 - P,)) . The finalform of the neighbourhood transition probability is

P(81 =to,8,=come" VIEC,(j),1#jIB,

_ AL, VIE C,(i)) = baK(r ., .k) (21)

where b = (1 - P,) Ic,tl)I .Clearly, the model of the label error process given

by equation (21) above can be directly incorporatedinto the recursive label replacement scheme impliedby (17) . In practical labelling problems it will benecessary to estimate the parameter of the model,i .e . P,. Our specification of the model of the neigh-bourhood transition probabilities has all of thecharacteristics of a binomial distribution of labelerrors . If t denotes the index of the dictionary itemwhich is isomorphic with the true labelling of thesuperclique under consideration, then the prob-ability that there are s label errors is

P(K(r, j, t) = s) = IC)(IQi)

I-sysl

x (1 - P,)1c,i#[(I

-P,)]

(22)

From this distribution, the label error probabilitycan be estimated using a knowledge of the meannumber of label errors in each superclique,(K(r, 1, t)), i .e .

P_(K(r,j, t))

(23)`

C,U)IAlthough this formula suggests itself as the basis for aparameter estimation procedure, there are technicalobstacles to its direct implementation . The immedi-ate problem arises from the fact that a global con-sistent labelling is required to determine the trueidentities of objects . Since such a labelling is the goalrather than a prerequisite of the discrete relaxation

procedure, we are unable to estimate (K(r, j, t)) ina straightforward and unbiased way . If on the otherhand, we try to estimate (K(r, j, t)) for each principalsuperclique using the dictionary of consistent label-lings, then we encounter two additional difficulties .The first of these difficulties is associated with theambiguity involved in determining the congruencymeasure of the true labelling given the many possi-bilities represented by the dictionary . It may notalways be the case that the dictionary item of maxi-mum congruency corresponds to the true labellingof a superclique . The second difficulty relates to theway in which the supercliques used for the calculationof the mean number of errors are selected. It isimportant to ensure that the supercliques form non-overlapping neighbourhoods in order to avoid yetanother source of bias . Under the assumption thattrue labelling of each non-overlapping supercliquecorresponds to the dictionary item with the minimumnumber of errors, i .e . of maximum congruency, thenwe can make an optimistic estimate of P, .

We have now described all of the ingredients nec-essary to apply the transitive model of the labelprocess to realistic problems . All that remains is toelucidate a label update rule . From (15) it is clearthat the criterion consists of two parts . Our devel-opment has concentrated on the pan that modelsthe contextual effects of the label replacement . Theremaining component models the non-contextualeffect. The balance betwen the two terms determinesthe role of contextual and non-contextual decisionprocesses. For instance, in the limit of very largenetworks the criterion given in (15) may becomedominated by the non-contextual term . This is clearlyundesirable behaviour . Since we are primarily con-cerned with contextual decision making here, wegive the non-contextual criterion zero weight . Thefinal form of the iterative label replacement schemeis therefore

assign Bi to class d);7 I if

p(xilB,=d4 ')F(Bi =th t, y'.v)

= Exp(xi18)=w)F(e,=w,f)

wherex,'

F(81=to,)'7) =1,.1`6,17,

E

a K(r .i.k)

I Ak,, E eCA,)

x P(8, = AB, VIE C,(j)) . ,

(24)

We have now achieved our objective set out inSection 3 .2. The criterion function is no longer binaryin nature, it takes on continuous values dependingon the label error probability and the degree ofcongruency of the existing superclique label esti-mates .

At this point it is worth considering the domain of

Discrete relaxation

719

applicability of (24) . The independence assumptionswhich lead to the congruency measure only applywhen the label corrupting process is memoryless .The discrete relaxation procedure given in (24)clearly has strict applicability to such problems . How-ever, there are many label corrupting processeswhich are not memoryless. As a concrete example,in Section 4 we point out that memoryless noiseassumption does not accurately model all types ofinconsistency present in the edge labelling appli-cation. In particular it cannot account for labelcorrupting effects brought about by over sampling .

Even if the model presented in this section doesnot strictly apply to all types of label corruption,the form of the criterion function given by (17) isapplicable to a wide range of labelling problems .The application of (17) requires a model of theneighbourhood transition probabilities P(66 =W, 8,=t;„ VI =C,U), l*jle,=,1t,, VI ec,(j)) .The model may be more elusive or complex to regu-late than that which results under the assumption oflabel-error independence . For instance when thelabel corruption is not memoryless, Markov modelsmay offer a consistent means of representing thelabel-process . If the application of a Markov modelis not feasible, the idea of congruency may stillprove an attractive and powerful heuristic modellingingredient. In Section 4 we exemplify this point bydemonstrating the success of the edge labelling appli-cation when rather than being a consequence ofmemoryless label corruption, the initial labellinginconsistency is due to oversampling .

3 .5 . Developments of the criterion function

Provided that the label error process satisfies theindependence assumptions outlined in Section 2.3,the geometric weighting of dictionary items accord-ing to label congruency can be directly implemented .For reasons of computational efficiency we wouldlike to further simplify the realisation of the criterionfunction. The motivation underlying the simpli-fications are the need to reduce the number of termsentering the criterion function, which is a compoundquantity, and our desire to provide a justification forthe dictionary partition scheme employed by Kittlerand Pairman .(4J

We commence by making the assumption that thesuperclique dictionary items are node invariant andequiprobable, i .e .

P(8, = AL„ bT E C,(i )) z i , VC.(i ) C X .

(25)

We now exploit the fact that the sum over dictionaryitems can be replaced by a sum over congruencymeasure, with the result that the criterion functioncan be expressed explicitly as a power series in theparameter a, i .e .

720

Scheme A .

F(e, _ w, Y7i )"I b C101

-IZr m-0

A*.,E 9 c,1,1

As the objective of the label replacement schemeis the improvement of labelling consistency, it isreasonable to expect the mean value of congruencymeasure K(r, j, k) to decrease with each iteration .According to equation (23), this is equivalent toreducing the label error probability with iterationnumber n. If P," ) denotes the label probability atiteration n, then we can rewrite (26) to take explicitaccount of the fact that the consistency of labellingincrementally improves

Scheme B .

F(ei=ta.1'"t;)=1`,1~I

Zr

CAM)I

(I p _ p J

mX

bm.iw.pcp

(27)m-))

Ay.IEBcA,I

The value of P," ) can either be estimated using theprocedure implied by (23) or reduced according to adeterministic empirical schedule . The estimation ofP,") clearly has the attractive feature that it permitsthe discrete relaxation procedure to adapt to dif-ferent levels of label corruption . However, it is notnecessarily guaranteed to decrease the label errorprobability so as to ensure that the final labellingis globally consistent . There may be a substantialresidual labelling inconsistency . Reducing the labelerror probability according to a schedule, on theother hand, does enforce greater consistency on thelabelling by demanding an appropriately small finalvalue of P(") . In Section 4 we will demonstrate howboth strategies for controlling the behaviour ofP (,") perform .

Our first simplifying assumption concerns the like-lihood of label errors . If each label error occurs withsmall probability, i .e . a -4 1, then only the term inthe power series corresponding to items of maximumcongruency is significant, i .e .

Scheme C .%,1 b

F(9, = a4, Y7,) = 1 gil E Z act' 4 E

aM,.K(, .i.k)

I r

A* EBgpl

where

Mi = k min K(r, j, k) .

(28)

The simplified scheme described above is clearlyof no great computational advantage since itsimplementation still requires the enumeration of thecongruency measure for each dictionary item . The

(26)

EDWIN HANCOCK and JOSEF KrrrLER

improvement in computational efficiency resultsfrom the observation that the label replacementdecision is no longer based upon a compound quan-tity which must be evaluated over all dictionary itemsfor the entire superclique set . The quantity of interestneed only be evaluated over a relatively small num-ber of superclique dictionary items . If the search ofthe dictionary can be organised in a non exhaustiveor suboptimal way, then a considerable gain inefficiency can be achieved . In the work reported herewe will only demonstrate that the exhaustive searchof dictionary items results in a satisfactory decisionprocess.Both Scheme A and Scheme B involve con-

tributions to the criterion function from each super-clique dictionary item . By contrast, Scheme C onlyinvolves contributions from the most congruent dic-tionary items, i .e . the number of terms appearing inthe criterion function has been substantially reduced .For the purposes of comparison, we would like todevelop a criterion function that corresponds to dic-tionary partition, i .e . one which utilises binary meas-ure of congruency . The advantage of this approachis that the computation of criterion function isrealised by table look-up to determine whether alabel configuration belongs to the dictionary or not .

The formal derivation of such a criterion functionwithin the framework developed in Section 3 .4 is nota straightforward problem . The difficulties can bedemonstrated by taking note of the fact that thecriterion function contains contributions both fromthe single dictionary item that is isomorphic with thetrue labelling of the superclique and those that arenot. Separating the true labelling, which is indexedby t, from the remainder of the power series givenby equation (26) we get

Jill'F(e i = co, Y7,) = I`gi 1 2 Z [aK(r.i.,)

c,p)I+

a '"

bm.K(r .,.k)]

(29)m'1

A' .69 B_,k*r

According to the model of the label error processgiven in Section 3.4, the congruency measure forthe true labelling, K(r, j, t), is expected to follow abinomial distribution . The distribution of congru-ency measure for the remaining dictionary items, i .e .K(r, j, k), k * t, is more complex. In the absence ofa label corrupting process, it measures the number oflabel modifications required to transform dictionaryitems into each other . To understand the behaviourof the second term in (29) requires a detailed modelof the transformational properties of generalised dic-tionaries ; this is outside the scope of this paper .

We can however proceed to justify the dictionarypartition scheme under some restrictive assumptions .Clearly, there exists a labelling indexed t for which

K(r, j, t) =

min K(r, j, k).

(30)Ai.IEecAI

Since K(r, j, t) is evaluated independently for eachmember of the superclique set, the most congruentdictionary items in the object neighbourhood maypotentially be mutually incompatible. However,under conditions of low label-error probability thelikelihood of this is very small . Under conditions ofsevere label corruption the need to enforce mutualcompatibility on superclique dictionary items wouldlead to a different form of relaxation process . Thedevelopment of such a scheme is outside the scopeof the formalism presented in this paper. Not-withstanding this difficulty, when K(r, j, t) = 0, thenaK(r•i•` ) = 1 . For convenience we can re-express (29)to make explicit this fact

I,(,I bF(8,=w,Y7,)=I`gJ ~) Z [450 .x1, . ;.,)

Ic,ti)I+ ~' am

bm.xb.i,k)] . (31)

If the label error probability is small i .e . a 4 1, thenthe second term appearing in the criterion function(29) can be safely assumed to satisfy followinginequality

1 >

a.

bm.K(,,i .k) .

(32)m=~

At.,Eec,,)

Thus, in the case of complete congruency with onedictionary item, the value of F(0,=w, Y7,) will bedominated by just the first term in the square-bracket . On the other hand if K(r, j, t) * 0, i .e . thereis incongruency, the first term in (31) will becomezero and the second term non negligible . However,if a is small then we can further assume that thecontribution to the second term from second-orderand higher-order powers of a can be neglected . Inother words, the sum will be dominated by the first-order terms . Let the number of such terms be tq,then we can write :

Scheme D .T ;I b

F(9;=w'Y7,)=J~j I -[60 .K(r.l.t)

+ (1 - 450.x),.; .,)) . p . a] . (33)

When a labelling has complete congruencyF(91 = w, Y7) - 1 and when there is incompletecongruency r(0, = w, Y7) - q . a . This is exactlyequivalent to the dictionary partition scheme . For aparticular labelling application q must be determinedfor a specific dictionary . In the case of the edgelabelling application described in the next sectionp a 3. It should be stressed that the applicabilityof (33) is limited to the case of small label errorprobability.

In the next section we will demonstrate the resultsof implementing the recursive decision rule given byScheme A and the simplifications given by Schemes

Discrete relaxation

721

B and C. We will also provide comparison with thedictionary partition scheme which we have justifiedby Scheme D .

4. APPLICATION TO EDGE LABELLING

This section is concerned with the application ofthe discrete relaxation procedure to edge-labelling .This is a highly structured application, in which priorknowledge is represented by a dictionary which con-tains only a very small fraction of the number of thepotential noisy configurations . For instance if fivelabels are used to describe the edge-process, thenthe dictionary for a 3 x 3 neighbourhood contains181 items whereas there are 5' admissible con-figurations . As we indicated in the introduction, itis most improbable that any of the existing MAPestimation approaches to discrete relaxation couldsuccessfully encapsulate this kind of representationof consistency and move from a highly inconsistentinitial labelling to a global optimum . The main objec-tive of this section is to substantiate our claims thatthe improved discrete relaxation procedure pre-sented in this paper is capable of handling such ahighly structured representation of consistency in arealistic application domain . Given the highly struc-tured nature of the edge application and the numberof objects that must be labelled in representativeimages, it can be regarded as a demanding test . Thisshould by no means be regarded as the only domainof applicability of the iterative discrete relaxationmethodology developed in this paper . Neither shouldthe proposed implementation be considered asdefinitive; there is still scope for dictionary refine-ment and algorithm tuning which have not yet beenfully explored .

To meet the aim set out above we compare theuse of the congruency measure with the conventionalmodel of dictionary partition used by Kittler andPairman!') The necessary representation of themeasurement and label processes have been inferredin connection with a study of probabilistic relax-ation!') To demonstrate the wider utiity of theapproach, we provide some limited additional com-parison wtih the strategy proposed by Canny . (21,22)

Edge detection is usually conducted as a multistageprocess . (20-21) The first stages involve filtering the rawpixel grey scale values to remove the effects of noiseor to produce a multiscale representation of theimage luminance function . Intermediate stages areconcerned with the determination of derivatives ofthe luminance function and the localisation of theassociated maxima . Finally, the edges of physicalobjects are located by assigning labels to gradientmaxima which exhibit properties such as strong spa-tial continuity . We are not primarily interested inthe filtering stage here ; it is concerned with theoptimal characterisation of the edge information inthe image luminance function . We are more inter-ested in the localisation and labelling of edges . In

722

EDWIN HANcocx and JOSEF LITTLER

the work of Canny this is achieved through the useof non-maximum suppression and hysteresis linking .The hysteresis linking process effectively draws on adictionary of labellings. However, the way in whichconnected labellings are used is closer in characterto the dictionary partition method than to the fulldiscrete relaxation method which draws on congru-ency . Canny initially labels edge-pixels if theirresponse exceeds a high threshold value . Pixels lay-ing above a weaker response threshold are thenadmitted provided they belong to edge segmentswhich are connected to the initially labelled pixels .Finally, unconnected high-response pixels aredeleted. The comparison with Canny serves not onlyto provide a metric of performance but also to con-trast the methodology adopted . We show that thereare certain advantages in using the discrete relaxationapproach .

4.1 . The model of the edge process

We will not give complete details of the modelsof the measurement or label processes here ; a fullaccount can be found in Hancock and Kittler .i r i Insummary, the model has the following features :

-The objects to be labelled are pixels which arearranged on a regular square lattice . The super-cliques are 3 x 3 neighbourhoods on the pixellattice . The superclique set for each object not onthe boundary of the pixel lattice consists of ninesuch cliques. The neighbourhood that is appro-priate to the label replacement scheme is a 5 x 5pixel window .

-The edge process may be represented by severallabel sets which are appropriate for use on a reg-ular square pixel lattice . The label set of minimalcardinality simply denotes whether a pixel belongsto the edge class or to the non-edge class, i .e. 9 2 ={e, El . A more complex representation aims atincorporating directional information into theedge-states . This is achieved by assigning one ofthe five labels denoted by St s = {~, t ,-., 4 , e}to pixel sites. According to this representationthe four arrows are orthogonal to the gradientdirections of physical edges and arepresents possi-bility that a pixel belongs to the non-edge class .An even finer representation of directional infor-mation is possible using the nine-label set Sty ={--•, ,', T . `/, -, d, 1, \ . E)

-The raw measurements are derivatives of theimage luminance function . The differentiation ofthe image luminance function has the effect ofamplifying any thermal noise component present .Popular edge labelling schemes, such as thoseproposed by Cannyi 2la 2l and Spacek,t20 j suppressthis effect by filtering the image luminance func-tion prior to edge-labelling . Unfortunately, thefiltering operation has a band limiting effect ongenuine edge features of high spatial frequency

96

92

Fig . 2 . The differencing masks used to calculate label prob-abilities .

such as comers. Our aim here is different ; weremove the label corrupting effects of thermalnoise through discrete relaxation . This has theparticular advantage that rather than operating onthe filtered image luminance function we operateon a symbolic representation of the edge infor-mation which retains genuine high frequency edgefeatures . For this reason we aim at using dif-ferencing operators which rely on minimal supportmasks . In particular we use the two 2 x 1 dif-ference operators shown in Fig . 2 . Our philosophyhere follows that of Spacek who argues thatCanny's directional differential operators areneedlessly band-limiting . A more effective way ofextracting gradient information is to filter the rawimage with a circularly symmetric filter and sub-sequently perform differentiation using differenceoperators of the smallest possible mask size . Thepostprocessing of edge information by discreterelaxation can in principal be applied to the outputof any circularly symmetric noise suppressingfilter . A concrete example is provided by Spacek'sfilter which satisfies certain optimality criteria forthe detection of step-edge profiles .

-The label probabilities are calculated from the rawderivative measurements, c, and cy , using a modelof the way in which noise manifests itself for non-edge pixels. According to this model, the densityfunction for the derivatives associated with thenon-edge class is multivariate Gaussian beingcharacterised by the standard deviation of an addi-tive noise component in the image luminance func-tion a. The a posteriori probability for the non-edge label derived from this density function is

P(Bi=e1c„cy)=exp-t(cj 3a2 c cr} • (34)

-In principle the parameter a can be adaptivelyestimated using a procedure similar to that sug-gested in reference (23) . We have not pursued thisestimation procedure here. The values of a usedin experimentation were those that gave the bestsubjective performance . When experimentingwith synthetic images these values were foundto be greater than the additive noise componentpresent ; this is reconcilable with the fact that noattempt has been made at accounting for quanti-zation errors . When the thermal noise con-tamination is low and the image noise is dominatedby quantization errors, a better model of the

91

measurement process would involve a uniformrather than a Gaussian density .

-The a posteriori probabilities for the edge labelsare calculated by apportioning the residual prob-ability mass, i .e . P,,, ;a ,a, = 1 - P(O, = eic=, c,.) .This is done on the basis of the responses c . and c, .It is interesting to note that the residual probabilityavailable to the edge classes is the same for allorientations of a step edge on the pixel lattice .This is an important property . It means that thelabel probabilities are not biased in favour of edgeswhich propagate along the directions of the pixellattice at the expense of those that propagate atangles of rr/4 to these directions . It should benoted that this method of calculating edge-labelprobabilities does not require a detailed model ofthe appearance of the variety of edge profilesexpected in natural imagery. From the modellingpoint of view this is a great advantage,

-Dictionaries for the 3 x 3 pixel neighbourhoodhave been generated for each of the three labelsets described above . The criteria used to compilethese dictionaries were that physical edges are asingle pixel wide and undergo changes of directioninfrequently. The cardinality of the dictionary forthe two-label set is 29, for five label-set it is 181 andfor the nine-label set it is 161 . All three dictionariesaim at encompassing the full structure of the edgeapplication . To this end, edges propagating alongthe pixel lattice axes and at angles of xr/4 to themare included. Although the five label-set is notparticularly convenient for representing diagonaledges, it is well matched to the response of ourminimal differencing operators . We have been atpains to ensure that the two-label and five-labeldictionaries include all label configurationsexpected for diagonal edges ; the representation ofdiagonal edges is by contrast straightforward forthe nine-label dictionary . This accounts for thefact that the five-label dictionary is of rather largercardinality than the nine-label dictionary . We havealso obtained a pruned version of the five-labeldictionary that contains only 37 items . In theexperimentation described later we will dem-onstrate that this pruned dictionary successfullyencapsulates the salient properties of physicaledges and does not appear to have a derogatoryeffect on the representational capacity of the edge-model . The dictionary configurations wereassumed to be equiprobable and node invariant inall four cases .

In addition to the ingredients described above, theimplementation of the discrete relaxation procedurerequires a model of the label error process. Therealisations of the criterion function presented inSection 3.5 were developed under specific assump-tions relating to the independence of the label-errorprocess . For these assumptions to be valid the labelcorrupting process must be memoryless . The edge

Discrete relaxation

723

labelling application is complex in nature and thereare several processes which could potentially giverise to label corruption . Some of these are memory-less; some are not .

If we consider scenes that are composed entirelyof step-edges then the image acquisition proceduremay introduce label corruption through the actionof thermal noise contamination. The resulting labelerror process is memoryless . Realistic scenessampled by realistic imaging devices may give rise toadditional sources of labelling inconsistency . Forinstance, a step profile may be sampled by an imagingdevice of finite spatial resolution which dilates thestep-response over a width of several pixels .Similarly, natural scenes contain a variety of poten-tial edge types which do not present step-profiles .These include delta-function edges which correspondto line structures, ridges at illumination maxima andramp-profiles due to shadows or illumination gradi-ents. If the detection of edges is pursued by post-processing the directional derivatives of the imageluminance function, then only step profiles sampledwith perfect resolution will have edge response con-fined to a single pixel width . The resultant non-contextual labellings of other profiles will consist ofthickened edge segments . Viewed according to theconstraint that physical edges are of a single pixelwidth, this type of edge structure is regarded asinconsistent . It is clear that in these cases the sourceof label corruption does not satisfy the independenceassumptions .

In the experimentation that follows we will dem-onstrate that such departures from the assumedmodel of label corruption do not impose a severelimitation on the utility of the discrete relaxationtechnique. It should be pointed out that certain typesof structure in the image luminance function may inany case not be regarded as the bona fide goal of theedge-detection scheme outlined above . For instance,in the case of inconsistency produced through theimaging of step-edges with a device of very poorresolving power, it may be argued that the image-luminance function is oversampled . In this case ourmethod may be more usefully applied to a sub-sampled image or to the output of a non-maximumsuppression operator . For some of the non-stepstructures listed above, the application of directionalfirst-derivative operators to the raw image luminancefunction may be an inappropriate preprocessing step ;the optimal detection of such features may requirespecialised additional pre-filtering of the raw imageluminance function . It should be stressed thereforethat we do not claim to have developed a completesolution to the edge-labelling problem . We havedeveloped a discrete relaxation procedure that hasutility for postprocessing certain kinds of edge infor-mation. A comparative study of the relative meritsand the appropriate uses of different postprocessingtechniques is in preparation and will appear else-where .

724

4.2. Experimentation

The purpose of this section is to demonstrate theutility of the discrete relaxation procedures outlinedin the preceding sections . In this respect our aimsare as follows :

-To show that the iterative label replacementscheme does incrementally improve the con-sistency of labelling .

-To demonstrate the comparative merits of dif-ferent realisations of the discrete relaxation pro-cedure which result from the different forms ofthe criterion function and from the differentmethods for controlling P, .

-To evaluate the robustness of the discrete relax-ation procedure to label corrupting noise .

-To demonstrate that the edge-labelling procedurecan perform in a satisfactory way when the labelerror process departs from the model described inSection 3.4.

-To demonstrate experimentally that the five-labelrepresentation of the label process is adequatewhen used in conjunction with the minimal dif-ferencing operators described above .

-To compare the performance of the discrete relax-ation procedure with other edge labelling stra-tegies. For this purpose, we have chosen the edgelabelling procedure of Canny .t 2 l 2l We will dem-

EDWIN HANCOCK and JOSEF KrrrLER

onstrate that the Canny edge-detector only hascomparable labelling performance to our discreterelaxation scheme when a relatively large filteringmask is used ; the band limiting effects of thefiltering are evident on high curvature featuressuch as corners .

-To demonstrate that the discrete relaxation pro-cedures have satisfactory performance on naturalimages which contain features that do not necess-arily correspond exactly to our model of the edge-process .

4.2 .1 . Imagery used for experimentation . In orderto meet the aims set out above, we have used bothsynthetic and natural imagery . The synthetic imageswere of two types . Firstly, there were those con-taining step edges with a variety of orientations whichwere corrupted by different levels of additive Gaus-sian noise . Secondly, there were images containingnon-step profiles, again corrupted by additive Gaus-sian noise ; such profiles simulate the sampling ofstep edges by realistic imaging systems . In the firstcase the initial inconsistency of the labelling was dueentirely to the corrupting effects of noise . In thesecond case additional inconsistency originates fromthe sampling procedure . It is important to dem-onstrate a competence to label both step and nonstep profiles since both types are manifest in images

Fig . 3 . Labellings obtained with successive iterations .

of natural scenes. We will demonstrate that ourdiscrete relaxation scheme can successfully label thevariety of edge profiles caused by sampling andillumination in natural scenes .

4.2.2 . Iterative behaviour. Figure 3 shows asequence of labellings obtained after each successiveiteration of the discrete relaxation procedure . Theinitial image consisted entirely of step edges cor-rupted by additive Gaussian noise of signal-to noiseratio (SNR) equal to 10 :3. For this example, we haveused the full criterion function given by Scheme Bwith P, controlled according to a schedule in whichits value decreases by a factor of 2 with each iteration ;the reason for restricting ourselves to this schemewill be evident to the reader from the followingsection . As P, is reduced the number of inconsistentlabel configurations steadily decreases. The finalvalue of P, is 0.001 and there are very few residualinconsistencies .

4 .2 .3 . Comparison of the discrete relaxationschemes. Figure 4 shows the initial non-contextuallabelling for a synthetic image of a circle cir-cumscribed by a square . This image consists entirelyof step profiles and has been corrupted by additive

Discrete relaxation

725

Gaussian noise of SNR = 25 :8. Figures 5(a) to 5(h)show the results of applying discrete relaxation usingthe criterion functions developed in Section 3 .5 tothe initial non-contextual labelling . The result shownin Fig. 5(a) has been obtained with 10 iterations ofScheme A using a fixed value of the label errorprobability, P, = 0 .05 . This is almost identical to theresult shown in Fig. 5(b) which was obtained usingScheme C with the same value of P, .

The iteration-dependent variation of the labelerror probability, i .e. Scheme B, has been used toobtain the results shown in Figs . 5(c), 5(d) and 5(e) .In Fig. 5(c) the label error probability has been re-estimated at each iteration using the coding schemeoutlined in Section 3 .4 . The initial value of P, was0.14 and after six iterations it had stabilised to within5% at 0.05 . Figure 5(d), on the other hand, has beenobtained with 10 iterations of Scheme B in which P,reduces by a factor of 2 with each iteration from aninitial value of 0 .5. The result obtained using theempirical deterministic schedule is more consistentthan that obtained using the label error estimationprocedure. This is a consequence of the differentfinal values of P, . In the case of estimation P, = 0 .05while in the case of the deterministic schedule P, =0.001 . Figure 5(e) demonstrates that the two stra-

Fig . 4 . Initial labelling for a SNR - 25:8 synthetic image .

726

r

EDWIN HANCOCK and JOSEF Krrnn

(c)

(e)

Fig . 5(a-f) . Comparison of relaxation schemes .

tegies for controlling the behaviour of P, can becombined in a complementary way to advantage .The estimation procedure was employed until thevalue of P, stabilised to within 5% ; subsequent iter-ations were obtained by reducing P, according to thedeterministic schedule .

Scheme D has been used to obtain the result shownin Fig. 5(f) with a = 0.05 and q = 3, as required bythe edge labelling dictionary . The result is poorcompared with those obtained using Schemes A, Band C. However, it should be borne in mind thatScheme D only applies under the condition of smalllabel error probability. To demonstrate the per-formance of Scheme D under conditions in which itis expected to satisfy this prerequisite we have useda SNR = 25 :4 circle-inside-square image . Figure 6shows the result obtained using Scheme A while Fig .7 shows that obtained using Scheme D. The resultsare comparable and very satisfactory .

r

(b)

(d)

Fig. 6 . Result obtained using Scheme A on a low noiseimage .

Fig . 7 . Result obtained using Scheme D on a low noiseimage .

From the experimentation described above weconclude that the use of the full congruency modelof the criterion function, i .e. Schemes A and B,result in the best performance . There is little lossof performance if the simplification presented inScheme C is used . Scheme D does not perform wellat high levels of label corruption . When Scheme Bis used the consistency of the final labelling is onlyguaranteed if P, is controlled by a deterministicschedule ; in the remainder of this section we willconfine our attention to the use of this scheme .

Discrete relaxation

727

4.2.5 . Departures from memoryless label corrup-tion. The synthetic images described above simulatethe sampling of a step-profile by a device of perfectspatial resolution which is subject to thermal noisecontamination . This is neither a complete model ofthe appearance of step-edges produced by realisticimaging systems nor of the type of edge-profilespresent in natural scenes . For the discrete relaxationprocedure to be of any utility, it should be capable oflabelling the variety of edge-types present in naturalscenes sampled by imaging devices of finite resolvingpower. As indicated in Section 4 .1, the effects offinite sampling width and illumination gradients inscenes introduce labelling inconsistencies which donot satisfy our assumptions of independence outlinedin Section 3.4. It is therefore important to dem-onstrate that the discrete relaxation approach toedge-labelling is not limited by departures from thememoryless model of the label corrupting process .

4.2.4. Robustness to noise. The experimentationdescribed above leads us to conclude that the useof Scheme B with P, controlled by a deterministicschedule results in the most satisfactory perfor-mance. Figure 8 shows the result of applying thisscheme to a sequence of synthetic images havingdifferent levels of noise corruption ; moving from leftto right the subimages correspond to signal-to-noiseratios of 10 :1,10:2,10:3,10 :4, 10 :5. Each subimagewas obtained after 10 iterations of Scheme B . Theprocedure appears robust up to values of SNR =10 :3 .

It is encouraging to note that even at high levelsof noise corruption the number of spuriously labellededge pixels is small .

Fig. 9(a-c) . Labelling of noise free non-step profile .

(a)

Fig . 8 . Labellings obtained from SNR - 10:1, 10 :2,10 :3, 10 :4 and 10 :5 images .

(b)

(c)

728

EDwIN Hn cocK and JOSEF KITrLER

To determine the robustness of the discrete relax-ation procedure to label errors caused by samplingbias, we have studied synthetic non-step profiles .Following Nalwa,tzdl we have used hyperbolic-tan-gent profiles to simulate the combined effects ofcamera sampling and illumination gradients . Figure9(a) shows such a profile of width 3 pixels . In Fig .9(b) it can be seen that the initial labelling of thisprofile assigns the same edge-label in a lane 6 pixelswide. The discrete relaxation procedure limits thefinal edge to a single pixel width (Fig . 9c) .

To simulate the combined effects of finite samplingwidth and thermal noise contamination, we haveadded Gaussian noise to a profile which is two pixelswide ; the SNR varies from 50:0 at the top of theimage to 50 :8 at the bottom. Figure 10(a) shows thesynthetic profile ; its initial labelling is shown in Fig .10(b). After 10 iterations of the discrete relaxationprocedure, the labelling shown in Fig . 10(c) isobtained . This is an encouraging observation sinceit indicates that the discrete relaxation procedureworks well even when the source of inconsistencydoes not necessarily correspond to the model of thelabel error process presented in Section 3 .4 .

4 .2 .6 . Dependence upon size of label-set . Figure11 shows a comparison of the use of the four differentdictionaries described above for a SNR = 25 :4, syn-thetic image ; Fig . 11 (a) shows the result of using thetwo-label dictionary, Fig. 11(b) that using the nine-label dictionary, Fig . 11(c) that using the full five-label dictionary and Fig . 11(d) that using the prunedfive-label dictionary . The result obtained using thetwo-label dictionary is the most prone to noise . Thisis understandable since this dictionary draws on avery limited representation of the edge-structure .There is little difference betwen the labellingsobtained using the five and nine label dictionaries .From the point of view of computational efficiencyit is worth noting that the pruned five label dictionaryalso performs in a satisfactory way .

It should be noted that both the five and nine labeldictionaries result in good identification of all edgeorientations .

4.2 .7 . Comparison with Canny's algorithm. InFig. 12 we show the best labelling that could beobtained for the SNR = 25 :8 image using animplementation of Canny's algorithm . The im-

(a)

Fig . 10(a-c) . Labelling of noisy non-step profile .

1%(c)

(a)

(c)

plementation draws on antisymmetric derivativeof Gaussian operators to determine gradient vectorsfrom the image data . Comparability with the sub-jective performance of the discrete relaxationmethod was only achieved when the support maskfor the operators was greater than seven pixels . Thehysteresis thresholds were chosen to give the bestsubjective balance between genuine edge featuresand noise. The erosion of curvature at the comersof the square is most evident . By contrast, the resultsobtained using discrete relaxation (Fig . 5) faithfullyreconstructs the comer features .

This is not intended as an exhaustive comparisonwith the work of Canny ; rather it is intended to givean indication of the relative performance of the twoapproaches in conditions complying with our modelof the edge-process . The discrete relaxation pro-cedure clearly has a greater potential to representnoise corrupted edge features . Canny determines thederivatives of the image luminance function usingmuch larger support masks than our model of theedge-process. After non-maximal suppression of thedirectional derivatives, Canny performs hysteresislinking by coarsely quantising the gradient infor-

Discrete relaxation

(b)

(d)r

Fig . I1. Comparison of results obtained using different dictionaries .

mation into three intervals depending on the con-nectivity of candidate edge-pixels ; in the discreterelaxation scheme there is a much finer metric ofconsistency . It is therefore not surprising that the

M~r

72 9

Fig . 12 . Result obtained using Canny's method on a SNR =25/8 image .

730 EDWIN HAN000K and 3OSEF KIT LER

Fig. 13 . Initial labelling of a natural scene .

I

Fig, 14 . Result obtained using Scheme B .

f

I'

discrete relaxation algorithm is better at locatingimage features under severe levels of label corrup-tion .

4 .2 .8 . Experiments with natural images. Figure 13shows the initial labelling of an image from theUniversity of Southern California database . We havecompared the results of Scheme B and Scheme Dfor the same initial labellings ; the results are givenin Figs 14 and 15 . Scheme B has been used since itappears to be the most successful of the congruencymodels for resolving the inconsistencies that arisethrough sampling bias . Scheme D, on the other hand,provides an indication of the results that are to beexpected if congruency is not incorporated into themodel of the label process . As expected, the con-sistency obtained with Scheme B is much greaterthan that resulting if Scheme D is used . The labellingobtained by Scheme B could certainly be used as theinput to higher level image interpretation since theedges located have very good connectivity and con-tain important features such as corners and t-junc-tions. For the purposes of comparison the resultsobtained using Canny's method are shown in Fig .16. The results are comparable with those obtainedby discrete relaxation .

S. CONCLUSIONS

We have developed a discrete relaxation pro-cedure that is realised as iterative local label replace-Pt 23 ;1-t

Discrete relaxation

Fig . 15 . Result obtained using Scheme D .

731

ment. This is to be contrasted with the exhaustivegeneration and search of global consistent labellings .The iterative labelling procedure is an improvementon existing relaxation schemes since it aims to locatea globally optimal MAP estimate with improvedconsistency properties . This represents a significantadvance . Existing techniques invariably achieve thelocalisation of computation at the expense of repre-sentational capacity of the label process model . Ourimprovement is achieved by developing a model ofthe label process which is not just based on countinginconsistent labellings but upon measuring theircongruency .

Our representation of the label process is basedon dictionaries of admissible configurations forsupercliques of the objects under consideration. Wedo not admit non-physical labellings on the super-cliques . The legalisation of non-physical labellings,which is a feature of previous schemes, can resultin inconsistency in the global labelling . We haveovercome the problem of estimating the probabilityof non-physical labellings, by introducing the idea ofa label corrupting process . Inconsistent labellings areregarded as corrupted versions of consistent label-lings which belong to the dictionary . Under certainnon-restrictive assumptions, the label corruptingprocess can be modelled by a binomial distributionof label errors . The number of such errors can bemeasured by the congruency between dictionaryitems and inconsistent labellings ; a consistent label-

732

EDWIN HANCOCK and JOSEF KITILER

ling being one that is isomorphic with a dictionaryitem. This is a new conceptual ingredient of relax-ation schemes .Under certain simplifying assumptions we have

developed some specific realisations of the discreterelaxation algorithm . The advantages of these label-ling schemes have been demonstrated for the highlystructured edge application . The results indicate thatthe congruency model of the label process developedin this paper has a superior ability to resolve incon-sistencies .

SUMMARY-DISCRETE RELAXATION

Section 1 of the paper is concerned with reviewingexisting approaches to the use of contextual infor-mation in object-labelling and focuses in detail onthe family of discrete relaxation algorithms . Theterm discrete relaxation refers to a technique whichaims at finding global consistent interpretations ofarrangements of image entities through the iterativeupdating of symbolic label assignments . In the origi-nal algorithms, the representation was purely sym-bolic, utilising a dictionary of consistent labellings toencapsulate knowledge of the application in hand .Recent work has aimed at admitting observationalinformation by regarding the global assignment oflabels as MAP estimation .

Section 2 pursues the specific issue of how discreterelaxation can be addressed as MAP estimation . Itis demonstrated how a global optimum of the aposteriori probability may be obtained by iterativesingle label replacements . The realisation of this

Fig . 16 . Result obtained using Canny's method .

optimisation procedure requries a model of the glo-bal label process. We make the observation thatexisting models of the global label process abandonthe powerful representation of constraints providedby the dictionary, which reduces their capacity toadequately model consistency .

Section 3 is aimed at overcoming this shortcoming .We commence with a global measure of consistencywhich is based on the heuristic criterion functionproposed by Hummel and Zucker . However, weextend this formulation in an important respect ; weregard the initial inconsistent labelling, which isusually obtained by a non contextual means, as acorrupted version of a consistent labelling in thedictionary . If the label corrupting process is assumedto act independently on each object, then the numberlabel-errors follows a binomial distribution ; the par-ameter of the binomial distribution is the class-inde-pendent label error probability which can beestimated in a straightforward way . Each incon-sistent label configuration contributes an amount tothe criterion function which is determined by itscongruency with the uncorrupted labellings belong-ing to the dictionary . Rather than coarsely countingthe number of consistent labellings, the resultingcriterion function finely measures the congruency ofthe global labelling with respect to the dictionary ofconsistent items, i .e . it has a greater capacity tomodel consistency .

Without simplification the criterion function is acompound quantity that is computed as a sum ofexponential functions of the congruency measureover the entire dictionary of consistent labellings .

Under the assumption that the label error probabilityis small, we present criterion functions that are ofreduced complexity and which can be used to com-putational advantage . With an increasing number ofiterations of the discrete relaxation procedure, theconsistency of labelling is expected to increase, i .e .the label error probability is expected to decrease .To take advantage of this observation we presentcontrol strategies based on reducing the label errorprobability according to a deterministic schedule andalso upon estimation .

Section 4 presents an application of discrete relax-ation to the postprocessing of raw edge information .The necessary details of the measurement acquisitionmodel for directional derivatives and the dictionaryof consistent edge labellings are given . The discreterelaxation approach is contrasted with Canny's post-processing of edge information. The origins of labelcorruption and the implied limitations of the assumedmodel of the label error process are discussed . Anextensive comparative study of the discrete relax-ation procedures resulting from the use of differentcriterion functions, control strategies and diction-aries is presented. This study is based on both syn-thetic and natural imagery . Some further comparisonis provided by Canny's algorithm .

Finally, Section 5 presents some conclusions .

REFERENCES

1 . A . Rosenfeld, R. A. Hummel and S . W . Zucker, Scenelabelling by relaxation operations, IEEE SMC SMC-6,420-433 (1976) .

2 . D . L. Waltz, Understanding line drawings of sceneswith shadows, in The Psychology of Computer Vision,ed . P . H . Winston. McGraw-Hill, New York (1975) .

3 . J . Kittler and J . Foglein, Contextual classification ofmultispectral pixel data, Image and Vision Computing,2, 13-29 (1984) .

4 . J . Kittler and D . Pairman, Contextual pattern recognition applied to cloud detection and identification,IEEE Trams. Geosc. Remove Sensing GE23, 825 (1985) .

5 . O . D . Faugerasand M. Berthod, Improving consistencyand reducing ambiguity in stochastic labelling : an opti-

Discrete relaxation

733

mization approach, IEEE PAM[ PAMI-3. 412-424(1981) .

6 . J . Kittler and E . R. Hancock, Combining evidencein probabilistic relaxation . Int. 1. Pattern RecognitionArtif. Intell. 3, 29-52 (1989) .

7 . E. R. Hancock and 1. Kittler, Edge-labelling usingdictionary-based relaxation, IEEE PAMI PAMI-12,165-181 (1990) .

8 . S. Geman and D . Geman, Stochastic relaxation . Gibbsdistributions and Bayesian restoration of images, IEEEPAMI PAMI-6. 721-741 (1984) .

9 . T. S. Yu and K . S . Fu, Recursive contextual classifi-cation using a spatial stochastic model, Pattern Rec-ognition 16, 89-108 (1983) .

10 . J . E . Besag, Spatial interaction and the statistical analy-sis of lattice systems (with discussion) . J. Royal Stat.Soc., Series B 36, 192-236 (1974) .

It . J . E . Bcsag . On the statistical analysis of dirty pictures,l. Royal Star . Soc., Series B (in press) .

12 . P . H . Swain, S . B. Vardeman and J . C . Tilton, Con-textual classification of multispectral data, Pattern Rec .ognition 13, 429-441 (1981) .

13 . K . S . Fu and T . S. Yu, Statistical Pattern Classificationusing Contextual Information . Research Studies Press,John Wiley, New York (1980) .

14 . J . R . Welch and K . G . Salter, A context algorithm forpattern recognition and image interpretation, IEEESMC SMC-1, 24-30 (1971) .

15 . D. A. Huffman, Impossible objects as nonsense sen-tences, Mach . Intell . 6, 295-323 (1971) .

16 . M . B . Clowes, On seeing things, Artif lntell. 2, 79-116(1971) .

17 . R . A . Hummel and S . W. Zucker, On the foundationsof relaxation labelling processes, IEEE PAMI PAMI-5, 267-287 (1983) .

18. W. W . Bledsoe and 1 . Browning, Pattern recognitionand reading by machine, Proc . Eastern Joint Comput .Conf 16, 225-232 (1959) .

19 . J . Kittler and J . Foglein, Contextual decision rule forobjects in lattice configurations, Proc. 7th ICPR, Mon-treal 1, 270-272 (1984) .

20. L. A. Spacek, Edge detection and motion detection,Image Vision Comput. 4, 43-56 (1986) .

21 . 1 . F . Canny, Finding edges and lines in images, M.I.T.Technical Report 720 (1983) .

22 . J . F . Canny, A computational approach to edge detec-tion, IEEE PAMI PAMI-8, 679-700 (1986) .

23 . J . Kittler, 1 . lllingworth, J . Foglein and K . Paler, Anautomatic thresholding algorithm and its performance,Proc. 7th ICPR, Montreal (1984) .

24. V . S . Nalwa, On detecting edges, IEEE PAM! PAMI-8, 701-714 (1986) .

About the author-EDWIN HANCOCK received his B .Sc . in Physics in 1977 and Ph .D. degree for workin High Energy Physics in 1981, both from the University of Durham . From 1981 until 1985 he wasemployed as a Senior Research Associate at the Rutherford Appleton Laboratory . During this period,he was involved on an experiment at the Stanford Linear Accelerator Centre, U .S.A ., to detect charmedparticles using high resolution imaging techniques . Since 1985 Dr Hancock has been working in thefields of pattern recognition and computer vision, being employed in the Informatics Department of theRutherford Appleton Laboratory . Dr Hancock is also an Associate Lecturer in the Department ofElectrical Engineering at the University of Surrey . His research interests include the use of contextualinformation in pattern recognition and image labelling problems .

About the author-Dr JOSEF KtrnFR has been working in the field of pattern recognition, image analysisand computer vision since 1970 . In 1986 he joined the Department of Electronic and ElectricalEngineering at Surrey University as Reader in Information Technology . He is a member of the EditorialBoards of IEEE transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition Letters,Pattern Recognition, the International Journal of Pattern Recognition and Artificial Intelligence, and,Image and Vision Computing. He is the author or co-author of more than a hundred publications whichinclude 5 books (one authored and four edited) and more than 40 papers in refereed internationaljournals.