object recognition in digital photogrammetry

20
Photogrammetric Record, 16(95): 743–762 (April 2000) OBJECT RECOGNITION IN DIGITAL PHOTOGRAMMETRY By T. SCHENK Ohio State University (Paper read at a Technical Meeting of the Photogrammetric Society on 3rd November, 1998) Abstract Object recognition and image understanding have increasingly become major subjects of interest for research activity in digital photogrammetry. This paper provides an overview of object recognition in photogrammetry, beginning with a problem statement and brief paradigm description. In order to exemplify the concept, automatic interior orien- tation is presented as an object recognition problem. Subsequent sections discuss the current status of object recognition by identifying relevant criteria, such as modelling, system strategies and inference components. Such criteria are useful for comparing object recognition systems or proposed approaches. Strengths and weaknesses of current systems are summarized, followed by a more detailed analysis of the modelling problem. Finally, two new approaches (scale-space and fusion of multisensor/multi- spectral data) are mentioned. These approaches serve as examples of promising new trends which have the potential of advancing object recogni- tion to a new level. KEY WORDS: automatic interior orientation, digital photogrammetry, modelling, object recognition INTRODUCTION AN INCREASING AMOUNT of research activity in digital photogrammetry is being devoted to object recognition and image understanding. For example, two thirds of the papers presented at the ISPRS Commission III symposium held in Columbus, Ohio in July, 1998, addressed issues related to recognizing objects. Moreover, special workshops on this topic indicate growing interest. In view of these efforts, it is interesting to consider when research will reach operational maturity, perhaps to the extent of building a map machine that could compile maps automatically. Nobody really believes that a general purpose automatic map making system is likely to be launched soon. Rather, the consensus of opinion is that map making is a difficult problem to solve, with the intriguing dichotomy that in practice a map can be produced very successfully, but apparently it is not known exactly how this end result is achieved, otherwise it should be possible to instruct a computer to carry out 743 Photogrammetric Record, 16(95), 2000

Upload: independent

Post on 16-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

PhotogrammetricRecord, 16(95): 743–762(April 2000)

OBJECTRECOGNITION IN DIGITALPHOTOGRAMMETRY

By T. SCHENK

Ohio StateUniversity

(Paperread at a Technical Meeting of the PhotogrammetricSociety on3rd November,1998)

Abstract

Object recognition and image understanding have increasinglybecome major subjects of interest for research activity in digitalphotogrammetry.This paperprovidesan overviewof objectrecognitioninphotogrammetry,beginningwith a problemstatementand brief paradigmdescription.In order to exemplifythe concept,automatic interior orien-tation is presentedas an object recognitionproblem.Subsequentsectionsdiscussthe current status of object recognition by identifying relevantcriteria, such as modelling, systemstrategiesand inferencecomponents.Such criteria are useful for comparing object recognition systemsorproposedapproaches.Strengthsand weaknessesof current systemsaresummarized,followedbya moredetailedanalysisof themodellingproblem.Finally, two newapproaches(scale-spaceand fusionof multisensor/multi-spectral data) are mentioned.Theseapproachesserve as examplesofpromisingnewtrendswhichhavethepotentialof advancingobjectrecogni-tion to a new level.

KEY WORDS: automatic interior orientation, digital photogrammetry,modelling,object recognition

INTRODUCTION

AN INCREASING AMOUNT of researchactivity in digital photogrammetryis beingdevotedto object recognitionand imageunderstanding.For example,two thirds ofthe paperspresentedat the ISPRSCommissionIII symposiumheld in Columbus,Ohio in July,1998,addressedissuesrelatedto recognizingobjects.Moreover,specialworkshopson this topic indicate growing interest. In view of theseefforts, it isinterestingto considerwhenresearchwill reachoperationalmaturity,perhapsto theextentof building a mapmachinethat could compilemapsautomatically.

Nobodyreally believesthat a generalpurposeautomaticmapmakingsystemislikely to be launchedsoon.Rather,the consensusof opinion is that mapmakingis adifficult problemto solve,with the intriguing dichotomythat in practicea mapcanbe producedvery successfully,but apparentlyit is not known exactlyhow this endresult is achieved,otherwiseit shouldbe possibleto instructa computerto carry out

743PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

the process.This paper identifies someof the major obstaclesencounteredin thebuilding of object recognition systemsand it discussespromising approachestocircumventtheseobstacles.

The next section introduces an object recognition paradigm. It should beconsideredasa type of masterplan which manyresearchersfollow in their attemptsto recognizeobjects.It is thendemonstratedhowtheparadigmcanbeappliedto solvethe relativelysimpleproblemof interior orientation.The remainingsectionssumma-rize theexperiencegainedwith recognitionsystems,with anemphasison modelling.

It is importantto distinguishbetweenautomaticandautonomoussystems.It hasbecomecustomaryto usethetermautomaticfor a computer-supportedprocess.Otherterms,suchassemi-automaticor 90 percentautomatic,indicatethedegreeto whicha humanoperatoris involvedin theautomaticprocess.However,whereprocessesareentirely executedin an environmentnot supportedby humanoperators,the termautonomousis used in this paper. Although emphasisis placed on autonomousprocesses,automaticsystemsstill playanimportantrole,becausetheywill remaintheonly operationalsolution for sometime to come.

OBJECTRECOGNITION PARADIGM

Fig. 1 givesa schematicdepictionof a computervision paradigm,which shouldnot be takentoo literally becausein practicea uniformly acceptableparadigmdoesnot exist.However,nearlyeverybodyagreesthatvision mustbesolvedin a modularfashion,since the differencebetweeninput and output is so huge that a stepwiseapproachis required.Theexactnatureof modulesrepresentinganordereddecompo-sition of vision and their interrelationshipare still a matterof considerabledebate.

Vision beginswith imageformation.Sincecomputervision may be viewedastheinverseprocessof imageformation,a thoroughunderstandingof imageformationis an obviousprerequisite.

Low level imageprocessingtasks,suchasgrey level modifications,providethetransformationfrom the raw to the preprocessedimage.Defectsof the raw image,causedby the image acquisition system,can also be removedat this stage.Theprocessof extractinguseful information thenbegins.Examplesof suchinformationextraction include the detection of corner points and edges,both of which arediscontinuitiesof the imagefunction.Edgesarelikely to havebeencausedby eventsin theobjectspacethatarerelatedto objects,for exampleboundaries,surfacemarksand surfacediscontinuities.The edgepixels are linked to edgesand are eventuallygroupedinto higher level entities,suchasstraightlines, arcsandparallel lines.

Segmentationis anothermeansof discoveringstructuresin the image whichcorrespondto sceneevents.In this case,imageregionsaresought,basedon similarcharacteristicsof spatially coherentpixels. Texture, for example,can be used todefineimageregions.Aggregatesof pixels with similar propertiesmay be groupedinto higherorder regions,perhapsby taking domainknowledgeinto account.

It haslong beenrecognizedthat shapeplaysa fundamentalrole in the recogni-tion of objects(considertheenormousfeatof thehumanvisualsystemin recognizingfacesbasedon a few strokesof theartist).Theshapeof a surfaceis a vital componentof object recognition and different cues are used to provide shapeinformation.Probablystereoscopyfirst comesto mind to the photogrammetrist.However,thereare equally importantcues,suchas shading,motion, textureand colour, which arecollectively known as “shapefrom X”.

The2·5Dsketchis a representationalcollectionof theresultsof theearlyvisionprocesses.In the vision paradigmpresentedby Marr (1982),the 2·5Dsketchshouldbe compiledwithout any prior knowledgeof the appearanceof the scene,nor thepotentialapplicationof thevision system(for example,recognizingbuildingsfrom an

744 PhotogrammetricRecord, 16(95),2000

PREPROCESSED IMAGE

2 5 D SKETCH

SCENE DESCRIPTION

image processing

edge detection segmentation

grouping shape from X

domain knowledge matching model building

ea

rly vision

late

vision

?

RAW IMAGE

.

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 1. Paradigmoverviewof computervision.

aerial scene, inspecting parts on a conveyor belt or navigating a robot). Thisrecommendationis frequently violated in real vision systemswhere, for example,segmentationand groupingare performedwith a particularapplicationin mind. Itshouldbeappreciatedthata systemthenbecomesspecificandapplicationdependent,which is only acceptablewherethe systemis not (mis)usedfor anotherapplicationthat would requiredifferent procedures.It is not surprisingthat the mostsuccessful

745PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

visionsystemsareapplicationdependent;thequestof buildinggeneralpurposevisionsystemsremainsa dream.

It is importantto realizethat the datain the 2·5D sketchcontainmoreexplicitinformationaboutthe scenethan the raw image.For example,an edgecould be anobjectboundaryor a shadow.However,a singlepixel canhaveany function.Depthand threedimensional(3D) shapeinformation are particularly important.The 2·5Dsketch is the transition from image spaceto object space.Subsequentprocesses,termedlate vision, aresceneorientedratherthan imageoriented.

The processof grouping (that is, the perceptualorganizationof extractedfeatures)can also be carried out after the 2·5D sketch has been derived. Someresearcherscall grouping“middle level vision”, in orderto emphasizetheimportanceof perceptualorganizationandparameterizationof features(SarkarandBoyer,1994).

If the aim of the vision systemis object recognition, then a data basewithmodelsof objectsis generated(model building). The groupedfeaturesare matchedwith theobjectlibrary. Usually thereis no perfectagreementbetweenthemodelandtheextractedfeatures;in a complexsetting,an inferenceprocessis thenstarted,withthe aim of minimizing andexplainingthe remainingdifferences.

INTERIOR ORIENTATION AS AN OBJECTRECOGNITION EXAMPLE

Backgroundand ProblemStatement

Theobjectrecognitionparadigmcanbeemployedfor solvinga relativelysimplephotogrammetricproblem:interior orientation.The purposeof interior orientationisto establisha transformationfrom the pixel systemto the imageco-ordinatesystem,which has the perspectivecentreas origin. If imagesare obtainedwith a digitalcamera,the transformationparametersareknownandthe interior orientationreducesto a simple translation.

An alternativeprocedureis requiredif digitized aerialphotographsareused.Inthis case,thetransformationparametersareunknownandtheymustbedeterminedbymeasuringthe fiducial marks.The challengeis to computethe fiducial mark centresasreliably andaccuratelyaspossible.More specifically,autonomousinterior orien-tation hasthe following objectives.

(1) Identificationandsub-pixellocalizationof fiducial marksarethemainaims.Identification includes the task of determiningwhich individual fiducialmarkhasbeendetected.Sub-pixellocalizationis requiredbecausethepixelsizeis likely to be largerthantheexpectedprecisionof thefiducial centres.

(2) The autonomousprocessrequiresa generalandrobustsolution,accommo-dating different typesof fiducial marks.A systemis robust if it can copewith differentproblemsastheymayoccurin a productionenvironment.Forexample,the film may be placed upside down in the scanneror somefiducials may be only partially digitized. In addition to operatormistakes,thereare film imperfectionsto consider,suchas noise,blemishesor overexposurewhen the fiducial mark is projectedon to the film. Fig. 2 showssome fiducial marks with different degreesof complexity; an automaticsystemis requiredto identify and locatethempreciselyand reliably.

(3) Thesystemmustaccommodateprovisionof eitherdiapositivesor negatives.(4) It must be possibleto handleeithercolour or black andwhite film.(5) The systemmust be able to acceptcoarseresolutionimagery,where the

fiducial centremay havebeenlost. Providedthat the fiducial mark is stilluniquely identifiable,the location may be determinedfrom the descriptivefeaturesof a fiducial.

746 PhotogrammetricRecord, 16(95),2000

(a) (b)

(c) (d)

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 2. Different typesof fiducial marks,in variousstatesof condition.

The chief problemto be solved is the identificationof the fiducial marksandtheir preciseand robust localization. It is reasonableto separatethesetwo tasks.Approachesmay pursueoneof the following two strategies.

(1) Area-basedApproach. The sub-imagethat contains a fiducial mark isbinarized.Thepreciselocalizationis performedby cross-correlatinganexactcopyofthe fiducial mark with the foreground image. Most interior orientation systemsdescribedin theliteraturefollow this approach,for exampleLue (1997)andSchickler(1995).

747PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

(2) Feature-basedApproach. In this case,extractedfeaturesare matchedwiththe featuresof the ideal fiducial mark.This approachcategorizesinterior orientationas an object recognitionproblem.Featuresto be extractedconsistof elementsof afiducial mark, for example straight lines, circles, crossesand squares.Differentmatchingmethodsmay be employed,for exampleshapematchingand relationalmatching(Schenk,1999).

Feature-basedApproach

Fiducialmarksareartificial objectswhich areprojectedon to thefilm at thetimeof exposure.Nearly everyfiducial mark hasa simple,regularshape,which encour-agesthe representationof fiducialsasstructuraldescriptions,requiringthe detectionof thesestructuralelementsin the image.

In thefollowing discussion,thesamplefiducial markshownin Fig. 2(c) is used.This example is representativeof typical fiducial marks, except perhapsfor theblemish(a film scratchright throughthe fiducial mark), which illustratesoneof themany problems that a robust system must be able to handle in an operationalenvironment.

Therearethreestructuralelements(shapeprimitives) that aremost likely to bepresentin any fiducial mark: straight lines, circles and discs.From the designdatasuppliedby the cameramanufacturer,the exactdimensionsof the fiducial marksareknown, suchas the radii of circlesor the lengthsof lines and their spatialrelation-ships.

Usingtheshapeprimitivesandtheir relationships,higherstructurescanbebuilt.For example,two concentriccircles of different radii constitutean annulus;twoparallelline segmentsbuild a line pair; line pairswith differentorientationsmakeupa cross.Continuing in this manner,it can be appreciatedthat by using only a fewsimpleshapeprimitives, togetherwith the multiple relationshipsbetweenthe primi-tives, it is possibleto build up a powerful structuraldescriptionof a fiducial mark.Suchdescriptionsarequitegeneralandcanbeusedfor manydifferent fiducial marktypes,providedthat they are madeup of circles, straight lines and,perhaps,discs.

Fig. 3 shows the shapeprimitives and derived structures,including spatialrelationships.With thesesimpleconstructs,it is possibleto devisea powerfulstrategyfor detectingand localizing fiducial marks.

(1) Detectedgepixels as the original part of the shapeprimitives.(2) Group edge pixels togetherthat belong to a single shapeprimitive, for

examplestraightline segmentor circle.(3) Checktherelationshipsbetweentheshapeprimitivesandbuild higherorder

structures,for exampleannuli, line pairsandcrosses.(4) Computethe fiducial mark centrefrom all the shapeprimitives.

This approachmaybeconsideredasstructuralmatching,althoughthematchingof the shapeprimitives and their spatial relationshipsare not performed simul-taneously.After eachgroupingprocess,the relationshipsare checked.If the resultsaresatisfactory,the confidencelevel relatedto the recognitionis increased.

Circular shapeprimitives canbe detectedby switching to a suitableparameterrepresentation(Hough space).A circle can be representedin the spatialdomainby

(x – x0)2 1 (y2 y0)2 5 r2, (1)

wherex0, y0 aretheco-ordinatesof thecentre,r is theradiusandx, y arethevariables.

748 PhotogrammetricRecord, 16(95),2000

Structure Relationships

Line segment Graphic primitive

Circles Graphic primitive

Line pair 2 parallel line segments

Line pair with gap 2 parallel line pairs

Cross2 parallel line pairs with gap;symmetric; perpendicular

Annulus 2 concentric circles

Shape primitive

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 3. Structureswhich canbebuilt from shapeprimitivesfor fiducial marks.Line segmentsandcirclesaretheshapeprimitives.Basedon these,annuliandcrossescanbeconstructedandtheyform importantpartsof

thestructuraldescription.

Let xi, yi be a point on the circle and let x0, y0, r be variables.Then therepresentationin the parameterspaceis againa circle

(x0 2 xi)2 1 (y0 2 yi)2 5 r2. (2)A simple relationshiphas beenestablishedbetweenspatial domain and par-

ameterspace.A point on a circle in thespatialdomaintransformsinto a circle in theparameterspace,wherethe centreis given by the co-ordinatesof the point.

Fig. 4 illustrates the relationship.Points 1 to 5 lie on a circle in the spatialdomain.Thesepointsall generatecirclesin theHoughspacethat intersectin a singlepoint, the centreof the circle in the spatialdomain.

This procedurecanbeappliedto find thecirclesthatarepartof a fiducial mark.The radii of thesecircles are accuratelyknown from designdata. Hence the 3Dparameterspace can be reduced to two dimensions.Fig. 5(a) is a sub-imageconsistingof 5123 512pixels,with a pixel sizeof about40 lm. This relatively largeimagesize is selectedto ensurethat it will containthe fiducial mark. The first stepentailsdetectingedges,as shownin Fig. 5(b). For every edgepixel, the respectivecircles are generatedin the parameterspace.Considerthe parameterspaceto berepresentedas an image,called an accumulatorarray.The pixels turnedon by thecirclesmust thenbe incremented.After processingall edgepixels,a searchis madein the accumulatorarrayfor peaks.The numberof peaksindicateshow manypixelslie on a circle andthe row andcolumnof a peakindicatethe centreof the circle inthe image.Fig. 5(c) showsthe pixels which are part of the annulusof the fiducialmark.

Using theHoughTransformapproach,straightline segmentscanbedetectedina similar fashion.Fig. 5(d) showsthepixelswhich arepartof thecrossof thefiducialmark.

749PhotogrammetricRecord, 16(95),2000

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

x axis

y a

xis

1

2

3

4

5

6

7

-80 -70 -60 -50 -40 -30 -20 -10 0 10 20-90

-80

-70

-60

-50

-40

-30

-20

-10

Parameter aP

aram

eter

b

41

2

35

6

7

( a) (b )

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 4. The relationshipbetweenpointson a circle in the spatialdomain(a) andtheir representationin theparameterspace(b). Pointsbecomecircles,for examplepoint1 transformsinto circle1. Pointsonacircle inthespatialdomain,for examplepoints1 to 5, generatecirclesin theparameterspacethatintersectin asingle

point, which is thecentreof thecircle in thespatialdomain.

PreciseLocalization

Successwith identifying the major structuresof the fiducial mark suggeststhatfor the determinationof its centrea different approachcanbe taken.The centrecanbecomputedfrom thestructuralelements,ratherthandirectly from thecentrepixels.In the examplegiven in Fig. 2(c), two concentriccirclesandfour straightline pairsindicatethe centre.Hence,a largenumberof pixels contributeto the locationof thecentre.

With knowledgeof theattitudeof thefiducialmark,it is first possibleto considerdeterminingthe edgepixels more accurately,using specializededgeoperatorsforstraightlines at a given orientation,or for circular shapeswith a known centreandradius.Next, thecentrecanbecomputedwith a leastsquaresadjustment.Thecentreof the fiducial mark is determinedby the centres of the two circles and theintersectionof the line pairs. This geometricalconfigurationsuggestsfitting twocircles of known radii through the pixels of the outer circle and the inner circle,respectively.Then, two straight lines through the crosspixels are forced to passthroughthe centre.Every edgepixel (circle or cross)contributesoneequation.Dueto the large redundancy,blunders can be detectedand eliminated more easily.Moreover,the accuracyincreases,aswell as the reliability.

Comments

Arguably, interior orientationis oneof the simplestof photogrammetricproce-dures.Most attemptsat automatinginterior orientationare basedon matchingthegrey levels of the fiducial mark with an ideal template.In order to provide a moregeneralsolution,a feature-basedapproachhasbeensuggestedfollowing the objectrecognitionparadigm.It hasbeenshownthat evensucha mundanetask as interiororientationrequiresthe full machineryof object recognitionin order to arrive at ageneral,robustsolution.How muchmoreintricatemusta solutionbe for solving thecomplexproblemof recognizingand reconstructingobjects,suchas buildings androads?

750 PhotogrammetricRecord, 16(95),2000

(a) (b)

(c) (d)

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 5. Meansof detectingcirclesandcrosses.Thesub-imagein (a) has5123 512pixelswith a pixel sizeof40lm. A LoG operatorwith w5 2 wasusedfor detectingedges(b). Theresultof detectingcirclesis shownin (c).Here,anewsub-image,of size1283 128,is chosensuchthatthehypothesizedfiducialmarkis centred.

Pixelsthathavebeenfoundby the12 45° line detectorareshownin (d).

CURRENT STATUS OF OBJECTRECOGNITION IN DIGITAL PHOTOGRAMMETRY

Background

Digital photogrammetryhasa profoundimpacton mostphotogrammetricproce-dures,for exampleon the basicorientationtasksthat canbe performedwith a highdegreeof automation.This statementis also true for the generationof digitalorthophotographsandDEMs, althoughto a lesserextent.Of all thephotogrammetricprocesses,map compilation is the most labour intensive. Completion may takeseveralhours per model, in contrast to interior orientation (a few minutes) and

751PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

relative orientation(perhapsup to 15 minutes).Hence,there is a major economicincentiveto automatethe mapcompilationprocess.In addition to the generationoftopographicmaps,newapplicationsareemergingthatcall for automatingtheprocessof identifying andreconstructingman-madeobjects.For example,threedimensionalcity modelsare in greatdemandby plannersand telecommunicationspecialists.

Object recognition comes under the auspices of image understanding.Researchersin computervision and machineperceptionhave long tried to devisesystemscapableof interpretingscenesbasedon informationextractedfrom imagery.Imageunderstandingor sceneinterpretationis applicationdependent.Thesamescenemay be interpretedentirely differently. Imagineyou areon a safari trip. Boredwithwaiting in thevehicleuntil thetyresarechanged,you exploretheneighbourhood.Allof a suddenyou areconfrontedwith a tiger, which seemsto be readyto leapat you.Your sceneinterpretationembracesawarenessof danger,meansof getting quicklybackto thecar,summoninghelpandsoon. Now considerthesamesituation,exceptthat you sit safely in a well protectedvehicle and your interestis in analyzingthebehaviourof anattackingtiger. You mayadmireits gracefulmovements,thecoloursof its coator its impressivefangs,a completelydifferent interpretationof the samescene.

Object recognition is consideredto be a very difficult problem. Fig. 6 is anattempt to classify photogrammetricproceduresin terms of automationand com-plexity. The classpoorly understoodmeansthat systemshaveextremedifficulty inperformingthesetasksautomatically,unlike humanoperators.What is thedilemma?It seemsthat there is insufficient knowledgeconcerninghow humansreally solvetheseproblems.Althougheverylay personcanidentify buildingsandroadson aerialimagery, no one knows exactly how. It is precisely for this reasonthat softwareengineerscannotbe instructedhow to programthe task. Imageunderstandingandobject recognitiondefy general,reliablesolutions.

Object recognitionentailsthe two tasksof detectionand reconstruction. Thesetasksareusuallyapproachedin a sequentialmanner,with detectionfollowed by thereconstructionof the object. The term extraction is reservedin this paperfor theprocessof determiningfeaturesfrom the sensoryinput data,althoughit hasbecomepopularto replaceobjectrecognitionby objectextraction. Features,suchascorners,edgesandregions,rarely rendera sufficientdescriptionof an object.Hence,objectscannotbe directly extracted.

Thereconstructiontaskneedsto beexaminedin moredetail.Supposeanobject,say a building, is identified. Although straight edgesmay delineatethe roof andperhapssomeadditional featureson the ground,the reconstructionis still a formi-dabletask,becauseit entailselementsof generalization.The geometricalboundariesof objects, as stored in a GIS for example,are not identical with the physicalboundaries.Physicalboundariesare not exactly straight,nor perpendicularto oneanother.Moreover, unnecessarydetails are omitted. The differencesbetweentheratherabstractgeometricalbuilding descriptionand its real physicalboundariesaddto the complexityof object recognition.

Criteria for ClassifyingObjectRecognitionApproaches

Therecognitionof objects,suchasbuildingsandroads,from aerialandsatelliteimagery is a goal which has beenpursuedby researchersin computervision anddigital photogrammetryfor a numberof years.Severalexperimentalsystemshavebeendeveloped.Rather than describingthesesystems,details are given of a fewcriteria that canbe usedto analysethem.

752 PhotogrammetricRecord, 16(95),2000

Degree of complexity

We

llu

nd

ers

too

dP

oo

rly

un

de

rsto

od

interior orientation

relative orientation

digital orthophotographproduction

automatic DEMgeneration

surface reconstruction

aerialtriangulation

automatic map compilation(object recognition)

photo-interpretation(image understanding)

SCHENK. Objectrecognitionin digital photogrammetry

FIG. 6. Classificationof photogrammetricproceduresin termsof difficulty andcomplexity.Well understoodmeansthattheproblemis sufficientlyunderstoodandsolutionscanbeprogrammed,theonly problembeingtheeffort involved.Generalsolutionsof poorly understoodproceduresarenot known.No matterhow manysoftwareengineersareassignedto theproblem,thedegreeof automationthatcanbeachievedis limited.

(a) ObjectModelsandRelationships.Individual pixels in an imagedo not carryany explicit information concerningassociationwith an object. Information aboutobjectsmust first be extractedfrom large numbersof pixels within varying spatialextents.However,theseextractedandorganizedfeatureswill still lack explicit labels.It can only be hopedthat the featuresare useful clues,allowing a hypothesisto bemade concerning the existenceof an object. Object modelling consists of thedescriptionof theseclues,including their relationships.It follows that objectmodelsmust take into accountwhat can be extractedfrom the sensoryinput data; in otherwords,objectmodelsaresensorandapplicationdependent.Recently,laserrangedatahavebeenincreasinglyusedto recognizebuildings. In that case,the object modelcannotinclude radiometricproperties.

Man-madeobjectsusuallyexhibit a considerabledegreeof geometricalregular-ity, which formsthebasisfor geometricmodelling.Choiceof thespacewithin whichanobjectis modelledis a critical decision.Modelling objectsin imagespacehasthedistinct advantagethat featuresextractedfrom an imagecan be comparedwith themodelwithout furtherprocessing.Thedisadvantageof this approachis thatthemodelmustbe projectedfrom object to imagespace,taking into accountthe sensormodelaswell asthe exteriororientation.Moreover,someinformationaboutthe 3D shapesof objectswill becomelost when attemptingto describethem in 2D imagespace.

Modelling in objectspacelendsitself to a muchrichersetof descriptions.Withregard to buildings, for example,edgesare predominantlyhorizontal and vertical(with the exceptionof someroof edges).The sameis true for building facades.Inorderto exploit theadvantageof 3D models,thematching(that is, thecomparisonofextractedfeatureswith the models)must also be performedin object space.As a

753PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

consequence,monocularly extracted featuresmust be matched in two or moreimages,followed by computationof their 3D position.Not all featuresareconjugate,however.Somecornerpointsor roof edgesmaybeoccludedin oneimage.Also, thematchingprocess(identificationof conjugatefeatures)may fail in someinstances.Therefore,3D featuresin objectspaceareonly a subsetof themonocularlyextractedfeatures.

Another problem related to object modelling is model representation.It ispossibleto distinguishbetweenthe following five generaltypes of representation(Mayer, 1998).

(1) Fixed values.This is the most simple form of representation,but it onlyworks if all object instantiationshavethe samevalues,or if the valuesareknown for all types.An examplewould be a subdivisionwhereonly a fewtypesof buildingsarepresent.

(2) Parametric.In this case, the basic shape is fixed, but not the values(parameters).A rectangleis a goodexample,becauseit canbecharacterizedby two parameters,length and width. Although much more flexible thanfixed representation,parametricrepresentationsare not generalenoughforoverall object recognitiontasks.

(3) Generic. These models are usually basedon part of a description, forexamplevolumesaredescribedby surfacesor elementsof constructivesolidgeometry,whereasobjectsareconstructedfrom partsby setoperations.

(4) Functional.In contrastto the previousmethods,functional modelsare notbasedon geometricaldescriptionsbut on their functions.Suchmodelsarenot particularly useful for recognizingobjects from aerial scenes,exceptwhenusedin conjunctionwith othermodels.

(5) Contextual.These models attempt to describeseveral objects and theirrelationships,becauseit maybeimportantto interpreta scene.In its extremedevelopment,theentiresceneis modelled,for examplea city (city models).Obviously,thecombinatorialpossibilitiesof describingrelationshipsamongobjectsincreasesdramaticallyand the successof scenemodellingdependson reducingthedescriptionsandtheir combinationsto a manageablesetthatis still useful.

(b) Strategies.The term “strategy” refers to the questionof how an objectrecognitionsystemimplementsthe paradigmpresentedin the secondsection.Theobject recognitionparadigmis in the form of a masterplan that can be realizedinmany different ways.Most implementationstake variousshortcuts,becausea com-plete realization may be impractical due to the complexity of the problem. Theessentialstepsincludeearlyvision processes,suchasfeatureextractionandsegmen-tation,followed by groupingandthereconstructionof thevisible surface.Thesedatadriven processesare supportedby the top–down processesof model building,matchingand the utilization of domainknowledge.

A comparisonof realsystemson thebasisof theparadigmprobablygivessomeinformationconcerningthesuccessrate.In thecaseof a building recognitionsystem,for example,it is interestingto know how manybuildingsarecorrectlyidentifiedandreconstructed,how many buildings remainundetectedand how many non-buildingobjectshavebeenincorrectlyclassifiedasbuildings.The answersto thesequestionsdo not revealhow long it takesthesystemto analysea scene,however.Realsystemsalsodiffer in termsof how well theyareimplementedfrom thesoftwareengineeringpoint of view.

(c) InferenceComponent.As mentionedin the previoussections,recognizingobjectsfrom aerial imageryis a difficult problem,involving signalprocessingtasksand perception.It is unlikely that a single stepsolution can be found. A common

754 PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

assumptionis that the underlying information processingis modular (occurring inlayers).

Computationalmodelsof information processingfor vision follow an orderlyprogressionof layers.Extractedandgroupedfeaturesarecompared(matched)withobject models.However, it is very unlikely that an exact match is achieved.Thelarger the differences between extracted features and object models, the morecomplexthe matchingprocesswill be.Oneway to approachthe problemis to formhypothesesandto verify them.For example,straightline segmentsthat areperpen-dicular to eachother may trigger the hypothesisthat they may coincide with theboundariesof aman-madeobject.Furtherevidenceis neededto verify thehypothesis.

Many researchersin philosophy and psychology associateperception withinference. Recently, it has been proposedthat perception relies on abduction.JosephsonandJosephson(1994)hypothesizethatperceptionconsistsof abductioninlayers.It appearsthat abductiveinferencesarewidespreadin scientific reasoningaswell as in ordinary life.

Abductionis a patternof reasoningthat canbecharacterizedasfinding thebestexplanationfor a set of data.An abductionproblem can be describedas follows:

D is a setof data,d1, d2, … , dn;H is a setof individual hypotheses,h1, h2, … , hm;h is an individual hypothesisthat explainsthe databest;thereforeh is probablytrue.The criteria for the partially orderedset of hypotheses,p(H), can be basedon

differentmeasures,for exampleon probability,on fuzzy values,or on a degreeof fit.Theinformationprocessingis decomposedinto thefollowing threetypesof activities.

(1) Evocationof hypothesesis triggeredby the dataandis typically bottom–up.Onesetof datamaygive riseto severalhypotheses.An exampleof top–downstimulatedevocationis priming, that is an expectationfrom a higher level.

(2) Instantiationof hypothesesis theactof evaluatingandscoringeachhypoth-esisindependently.At the sametime, a determinationis madeas to whichdata can be accountedfor by the hypothesis.Theseare not necessarilyidentical to the datawhich evokedthe hypothesisor which were usedtoscoreit. Hypotheseswith low scorescanbeeliminatedfrom furtherconsid-erations,at leasttemporarily.

(3) Compositionoccurs when instantiatedhypothesesare evaluatedwith anemergingcoherentbest interpretation.Knowledgeof interactionsbetweenhypothesesis necessaryeither to increaseor to reducethe confidenceofinitially similarly weighted hypotheses.A proven strategy from earlierabductionmachinesbeginsthecompositionprocessby tentativelyacceptingthehighestconfidenceconclusionthatcanbeidentifiedandthenpropagatingthe consequencesalong known lines of hypothesisinteraction.If this firstattemptfails, the processis repeatedwith a lower confidence.

ASSESSMENT

Current Systemsfor ObjectRecognition

In this section,brief commentsare given on the statusof recognizingobjectsfrom aerial scenes.For a more detailedaccount,the readeris referred to Mayer(1998). In view of the difficulty of the problem,increasedresearchactivity in thisfield is a very positiveaspect.More andmoreresearchersfrom different institutionsare tackling the object recognition problem and some of them are using new,innovativeconcepts.It appearsthat experimentalobjectrecognitionsystemsbuilt onmore complexstrategiesand modelsare considerablymore successfulin termsof

755PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

generalscenes.Examplesincludesystemsthatexploit additionalclues,suchascolour(Henricssonet al., 1996) or shadows(Shufelt and McKeown, 1993). Moreover,careful groupingof extractedfeaturesin object spaceis recommended.

Due to thecomplexityof the task,it is relativelyeasyto generatean impressivelist of the deficienciespossessedby current systems.Arguably, the single mostcritical deficiencyis themodellingof objects(discussedin thenextsection).Overall,modelsarenot generalenoughand relationshipsamongobjectsareabsent.

Although Gulch (1992) promotesa knowledge-basedapproachfor buildingreconstruction,most approacheslack a well developedinferencecomponent.Manysystemstakea shortcutby acceptinga matchbetweenfeaturesandobjectsasthefinalresult of the recognition task. This device not only neglectsthe generationandevaluationof alternativehypotheses,but also missesthe aspectof quality control.

Frequently,data from severalsensorsare available,for example,rangeinfor-mation from laserscanningsystemsandpictorial information from cameras(Haala,1994;EcksteinandMunkelt, 1995;Bordeset al., 1996).The synergismof disparatedatasetsis hardlyexploited,however.Most objectrecognitionsystemsdealwith onedatasourceonly. It is interestingto note that in many projectsthat involve objectrecognition, prior information is available. A good example is a GIS. A prioriinformation and knowledge is not used to the extent of supporting the objectrecognitionand reconstructiontask.

In this brief assessment,emphasishasbeenplacedon building recognition.Foranevaluationof recognizingroads,the readeris referredto Heipkeet al. (1997).Anexampleof a moregeneralrecognitionsystemis describedin Collins et al. (1995).

A World of Models

The conclusionwasreachedin the previoussectionthat modellingtopographicobjects is a major problem and the lack of adequatesolutions preventsobjectrecognitionsystemsfrom beinggeneralandrobustenoughfor operationaluse.Someof the reasonswhy modelling is a difficult problem needto be examined.Fig. 7illustratesseveraldifferent kinds of modelsand their relationships.

At the outset,thereis the physicalworld, or “reality”. It is essentiallya worldof particles,atoms,molecules,electromagneticfields and much empty space.Thisworld is indeedquite different from the world that we experienceashumanbeings.Our world consistsof objects,smells,tones,tastesandcolours,which areperceptions(or mentalconstructions)thatdo not exist in thephysicalworld. Whatwe experienceassoundcompriseswavescausedby vibratingobjects.Suchacousticwavesevokeasensationperceivedas sound.The visible portion of the electromagneticspectrumstimulatestheretina,thevisualcortexandhighercorticalareas.Thereconstructionofthe visual world differs considerablyfrom the imageon the retina.As a convincingexample,take the caseof the blind spot.Our retinal imageshavea hole wheretheoptic nerveleaves(closeoneeye,movea pencilacrossthevisualfield andkeepyoureyesfixed on a stationaryobject; the pencil disappearswhen its imagefalls on theblind spot),but we perceivea continuousworld.

Sinceour sensesrespondto a limited rangeof signals,our modelof the “real”world is limited. Moreover,thereareindividualdifferencesin sensitivity.Your modelof theworld differs from my world. You don’t haveaccessto my mentalreconstruc-tion, nor do I know what you perceive.Basedon communication,but perhapsmoreby the way we respondto the environment,I concludethat you seea world quitesimilar to the one I see.This result is not surprising;for us to survive in the realworld, we needto perceiveit veridically.

Sensorsrecordelectromagneticradiationwithin variousrangesof the spectrum.From the raw data,the world is reconstructed.As discussedearlier,the questbegins

756 PhotogrammetricRecord, 16(95),2000

ourworld

modelworld

dataworld

GISworld

recon-structedworld

humans sensors

physicalworld

SCHENK. Objectrecognitionin digital photogrammetry

with extractingandgroupingfeaturesleadingto the dataworld, which is a rudimen-tary modelof theworld obtainedentirely from data.Thenextstepis concernedwithcomparingthe dataworld with the modelworld in the hopeof establishingassocia-tions that would eventuallylead to a useful sceneinterpretation,the reconstructedworld in Fig. 7. Clearly, the model world (the library of object descriptionsthatwe create)is much affectedby how we perceiveand experiencethe world. Thefact that sensorslack perceptionandexperienceresultsin a gapbetweenthe modeland the data world. This gap dooms object recognition to fail in the followingsituations.

(1) Gapbetweenthedatamodelandtheobjectmodelis toobig. This effectmaybe due to representationalincompatibility. Also, unrealistic assumptionsaboutwhatcanbeextractedandgroupedfrom sensoryinput datacanleavea gap too wide to be bridgedin the matching.

(2) Object model too simple. In order to minimize the gap betweendataandobjectmodel,it is possibleto simplify thelatteruntil thedifferencebecomesmanageable.By that time, however,theobjectmodelmayhavebecometoosimple.An examplewith building recognitiondemonstratesthe case.Sup-posebuildingsof a particularshapeandsizeareconsidered,saya rectangu-lar shapeof dimensionsa and b. If a fixed orientationof the buildings isadded,for instancethey are all aligned north–south,the recognitiontaskbecomessimpler but at the cost of being so specific that it becomesunrealistic.

FIG. 7. A world of models.Our world is a mentalreconstructionfrom signalsreceivedby our senses.

757PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

Theseworlds do not representthe entire story, however.Fig. 7 includestheworld of GIS that ought to be takeninto accountwhen it comesto incorporatingapriori knowledgein the recognitionprocess.It is most likely that the geometricalboundariesof objectsstoredin a GIS arenot identicalwith the physicalboundaries.Physicalboundariesarenot exactlystraight,nor perpendicularto eachother.More-over,unnecessarydetailsareleft out. In additionto all thesedifficulties, generaliza-tion problemsevenhaveto be faced.

TRENDS

Scale-space

The scale-spacetheoryprovidesthe theoreticalunderpinningof the observationthat objectsin the world exist at a limited rangeof scales.For example,it is onlymeaningfulto describea building at, say, the metreto kilometre level. Thereis nopoint in describinga building at an astronomicalscaleor at the atomiclevel. Witkin(1983)proposeda multiscalerepresentationof a measuredsignal.Thederivedsignals(the scale-space)are obtainedby convolving the original signal with a Gaussiankernel of ever increasingsize. Hence,eachnew signal containsfewer details. Acornerstoneof the scale-spacetheory is the fact that no new featurescan appearatcoarserlevels. That is, featuresat coarserlevels are simplificationsof finer scalesignals.Tracingfeaturesthroughthescale-spaceprovidescluesabouttheir relevanceto objectproperties.

Mayer (1998) proposedthe inclusion of scale-spacein object recognition.Hedemonstratesconvincinglyhow unnecessarydetailsdisappear.For example,vehiclesdisappearat coarserscalesand roadsemergein a lessclutteredenvironment.Thisproceduregreatly facilitates the recognition aspect,but increasesthe localizationproblem,becausesmoothingthe original imagenot only suppressesdetailsbut alsodisplacesfeatures.

The scale-spacetheory also offers the possibility of modelling objects atdifferent scales.This process,in turn, suggestsperformingthe matching(associatingfeatureswith objects)at different scales.The true potentialof the scale-spacetheoryin object recognitionis in building two scale-spaces,one for the sensoryinput dataandtheotherfor representingtheobjects.Of course,thechallengeis to formalizetherelationshipbetweenthesescale-spaces.

Multisensorand MultispectralData

Beforean object,saya building, canbe measured,it mustfirst be identifiedassuch.This requirementinvolvesunderstandingan image,at leastto a certainextent,which is precisely what makes object recognition a hard problem. Humans areremarkablyadeptat object recognition.How the humanvisual systemsolves theproblemis not known well enoughto mimic the solution by computers.Insteadoftrying to unravelthe imageunderstandingabilitiesof humans,researchersattempttoimprovethecurrentstatusof recognizingobjectsfrom aerialscenesby increasingthesensoryinput sources.For example,laseraltimeterdataareusedto generateDEMs(Haala,1994); colour imageryis preferredto take advantageof colour information(Henricssonet al., 1996).Expandingthis idealeadsto the inclusionof multispectralor evenhyperspectralsensordatainto the object recognitionprocess.This trend isfacilitated by the fact that the spatial,spectraland temporalresolutionsof differentairborneand spacebornesensorsare rapidly increasing,whilst the cost associatedwith datacollection is decreasing.

Thecardinalquestion,now, is how to exploit thepotentialwhich thesedifferentdata sourcesoffer in order to tackle object recognitionmore effectively. Ideally,

758 PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

provenconceptsand methodsin remotesensing,digital photogrammetryand com-putervision shouldbe combinedin a synergisticfashion.CsathoandSchenk(1998)arguethatsuchattemptsshouldfirst belaunchedon a conceptuallevel beforespecificalgorithmsaredevised,modifiedor merged.

CONCLUDING REMARKS

Object recognitionof urbanscenesis an utterly ill-posedproblem.Pleasingastheobjectrecognitionparadigmis on theconceptuallevel, its implementationon thealgorithmiclevel is flawed.It is commonlyagreedthat themodelscurrentlyusedfordescribingobjectsare weak. Often, there is a representationalincompatibility be-tweendataandobjectmodelwhich, in turn, causesthe matchingto fail. Moreover,objectmodelsarenot generalenoughandusuallylack relationalinformation.It turnsout thatthesuccessfulrecognitionof objectsin moredifficult scenesevenrequirestheinclusion of unspecifiedobjectsin the modelling. For example,to unambiguouslyrecognizea building it may be necessaryto include trucks in the process.

Researchershave come to realize that utilizing multisensorand multispectraldatasourcesgreatly increasesthe chancesof making the recognitionprocessmorestable.As the availability andperformanceof airbornesensorsrapidly increaseandat the sametime the costof suchsystemsdecreases,multisensordataacquisitionisbecomingcommerciallyfeasible.With multispectralandmultisensordataavailable,objectscanbemodelledmoredistinctly and,equallyimportant,closerto whatcanbeextractedfrom sensoryinput data.In order to exploit fully the potentialoffered bythese data sources,proven conceptsand methods from remote sensing,digitalphotogrammetryand computervision shouldbe combinedin a synergisticfashion.

REFERENCES

BORDES, G., GUERIN, P.,GIRAUDON, G. andMAITRE, H., 1996.Contributionof externaldatato aerialimageanalysis.InternationalArchivesof Photogrammetryand RemoteSensing, 31(4): 134–138.

COLLINS, R. T., JAYNES, C., STOLLE, F., WANG, X., CHENG, Y.-Q., HANSON, A. R. and RISEMAN, E. M.,1995. A systemfor automatedsite model acquisition.Integrating photogrammetrictechniqueswithsceneanalysisand machinevision II . SPIE,2486:244–254.

CSATHO, B. andSCHENK, T., 1998.Multisensordatafusionfor automaticsceneinterpretation.InternationalArchivesof Photogrammetryand RemoteSensing, 33(3/1):429–434.

ECKSTEIN, W. andMUNKELT, O., 1995.Extractingobjectsfrom digital terrainmodels.Remotesensingandreconstructionfor 3-D objectsand scenes. SPIE,2572:43–51.

GULCH, E., 1992. A knowledge based approachto reconstructbuildings in digital aerial imagery.InternationalArchivesof Photogrammetryand RemoteSensing, 29(2): 410–417.

HAALA , N., 1994.Detectionof buildingsby fusion of rangeand imagedata.Ibid., 30(3/1):341–346.HEIPKE, C., MAYER, H., WIEDEMANN, C. and JAMET, O., 1997. Evaluationof automaticroad extraction.

Ibid., 32(3–4W2):151–160.HENRICSSON, O., BIGNONE, F., WILLUHN , W., ADE, F., KUBLER, O., BALTSAVIAS, E., MASON, S. andGRUN,

A., 1996.ProjectAMOBE: strategies,currentstatusand future work. Ibid., 31(3): 321–330.JOSEPHSON, J. R. and JOSEPHSON, S. G. (Eds.), 1994. Abductiveinference. CambridgeUniversity Press,

Cambridge.306 pages.LUE, Y., 1997.Onestepto a higher level of automationfor softcopyphotogrammetry.Automatic interior

orientation.ISPRSJournal of Photogrammetry& RemoteSensing, 52(3): 103–109.MAYER, H., 1998.AutomatischeObjektextraktionausdigitalen Luftbildern. DeutscheGeodatischeKom-

mission, SeriesC, 494: 132 pages.MARR, D., 1982.Vision: a computationalinvestigationinto the humanrepresentationand processingof

visual information.W. H. Freeman,SanFrancisco.397 pages.SARKAR, S.andBOYER K. L., 1994.Computerperceptualorganizationin computervision.World Scientific,

Singapore,New Jersey.Seriesin MachinePerceptionandArtificial Intelligence,12: 232 pages.SCHENK, T., 1999.Digital photogrammetry.TerraScience,Laurelville, Ohio. 428 pages.SCHICKLER, W., 1995. Ein operationellesVerfahren zur automatischeninneren Orientierungvon Luft-

bildern. Zeitschrift fur Photogrammetrieund Fernerkundung, 63(3): 115–122.SHUFELT, J. and D. MCKEOWN, 1993.Fusionof monocularcuesto detectman-madestructuresin aerial

imagery.ComputerVision, Graphics,and ImageProcessing, 57(3): 307–330.WITKIN , A., 1983. Scale-spacefiltering. Proceedings8th International Joint Conferenceon Artificial

Intelligence, Karlsruhe,Germany.Pages1019–1022.

759PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

Resume

La reconnaissanced’objetset l’interpretation d’imagessontdevenuespeu a peu les principauxsujetsd’interet dansle domainede la rechercheenphotogramme´trie numerique.On fournit danscetarticle un apercu delareconnaissanced’objets en photogramme´trie numerique, en commenc¸antpar l’exposed’un problemeet la brevedescriptiond’un paradigme.Afin deramener le concept au niveau d’un exemple,on presente l’orientationinterneautomatiquecommeetant un problemede reconnaissanced’objets.

On analyseensuitedansles paragraphessuivantsl’etat actuel de lareconnaissanced’objetsenmettantenevidencelescriteressignificatifs,telsque la modelisation, les strategies de systemes et les composantesd’inference.De tels criteres sontutiles pour comparerles diverssystemesde reconnaissanced’objetset la facon dont ils abordentles problemes.Onresumeles points forts et les points faibles dessystemesactuels,puis onexamineplus en detail le problemede la modelisation.

On mentionneenfinl’existencede deuxnouvellesmethodes(fusionetmisea l’echelle dansl’espacede donnees multicapteurs/multibande).Cesmethodesillustrent de facon prometteusede nouvellestendancescapablesde faire avancerla reconnaissanced’objets a un niveausuperieur.

Zusammenfassung

Objekterkennungund BildverstehenwurdenzunehmendHauptobjektedesInteressesbei Forschungsaktivita¨tenin der Digitalphotogrammetrie.ImArtikel wird ein Uberblick zur Objekterkennungin der Photogrammetriegegeben,wobeimit einerProblemformulierungundeinerkurzenBeispiels-beschreibungbegonnenwird. Um die Aufgabenstellungzu verdeutlichenwird die automatischeinnere Orientierungals Problemder Objekterken-nung dargestellt. In nachfolgendenAbschnittenwird der gegenwa¨rtigeStandder Objekterkennungdurch IdentifizierungrelevanterKriterien, wiez.B.der Modellierungder Systemstrategienund von Interferenzkomponen-ten diskutiert. SolcheKriterien sind fur den Vergleich von SystemenzurObjekterkennungoder vorgeschlagenenNaherungennutzlich. StrengeundSchwa¨chen gegenwa¨rtiger Systemewerdenzusammengefasst,gefolgt voneiner detailliertenAnalysedesModellierungsproblems.Abschließendwer-den zwei neueLosungen(Maßstabsraumund Fusion von multisensoralenoder multispektralenDaten)erwahnt. DieseLosungendienenals Beispielefur erfolgversprechendeneueTrends,die die Moglichkeitbieten,die Objek-terkennungauf ein neuesNiveauzu heben.

DISCUSSION

Professor Cooper: The Presidentspoke earlier about the PhotogrammetricSocietymergingwith anothersociety,the RemoteSensingSociety.From what ourspeakerhaspresentedthis evening,it seemsto me that the PhotogrammetricSocietymight, with profit, merge with a “Society for Cognitive Psychology” and also a“Society for Neuropsychology”.It is in theseareasthat I think that progressiscurrentlybeingmadewhich hasa direct impacton the modelling that wasstatedasbeingso difficult to do. In the final illustration shown,therewasthe physicalworld,thehumansensorandthenour world. Is it not arguablethat “my world” shouldhavebeenshownratherthan “our world”, which of coursecomplicatesthe problem.

760 PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

ProfessorSchenk:I think that I alludedearlier to this problemby sayingthateverybodyhereperceivesa slightly differentworld. I thereforetotally agreewith you.The problem becomesmuch more complicatedwith all thesedifferent worlds ofdifferent people,but I believethat everybody’sreconstructionagreesin the essentialparts.Theperceptionsthatwe gain from sensoryinput, for examplethevisual input,are stable,giving a presentationwith which most peoplewould agree,apart from,perhaps,if minor detailsare discussedaboutcertainvisual phenomena.In general,everybodywould agreeto label an appropriateobjectasa building or a roadandsoon. Scenelabelling is usually quite consistent.I particularly liked your remarkthatcognitivescienceshouldbe included,becausethis is a disciplinewhich tries to givean understandingof how we understandthe world andhow we model it.

ProfessorCooper:Neuropsychologyis alsomakingadvancesin a differentway,in terms of modelling our understandingand reaction to the world by physicalprocessesratherthanby psychologicalprocesses,which arenot directly observable.However,it alsoseemsto me that the problemthat is tractableat the momentis theoneof usingprior geometricalinformationof somekind andyou haveillustratedthis.Perhapsby the time that developmentis sortedout, neuropsychologistswill haveprovidedus with somemore information that will enableus to take the next step.However,I do not seehow thenextstepcanbetakenuntil we understandmoreabouthow we understandthe real world.

ProfessorSchenk:We needto differentiatebetweenanimmediaterepresentationthatwe gainabouttheworld andthe interpretationof thatworld, so thatwe cantakeappropriateaction. For example,when we reconstructa streetscene,we interpretfrom which directioncarsareapproachingbeforewe crossthe street.Thereforewearealreadyin the stageof planningandexecutingactions,which I believeis a stepbeyond what we really need in object recognition. In real life, first we need torecognizeobjectsandthenwe beginto considerthe implicationsof this recognition,for examplewith regardto motion. The latter stageis much more complicated,ofcourse,and individual interpretationmay vary. However, I believe that everybodybasicallyagreesthat we are at the very first stagewherewe can recognizeobjects.

ProfessorDowman:Your paradigmand, I think, most of your exampleswereoriented towards what you describedas autonomousfeature extraction or objectextraction.However,I think that a lot of the currentprogressis being madein themoresemi-automaticareas.You usedthe term “dreamsolution” for an autonomoussystem.How do you seethe two approachesdeveloping?Are we going to achieveefficient,robust,semi-automaticfeatureextractionwithin thenearfutureandperhapswait much longerbeforewe achievethe autonomoussolution?

ProfessorSchenk:I think that this is true. I amnot impartial,of course,andmyresearchinterest lies more in autonomoussystems,in terms of understandingtheprinciples behind object recognition,not in building systemswhich can be usedtoday,or eventomorrow.However,thatdoesnot meanthatautomaticsystems,whichcan be used in softcopy workstations,should not be developed,becausethat iscurrently the only way that it is really possibleto achieveobject recognitionat anoperationallevel. There is an enormousdifferencebetweenan autonomousand anautomaticsystem,in that the latter hasthe safetynet of humanintervention,whicheventually can fix any incapability in the system.I agree,though, that the onlyoperationalsystemsat presentare semi-automaticsystemsinstalled on softcopyworkstations.Thedisadvantagehereis that it is easyto focustoo muchon implemen-tationsandnot enoughon the theoreticalandconceptualaspectsof theproblem.I donot believethat we can solve the overall object recognitionproblem,to achieveanautonomoussolution, by experimentingwith different algorithms. For example,developinga newsnakealgorithmdoesnot greatlyadvancetheoverallunderstandingof object recognition.

761PhotogrammetricRecord, 16(95),2000

SCHENK. Objectrecognitionin digital photogrammetry

Mr. Newby:I was intriguedby the last illustration shownin your presentation.It seemedto me that you had proceededthrough a discussionof a very difficultproblem.You hadadmittedthat it is indeedvery difficult, but you told us that someprogressis beingmadeand that you havea viable approach.Thensuddenly,at theend,you threw down thesenew challengesandtold us that we mustsuddenlymakethe problem very much more complicated.You more or less implied that we arefailing with the original simplistic approaches;trying to simplify is not the answer.Instead,we mustmakethis taskmorecomplicated;in otherwords,we must retreata bit beforewe cango forward.Is it really your conclusionthat theapproachesmadeso far are not going to work on their own, so that you will needto use multiplesensorsand the other ideasintroducedin the last illustration?

ProfessorSchenk:I firmly believe that we should include multisensor,multi-spectralinformation, becauseit makesmodelling easierand gives a richer set offeatureswhich canbe extracted.I just wantedto cautionthat useof this informationwon’t betheultimateanswer,either.New problemshaveto besolved,suchasfusion.On what level do we mergedata,informationandknowledge?

Mr. Newby: It seemedto me that greatercomplexity was going to raisemoreproblemsthan it solves,but I think that you areactuallysayingthat this will makethe ultimategoal easierto achieve.

ProfessorSchenk:Yes,certainly,I would tradethegreatercomplexityof useofmultispectralor hyperspectralinformationagainstthe easiermodellingachievedandI definitely feel that we shouldtakeon the challengeof fusion.

Dr. Robson:Everything that you have presentedhas been in the context ofmappingfrom aerial photography.Softcopyworkstationsare usedmore and morenow for closerangeapplicationsaswell. I canseethatwhatyou aresuggestingis thatinitially a setof automatedtools shouldbe developedandthenmaybean automaticsystem.Do you expectto seesimilar progressin the field of closerangework?

ProfessorSchenk:Yes, I do.Dr. Robson:Will similar tools be usedor will major changesbe necessary?ProfessorSchenk:I tend to focus on the map making problem,but I entirely

agreethat close range applicationsare more similar to what computervision ormachinevision attemptsto create.Perhapswith a morerestrictedscene,modellingiseasier.On the other hand, oblique imagesmay be usedwhich can causefurtherproblems.Overall, I think that close rangeapplicationsare closer to solution byautonomoussystemsthan topographicmapping.

ProfessorCooper:Why would you saythat is the case?The geometryis quitedifferent andoneusuallyknowsmoreaboutthe objectin closerangephotogramme-try. Are thereanyotherreasonswhy you think thatanautonomoussolutioncouldbemoresuccessfullyappliedfor closerangework?

ProfessorSchenk:Closerangeapplicationsusually exist in a more controlledenvironment.For example,one can probably influencethe quality of the imagerybetterthanin aerialapplications.I feel that a major advantagelies in the modelling,becauseoften thereare only a few objectspresent,which can usually be modelledgeometrically.

Dr. Robson:I agreewith you with regardto themodellingadvantage.However,asfar asimagingis concerned,major problemscanoccurdueto a massiverangeintonality andvery significantdifferencesin obliquity andgeometry.Different lightingconditions,which are often not controllable,can also be present.Certainly, if animaging solution can be designedfor a specificproblem,then I think that you areright. However,achievinga generalsolution is muchmoredifficult.

762 PhotogrammetricRecord, 16(95),2000