ballard d. and brown c. m. 1982 computer vision

539
C OMFUT DANA H. BALLARD - CHRISTOPHER M. BROWN

Upload: antonio-villena-martin

Post on 02-Jan-2016

40 views

Category:

Documents


3 download

TRANSCRIPT

  • COMFUTDANAH.BALLARDCHRISTOPHERM.BROWN

  • COMPUTERVISION

    DanaH.BallardChristopher M.Brown

    Department of Computer ScienceUniversity of RochesterRochester, New York

    PRENTICEHALL, INC., Englewood Cliffs, New Jersey 07632

  • LibraryofCongress Cataloging inPublication DataBALLARD. DANA HARRY.

    Computer vision.Bibliography: p.Includes index.I. Imageprocessing. I. Brown,Christopher M.

    II. Title.TA1632.B34 621.38'04I4 8120974ISBN 0131653164 AACR2

    Cover design by Robin Breite

    1982 by PrenticeHall, Inc.Englewood Cliffs, New Jersey 07632

    All rights reserved. No part of this bookmay be reproduced in any form or by any meanswithout permission in writing from the publisher.

    Printed in the United States of America

    10 9 8 7 6 5 4 3 2

    ISBN D13XtS31bM

    PRENTICEHALL INTERNATIONAL, INC., LondonPRENTICEHALL OF AUSTRALIA PTY. LIMITED, SydneyPRENTICEHALL OF CANADA, LTD., TorontoPRENTICEHALL OF INDIA PRIVATE LIMITED, New DelhiPRENTICEHALL OF JAPAN, INC., TokyoPRENTICEHALL OF SOUTHEAST ASIA PTE. LTD., SingaporeWHITEHALL BOOKS LIMITED, Wellington, New Zealand

  • Preface xiii

    Acknowledgments xv

    Mnemonics for Proceedings and Special Collections Cited inReferences xix

    1 COMPUTER VISION1.1 AchievingSimpleVisionGoals 11.2 HighLevel and LowLevel Capabilities 21.3 ARange of Representations 61.4 The Role ofComputers 91.5 Computer Vision Research and Applications 12

  • Part IGENERALIZED IMAGES

    13

    IMAGE FORMATION2.1 Images 172.2 Image Model 18

    2.2.1 ImageFunctions, 182.2.2 Imaging Geometry, 192.2.3 Reflectance, 222.2.4 Spatial Properties, 242.2.5 Color, 312.2.6 Digital Images, 35

    2.3 Imaging Devices for Computer Vision2.3.1 Photographic Imaging,442.3.2 Sensing Range, 522.3.3 Reconstruction Imaging, 56

    EARLY PROCESSING3.1 Recovering Intrinsic Structure 633.2 Filtering the Image 65

    3.2.1 TemplateMatching,653.2.2 Histogram Transformations, 703.2.3 Background Subtraction, 723.2.4 Filtering and Reflectance Models, 73

    3.3 Finding Local Edges 753.3.1 TypesofEdgeOperators,763.3.2 Edge Thresholding Strategies, 803.3.3 ThreeDimensional Edge Operators, 813.3.4 How Good Are Edge Operators? 833.3.5 Edge Relaxation, 85

    3.4 Range Information from Geometry 883.4.1 Stereo Vision andTriangulation, 883.4.2 A Relaxation Algorithm for Stereo, 89

    3.5 Surface Orientation from Reflectance Models 933.5.1 Reflectivity Functions,933.5.2 Surface Gradient, 953.5.3 Photometric Stereo, 983.5.4 Shape from Shading by Relaxation, 99

    3.6 Optical Flow 1023.6.1 TheFundamental FlowConstraint, 1023.6.2 Calculating Optical Flow by Relaxation, 103

    3.7 Resolution Pyramids 1063.7.1 GrayLevel Consolidation, 1063.7.2 Pyramidal Structures in Correlation, 1073.7.3 Pyramidal Structures in Edge Detection, 109

    42

  • PART IISEGMENTED IMAGES

    115

    4 BOUNDARY DETECTION4.1 On Associating Edge Elements 1194.2 Searching Nea r an Approximate Location 121

    4.2.1 Adjusting A Priori Boundaries, 1214.2.2 NonLinear Correlation in Edge Space, 1214.2.3 DivideandConquer Boundary Detection, 122

    4.3 The Hough Method for Curve Detection 1234.3.1 UseoftheGradient, 1244.3.2 Some Examples, 1254.3.3 Trading OffWork in Parameter Space for Work in

    Image Space, 1264.3.4 GeneralizingtheHough Transform, 128

    4.4 Edge Following as Graph Searching 1314.4.1 GoodEvaluationFunctions,1334.4.2 Finding All the Boundaries, 1334.4.3 Alternatives to the A Algorithm, 136

    4.5 Edge Following as Dynamic Programming 1374.5.1 Dynamic Programming, 1374.5.2 Dynamic Programming for Images, 1394.5.3 Lower Resolution Evaluation Functions, 1414.5.4 Theoretical Questions about Dynamic

    Programming, 1434.6 Contour Following 143

    4.6.1 Extension toGrayLevel Images, 1444.6.2 Generalization to HigherDimensional Image

    Data, 146

    5 REGION GROWING5.1 Regions 1495.2 A Local Technique: Blob Coloring 1515.3 Global Techniques: Region Growing via Thresholding

    5.3.1 Thresholding inMultidimensional Space,1535.3.2 Hierarchical Refinement, 155

    5.4 Splitting and Merging 1555.4.1 StateSpaceApproach toRegionGrowing, 1575.4.2 LowLevel Boundary Data Structures, 1585.4.3 GraphOriented Region Structures, 159

    5.5 Incorporat ion of Semantics 160

    6 TEXTURE6.1 What Is Texture? 1666.2 Texture Primitives 169

    Contents

  • 6.3 Structural Models of Texel Placement 1706.3.1 Grammatical Models, 1726.3.2 Shape Grammars, 1736.3.3 Tree Grammars, 1756.3.4 Array Grammars, 178

    6.4 Texture as a Pattern Recognition Problem 1816.4.1 Texture Energy, 1846.4.2 Spatial GrayLevel Dependence, 1866.4.3 Region Texels, 188

    6.5 TheTexture Gradient 189

    7 MOTION7.1 Mot ion Unders tanding 195

    7.1.1 DomainIndependent Understanding, 1967.1.2 DomainDependent Understanding, 196

    7.2 Understanding Optical Flow 1997.2.1 Focusof Expansion, 1997.2.2 Adjacency, Depth, and Collision, 2017.2.3 Surface Orientation and Edge Detection, 2027.2.4 Egomotion, 206

    7.3 Unders tanding Image Sequences 2077.3.1 Calculating Flow from Discrete Images,2077.3.2 Rigid Bodies from Motion, 2107.3.3 Interpretation of Moving Light DisplaysA

    DomainIndependent Approach, 2147.3.4 Human Motion UnderstandingA Model

    Directed Approach, 2177.3.5 Segmented Images, 220

    Part IIIGEOMETRICAL STRUCTURES

    227

    8 REPRESENTATION OF TWODIMENSIONALGEOMETRICSTRUCTURES

    8.1 TwoDimensional Geometric Structures 2318.2 Boundary Representations 232

    8.2.1 Polylines,2328.2.2 Chain Codes, 2358.2.3 The tyj Curve, 2378.2.4 Fourier Descriptors, 2388.2.5 Conic Sections, 2398.2.6 BSplines, 2398.2.7 Strip Trees, 244

  • 8.3 Region Representations 2478.3.1 SpatialOccupancyArray,2478.3.2 y Axis, 2488.3.3 Quad Trees, 2498.3.4 Medial Axis Transform, 2528.3.5 Decomposing Complex Areas, 253

    8.4 Simple Shape Properties 2548.4.1 Area,2548.4.2 Eccentricity, 2558.4.3 Euler Number, 2558.4.4 Compactness, 2568.4.5 Slope Density Function, 2568.4.6 Signatures, 2578.4.7 Concavity Tree, 2588.4.8 ShapeNumbers, 258

    9 REPRESENTATION OF THREEDIMENSIONALSTRUCTURES

    9.1 Solids and Their Representation 2649.2 Surface Representations 265

    9.2.1 SurfaceswithFaces,2659.2.2 Surfaces Based on Splines, 2689.2.3 Surfaces That Are Functions on the Sphere, 270

    9.3 Generalized Cylinder Representations 2749.3.1 GeneralizedCylinderCoordinate Systemsand

    Properties, 2759.3.2 Extracting Generalized Cylinders, 2789.3.3 ADiscreteVolumetricVersion of theSkeleton, 279

    9.4 Volumetric Representations 2809.4.1 SpatialOccupancy, 2809.4.2 Cell Decomposition, 2819.4.3 Constructive Solid Geometry, 2829.4.4 Algorithms for Solid Representations, 284

    9.5 Unders tanding Line Drawings 2919.5.1 Matching LineDrawings to ThreeDimensional

    Primitives, 2939.5.2 Grouping Regions Into Bodies, 2949.5.3 Labeling Lines, 2969.5.4 Reasoning About Planes, 301

    Part IVRELATIONAL STRUCTURES

    313

    10 KNOWLEDGE REPRESENTATION AND USE10.1 Representations 317

    10.1.1 TheKnowledgeBaseModels and Processes,318

  • 10.1.2 Analogical and Propositional Representations,319

    10.1.3 Procedural Knowledge, 32110.1.4 Computer Implementations, 322

    10.2 SemanticNets 32310.2.1 SemanticNet Basics,32310.2.2 Semantic Nets for Inference, 327

    10.3 Semantic Net Examples 33410.3.1 Frame Implementations, 33410.3.2 Location Networks, 335

    10.4 Control Issues in Complex Vision Systems 34010.4.1 ParallelandSerialComputation, 34110.4.2 Hierarchical and Heterarchical Control, 34110.4.3 Belief Maintenance and Goal Achievement, 346

    11 MATCHING 3521.1 Aspects of Matching 352

    11.1.1 Interpretation:Construction, Matching, andLabeling 352

    Il.l.2 Matching Iconic, Geometric, and RelationalStructures, 353

    1.2 GraphTheoretic Algorithms 35511.2.1 TheAlgorithms,35711.2.2 Complexity, 359

    1.3 Implementing GraphTheoretic Algorithms 36011.3.1 MatchingMetrics,36011.3.2 Backtrack Search, 36311.3.3 Association Graph Techniques, 365

    1.4 Matching in Practice 36911.4.1 DecisionTrees,37011.4.2 Decision Tree and Subgraph Isomorphism, 37511.4.3 Informal Feature Classification, 37611.4.4 A Complex Matcher, 378

    12 INFERENCE 38312.1 FirstOrder Predicate Calculus 384

    12.1.1 ClauseForm Syntax (Informal), 38412.1.2 Nonclausal Syntax and Logic Semantics

    (Informal), 38512.1.3 ConvertingNonclausal Form toClauses,38712.1.4 Theorem Proving, 38812.1.5 Predicate Calculus and Semantic Networks, 39012.1.6 Predicate Calculus and Knowledge

    Representation, 39212.2 Computer Reasoning 39512.3 Production Systems 396

    12.3.1 Production System Details, 39812.3.2 Pattern Matching, 399

    Contents

  • 12.3.3 An Example, 40112.3.4 Production System Pros and Cons, 406

    12.4 SceneLabeling and Constraint Relaxation 40812.4.1 Consistent andOptimal Labelings,40812.4.2 Discrete Labeling Algorithms, 41012.4.3 A Linear Relaxation Operator and a Line

    Labeling Example, 41512.4.4 ANonlinear Operator, 41912.4.5 Relaxation as Linear Programming, 420

    12.5 Active Knowledge 43012.5.1 Hypotheses,43112.5.2 HOWTO and SOWHAT Processes, 43112.5.3 Control Primitives, 43112.5.4 Aspects of Active Knowledge, 433

    13 GOAL ACHIEVEMENT 43813.1 Symbolic Planning 439

    13.1.1 RepresentingtheWorld,43913.1.2 Representing Actions, 44113.1.3 Stacking Blocks, 44213.1.4 The Frame Problem, 444

    13.2 Planning with Costs 44513.2.1 Planning,Scoring,andTheir Interaction,44613.2.2 Scoring Simple Plans, 44613.2.3 Scoring Enhanced Plans, 45113.2.4 Practical Simplifications, 45213.2.5 A Vision System Based on Planning, 453

    APPENDICES465

    A 1 SOME MATHEMATICAL TOOLS 465Al.l Coordinate Systems 465

    Al.I.I Cartesian,465Al.l .2 Polar and Polar Space, 465Al. l .3 Spherical and Cylindrical, 466Al.l .4 Homogeneous Coordinates, 467

    A l .2 Trigonometry 468Al.2.1 PlaneTrigonometry, 468A1.2.2 Spherical Trigonometry, 469

    A1.3 Vectors 469A1.4 Matrices 471A1.5 Lines 474

    A1.5.1 Two Points,474A1.5.2 Point and Direction, 474A1.5.3 Slope and Intercept, 474

    Contents xi

  • A1.5.4 Ratios, 474Al.5.5 Normal and Distance from Origin (Line

    Equation), 475A1.5.6 Parametric,476

    A1.6 Planes 476A1.7 Geometric Transformations 477

    A1.7.1 Rotation,477A1.7.2 Scaling, 478A1.7.3 Skewing, 479Al.7.4 Translation, 479A1.7.5 Perspective, 479Al.7.6 Transforming Lines and Planes, 480A1.7.7 Summary, 480

    A1.8 Camera Calibration and Inverse Perspective 481A1.8.1 CameraCalibration, 482A1.8.2 Inverse Perspective, 483

    A1.9 LeastSquaredError Fitting 484A1.9.1 PseudoInverseMethod,485A1.9.2 Principal Axis Method, 486Al.9.3 Fitting Curves by the PseudoInverse Method,

    487A1.10 Conies 488A1.11 Interpolation 489

    Al.11.1 OneDimensional,489A1.11.2 TwoDimensional, 490

    A1.12 The Fast Fourier Transform 490A1.13 The Icosahedron 492A1.14 Root Finding 493

    A 2 ADVANCED CONTROL MECHANISMSA2.1 Standard Control Structures 497

    A2.1.1 Recursion,498A2.1.2 CoRoutining, 498

    A2.2 Inherently Sequential Mechanisms 499A2.2.1 Automatic Backtracking,499A2.2.2 Context Switching, 500

    A2.3 Sequential or Parallel Mechanisms 500A2.3.1 Modules and Messages,500A2.3.2 Priority Job Queue, 502A2.3.3 PatternDirected Invocation, 504A2.3.4 Blackboard Systems, 505

    AUTHOR INDEX

    SUBJECTINDEX

  • Preface

    Thedreamofintelligentautomatagoesbacktoantiquity;itsfirstmajor articulationin the context of digital computerswas byTuring around 1950.Since then, thisdream has beenpursued primarily byworkers in thefieldofartificial intelligence,whose goal is to endow computers with informationprocessing capabilitiescomparable to thoseofbiological organisms.From theoutset, oneofthegoalsofartificial intelligencehasbeentoequipmachineswiththecapabilityofdealingwithsensory inputs.

    Computervisionis the construction of explicit, meaningful descriptions ofphysical objects from images. Image understanding is very different from imageprocessing,which studies imagetoimage transformations, not explicit descriptionbuilding. Descriptions are a prerequisite for recognizing, manipulating, andthinking about objects.

    We perceive a world of coherent threedimensional objects with manyinvariant properties. Objectively, the incoming visual data do not exhibitcorresponding coherence or invariance; they contain much irrelevant or evenmisleading variation. Somehow our visual system, from the retinal to cognitivelevels,understands, or imposesorder on, chaoticvisual input. It doessobyusingintrinsicinformationthatmayreliablybeextractedfrom theinput,andalsothroughassumptionsand knowledge that areapplied at various levelsinvisualprocessing.

    The challenge of computer vision is one of explicitness. Exactly whatinformation about scenes can be extracted from an image using only very basicassumptions about physics and optics? Explicitly, what computations must beperformed? Then, at what stagemust domaindependent, prior knowledge abouttheworld beincorporated intotheunderstanding process?Howareworldmodelsand knowledge represented and used?Thisbook isabout the representations andmechanismsthatallowimageinformation andpriorknowledgetointeractinimageunderstanding.

    Computer vision is a relatively new and fastgrowing field. The firstexperimentswere conducted in the late 1950s,and many of theessential concepts

    xiii

  • havebeendevelopedduringthelastfiveyears.Withthisrapidgrowth,crucialideashave arisen indisparate areas such asartificial intelligence, psychology, computergraphics,andimageprocessing.Ourintentistoassembleaselectionofthismaterialin a form that will serve both as asenior/graduatelevel academic text and as auseful reference to thosebuilding vision systems.Thisbook has astrong artificialintelligenceflavor,andwehopethiswillprovokethought.Webelievethatboththeintrinsic image information and theinternal model of theworld are important insuccessful vision systems.

    Thebookisorganized intofourparts,basedondescriptionsofobjectsat fourdifferent levelsof abstraction.

    1.Generalizedimagesimagesandimagelikeentities.2. Segmented imagesimages organized into subimages that are likely to

    correspond to"interesting objects."3. Geometricstructuresquantitativemodelsofimageandworldstructures.4. Relational structurescomplex symbolicdescriptions of imageandworld

    structures.

    Theparts follow aprogression of increasingabstractness.Although the fourpartsaremostnaturallystudiedinsuccession,theyarenottightlyinterdependent.PartIisaprerequisitefor PartII,butPartsIIIand IVcanberead independently.

    Parts of the book assume some mathematical and computing background(calculus,linearalgebra,datastructures,numericalmethods).However, throughoutthebookmathematicalrigortakesabackseattoconcepts.Ourintentistotransmitasetof ideasabout anewfieldtothewidestpossibleaudience.

    Inonebookitisimpossibletodojusticetothescopeanddepthofpriorworkincomputervision.Further,werealizethatinafastdevelopingfield,therapidinfluxofnewideaswillcontinue.Wehopethatourreaderswillbechallengedtothink,criticize,read further, andquicklygobeyondtheconfines ofthisvolume.

    xiv Preface

  • Acknowledgments

    JerryFeldman andHerbVoelcker(andthrough them theUniversity ofRochester)providedmany resources for thiswork.Oneof themost importantwasacapableand forgiving staff (secretarial, technical, and administrative). For massive textediting, valuable advice,and good humorweareespeciallygrateful to Rose Peet.Peggy Meeker, Jill Orioli, and Beth Zimmerman all helped at various stages.

    Several colleaguesmade suggestions on early drafts: thanks to JamesAllen,Norm Badler, Larry Davis,Takeo Kanade, John Render, Daryl Lawton, JosephO'Rourke, Ari Requicha, Ed Riseman, Azriel Rosenfeld, Mike Schneier, KenSloan, SteveTanimoto, Marty Tenenbaum, and Steve Zucker.

    Graduatestudentshelped inmanydifferent ways:thanksespeciallytoMichelDenber, Alan Frisch, Lydia Hrechanyk, Mark Kahrs,Keith Lantz, JoeMaleson,LeeMoore,MarkPeairs,DonPerlis,RickRashid,DanRussell,DanSabbah,BobSchudy,PeterSelfridge, UriShani,andBobTilove.BernhardStuthdeservesspecialmention formuchcareful andcritical reading.

    Finally,thanksgoto JaneBallard,mostly for standing steadfast through thecycles ofelation and depression and for numerous engineeringtoEnglish translations.

    AsPatWinstonputit:"Awillingnesstohelpisnotanimpliedendorsement."The aid of others was invaluable, butwe alone are responsible for theopinions,technical details, and faults of this book.

    FundingassistancewasprovidedbytheSloanFoundation underGrant78415,bytheNational InstitutesofHealthunderGrantHL21253,andbytheDefenseAdvanced Research Projects Agency under Grant N0001478C0164.

    The authors wish to credit the following sources for figures and tables. Forcomplete citations given here in abbreviated form (as "from ..." or "after .. ."),refer to the appropriate chapterend references.

    Fig. 1.2 from Shani, U., "A 3D modeldriven system for the recognition of abdominalanatomy from CTscans,"TR77,Dept.ofComputer Science,University ofRochester, May1980.

    Acknowledgments xv

  • Fig. 1.4 courtesy ofAllen Hanson and Ed Riseman, COINS Research Project, University ofMassachusetts, Amherst, MA.

    Fig. 2.4 after Horn and Sjoberg, 1978.Figs. 2.5, 2.9, 2.10, 3.2, 3.6, and 3.7 courtesy of Bill Lampeter.Fig.2.7a painting by Louis Condax; courtesy of Eastman Kodak Company and the OpticalSociety of America.

    Fig.2.8acourtesyofD.Greenberg andG. Joblove,Cornell Program ofComputer Graphics.Fig. 2.8b courtesy of Tom Check.Table 2.3 after Gonzalez and Wintz, 1977.Fig. 2.18 courtesy of EROS Data Center, Sioux Falls, SD.Figs. 2.19 and 2.20 from Herrick, C.N., Television Theory and Servicing: Black/White andColor, 2nd Ed. Reston, VA: Reston, 1976.

    Figs. 2.21,2.22, 2.23, and 2.24 courtesy of Michel Denber.Fig. 2.25 from Popplestone et al., 1975.Fig. 2.26 courtesy of Production Automation Project, University of Rochester.Fig. 2.27 from Waag and Gramiak, 1976.Fig. 3.1 courtesy of Marty Tenenbaum.Fig. 3.8 after Horn, 1974.Figs. 3.14 and 3.15 after Frei and Chen, 1977.Figs. 3.17and 3.18 from Zucker, S.W. and R.A. Hummel, "An optimal 3Dedge operator,"IEEE Trans. PAMI3, May 1981,pp. 324331.

    Fig. 3.19 curvesare based on data inAbdou, 1978.Figs.3.20, 3.21,and 3.22 from Prager, J.M., "Extracting and labeling boundary segments innatural scenes," IEEE Tans. PAMI 12, 1,January 1980. 1980 IEEE.

    Figs.3.23,3.28,3.29, and 3.30courtesy of Berthold Horn.Figs. 3.24 and 3.26 from Marr, D. and T. Poggio, "Cooperative computation of stereo disparity," Science, Vol. 194, 1976,pp.283287. 1976bytheAmerican Association for theAdvancement of Science.

    Fig. 3.31 from Woodham, R.J., "Photometric stereo:A reflectance map technique for determining surface orientation from image intensity," Proc. SPIE, Vol. 155, August 1978.

    Figs. 3.33 and 3.34 after Horn and Schunck, 1980.Fig. 3.37 from Tanimoto, S. and T. Pavlidis, "A hierarchical data structure for picture processing," CGIP 4, 2, June 1975, pp. 104119.

    Fig. 4.6 from Kimme et al., 1975.Figs. 4.7 and 4.16 from Ballard and Sklansky, 1976.Fig. 4.9 courtesy of Dana Ballard and Ken Sloan.Figs.4.12and4.13 from Ramer, U., "Extraction of linestructures from photgraphs ofcurvedobjects," CGIP 4, 2, June 1975,pp. 81103.

    Fig. 4.14 courtesy of Jim Lester, Tufts/New England Medical Center.Fig. 4.17 from Chien, Y.P. and K.S. Fu, "A decision function method for boundary detection," CGIP 3, 2, June 1974, pp. 125140.

    Fig. 5.3 from Ohlander, R., K. Price,and D.R. Reddy, "Picture segmentation using a recursiveregion splitting method," CGIP 8, 3, December 1979.

    Fig. 5.4 courtesy of Sam Kapilivsky.Figs. 6.1, 11.16, and A1.13 courtesy of Chris Brown.Fig. 6.3 courtesy of Joe Maleson and John Kender.Fig. 6.4 from Connors, 1979. Texture images by Phil Brodatz, in Brodatz, Textures. NewYork: Dover, 1966.

    Fig. 6.9 texture image by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966.Figs. 6.11, 6.12, and 6.13 from Lu, S.Y. and K.S. Fu, "A syntactic approach to textureanalysis," CGIP 7,3, June 1978, pp. 303330.

    xvi Acknowledgments

  • Fig. 6.14 from Jayaramamurthy, S.N., "Multilevel array grammars for generating texturescenes," Proc. PRIP, August 1979,pp.391398. 1979 IEEE.

    Fig.6.20 from Laws, 1980.Figs. 6.21 and 6.22 from Maleson et al., 1977.Fig. 6.23 courtesy of Joe Maleson.Figs. 7.1 and 7.3 courtesy of Daryl Lawton.Fig. 7.2 after Prager, 1979.Figs.7.4and 7.5 from Clocksin,W.F., "Computer prediction ofvisual thresholds for surfaceslant and edge detection from optical flow fields," Ph.D. dissertation, University of Edinburgh, 1980.

    Fig. 7.7 courtesy of Steve Barnard and Bill Thompson.Figs. 7.8 and 7.9 from Rashid, 1980.Fig. 7.10 courtesy of Joseph O'Rourke.Figs. 7.11 and 7.12 after Aggarwal and Duda, 1975.Fig. 7.13 courtesy of HansHellmut Nagel.Fig. 8.Id after Requicha, 1977.Figs. 8.2, 8.3, 8.21a, 8.22, and 8.26 after Pavlidis, 1977.Figs. 8.10, 8.11, 9.6, and 9.16 courtesy of Uri Shani.Figs. 8.12, 8.13, 8.14, 8.15, and 8.16 from Ballard, 1981.Fig. 8.21 b from Preston, K., Jr., M.J.B.Duff; S. Levialdi, P.E. Norgren, and Ji. Toriwaki,"Basicsofcellular logicwith someapplications inmedical imageprocessing," Proc. IEEE,Vol. 67, No. 5, May 1979, pp. 826856.

    Figs. 8.25, 9.8, 9.9, 9.10, and 11.3 courtesy of Robert Schudy.Fig. 8.29 after Bribiesca andGuzman, 1979.Figs. 9.1,9.18, 9.19, and 9.27 courtesy of Ari Requicha.Fig. 9.2 from Requicha, A.A.G., "Representations for rigid solids: theory, methods,systems," Computer Surveys 12,4, December 1980.

    Fig. 9.3 courtesy of Lydia Hrechanyk.Figs. 9.4 and 9.5 after Baumgart, 1972.Fig. 9.7 courtesy of Peter Selfridge.Fig. 9.11 after Requicha, 1980.Figs.9.14and 9.15b from Agin,G.J. andT.O. Binford, "Computer description ofcurvedobjects," IEEE Trans, on Computers 25, 1, April 1976.

    Fig. 9.15a courtesy of Gerald Agin.Fig. 9.17 courtesy of A. Christensen; published as frontispiece of ACM SIGGRAPH 80

    Proceedings.Fig. 9.20 from Marr and Nishihara, 1978.Fig. 9.21 after Tilove, 1980.Fig. 9.22b courtesy of Gene Hartquist.Figs. 9.24, 9.25, and 9.26 from Lee and Requicha, 1980.Figs.9.28a, 9.29, 9.30, 9.31,9.32,9.35,and 9.37 and Table 9.1 from Brown, C.and R. Popplestone, "Cases insceneanalysis," in Pattern Recognition, ed.B.G. Batchelor. New York:Plenum, 1978.

    Fig.9.28bfromGuzman,A.,"Decomposition ofavisualsceneintothreedimensional bodies,"in Automatic Interpretation and Classification of Images, A. Grasseli, ed., New York:Academic Press, 1969.

    Fig. 9.28c from Waltz, D., "Understanding line drawing of scenes with shadows," in ThePsychology of Computer Vision,ed. P.H. Winston. New York: McGrawHill, 1975.

    Fig. 9.28d after Turner, 1974.Figs. 9.33, 9.38, 9.40, 9.42, 9.43, and 9.44 after Mackworth, 1973.

    Acknowledgments xvii

  • Figs. 9.39, 9.45, 9.46, and 9.47 and Table 9.2 after Kanade, 1978.Figs. 10.2 and A2.1 courtesy of Dana Ballard.Figs. 10.16, 10.17,and 10.18 after Russell, 1979.Fig. 11.5 after Fischler and Elschlager, 1973.Fig. 11.8 after Ambler et al., 1975.Fig. 11.10 from Winston, P.H., "Learning structural descriptions from examples," in ThePsychology of Computer Vision, ed. P.H. Winston. New York: McGrawHill, 1975.

    Fig. 11.11 from Nevatia, 1974.Fig. 11.12 after Nevatia, 1974.Fig. 11.17 after Barrow and Popplestone, 1971.Fig. 11.18 from Davis, L.S., "Shape matching using relaxation techniques," IEEE Trans.PAMI 1, 4, January 1979, pp. 6072.

    Figs. 12.4 and 12.5 from Sloan and Bajcsy, 1979.Fig. 12.6 after Barrow and Tenenbaum, 1976.Fig. 12.8 after Freuder. 1978.Fig. 12.10from Rosenfeld, A.R.,A.Hummel,andS.W.Zucker,"Scenelabelingby relaxationoperations," IEEE Trans. SMC 6, 6,June 1976,p.420.

    Figs. 12.11, 12.12, 12.13, 12.14, and 12.15 after Hinton, 1979.Fig. 13.3 courtesy of Aaron Sloman.Figs. 13.6, 13.7, and 13.8 from Garvey, 1976.Fig.A1.11 after Duda and Hart, 1973.Figs.A2.2andA2.3from Hanson,A.R. and E.M. Riseman, "VISIONS:Acomputer systemfor interpreting scenes," inComputer VisionSystems, ed.A.R. Hanson and E.M. Riseman.New York: Academic Press, 1978.

    Acknowledgments

  • Mnemonicsfor Proceedings and Special Collections

    Cited in the References

    CGIPComputer Graphicsand Image Processing

    COMPSACIEEEComputer Society's3rd InternationalComputerSoftware andApplications Conference, Chicago,November 1979.

    cvsHanson, A. R. and E. M. Riseman (Eds.). ComputerVision Systems.NewYork: Academic Press, 1978.

    DARPA IUDefense Advanced Research Projects Agency Image UnderstandingWorkshop, Minneapolis, MN, April 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, PaloAlto, CA,October 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, Cambridge, MA,May 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, CarnegieMellon University, Pittsburgh, PA,November 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, University of Maryland, College Park, MD,April 1980.

    IJCAI2nd International Joint Conference on Artificial Intelligence, ImperialCollege, London, September 1971.4th International JointConference onArtificial Intelligence,Tbilisi,Georgia,USSR, September 1975.5th International Joint Conference on Artificial Intelligence, MIT,Cambridge, MA,August 1977.6th International Joint Conference onArtificial Intelligence,Tokyo,August1979.

    Mnemonics xix

  • IJCPR2nd International Joint Conference on Pattern Recognition, Copenhagen,August 1974.3rd International Joint Conference on Pattern Recognition, Coronado, CA,November 1976.4thInternationalJointConferenceonPatternRecognition,Kyoto,November1978.5th International Joint Conference on Pattern Recognition, Miami Beach,FL, December 1980.

    MI4 *Meltzer, B.and D. Michie (Eds.).MachineIntelligence4.Edinburgh: Edinburgh University Press, 1969.

    MI5Meltzer, B.and D.Michie (Eds.).MachineIntelligence5.Edinburgh: Edinburgh University Press, 1970.

    MI6Meltzer, B.and D.Michie (Eds.).MachineIntelligence6.Edinburgh: Edinburgh University Press,1971.

    M17Meltzer, B.and D.Michie (Eds.).MachineIntelligence 7.Edinburgh: Edinburgh University Press, 1972.

    PCVWinston, P. H. (Ed.). The Psychologyof Computer Vision.New York:McGrawHill, 1975.

    PRIPIEEE Computer Society Conference on Pattern Recognition and ImageProcessing, Chicago, August 1979.

    Mnemonics

  • 1

    ComputerVisionIssues

    1.1 ACHIEVINGSIMPLEVISIONGOALS

    SupposethatyouaregivenanaerialphotosuchasthatofFig.1.1aandaskedtolocateshipsinit.Youmayneverhaveseenanavalvesselinanaerialphotographbefore, butyouwillhavenotroublepredictinggenerallyhowshipswillappear.Youmightreasonthatyouwillfindnoshipsinland,andsoturnyourattentiontooceanareas.Youmightbemomentarilydistractedbytheglareonthewater,butrealizingthat itcomesfrom reflected sunlight, you perceive theocean ascontinuous andflat.Shipsontheopenoceanstandouteasily (ifyouhaveseenshipsfrom theair,youknowtolookfortheirwakes).Neartheshoretheimageismoreconfusing,butyouknowthatshipsclosetoshoreareeithermooredordocked.Ifyouhaveamap(Fig.1.1b),itcanhelplocatethedocks (Fig.1.1c);inalowqualityphotograph itcanhelpyou identify theshoreline. Thus itmightbeagoodinvestment ofyourtime toestablish thecorrespondence between themapand the image.Asearchparalleltotheshoreinthedockareasrevealsseveralships(Fig.1.Id).

    Again, suppose that you are presentedwith aset ofcomputeraided tomographic (CAT) scansshowing"slices"ofthehumanabdomen (Fig.1.2a).Theseimagesareproductsofhigh technology,andgiveusviewsnotnormallyavailableevenwithxrays.Yourjob isto reconstruct from thesecrosssections the threedimensional shape of the kidneys. Thisjob maywell seem harder thanfindingships.Youfirstneedtoknowwhattolookfor (Fig.1.2b),wheretofinditinCATscans,andhowitlooksinsuchscans.Youneedtobeableto"stackup"thescansmentallyandform aninternalmodeloftheshapeofthekidneyasrevealedbyitsslices(Fig.1.2cand1.2d).

    Thisbookisaboutcomputervision.Thesetwoexampletasksaretypicalcom

    1

    ComputerVision

  • puter vision tasks; both were solvedbycomputers usingthesortsofknowledgeand techniques alluded tointhedescriptive paragraphs. Computer vision is theenterpriseofautomatingandintegratingawiderangeofprocessesandrepresentations used forvision perception. It includes asparts many techniques thatareuseful by themselves, such as imageprocessing(transforming, encoding, andtransmitting images) andstatisticalpatternclassification (statistical decision theoryapplied togeneral patterns, visualorotherwise).More importantly forus,it includestechniquesforgeometricmodelingandcognitiveprocessing.

    1.2 HIGHLEVELANDLOWLEVEL CAPABILITIES

    TheexamplesofSection1.1illustratevisionthatusescognitiveprocesses,geometricmodels,goals, andplans.These highlevelprocessesareveryimportant;ourexamples only weakly illustrate their power andscope.There surely would besomeoverall purpose tofindingships; theremightbecollateral information that therewere submarines, barges, orsmall craft intheharbor, andsoforth. CATscanswouldbeusedwithseveraldiagnosticgoalsinmindandanassociatedmedicalhistory available. Goals andknowledge arehighlevel capabilities that canguidevisualactivities,andavisualsystemshouldbeabletotakeadvantageofthem.

    (a) (b)Fig. 1.1 Finding ships inanaerial photograph, (a)Thephotograph; (b)a correspondingmap; (c)thedock areaofthephotograph; (d)registered mapand image,with shiplocation.

    2 Ch. 1 Computer Vision

  • Fig. 1.1 (cont.)

    Even such elaborated tasks are very special ones and in their way easier tothink about than the commonplace visual perceptions needed to pick up a baby,cross a busy street, or arrive at a party and quickly " s e e " who you know, yourhost's taste in decor, and how long the festivities have been going on. All thesetasks require judgment and large amounts of knowledge of objects in the world,how they look, and how they behave. Such highlevel powers are so well integrated into"vision" astobeeffectively inseparable.

    Knowledge and goals areonly part ofthe visionstory. Vision requires manylowlevelcapabilities weoften take for granted; for example, our ability to extractintrinsicimagesof "lightness," "color," and "range." We perceive black as blackin a complex scene even when the lighting is such that some black patches arereflecting more light than somewhite patches. Similarly, perceived colorsare notrelated simply to the wavelengths of reflected light; if they were, wewould consciously seecolors changing with illumination. Stereo fusion (stereopsis) isa lowlevel facility basictoshortrange threedimensional perception.

    An important lowlevel capability isobjectperception:for our purposes it doesnot reallymatter ifthistalent isinnate, ("hardwired"), orifitisdevelopmental oreven learned ("compiledin"). Thefact remains thatmature biologicalvisionsystems are specialized and tuned to deal with the relevant objects in their environ

    Sec.1.2 HighLevel and LowLevel Capabilities 3

  • Fig. 1.2 Findingakidney inacomputeraided lomographicscan, (a)Onesliceofscan data;(b)prototypekidneymodel; (c)model fitting; (d) resultingkidneyand spinalcord instances.

    ments.Further specializationcanoften belearned, butitisbuiltonbasicimmutableassumptionsabouttheworldwhichunderliethevisionsystem.

    Abasicsortofobject recognitioncapability isthe"figure/ground" discrimination that separates objects from the "background." Other basicorganizationalpredispositions are revealed by the "Gestalt laws" ofclustering, which demonstrate rules our vision systems use to form simple arrays of stimuli into morecoherent spatial groups.Adramatic example ofspecialized object perception for

    4 Ch. 7 Computer Vision

  • humanbeingsisrevealed inour"face recognition"capability,whichseemstooccupyalargevolumeofbrainmatter.Geometricvisualillusionsaremoresurprisingsymptomsofnonintuitive processingthat isperformed byourvisionsystems,eitherforsomedirectpurposeorasasideeffect ofits specializedarchitecture.Someother illusions clearly reflect the intervention of highlevel knowledge. For instance, thefamiliar "Neckercubereversal"isgrounded inour threedimensionalmodelsforcubes.

    Lowlevelprocessingcapabilitiesareelusive;theyareunconscious,andtheyare not well connected to other systems that allow direct introspection. For instance,ourvisualmemoryfor imagesisquiteimpressive,yetourquantitativeverbal descriptions of images are relatively primitive. The biological visual"hardware"hasbeendeveloped, honed,andspecialized overavery longperiod.However, itsorganization and functionality isnot well understood except atextreme levelsofdetailandgeneralitythe behaviorofsmallsetsofcatormonkeycorticalcellsandthebehaviorofhumanbeingsinpsychophysicalexperiments.

    Computer vision isthus immediately faced with averydifficult problem; itmust reinvent, withgeneral digital hardware, themost basicandyet inaccessibletalentsofspecialized, parallel,andpartlyanalogbiologicalvisual systems.Figure1.3maygiveafeelingfor theproblem;itshowstwovisualrenditionsofafamiliarsubject.Theinsetisanormalimage,therestisaplotoftheintensities (graylevels)intheimageagainst theimagecoordinates.Inotherwords,itdisplays information

    F1 ' ' ' '

    / " " ' ' . ' ? I k. ; . , ' %

    . . . \ ^ r : ; ' ; ''" f.'.K:i #*&'" ' ' > iK

    ' , . : \ v.. !' Vs

    .' " "V. .

    Fig. 1.3 Tworepresentationsofanimage.Oneisdirectlyaccessibletoourlowlevelprocesses;theotherisnot.

    Sec.7.2 HighLevel and LowLevel Capabilities 5

  • with "height" instead of "light." No information is lost, and the display isanimagelikeobject,butwedonotimmediatelyseeafaceinit.Theinitialrepresentation the computer has to work with is no better; it is typically just an array ofnumbers from which human beings could extract visual information only verypainfully. Skipping the lowlevel processing we take for granted turns normallyeffortless perceptionintoaverydifficult puzzle.

    Computer vision is vitally concerned with both lowlevel or "early processing" issuesandwith thehighlevel and "cognitive" useofknowledge.Wheredoesvision leaveoff and reasoning andmotivation begin?Wedonot knowprecisely, butwefirmly believe (andhope toshow) that powerful, cooperating, richrepresentationsoftheworldareneeded for anyadvancedvisionsystem.Withoutthem, nosystemcanderiverelevantand invariant information from input thatisbesetwitheverchanging lighting and viewpoint, unimportant shape differences,noise,andother largebutirrelevantvariations.Theserepresentationscanremovesomecomputational loadbypredictingorassumingstructurefor thevisualworld.

    Finally, ifasystem is to be successful in avariety of tasks, itneeds some"metalevel"capabilities:itmustbeabletomodelandreasonaboutitsowngoalsand capabilities, and the success of its approaches. These complex and relatedmodelsmustbemanipulatedbycognitiveliketechniques,eventhoughintrospectivelytheperceptualprocessdoesnotalways"feel" touslikecognition.

    ComputerVisionSystems

    1.3 ARANGEOFREPRESENTATIONS

    Visualperception istherelationofvisualinputtopreviouslyexistingmodelsoftheworld. There is a large representational gap between the image and the models("ideas," "concepts")whichexplain,describe,orabstracttheimageinformation.Tobridgethatgap,computervisionsystemsusuallyhavea(looselyordered)rangeofrepresentationsconnecting theinputandthe"output" (afinaldescription,decision,orinterpretation). Computer vision then involvesthedesignofthese intermediate representations and the implementation ofalgorithms toconstruct themandrelatethemtooneanother.

    We broadly categorize the representations into four parts (Fig. 1.4) whichcorrespondwith theorganization of thisvolume. Withineachpart theremaybeseverallayersofrepresentation, orseveralcooperatingrepresentations.Althoughthesetsofrepresentations arelooselyordered from "early"and "lowlevel"signalsto "late" and "cognitive'''' symbols, the actualflowofeffort and informationbetween them isnot unidirectional.Ofcourse, not all levels need to be used ineach computer vision application; some may be skipped, or the processingmaystartpartwayupthehierarchyorendpartwaydownit.

    Generalizedimages (PartI) are iconic (imagelike) andanalogicalrepresentations of the input data. Images may initially arise from several technologies.

    6 Ch. 1 Computer Vision

  • Fig. 1.4 Examplesofthefour categoriesofrepresentation usedincomputervision, (a)Iconic;(b)segmented; (c)geometric;(d)relational.

    Domainindependent processing can produce other iconic representations moredirectly useful to later processing, such as arrays of edgeelements (grayleveldiscontinuities).Intrinsicimagescansometimesbeproducedatthisleveltheyrevealphysical propertiesofthe imagedscene (suchassurface orientations, range,or surface reflectance). Often parallelprocessingcanproducegeneralized images.More generally, most "lowlevel" processes can be implemented with parallelcomputation.

    Segmentedimages(PartII)areformed from thegeneralizedimagebygathering its elements into sets likely to be associated with meaningful objects in thescene.Forinstance,segmentingasceneofplanarpolyhedra (blocks)might resultin a set of edgesegmentscorresponding to polyhedral edges, or a set of two

    Sec.1.3 A Range ol Representations 7

  • Garage ( )Bushes ( ) Grass ( j House ( jSky ( j Tree1 ( j Tree2

    6 Side2

    Fig. 1.4 (cont.)dimensional regionsin theimagecorresponding topolyhedral faces.Inproducingthesegmentedimage,knowledgeabouttheparticulardomainatissuebeginstobeimportantbothtosavecomputationandtoovercomeproblemsofnoiseandinadequatedata.Intheplanarpolyhedralexample,ithelpstoknowbeforehand that thelinesegmentsmustbestraight. Textureandmotionareknowntobeveryimportantin segmentation, and arecurrently topicsofactive research; knowledge in theseareasisdevelopingveryfast.

    Geometricrepresentations (PartIII) areusedtocapturetheallimportant idea8 Ch. 7 Computer Vision

  • of twodimensional and threedimensional shape. Quantifying shape isas importantasitisdifficult. Thesegeometricrepresentationsmustbepowerfulenoughtosupport complex and general processing, such as "simulation" of the effects oflightingandmotion. Geometricstructures areasuseful for encoding previouslyacquiredknowledgeastheyareforrerepresentingcurrentvisualinput.Computervisionrequiressomebasicmathematics;Appendix 1hasabriefselectionofusefultechniques.

    Relationalmodels(PartIV)arecomplexassemblagesofrepresentationsusedto support sophisticated highlevel processing. An important tool inknowledgerepresentation issemanticnets,whichcanbeusedsimplyasanorganizationalconvenience or asa formalism in their own right. Highlevel processing often usespriorknowledgeandmodelsacquired priortoaperceptualexperience.Thebasicmodeofprocessing turns from constructingrepresentations tomatchingthem.Athighlevels,propositionalrepresentationsbecomemoreimportant. Theyaremadeupofassertionsthataretrueorfalsewithrespecttoamodel,andaremanipulatedby rules of inference.Inferencelike techniques can also be used forplanning,which models situations and actions through time,and thus must reason abouttemporally varying and hypothetical worlds.The higher the level of representation, themoremarked isthe flowof control(direction ofattention, allocationofeffort) downwardtolowerlevels,andthegreaterthetendencyofalgorithmstoexhibit serialprocessing. These issues of control are basic to complex informationprocessingingeneralandcomputervisioninparticular;Appendix2outlinessomespecificcontrolmechanisms.

    Figure 1.5 illustratesthe looseclassification ofthe four categoriesintoanalogicalandpropositional representations.Weconsidergeneralizedandsegmentedimagesaswellasgeometricstructures tobeanalogicalmodels.Analogicalmodelscapture directly the relevant characteristics of the represented objects, and aremanipulatedandinterrogated bysimulationlikeprocesses.Relationalmodelsaregenerallyamixof analogical and propositional representations.Wedevelop thisdistinctioninmoredetailinChapter10.

    1.4 THEROLEOFCOMPUTERS

    Thecomputerisacongenialtoolforresearchintovisualperception. Computers are versatile and forgiving experimental subjects.They are easily

    andethicallyreconfigurable, notmessy,andtheirworkingscanbescrutinizedinthefinestdetail.

    Computers aredemanding critics.Imprecision, vagueness,andoversightsarenottoleratedinthecomputerimplementationofatheory.

    Computers offer newmetaphors for perceptual psychology (also neurology,linguistics,andphilosophy).Processesandentitiesfromcomputerscienceprovide powerful and influential conceptual tools for thinking about perceptionandcognition.

    Computers cangiveprecisemeasurements of the amount ofprocessing they

    Sec. 1.4 TheRole of Computers 9

  • Knowledgebase

    AiJogicalmodels

    Analogicalprooositional

    models

    Generalizedimage

    Geometricstructures

    Relationalstructures

    Fig. 1.5 The knowledge baseofacomplex computer vision system, showingfour basicrepresentationalcategories.

  • Table1.1

    EXAMPLESOFIMAGE ANALYSIS TASKS

    Domain Objects Modality Tasks Knowledge Sources

    Robotics Threedimensional Light Identify or describeoutdoor scenes Xrays objects in sceneindoor scenes Industrial tasks

    Mechanical parts LightStructured light

    Aerial images Terrain Light Improved imagesBuildings,etc. Infrared Resource analyses

    Radar Weather predictionSpyingMissileguidanceTactical analysis

    Astronomy Stars Light Chemical compositionPlanets Improved images

    MedicalMacro

    Micro

    Chemistry

    Body organs

    CellsProtein chainsChromosomes

    Molecules

    Neuroanatomy Neurons

    Physics Particle tracks

    XraysUltrasoundIsotopesHeatElectronmicroscopyLight

    Electron densities

    LightElectronmicroscopy

    Light

    Diagnosis of abnormalities

    Operative and treatmentplanning

    Pathology, cytologyKaryotyping

    Analysis ofmolecularcompositions

    Determination ofspatial orientation

    Find newparticlesIdentify tracks

    Models of objectsModelsofthereflection of

    light from objects

    MapsGeometrical models of shapesModels ofimage formation

    Geometrical models of shapes

    Anatomical modelsModels ofimage formation

    Models of shape

    Chemical modelsStructured models

    Neural connectivity

    Atomic physics

  • do.Acomputerimplementationplacesanupperlimitontheamountofcomputationnecessaryforatask.

    Computersmaybeusedeithertomimicwhatweunderstandabouthumanperceptualarchitectureandprocesses,ortostrikeout indifferent directionstotrytoachievesimilarendsbydifferentmeans.

    Computer modelsmay bejudged either by their efficacy for applications andonthejob performance or by their internal organization, processes, andstructuresthetheorytheyembody.

    1.5 COMPUTERVISIONRESEARCHANDAPPLICATIONS

    "Pure" computer visionresearchoften dealswithrelatively domainindependentconsiderations.Theresultsareuseful inabroadrangeofcontexts.Almostalwayssuchworkisdemonstrated inoneormoreapplicationsareas,andmoreoften thannotaninitialapplicationproblemmotivatesconsiderationofthegeneralproblem.Applicationsofcomputervisionareexciting,andtheirnumber isgrowingascomputervisionbecomesbetterunderstood.Table1.1givesapartiallistof"classical"andcurrentapplicationsareas.

    Within the organization outlined above, this book presents many specificideasandtechniqueswithgeneralapplicability.Itismeanttoprovideenoughbasicknowledgeandtoolstosupportattacksonbothapplicationsandresearchtopics.

    12 Ch. 7 Computer Vision

  • GENERALIZEDIMAGES

  • Thefirststep in thevision process isimage formation. Imagesmayarise fromavariety of technologies. For example, most televisionbased systems convertreflected lightintensity intoanelectronicsignalwhich isthendigitized;othersystemsusemoreexoticradiations,suchasxrays, laserlight,ultrasound, andheat.Thenetresultisusuallyanarrayofsamplesofsomekindofenergy.

    Thevision systemmaybeentirely passive, takingasinput adigitized imagefrom amicrowave or infrared sensor, satellite scanner, oraplanetary probe, butmore likely involvessomekindofactive imaging. Automated activeimagingsystemsmay control the direction and resolution ofsensors, or regulate and directtheir own light sources.The light source itself may have special properties andstructuredesignedtorevealthenatureofthethreedimensionalworld;anexampleistouseaplaneoflightthatfallsonthesceneinastripewhosestructureiscloselyrelated to the structure ofopaqueobjects.Range data for the scenemaybeprovided by stereo (two images), but also by triangulation using lightstripe techniquesorby"spotranging"usinglaserlight.Asinglehardwaredevicemaydeliverrange and multispectral reflectivity ("color") information. The imageformingdevicemayalsoperform variousotheroperations. Forexample,itmayautomaticallysmoothorenhancetheimageorvaryitsresolution.

    Thegeneralizedimageisasetofrelated imagelikeentitiesforthescene.Thissetmayinclude related imagesfrom severalmodalities,butmayalsoinclude theresultsofsignificant processingthatcanextract intrinsicimages.Anintrinsicimageisan"image,"orarray,ofrepresentationsofanimportant physicalquantitysuchassurface orientation, occludingcontours,velocity,orrange.Objectcolor,whichis a different entity from sensed redgreenblue wavelengths, is an intrinsicquality.Theseintrinsicphysicalqualitiesareextremelyuseful; theycanberelatedtophysicalobjects farmoreeasilythan theoriginalinput values,whichreveal thephysicalparametersonlyindirectly.Anintrinsicimageisamajorsteptowardsceneunderstandingandusuallyrepresentssignificantandinterestingcomputations.

    PartI Generalized Images

  • Theinformation necessary tocomputeanintrinsicimageiscontained intheinput imageitself, andisextractedby"inverting" thetransformation wroughtbytheimagingprocess, thereflection ofradiationfrom thescene,andotherphysicalprocesses.Anexampleisthefusion oftwostereoimagestoyieldanintrinsicrangeimage.Many algorithms to recover intrinsic imagescan berealized withparallelimplementations,mirroringcomputations thatmaytakeplaceinthelowerneurologicallevelsofbiologicalimageprocessing.

    Allofthecomputationslistedabovebenefit from theideaofresolutionpyramids.Apyramidisageneralizedimagedatastructureconsistingofthesameimageatseveralsuccessively increasinglevelsofresolution.Astheresolution increases,moresamplesare required torepresent the increased information andhence thesuccessive levels are larger, making the entire structure look like a pyramid.Pyramidsallowtheintroductionofmanydifferent coarsetofine imageresolutionalgorithmswhicharevastlymoreefficient thantheirsinglelevel, highresolutiononlycounterparts.

    Part I Generalized Images 15

  • ImageFormation 2

    2.1 IMAGES

    Imageformation occurswhenasensorregisters radiation thathasinteracted withphysicalobjects. Section2.2dealswithmathematical modelsofimagesand imageformation.Section2.3describesseveralspecificimageformation technologies.

    Themathematicalmodelofimaginghasseveraldifferent components.1. Animagefunctionisthefundamental abstractionofanimage.2. Ageometricalmodeldescribeshowthreedimensionsareprojectedintotwo.3. A radiometricalmodelshows how the imaging geometry, light sources, and

    reflectancepropertiesofobjectsaffect thelightmeasurementatthesensor.4. A spatialfrequency modeldescribeshowspatial variationsoftheimagemay

    becharacterizedinatransformdomain.5. Acolormodeldescribeshowdifferent spectralmeasurementsarerelatedtoim

    agecolors.6. Adigitizingmodeldescribestheprocessofobtainingdiscretesamples.

    This material forms the basis of much imageprocessing work and isdeveloped inmuch more detail elsewhere, e.g., [Rosenfeld andKak 1976;Pratt1978].Ourgoalsarenotthoseofimageprocessing,sowelimitourdiscussiontoasummaryoftheessentials.

    The wide range of possible sources of samples and the resulting differentimplications for later processingmotivate our overviewofspecific imaging techniques.Ourgoalisnottoprovideanexhaustivecatalog,butrathertogiveanideaof the range of techniques available.Very different analysis techniques may beneeded depending on how the imagewasformed. Twoexamples illustrate this

    17

  • point.Iftheimageisformedbyreflected lightintensity,asinaphotograph,theimage records both light from primary light sources and (more usually) the lightreflected offphysicalsurfaces.WeshowinChapter 3that incertaincaseswecanuse these kinds of images together with knowledge about physics to derive theorientation ofthesurfaces. If, on theotherhand, the imageisacomputed tomogramofthehumanbody (discussed inSection2.3.4), theimagerepresents tissuedensityofinternalorgans.Hereorientationcalculationsareirrelevant,butgeneralsegmentation techniques ofChapters 4and 5 (theagglomeration ofneighboringsamplesofsimilardensityintounitsrepresentingorgans)areappropriate.

    2.2 IMAGEMODEL

    Sophisticated imagemodels of astatistical flavor are useful in image processing[Jan1981].Hereweareconcernedwithmoregeometricalconsiderations.

    2.2.1 Image Functions

    Animagefunctionisamathematical representationofanimage. Generally,animagefunction isavectorvaluedfunction ofasmallnumberofarguments.Aspecialcaseofthe imagefunction isthe digital (discrete) imagefunction,where theargumentstoandvalueofthefunction areallintegers.Different imagefunctionsmaybeusedtorepresent thesameimage,dependingonwhichofitscharacteristicsareimportant. For instance, a camera produces an image on blackandwhitefilmwhich isusually thought of asarealvalued function (whosevalue could be thedensityofthephotographic negative) oftworealvalued arguments,oneforeachof twospatial dimensions.However, at avery small scale (the order of thefilmgrain)thenegativebasicallyhasonlytwodensities,"opaque"and "transparent."

    Most images are presented by functions of two spatial variablesfix) =fix, y), wherefix, y) isthebrightnessofthegrayleveloftheimageataspatialcoordinate ix,y). Amultispectral imagefisavectorvalued function withcomponents if].. ,f). Onespecialmultispectral imageisacolorimageinwhich,for example, the components measure the brightness values of each of threewavelengths,thatis,

    /CO J red( x ) >Jblue( x ) '/green ( x )

    Timevarying images fix,t) have an added temporal argument. For specialthreedimensional images,x= ix,y, z). Usually,both thedomainandrange of/arebounded.

    An important part of the formation process is theconversion of the imagerepresentation from acontinuous function toadiscrete function; weneed somewayofdescribingtheimagesassamplesatdiscretepoints.Themathematical toolweshalluseisthedeltafunction.

    Formally,thedeltafunctionmaybedefinedby

    18 Ch. 2 Image Formation

  • 8(x)= 0whenx^0ooWhenx=0 ( 2 1 )

    J8(x)dx= 1Ifsomecareisexercised, thedeltafunctionmaybeinterpretedasthelimitofasetoffunctions:

    8(x) = lim 80c)n

    wb^re

    8Ax) /I ifUI

  • Fig. 2.1 Ageometric camera model.

    Thecaseforx' istreatedsimilarly:

    xif

    (2.5)fzTheprojected imagehasz= 0everywhere.However, projecting awaythezcomponent is best considered a separate transformation; the projective transform isusuallythoughttodistortthezcomponentjustasitdoesthexand^.Perspectivedistortionthusmaps(x,y, z) to

    be',y\ z')= fx fy hfz' fz' fz (2.6)Theperspective transformation yieldsorthographicprojectionasaspecialcase

    whentheviewpointisthepointatinfinityinthezdirection.Thenallobjectsareprojectedontotheviewingplanewithnodistortionoftheirxand^coordinates.

    The perspective distortion yields a threedimensional object that has been"pushed out ofshape"; it ismore shrunken the farther it isfrom theviewpoint.The zcomponent isnot available directly from a twodimensional image, beingidentically equal tozero. In ourmodel, however, thedistorted zcomponent hasinformation about the distance of imaged points from the viewpoint.When thisdistorted object isprojected orthographically onto the imageplane, theresult isaperspectivepicture.Thus,toachievetheeffect ofrailroadtracksappearingtocometogether in the distance, the perspective distortion transforms the tracksso thatthey docometogether (atapointatinfinity)!Thesimpleorthographic projectionthat projects away the z component unsurprisingly preserves this distortion.Severalpropertiesoftheperspectivetransform areofinterestandareinvestigatedfurther inAppendix1.

    BinocularImagingBasicbinocular imaginggeometry isshown in Fig.2.3a. For simplicity,we

    20 Ch. 2 Image Formation

  • -4r^

    (*, / ,2

    \x',y

    (b)

    Fig. 2.2 (a)Cameramodelequivalent tothatofFig.2.1; (b)definitionofterms.

    useasystemwithtwoviewpoints.Inthismodel theeyesdonotconverge; theyareaimedinparallelatthepointatinfinity inthezdirection.Thedepth informationaboutapoint isthen encoded only byitsdifferent positions {disparity) inthetwoimageplanes.

    WiththestereoarrangementofFig.2.3,

    x, _ jx d)ffz

    ....

  • x' =0

    x=0

    x " =0

    Fig. 2.3 A nonconvergent binocularimagingsystem.

    througheacheye.Thebaselineofthebinocularsystemis2d.Thus(/ z)x' = (x d)f (2.7)(f z)x" = (x +d)f (2.8)

    Subtracting (2.7)from (2.8)gives(fz)(x"x') = 2df

    or

    2= / }df , (2.9)x X

    Thus ifpointscanbematched todetermine thedisparity Cx" x') and thebaselineandfocallengthareknown,thezcoordinateissimpletocalculate.

    Ifthesystemcanconvergeitsdirectionsofviewtoafinitedistance,convergence anglemayalso be used tocompute depth.The hardest part of extractingdepth information from stereo isthematchingofpointsfor disparity calculations."Lightstriping"isawaytomaintaingeometricsimplicityandalsosimplifymatching(Section2.3.3).2.2.3 Reflectance

    TerminologyAbasicaspectofthe imagingprocess isthephysicsofthe reflectance ofob

    jects,whichdetermineshowtheir "brightness" inanimagedependson their inherentcharacteristicsandthegeometryoftheimagingsituation.Aclearpresentation of themathematics ofreflectance isgiven in [HornandSjoberg 1978;Horn1977].Light energy flux ismeasured inwatts; "brightness" ismeasured withrespecttoareaandsolidangle. TheradiantintensityIofasourceistheexitantfluxperunitsolidangle:

    / = watts/steradian (2.10)do)

    _

    f ^Imageplane

    22 Ch. 2 Image Formation

  • Here dco isan incremental solidangle.The solidangleofasmall area dAmeasuredperpendicular toaradius /isgivenby

    da>= 4* (2.11)r

    inunitsofsteradians. (Thetotalsolidangleofasphere is 4ir.)The irradianceisfluxincidentonasurface element dA:

    E = ^ watts/meter2 (2.12)dA

    and the flux exitant from thesurface isdefined in terms of the radianceL, which isthefluxemitted perunit foreshortened surface areaperunitsolidangle:

    L = watts/(meter2 steradian) (2.13)dA cos9do>

    where9istheanglebetween thesurface normaland thedirection ofemission.Image irradiancef is the "brightness" of the image atapoint, and is propor

    tional toscene radiance.A"graylevel" isaquantizedmeasurement ofimage irradiance. Image irradiance depends on the reflective properties of the imaged surfaces as well as on the illumination characteristics. How a surface reflects lightdepends on its microstructure and physical properties. Surfaces may be matte(dull, flat), specular (mirrorlike), or have more complicated reflectivity characteristics (Section 3.5.1).The reflectancerofasurface isgiven quitegenerally byitsBidirectional Reflectance Distribution Function (BRDF) [Nicodemusetal.1977].The BRDF is the ratio of reflected radiance in the direction towards the viewer totheirradianceinthedirection towardsasmallareaofthesource.

    EffectsofGeometryonanImaging SystemLetusnowanalyzeasimpleimageforming systemshown inFig.2.4with the

    objective ofshowing how thegray levels are related to the radiance of imaged objects.Following [HornandSjoberg 1978],assume that the imagingdevice isproperly focused; rays originating in the infinitesimal area dA0 on the object's surfaceare projected into some area dAp in the image plane and no rays from other portionsof theobject's surface reach thisareaof the image.The system isassumed tobeanidealone,obeying thelawsofsimplegeometrical optics.

    ,The energy flux/unit area that impingeson thesensor isdefined tobeEp. Toshow how Ep is related to the scene radiance L, first consider the flux arriving atthe lensfrom asmallsurface area dA0. From (2.13) thisisgivenas

    d$> = dA0JLcos9doj (2.14)Thisfluxisassumed toarriveatanareadAp intheimagingplane.Hence the irradianceisgivenby [usingEq. (2.12)]

    E, = (2.15)dAp

    Now relate dA0 to dAp by equating the respective solid angles as seen from thelens; that is [makinguseofEq. (2.12)],

    Sec. 2.2 Image Model 23

  • Fig. 2.4 Geometryofanimageforming system.

    , . cosfl , . cos adA0 r2 = dAp Jo Jp

    SubstitutingEqs.(2.16)and(2.14)into(2.15)gives

    E= cosa fofp

    JLdc

    (2.16)

    (2.17)

    Theintegralisover thesolidangleseenbythelens.Inmost instanceswecanassume that L isconstant over this angle and hencecan be removed from the integral.Finally,approximatedoi bytheareaofthelensforeshortened bycosa, thatis, (TT/4)D2 cosa dividedbythedistance/0/cosa squared:

    doi = JLD2 cpVa4 fa2

    sothatfinally

    = 4

    D_fp

    cos4air L

    (2.18)

    (2.19)

    Theinterestingresultsherearethat (1) theimageirradianceisproportionaltothesceneradianceL,and (2)thefactorofproportionality includesthefourthpowerofthe offaxis angle a. Ideally, an imaging device should be calibrated so that thevariationinsensitivityasafunctionofa isremoved.

    2.2.4 SpatialProperties

    TheFourierTransformAnimageisaspatiallyvaryingfunction.Onewaytoanalyzespatialvariations

    isthedecomposition ofanimagefunction intoasetoforthogonal functions, onesuchset being theFourier (sinusoidal) functions. TheFourier transform maybeusedtotransform theintensity imageintothedomainofspatialfrequency.Forno

    24 Ch.2 ImageFormation

  • tational convenienceandintuition,weshallgenerally useasanexamplethecontinuousonedimensionalFouriertransform.Theresultscanreadilybeextendedtothediscretecaseandalsotohigherdimensions [Rosenfeld andKak1976].Intwodimensions we shall denote transform domain coordinates by iu, v).The onedimensionalFouriertransform, denoted CS,isdefinedby

    3=[fix)]=Fiu)where

    + 0 0

    Fiu) = ffix)expij2irux)dx (2.20)00

    wherej =V( l ) . Intuitively, Fourier analysis expresses afunction asasumofsinewavesofdifferent frequency andphase. TheFourier transform hasaninverse~

    l[F(u)] =fix). Thisinverseisgivenby

    fix) = f F(u) exp(JITTUX) du (2.21)00

    Thetransformhasmanyusefulproperties,someofwhicharesummarizedinTable2.1.CommononedimensionalFouriertransformpairsareshowninTable2.2.

    The transform Fin) issimplyanotherrepresentation oftheimage function.ItsmeaningcanbeunderstoodbyinterpretingEq. (2.21) foraspecificvalueofx,say x0:

    fixo) =JV()exp (j2irux0)du (2.22)Thisequationstatesthataparticularpointintheimagecanberepresentedby

    aweightedsumofcomplex exponentials (sinusoidal patterns) atdifferent spatialfrequencies u.Fiu) isthusaweightingfunctionforthedifferent frequencies.Lowspatialfrequenciesaccountfor the"slowly"varyinggraylevelsinanimage,suchas the variation of intensity over a continuous surface. Highfrequency componentsareassociatedwith"quicklyvarying"information, suchasedges.Figure2.5showstheFouriertransformofanimageofrectangles,togetherwiththeeffectsofremovinglowandhighfrequency components.

    The Fourier transform is defined above to be a continuous transform.Althoughitmaybeperformed instantlybyoptics,adiscreteversionofit,the "fastFourier transform," isalmostuniversallyusedinimageprocessingandcomputervision.Thisisbecauseoftherelativeversatilityofmanipulating thetransform inthedigitaldomainascomparedtotheopticaldomain.Imageprocessingtexts,e.g.,[Pratt1978;GonzalezandWintz1977]discusstheFFTinsomedetail;wecontentourselveswithanalgorithmforit(Appendix1).

    TheConvolutionTheoremConvolution is avery important imageprocessing operation, and is a basic

    operation oflinearsystemstheory.Theconvolutionoftwofunctions / andgisafunction hofadisplacementydefinedas

    hiy)=f*g = jfix)giyx)dx (2.23)Sec.2.2 ImageModel 2 5

  • Table 2.1

    PROPERTIESOFTHE FOURIER TRANSFORM

    Spatial Domain FrequencyDomain

    fix)gix)

    Fiu)=(J[fix)]Giu)=(J[gix)}

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    LinearityC\fix) + c2gix)C\,C2scalarsScaling

    fiax)Shiftingfix Xo)SymmetryFix)Conjugationf*ix)Convolution

    nix) =f*g = J fix')gix x') dx'Differentiationd"fix)dx"

    dFiu) + c2Giu)

    \a\ [a]

    e2wJx

    Fiu)

    / ( )

    F*i~u)

    Fiu)Giu)

    i2trju)"Fiu)

    Parseval's theorem:

    jjfix)\2dx =}\F(Z)\2dtJfix)g*ix) dx=jFi)G*it) d$fix) Fit)

    ReaK/?) Real parteven (RE)Imaginarypartodd (10)

    Imaginary (I) RO,IERE,IO RRE,IE IRE RERO 10IE IE10 RO

    Complexeven (CE) CECO CO

    26 Ch.2 ImageFormation

  • Table2.2

    FOURIER TRANSFORM PAIRS

    f{x)

    Rectanglefunction1

    2 Rect(x) 2

    Trianglefunction1

    1 12 2

    Gaussian

    Unit impulse 8{x)

    Unitstep

    F(%)

    Sinc(g) =

    Sine2 (%)

    2 27T/|

    Sec.2.2 Image Model 27

  • Table 2.2 (cont.)

    Comb function

    urnill2 6(x nx0

    n = oe>

    2xQ x0 x0 2x

    cos 2iroj0x

    sin 27rcjx

    2 (f^

    t_LLJJ^2 H

    i [ 5 ( ? w 0 ) +5(?+ w0)

    5 / [ 8 (? w 0 ) +5{f+w0)]

    ImF

    Intuitively,onefunction is"sweptpast" (inonedimension)or"rubbedover" (intwodimensions) theother.Thevalueoftheconvolutionatanydisplacement istheintegraloftheproductofthe (relativelydisplaced) function values.Onecommonphenomenon that iswellexpressed byaconvolution istheformation ofanimagebyan optical system.The system (sayacamera) has a"pointspread function,"whichistheimageofasinglepoint. (Inlinearsystemstheory,thisisthe "impulseresponse,"orresponsetoadeltafunction input.)Theidealpointspread functionis,ofcourse,apoint.Atypicalpointspread function isatwodimensionalGaussian spatial distribution of intensities, but may include such phenomena asdiffraction rings.Inanyevent, ifthecamera ismodeledasalinearsystem (ignor

    Fig. 2.5 (on facing page) (a) An image,fix, y). (b) A rotated version of (a), filtered toenhance high spatialfrequencies, (c) Similar to (b), but filtered to enhance low spatial frequencies, (d), (e), and (f) show the logarithm of the power spectrum of (a), (b), and (c).The power spectrum is the logsquare modulus of the Fouriertransform F(u, v).Considered inpolarcoordinates (p,0), pointsofsmallp correspond tolowspatial frequencies("slowlyvarying" intensities), largep to high spatial frequencies contributed by "fast" variations such as stepedges.The power at (p,9) isdetermined bythe amount of intensity variation at thefrequency p occurring at theangle0.

    28 Ch. 2 Image Formation

  • (b)

    (d)

    ;

    : ,

    (f) n H B H o B B H H H H H

    29

  • ing the added complexity that the pointspread function usually varies over thefield ofview),theimageistheconvolutionofthepointspreadfunctionandtheinputsignal.Thepointspread function isrubbedover theperfect input image, thusblurringit.

    Convolution isalsoagood model for the application ofmany other linearoperators,suchaslinedetecting templates.Itcanbeusedinanotherguise (calledcorrelation) toperformmatchingoperations (Chapter3)whichdetectinstancesofsubimagesorfeatures inanimage.

    Inthespatialdomain, theobviousimplementationoftheconvolutionoperation involvesashiftmultiplyintegrate operationwhich ishard todo efficiently.However,multiplicationandconvolutionare"transform pairs,"sothatthecalculation of the convolution in one domain (say the spatial) is simplified byfirstFourier transforming totheother (thefrequency) domain,performing amultiplication,andthentransforming back.

    Theconvolution of/and gin thespatialdomain isequivalent to the pointwiseproductofFandGinthefrequency domain,

    cSif*g) =FG (2.24)Weshallshowthisinamanner similar to [DudaandHart 1973]. Firstweprovetheshifttheorem.IftheFouriertransformof/Gc) isFiu), definedas

    F(u) =Jf(x) exp [ j2ir(ux)]dx (2.25)X

    then

    5 [fix a)] = ffixa) exp [ j2ir(ux)]dx (2.26)X

    changingvariablessothatx' =x aanddx=dx'= ff(x') exp {j2ir[u(x' +a)])dx' (2.27)

    x'

    Nowexp[ jliruix' +a)] = exp ( jlrrua) exp( JITTUX'), where thefirsttermisaconstant.Thismeansthat

    3 [fix a)] =exp( jl7rua)Fiu) (shift theorem)Nowwearereadytoshowthat,'y[f(x)*g(x)]= Fiu)Giu).

    (Sif*g) =j{j fix)giy x)} exp ( jlituy) dx dy (2.28)y x

    = ffix){fgiy x) exp ( jlTTuy) dy)dx (2.29)x y

    Recognizingthatthetermsinbracesrepresent ,fy[giy x)] andapplyingtheshifttheorem,weobtain

    cS(f*g) =J/Gc)exp ( j2irux)G(u) dx (2.30)X

    = Fiu)Giu) (2.31)

    30 Ch. 2 Image Formation

  • 2.2.5 Color

    Notallimagesaremonochromatic;infact,applicationsusingmultispectralimagesarebecomingincreasinglycommon (Section2.3.2).Further, humanbeingsintuitivelyfeel thatcolorisanimportantpartoftheirvisualexperience,andisusefulorevennecessaryforpowerful visualprocessing intherealworld. Colorvisionprovidesahost ofresearch issues, both forpsychology andcomputer vision. Webriefly discuss two aspects ofcolor vision: color spaces andcolor perception.Severalmodelsofthehumanvisualsystemnotonlyincludecolorbuthaveprovenuseful inapplications [Granrath1981].

    ColorSpacesColorspacesareawayoforganizingthecolorsperceivedbyhumanbeings.It

    happens thatweightedcombinations ofstimuliatthree principalwavelengthsaresufficient todefinealmostallthecolorsweperceive.Thesewavelengthsformanaturalbasisorcoordinatesystemfromwhichthecolormeasurementprocesscanbedescribed.Color perception isnot related inasimplewaytocolor measurement,however.

    Colorisaperceptual phenomenon related tohuman response todifferentwavelengthsinthevisibleelectromagneticspectrum [400 (blue) to700nanometers(red);ananometer (nm) is109meter].The sensation ofcolor arises fromthesensitivitiesofthree typesofneurochemical sensors inthe retina tothe visiblespectrum. The relative responseofthesesensorsisshown inFig.2.6.Note thateach sensor respondstoarangeofwavelengths.The illumination source hasitsown spectral compositionf{k) which ismodified bythe reflecting surface.Letr(k) bethisreflectance function.ThenthemeasurementRproducedbythe "red"sensorisgivenby

    R=jf(k)r(k)hR(k) dk (2.32)So thesensor output is actually theintegral of three different wavelengthdependentcomponents:thesource/ , thesurfacereflectance /,andthesensorhR.

    Surprisingly,onlyweightedcombinationsofthreedeltafunction approximationstothedifferentf(k)h 00 , thatis,8(A/?), 8(AG),and8(XB),arenecessaryto

    i 1

    1 P

    1

    a ^ /

    400 500 600Wavelength,nm

    700 Fig. 2.6 Spectral responseof humancolor sensors.

    Sec.2.2 Image Model 31

  • producethesensationofnearlyallthecolors.Thisresultisdisplayedonachromaticitydiagram.Suchadiagramisobtainedbyfirstnormalizingthethreesensormeasurements:

    Rr =

    b=

    R +G+BG

    R +G+BB

    R +G+B

    (2.33)

    andthenplottingperceivedcolorasafunction ofanytwo(usuallyredandgreen).Chromaticity explicitly ignores intensity orbrightness; it isasection through thethreedimensionalcolorspace(Fig.2.7).Thechoiceof(XR,kG, \B) = (410,530,650)nmmaximizes the realizablecolors, but somecolors still cannot berealizedsincetheywouldrequirenegativevaluesforsomeof/, g,andb.

    Anothermore intuitivewayofvisualizingthepossiblecolorsfrom the RGBspaceistoviewthesemeasurementsasEuclideancoordinates.Hereanycolorcanbevisualizedasapoint intheunitcube.Othercoordinate systemsareuseful fordifferent applications;computergraphicshasprovedastrongstimulusfor investigationofdifferent colorspacebases.

    ColorPerceptionColor perception is complex, but the essential step is a transformation of

    threeinputintensitymeasurements intoanotherbasis.Thecoordinatesofthenew

    (a) (b)Fig. 2.7 (a) An artist's conception of the chromaticity diagramsee color insert; (b) amore useful depiction. Spectral colors range along the curved boundary; the straight boundary isthelineofpurples.

    32 Ch. 2 Image Formation

  • basisaremoredirectlyrelatedtohumancolorjudgments.Although theRGBbasisisgoodfor theacquisitionordisplayofcolor infor

    mation, itisnotaparticularlygoodbasistoexplain theperceptionofcolors. Human vision systemscanmakegoodjudgments about the relative surface reflectancer(A)despitedifferent illuminatingwavelengths;thisreflectanceseemstobewhatwemeanbysurfacecolor.

    Another important feature ofthecolorbasisisrevealedbyanabilitytoperceive in "black andwhite," effectively deriving intensity information from thecolormeasurements. From an evolutionary point ofview,wemight expect thatcolorperceptioninanimalswouldbecompatiblewithpreexistingnoncolorperceptualmechanisms.

    These twoneedstheneed tomakegoodcolorjudgments and theneed toretainanduseintensity informationimply thatweuseatransformed, nonRGBbasisforcolorspace.Ofthedifferent basesinuseforcolorvision,allarevariationsonthistheme:Intensityformsonedimensionandcolorisatwodimensionalsubspace.Thedifferences ariseinhowthecolorsubspaceisdescribed.Wecategorizesuchbasesintotwogroups.

    1.Intensity/Saturation/Hue(IHS).Inthisbasis,wecomputeintensityasintensity:=R +G+B (2.34)

    The saturation measures the lackofwhiteness in thecolor. Colorssuchas "fireengine" redand "grass"greenaresaturated; pastels (e.g.,pinksandpaleblues)aredesaturated.SaturationcanbecomputedfromRGBcoordinatesbytheformula[TenenbaumandWeyl1975]

    , 3min (R, G,B) ,~~cxsaturation:= 1

    : ^L (2.35)

    intensityHue is roughly proportional to the average wavelength of the color. It can bedefinedusingRGBbythefollowingprogram fragment:

    hue:= cos l MR G) +(R B))} (2.36)^VCR G)2+ (R B){G BY

    UB > Gthenhue:= 2pihueThe IHS basis transforms the RGB basis in the following way. Thinking of thecolor cube, thediagonal from theorigin to (1, 1, 1)becomes the intensityaxis.Saturation isthedistanceofapointfrom thataxisandhueistheanglewithregardtothepointaboutthataxisfromsomereference (Fig.2.8).

    This basis isessentially that used byartists [Munsell 1939],who term saturation chroma. Also, this basishasbeen used ingraphics [Smith 1978;JobloveandGreenberg1978].

    One problemwith the IHSbasis,particularly asdefined by (2.34) through(2.36),isthatitcontainsessentialsingularitieswhereitisimpossibletodefine thecolor in aconsistent manner [Kender 1976].For example, hue hasan essentialsingularityforallvaluesof(R, G,B), whereR = G=B. Thismeansthatspecialcaremustbetakeninalgorithmsthatusehue.

    2.Opponentprocesses.TheopponentprocessbasisusesCartesianrather than

    Sec. 2.2 Image Model 33

  • (a) (b)

    8 An IHSColorSpace, (a)Crosssection atone intensity; (b) crosssectionatonehueseecolorinserts.

    cylindrical coordinates for the color subspace, and wasfirstproposed byHering[Teevan and Birney 1961].The simplest form ofbasis isalinear transformationfrom R, G,B coordinates. The new coordinates are termed "i? G","Bl r " , a n d " ^ Bk ":

    \R GBl YW Bk

    Theadvocatesofthisrepresentation, suchas[HurvichandJameson 1957],theorizethat thisbasishasneurological correlatesand isinfact thewayhuman beingsrepresent ("name") colors.Forexample,inthisbasisitmakessensetotalkabouta "reddish blue" but not a"reddish green." Practical opponent process modelsusuallyhavemorecomplexweightsinthetransformmatrixtoaccountforpsychophysical data.Somestartling experiments [Land 1977]showour ability tomakecorrectcolorjudgmentsevenwhen theilluminationconsistsofonlytwoprincipalwavelengths.The opponent process, at the level atwhich wehavedeveloped it,doesnotdemonstratehowsuchjudgmentsaremade,butdoesshowhowstimulusatonlytwowavelengthswillproject intothecolorsubspace.Readersinterestedinthedetailsofthetheoryshouldconsultthereferences.

    Commercial television transmissionneedsanintensity,or"W Bk" component forblackandwhitetelevisionsetswhilestillspanningthecolorspace.TheNational Television Systems Committee (NTSC) uses a "YIQ" basis extractedfromRGBvia

    1 2 1 R]1 1 2 G1 1 1 B

    Ch. 2 Image Formation

  • 0.60 0.28 0.320.21 0.52 0.310.30 0.59 0.11

    Thisbasisisaweightedformof(/, Q, Y) = ("Rcyan,""magentagreen, " "WBk")

    2.2.6 Digital Images

    The digitalimageswithwhichcomputer visiondealsare represented bymvectordiscretevalued imagefunctions / ( x ) , usuallyofone, two,three,or four dimensions.

    Usually m = 1,and both the domain and range offix) are discrete. Thedomain of / is finite, usually a rectangle, and the range of / is positive andbounded:0

  • .

    ;

    Fig. 2.9 Usingdifferent numbersofsamples, (a)A'= 16;(b)N = 32; (c)A' =64; (d)N= 128;(e)A'= 256;(f) N= 512.

    Ch.2 ImageFormatio

  • Table 2.3

    NUMBER OF 8BIT BYTESOF STORAGE FORVARIOUS VALUES OF NAND M

    Nm

    32 64 128 256 512

    1 128 512 2,048 8,192 32,7682 256 1,024 4,096 16,384 65,5363 512 2,048 8,192 32,768 131,0724 512 2,048 8,192 32,768 131,0725 1,024 4,096 16,384 65,536 262,1446 1,024 4,096 16,384 65,536 262,1447 1,024 4,096 16,384 65,536 262,1448 1,024 4,096 16,384 65,536 262,144

    computer vision tohave tobalance thedesire for increased resolution (bothgrayscaleandspatial) against itscost. Betterdatacanoften makealgorithmseasier towrite,but asmallamount ofdatacanmakeprocessingmoreefficient. Ofcourse,the image domain, choice of algorithms, and image characteristics all heavilyinfluencethechoiceofresolutions.

    TesselationsandDistanceMetricsAlthoughthespatialsamplesfor/(x) canberepresentedaspoints,itismore

    satisfying totheintuitionandacloserapproximation totheacquisitionprocesstothink of these samplesasfinitesizedcellsofconstant graylevel partitioning theimage.Thesecellsare termedpixels,anacronymforpictureelements. Thepatternintowhich theplaneisdivided iscalled its tesselation.Themostcommon regulartesselationsoftheplaneareshowninFig.2.11.

    Although rectangular tesselations arealmost universally used in computervision, they have a structural problem known as the "connectivity paradox."Givenapixelinarectangular tesselation,howshouldwedefinethepixelstowhichit is connected? Two common ways arefourconnectivity and eightconnectivity,showninFig.2.12.

    However,eachoftheseschemeshascomplications.ConsiderFig.2.12c,consisting of a black object with a hole on a white background. If we use fourconnectedness, the figure consists of four disconnected pieces, yet the hole isseparated from the "outside" background. Alternatively, if we use eightconnectedness, thefigureisoneconnectedpiece,yettheholeisnowconnected totheoutside.Thisparadoxposescomplicationsformanygeometricalgorithms.Triangular and hexagonal tesselations donot suffer from connectivity difficulties (ifweusethreeconnectedness for triangles);however, distancecanbemore difficulttocomputeonthesearraysthanforrectangulararrays.

    Thedistancebetween twopixelsinanimageisanimportantmeasurethatisfundamental tomanyalgorithms.Ingeneral,adistanced\sametric.Thatis,

    Sec. 2.2 ImageModel 37

  • Fig. 2.10 Usingdifferent numbers ofbitsper sample, (a) m= 1; (b) m=2; (c)m = 4; (d) m= 8.

    (1) d(x, y)=0i f fx= y(2) d(x, y)=d(y,x)(3) d(x, y)+ /(y, z) >d(\, z)

    Forsquarearrayswithunitspacingbetweenpixels,wecanuseanyofthefollowingcommondistancemetrics (Fig.2.13)fortwopixelsx = (x\,yi) and>>=(^J^)

    Euclidean:de(x, y) V ( x , x 2 ) 2+ (y]y2)2

    Cityblock:

    dCb(*>y ) = |jfi*2l'+ \y\y2\

    (2.37)

    (2.38)

    38 Ch.2 ImageFormation

  • (a)

    (c)

    Chessboard:

    Fig. 2.11 Different tesselationsof theimage plane, (a)Rectangular; (b)triangular; (c) hexagonal.

    dch(x,y) =max \x\x2l\y\y2\ (2.39)

    Other definitions are possible, and all suchmeasures extend tomultiple dimensions.Thetesselationofhigherdimensionalspaceintopixelsusuallyisconfined to(/7dimensional) cubicalpixels.

    TheSamplingTheoremConsider the onedimensional "image" shown inFig.2.14.Todigitize this

    imageonemustsampletheimagefunction.Thesesampleswillusuallybeseparatedat regular intervalsasshown. Howfarapart should thesesamplesbetoallowreconstruction (toagivenaccuracy) oftheunderlyingcontinuous imagefrom itssamples?Thisquestion isanswered bytheShannonsamplingtheorem.Anexcellent rigorous presentation of the sampling theorem may befound in [RosenfeldandKak 1976].Hereweshallpresent ashortergraphical interpretation using theresultsofTable2.2.Forsimplicityweconsider theimagetobeperiodicinordertoavoid small edgeeffects introduced bythefiniteimagedomain.Amore rigorous

    Sec.2.2 Image Model 39

  • wm. f 1^ /// ^

    ^

    ^ i^ ^ ^ 7

    ^ mz^^

    (b) (0

    Fig. 2.12 Connectivity paradox for rectangular tesselations. (a)Acentral pixeland its4connected neighbors; (b) a pixel and its 8connected neighbors; (c)afigurewithambiguousconnectivity.

    23232223

    2211122

    3323

    321233210123 32101232211122 3212332223232

    3233

    (b)

    3333333322222332111233210123321112332222233333333

    (0

    Fig. 2.13 Equidistant contours fordifferentmetrics.

    Fig. 2.14 Onedimensionalimageanditssamples.

    treatment, which considers these effects, is given in [Andrews and Hunt1977].

    Supposethattheimageissampledwitha"comb"functionofspacingxQ(seeTable2.2).Thenthesampledimagecanbemodeledby

    fAx) =f(x)^d(x nx0) (2.40)

    wheretheimagefunction modulatesthecombfunction. Equivalently, thiscanbewrittenas

    fs(x) = f(nx0)8(x nx0) (2.41)

    Therighthand sideofEq. (2.40) istheproductoftwofunctions, sothatproperty

    40 Ch. 2 Image Formation

  • (6) inTable2.1isappropriate.TheFouriertransform offs(x) isequaltotheconvolutionofthetransformsofeachofthetwofunctions.Usingthisresultyields

    Fsiu) =F(u)*T8(w ) (2.42)

    ButfromEq.(2.3),

    sothat

    * o *o

    Fiu) *8(w ~ ) = F(u ) (2.43)X0 XQ

    Fs(u) =^Fiu ^) (2.44) *0 ^ o

    Therefore, sampling theimagefunctionfix) atintervalsofxoisequivalentinthefrequency domain toreplicatingthetransform of /a t intervalsof . This

    XQlimitstherecoveryoffix) from itssampledrepresentation, fs(x). Therearetwobasicsituationstoconsider.Ifthe transform offix) isbandlimitedsuchthat Fiu)= 0for|u|> l/(2x0), thenthereisnooverlapbetweensuccessivereplicationsofFiu) inthefrequency domain. Thisisshownfor thecaseofFig.2.15a,wherewehavearbitrarilyusedatriangularshaped imagetransform toillustratetheeffectsofsampling.Incidentally, note thatfor thistransform Fiu) =F(u) andthatithasno imaginary part; from Table2.2, the onedimensional imagemust alsoberealandeven.NowifF{u) isnotbandlimited, i.e.,thereareu> forwhichFiu)

    2*o^ 0,thencomponentsofdifferent replicationsofFiu) willinteracttoproducethecomposite function Fsiu), asshowninFig. 2.15b.Inthefirstcasefix) canberecoveredfromFsiu) bymultiplyingFsiu) byasuitableGiu):

    G(u) = 1 lw|

  • Piu)

    7x,1

    2xn

    PAu)

    y\\ / \ /

    * 0

    12xn

    (b)

    PAu)G(u) > PAu)G(u)

    (0

    Fig. 2.15 (a) Fiu) bandlimited so that F(u) = 0 for\u\> \/2x0.limitedasin(a),(c)reconstructed transform.

    (b) Fiu) not band

    theimage,orcanbeincorporatedwithsuchblurring (byintegratingtheimageintensityoverafiniteareaforeachsample).Imageblurringcanbury irrelevantdetails,reducecertainformsofnoise,andalsoreducetheeffectsofaliasing.

    2.3 IMAGING DEVICESFORCOMPUTER VISION

    Thereisavastarrayofmethodsforobtainingadigitalimageinacomputer.Inthissectionwehaveinmindonly "traditional" imagesproduced byvariousformsofradiationimpingingonasensorafterhavingbeenaffected byphysicalobjects.

    Manysensorsarebestmodeledasananalogdevicewhoseresponsemustbedigitizedforcomputer representation.Thetypesofimaging devices possiblearelimitedonlybythetechnical ingenuityoftheirdevelopers;attemptingadefinitive

    42 Ch. 2 Image Formation

  • XRayscanner

    Fig. 2.16 Imagingdevices (boxes), information structures (rectangles),and processes (circles).

    taxonomy isprobably unwise. Figure 2.16 isa flowchart ofdevices, informationstructures,andprocessesaddressedinthisandsucceedingsections.

    Whentheimagealreadyexistsinsomeform,orphysicalconsiderationslimitchoiceofimagingtechnology,thechoiceofdigitizingtechnologymaystillbeopen.Mostimagesarecarriedonapermanentmedium,suchasfilm,oratleastareavailable in (essentially) analog form to adigitizing device.Generally, the relevanttechnical characteristics of imaging or digitizing devices should be foremost inmind when a technique is being selected. Such considerations as the signaltonoise ratio ofthe device, its resolution, the speed atwhich itworks,and itsexpenseareimportantissues.

    Sec.2.3 Imaging Devices for Computer Vision 43

  • 2.3.1 Photographic Imaging

    The camera is the most familiar producer of optical images on a permanentmedium.Weshallnotaddressherethemultitudesofstillandmoviecameraoptions;rather,webriefly treatthecharacteristicsofthephotographicfilmandofthedigitizingdevicesthatconverttheimagetomachinereadableform.MoreonthesetopicsiswellpresentedintheReferences.

    Photographic (blackandwhite) filmconsistsofanemulsionofsilverhalidecrystalsonafilmbase.(Severalotherlayersareidentifiable,butarenotessentialtoanunderstandingof therelevant propertiesoffilm.)Uponexposure tolight, thesilver halide crystals form developmentcenters, whichare smallgrainsofmetallicsilver.The photographic development process extends the formation ofmetallicsilver totheentire silverhalidecrystal,which thusbecomesabinary ("light" or"no light") detector. Subsequent processing removes undeveloped silverhalide.The resultingfilmnegative isdarkwheremany crystalsweredeveloped and lightwherefewwere.Theresolution ofthefilmisdetermined bythegrainsize,whichdepends on the original halide crystals and on development techniques. Generally,thefasterthefilm(thelesslightneeded toexposeit),thecoarserthegrain.Filmexiststhatissensitivetoinfrared radiation;xrayfilmtypicallyhastwoemulsionlayers,givingitmoregraylevelrangethanthatofnormalfilm.

    Arepetitionofthenegativeforming processisusedtoobtainaphotographicprint.Thenegative isprojected onto photographicpaper,which respondsroughlyinthesamewayasthenegative.Mostphotographicprintpapercannotcaptureinoneprint therangeofdensitiesthatcanbepresentinanegative. Positivefilmsdoexistthatdonotrequireprinting;themostcommonexampleiscolorslide film.

    Theresponseoffilmtolightisnotcompletely linear.Thephotographicdensityobtained byanegative isdefined asthe logarithm (base 10)oftheratioofincidentlighttotransmittedlight.

    D = log10 f

    The exposure of a negative dictates (approximately) its response. Exposure isdefined astheenergyperunitarea thatexposed thefilm(in itssensitivespectralrange).Thusexposureistheproductoftheintensityandthetimeofexposure.Thismathematical model of the behavior of the photographic exposure process iscorrect for awideoperating rangeofthefilm,but reciprocityfailureeffects in thefilmkeeponefrom beingablealwaystotradelightlevelforexposuretime.Atverylowlightlevels,longerexposuretimesareneeded thanarepredictedbytheproductrule.

    The responseoffilmtolight isusuallyplotted inan"H&Dcurve" (namedforHurterandDriffield), whichplotsdensityversusexposure.TheH&Dcurveoffilmdisplaysmany of its important characteristics. Figure 2.17 exhibits a typicalH&Dcurveforablackandwhite film.

    The toeofthecurveisthelowerregionoflowslope.Itexpressesreciprocityfailure and the fact that thefilmhasacertain bias,orfog response,whichdominatesitsbehavioratthelowestexposurelevels.Asonewouldexpect, thereisanupper limittothedensityofthefilm,attainedwhenamaximumnumberofsilver

    Ch. 2 Image Formation

  • Log(exposure) Fig. 2.17 TypicalH & D curve.

    halide crystals are rendered developable. Increasing exposure beyond thismaximum levelhaslittleeffect, accountingfor the shoulderintheH&Dcurve,oritsflattenedupperend.

    Inbetweenthetoeandshoulder, thereistypicallyalinearoperatingregionofthe curve. Highcontrast films are those with high slope (traditionally calledgamma); theyresponddramatically tosmallchangesinexposure.Ahighcontrastfilmmayhaveagammabetweenabout1.5and10.Filmswithgammasofapproximately 10areused ingraphicsarts tocopy linedrawings.Generalpurposefilmshavegammasofabout0.5to1.0.

    The resolution ofageneralfilm isabout 40 lines/mm, whichmeans thata1400x 1400imagemaybedigitizedfrom a35mmslide.Atanygreater samplingfrequency, theindividualfilmgrainswilloccupymorethanapixel,andtheresolutionwillthusbegrainlimited.

    ImageDigitizers(Scanners)Accuracyand speedarethemainconsiderations inconverting animageon

    filmintodigitalform.Accuracyhastwoaspects:spatialresolution,looselythelevelofimagespatialdetailtowhichthedigitizercanrespond,andgraylevelresolution,defined generally as the range ofdensities or reflectances towhich the digitizerrespondsandhowfinelyitdividestherange. Speedisalsoimportantbecauseusuallymanydataareinvolved;imagesof1millionsamplesarecommonplace.

    Digitizers broadly take two forms: mechanical and "flying spot." In amechanicaldigitizer,thefilmandasensingassemblyaremechanicallytransportedpastoneanotherwhilereadingsaremade.Inaflyingspotdigitizer, thefilmandsensorarestatic.Whatmovesisthe"flying spot,"whichisapointoflightonthefaceofacathoderay tube,oralaser beamdirectedbymirrors.Inalldigitizersaverynarrowbeamoflightisdirectedthroughthefilmorontotheprintataknowncoordinatepoint.Thelighttransmittanceorreflectance ismeasured, transformedfrom analogtodigitalform, andmadeavailabletothecomputer through interfacingelectronics.Thelocationonthemediumwheredensityisbeingmeasuredmayalsobetransmittedwitheachreading,butitisusuallydeterminedbyrelativeoffsetfrom positions transmitted lessfrequently. For example, a"new scan line" impulse istransmittedforTVoutput; thepositionalongthecurrentscanlineyieldsanxposition,andthenumberofscanlinesyieldsavposition.

    Sec.2.3 Imaging Devices for Computer Vision 45

    2.0

    1.0

    nn

  • Themechanicalscannersaremostlyoftwotypes,flatbedanddrum.Inaflatbeddigitizer, thefilmislaidflatonasurface overwhich the lightsourceand thesensor (usually a very accurate photoelectric cell) are transported in a rasterfashion. Inadrumdigitizer, thefilmisfastened toacirculardrumwhichrevolvesasthesensorandlightsourcearetransporteddownthedrumparalleltoitsaxisofrotation.

    Color mechanical digitizers also exist; they work by using colored filters,effectively extracting inthreescansthree "color overlays"whichwhen superimposedwould yield the original color image.Extracting some "composite" colorsignalwithonereadingpresentstechnicalproblemsandwouldbedifficult todoasaccurately.

    SatelliteImageryLANDSATandERTS (EarthResourcesTechnologySatellites) havesimilar

    scannerswhichproduceimagesof2340x33807bitpixelsinfour spectralbands,coveringanareaofi00x 100nauticalmiles.Thescannerismechanical,scanningsix horizontal scan lines at a time; the rotation of the earth accounts for theadvancementofthescanintheverticaldirection.

    Asetoffour imagesisshowninFig.2.18.Thefour spectralbandsarenumbered 4,5,6,and7.Band4 [0.5to0.6/xm (green)] accentuates sedimentladenwaterandshallowwater,band5[0.6to0.7/xm (red)]emphasizescultural featuressuchasroadsandcities,band6[0.7to0.8fxm (nearinfrared)] emphasizesvegetationandaccentuates thecontrast between landandwater, band7 [0.8to 1.1/xm(near infrared)] islike band 6except that it isbetter at penetrating atmospherichaze.

    TheLANDSAT imagesareavailableatnominal costfrom theU.S.government (TheEROSDataCenter, SiouxFalls,SouthDakota 57198).Theyare furnished on tape, and cover the entire surface of the earth (often the buyer hasachoiceoftheamountofcloudcover).Theseimagesformahugedatabaseofmultispectral imagery, useful for landuseandgeologicalstudies;they furnish somethingofanimageanalysischallenge,sinceonesatellitecanproducesome6billionbitsofimagedataperday.

    TelevisionImagingTelevisioncamerasareappealingdevicesforcomputervisionapplicationsfor

    several reasons. For one thing, the image is immediate; the camera can showeventsastheyhappen.Foranother, theimageisalreadyinelectrical,ifnotdigitalform. "Television camera" is basically a nontechnical term, because manydifferent technologiesproducevideosignalsconformingtothestandardssetbytheFCCandNTSC.Camerasexistwithawidevarietyoftechnicalspecifications.

    Usually,TVcamerashaveassociated electronicswhichscananentire "picture" atatime.Thisoperation isclosely related tobroadcastand receiver standards,and ismoreoriented tohuman viewing than tocomputer vision.An entireimage (ofsome525scanlinesintheUnitedStates) iscalledaframe, andconsistsoftwofields,eachmadeupofalternatescanlinesfrom theframe.Thesefieldsaregeneratedandtransmitted sequentiallybythecameraelectronics.Thetransmittedimageisthusinterlaced,withalloddnumbered scanlinesbeing"painted" on the

    46 Ch.2 ImageFormation

  • (c) (d)Fig. 2.18 Thestra