ballard d. and brown c. m. 1982 computer vision

COMFUTDANAH.BALLARDCHRISTOPHERM.BROWN

COMPUTERVISION

DanaH.BallardChristopher M.Brown

Department of Computer ScienceUniversity of RochesterRochester, New York

PRENTICEHALL, INC., Englewood Cliffs, New Jersey 07632

LibraryofCongress Cataloging inPublication DataBALLARD. DANA HARRY.

Computer vision.Bibliography: p.Includes index.I. Imageprocessing. I. Brown,Christopher M.

II. Title.TA1632.B34 621.38'04I4 8120974ISBN 0131653164 AACR2

Cover design by Robin Breite

1982 by PrenticeHall, Inc.Englewood Cliffs, New Jersey 07632

All rights reserved. No part of this bookmay be reproduced in any form or by any meanswithout permission in writing from the publisher.

Printed in the United States of America

10 9 8 7 6 5 4 3 2

ISBN D13XtS31bM

PRENTICEHALL INTERNATIONAL, INC., LondonPRENTICEHALL OF AUSTRALIA PTY. LIMITED, SydneyPRENTICEHALL OF CANADA, LTD., TorontoPRENTICEHALL OF INDIA PRIVATE LIMITED, New DelhiPRENTICEHALL OF JAPAN, INC., TokyoPRENTICEHALL OF SOUTHEAST ASIA PTE. LTD., SingaporeWHITEHALL BOOKS LIMITED, Wellington, New Zealand

Preface xiii

Acknowledgments xv

Mnemonics for Proceedings and Special Collections Cited inReferences xix

1 COMPUTER VISION1.1 AchievingSimpleVisionGoals 11.2 HighLevel and LowLevel Capabilities 21.3 ARange of Representations 61.4 The Role ofComputers 91.5 Computer Vision Research and Applications 12

Part IGENERALIZED IMAGES

13

IMAGE FORMATION2.1 Images 172.2 Image Model 18

2.2.1 ImageFunctions, 182.2.2 Imaging Geometry, 192.2.3 Reflectance, 222.2.4 Spatial Properties, 242.2.5 Color, 312.2.6 Digital Images, 35

2.3 Imaging Devices for Computer Vision2.3.1 Photographic Imaging,442.3.2 Sensing Range, 522.3.3 Reconstruction Imaging, 56

EARLY PROCESSING3.1 Recovering Intrinsic Structure 633.2 Filtering the Image 65

3.2.1 TemplateMatching,653.2.2 Histogram Transformations, 703.2.3 Background Subtraction, 723.2.4 Filtering and Reflectance Models, 73

3.3 Finding Local Edges 753.3.1 TypesofEdgeOperators,763.3.2 Edge Thresholding Strategies, 803.3.3 ThreeDimensional Edge Operators, 813.3.4 How Good Are Edge Operators? 833.3.5 Edge Relaxation, 85

3.4 Range Information from Geometry 883.4.1 Stereo Vision andTriangulation, 883.4.2 A Relaxation Algorithm for Stereo, 89

3.5 Surface Orientation from Reflectance Models 933.5.1 Reflectivity Functions,933.5.2 Surface Gradient, 953.5.3 Photometric Stereo, 983.5.4 Shape from Shading by Relaxation, 99

3.6 Optical Flow 1023.6.1 TheFundamental FlowConstraint, 1023.6.2 Calculating Optical Flow by Relaxation, 103

3.7 Resolution Pyramids 1063.7.1 GrayLevel Consolidation, 1063.7.2 Pyramidal Structures in Correlation, 1073.7.3 Pyramidal Structures in Edge Detection, 109

42

PART IISEGMENTED IMAGES

115

4 BOUNDARY DETECTION4.1 On Associating Edge Elements 1194.2 Searching Nea r an Approximate Location 121

4.2.1 Adjusting A Priori Boundaries, 1214.2.2 NonLinear Correlation in Edge Space, 1214.2.3 DivideandConquer Boundary Detection, 122

4.3 The Hough Method for Curve Detection 1234.3.1 UseoftheGradient, 1244.3.2 Some Examples, 1254.3.3 Trading OffWork in Parameter Space for Work in

Image Space, 1264.3.4 GeneralizingtheHough Transform, 128

4.4 Edge Following as Graph Searching 1314.4.1 GoodEvaluationFunctions,1334.4.2 Finding All the Boundaries, 1334.4.3 Alternatives to the A Algorithm, 136

4.5 Edge Following as Dynamic Programming 1374.5.1 Dynamic Programming, 1374.5.2 Dynamic Programming for Images, 1394.5.3 Lower Resolution Evaluation Functions, 1414.5.4 Theoretical Questions about Dynamic

Programming, 1434.6 Contour Following 143

4.6.1 Extension toGrayLevel Images, 1444.6.2 Generalization to HigherDimensional Image

Data, 146

5 REGION GROWING5.1 Regions 1495.2 A Local Technique: Blob Coloring 1515.3 Global Techniques: Region Growing via Thresholding

5.3.1 Thresholding inMultidimensional Space,1535.3.2 Hierarchical Refinement, 155

5.4 Splitting and Merging 1555.4.1 StateSpaceApproach toRegionGrowing, 1575.4.2 LowLevel Boundary Data Structures, 1585.4.3 GraphOriented Region Structures, 159

5.5 Incorporat ion of Semantics 160

6 TEXTURE6.1 What Is Texture? 1666.2 Texture Primitives 169

Contents

6.3 Structural Models of Texel Placement 1706.3.1 Grammatical Models, 1726.3.2 Shape Grammars, 1736.3.3 Tree Grammars, 1756.3.4 Array Grammars, 178

6.4 Texture as a Pattern Recognition Problem 1816.4.1 Texture Energy, 1846.4.2 Spatial GrayLevel Dependence, 1866.4.3 Region Texels, 188

6.5 TheTexture Gradient 189

7 MOTION7.1 Mot ion Unders tanding 195

7.1.1 DomainIndependent Understanding, 1967.1.2 DomainDependent Understanding, 196

7.2 Understanding Optical Flow 1997.2.1 Focusof Expansion, 1997.2.2 Adjacency, Depth, and Collision, 2017.2.3 Surface Orientation and Edge Detection, 2027.2.4 Egomotion, 206

7.3 Unders tanding Image Sequences 2077.3.1 Calculating Flow from Discrete Images,2077.3.2 Rigid Bodies from Motion, 2107.3.3 Interpretation of Moving Light DisplaysA

DomainIndependent Approach, 2147.3.4 Human Motion UnderstandingA Model

Directed Approach, 2177.3.5 Segmented Images, 220

Part IIIGEOMETRICAL STRUCTURES

227

8 REPRESENTATION OF TWODIMENSIONALGEOMETRICSTRUCTURES

8.1 TwoDimensional Geometric Structures 2318.2 Boundary Representations 232

8.2.1 Polylines,2328.2.2 Chain Codes, 2358.2.3 The tyj Curve, 2378.2.4 Fourier Descriptors, 2388.2.5 Conic Sections, 2398.2.6 BSplines, 2398.2.7 Strip Trees, 244

8.3 Region Representations 2478.3.1 SpatialOccupancyArray,2478.3.2 y Axis, 2488.3.3 Quad Trees, 2498.3.4 Medial Axis Transform, 2528.3.5 Decomposing Complex Areas, 253

8.4 Simple Shape Properties 2548.4.1 Area,2548.4.2 Eccentricity, 2558.4.3 Euler Number, 2558.4.4 Compactness, 2568.4.5 Slope Density Function, 2568.4.6 Signatures, 2578.4.7 Concavity Tree, 2588.4.8 ShapeNumbers, 258

9 REPRESENTATION OF THREEDIMENSIONALSTRUCTURES

9.1 Solids and Their Representation 2649.2 Surface Representations 265

9.2.1 SurfaceswithFaces,2659.2.2 Surfaces Based on Splines, 2689.2.3 Surfaces That Are Functions on the Sphere, 270

9.3 Generalized Cylinder Representations 2749.3.1 GeneralizedCylinderCoordinate Systemsand

Properties, 2759.3.2 Extracting Generalized Cylinders, 2789.3.3 ADiscreteVolumetricVersion of theSkeleton, 279

9.4 Volumetric Representations 2809.4.1 SpatialOccupancy, 2809.4.2 Cell Decomposition, 2819.4.3 Constructive Solid Geometry, 2829.4.4 Algorithms for Solid Representations, 284

9.5 Unders tanding Line Drawings 2919.5.1 Matching LineDrawings to ThreeDimensional

Primitives, 2939.5.2 Grouping Regions Into Bodies, 2949.5.3 Labeling Lines, 2969.5.4 Reasoning About Planes, 301

Part IVRELATIONAL STRUCTURES

313

10 KNOWLEDGE REPRESENTATION AND USE10.1 Representations 317

10.1.1 TheKnowledgeBaseModels and Processes,318

10.1.2 Analogical and Propositional Representations,319

10.1.3 Procedural Knowledge, 32110.1.4 Computer Implementations, 322

10.2 SemanticNets 32310.2.1 SemanticNet Basics,32310.2.2 Semantic Nets for Inference, 327

10.3 Semantic Net Examples 33410.3.1 Frame Implementations, 33410.3.2 Location Networks, 335

10.4 Control Issues in Complex Vision Systems 34010.4.1 ParallelandSerialComputation, 34110.4.2 Hierarchical and Heterarchical Control, 34110.4.3 Belief Maintenance and Goal Achievement, 346

11 MATCHING 3521.1 Aspects of Matching 352

11.1.1 Interpretation:Construction, Matching, andLabeling 352

Il.l.2 Matching Iconic, Geometric, and RelationalStructures, 353

1.2 GraphTheoretic Algorithms 35511.2.1 TheAlgorithms,35711.2.2 Complexity, 359

1.3 Implementing GraphTheoretic Algorithms 36011.3.1 MatchingMetrics,36011.3.2 Backtrack Search, 36311.3.3 Association Graph Techniques, 365

1.4 Matching in Practice 36911.4.1 DecisionTrees,37011.4.2 Decision Tree and Subgraph Isomorphism, 37511.4.3 Informal Feature Classification, 37611.4.4 A Complex Matcher, 378

12 INFERENCE 38312.1 FirstOrder Predicate Calculus 384

12.1.1 ClauseForm Syntax (Informal), 38412.1.2 Nonclausal Syntax and Logic Semantics

(Informal), 38512.1.3 ConvertingNonclausal Form toClauses,38712.1.4 Theorem Proving, 38812.1.5 Predicate Calculus and Semantic Networks, 39012.1.6 Predicate Calculus and Knowledge

Representation, 39212.2 Computer Reasoning 39512.3 Production Systems 396

12.3.1 Production System Details, 39812.3.2 Pattern Matching, 399

Contents

12.3.3 An Example, 40112.3.4 Production System Pros and Cons, 406

12.4 SceneLabeling and Constraint Relaxation 40812.4.1 Consistent andOptimal Labelings,40812.4.2 Discrete Labeling Algorithms, 41012.4.3 A Linear Relaxation Operator and a Line

Labeling Example, 41512.4.4 ANonlinear Operator, 41912.4.5 Relaxation as Linear Programming, 420

12.5 Active Knowledge 43012.5.1 Hypotheses,43112.5.2 HOWTO and SOWHAT Processes, 43112.5.3 Control Primitives, 43112.5.4 Aspects of Active Knowledge, 433

13 GOAL ACHIEVEMENT 43813.1 Symbolic Planning 439

13.1.1 RepresentingtheWorld,43913.1.2 Representing Actions, 44113.1.3 Stacking Blocks, 44213.1.4 The Frame Problem, 444

13.2 Planning with Costs 44513.2.1 Planning,Scoring,andTheir Interaction,44613.2.2 Scoring Simple Plans, 44613.2.3 Scoring Enhanced Plans, 45113.2.4 Practical Simplifications, 45213.2.5 A Vision System Based on Planning, 453

APPENDICES465

A 1 SOME MATHEMATICAL TOOLS 465Al.l Coordinate Systems 465

Al.I.I Cartesian,465Al.l .2 Polar and Polar Space, 465Al. l .3 Spherical and Cylindrical, 466Al.l .4 Homogeneous Coordinates, 467

A l .2 Trigonometry 468Al.2.1 PlaneTrigonometry, 468A1.2.2 Spherical Trigonometry, 469

A1.3 Vectors 469A1.4 Matrices 471A1.5 Lines 474

A1.5.1 Two Points,474A1.5.2 Point and Direction, 474A1.5.3 Slope and Intercept, 474

Contents xi

A1.5.4 Ratios, 474Al.5.5 Normal and Distance from Origin (Line

Equation), 475A1.5.6 Parametric,476

A1.6 Planes 476A1.7 Geometric Transformations 477

A1.7.1 Rotation,477A1.7.2 Scaling, 478A1.7.3 Skewing, 479Al.7.4 Translation, 479A1.7.5 Perspective, 479Al.7.6 Transforming Lines and Planes, 480A1.7.7 Summary, 480

A1.8 Camera Calibration and Inverse Perspective 481A1.8.1 CameraCalibration, 482A1.8.2 Inverse Perspective, 483

A1.9 LeastSquaredError Fitting 484A1.9.1 PseudoInverseMethod,485A1.9.2 Principal Axis Method, 486Al.9.3 Fitting Curves by the PseudoInverse Method,

487A1.10 Conies 488A1.11 Interpolation 489

Al.11.1 OneDimensional,489A1.11.2 TwoDimensional, 490

A1.12 The Fast Fourier Transform 490A1.13 The Icosahedron 492A1.14 Root Finding 493

A 2 ADVANCED CONTROL MECHANISMSA2.1 Standard Control Structures 497

A2.1.1 Recursion,498A2.1.2 CoRoutining, 498

A2.2 Inherently Sequential Mechanisms 499A2.2.1 Automatic Backtracking,499A2.2.2 Context Switching, 500

A2.3 Sequential or Parallel Mechanisms 500A2.3.1 Modules and Messages,500A2.3.2 Priority Job Queue, 502A2.3.3 PatternDirected Invocation, 504A2.3.4 Blackboard Systems, 505

AUTHOR INDEX

SUBJECTINDEX

Preface

Thedreamofintelligentautomatagoesbacktoantiquity;itsfirstmajor articulationin the context of digital computerswas byTuring around 1950.Since then, thisdream has beenpursued primarily byworkers in thefieldofartificial intelligence,whose goal is to endow computers with informationprocessing capabilitiescomparable to thoseofbiological organisms.From theoutset, oneofthegoalsofartificial intelligencehasbeentoequipmachineswiththecapabilityofdealingwithsensory inputs.

Computervisionis the construction of explicit, meaningful descriptions ofphysical objects from images. Image understanding is very different from imageprocessing,which studies imagetoimage transformations, not explicit descriptionbuilding. Descriptions are a prerequisite for recognizing, manipulating, andthinking about objects.

We perceive a world of coherent threedimensional objects with manyinvariant properties. Objectively, the incoming visual data do not exhibitcorresponding coherence or invariance; they contain much irrelevant or evenmisleading variation. Somehow our visual system, from the retinal to cognitivelevels,understands, or imposesorder on, chaoticvisual input. It doessobyusingintrinsicinformationthatmayreliablybeextractedfrom theinput,andalsothroughassumptionsand knowledge that areapplied at various levelsinvisualprocessing.

The challenge of computer vision is one of explicitness. Exactly whatinformation about scenes can be extracted from an image using only very basicassumptions about physics and optics? Explicitly, what computations must beperformed? Then, at what stagemust domaindependent, prior knowledge abouttheworld beincorporated intotheunderstanding process?Howareworldmodelsand knowledge represented and used?Thisbook isabout the representations andmechanismsthatallowimageinformation andpriorknowledgetointeractinimageunderstanding.

Computer vision is a relatively new and fastgrowing field. The firstexperimentswere conducted in the late 1950s,and many of theessential concepts

xiii

havebeendevelopedduringthelastfiveyears.Withthisrapidgrowth,crucialideashave arisen indisparate areas such asartificial intelligence, psychology, computergraphics,andimageprocessing.Ourintentistoassembleaselectionofthismaterialin a form that will serve both as asenior/graduatelevel academic text and as auseful reference to thosebuilding vision systems.Thisbook has astrong artificialintelligenceflavor,andwehopethiswillprovokethought.Webelievethatboththeintrinsic image information and theinternal model of theworld are important insuccessful vision systems.

Thebookisorganized intofourparts,basedondescriptionsofobjectsat fourdifferent levelsof abstraction.

1.Generalizedimagesimagesandimagelikeentities.2. Segmented imagesimages organized into subimages that are likely to

correspond to"interesting objects."3. Geometricstructuresquantitativemodelsofimageandworldstructures.4. Relational structurescomplex symbolicdescriptions of imageandworld

structures.

Theparts follow aprogression of increasingabstractness.Although the fourpartsaremostnaturallystudiedinsuccession,theyarenottightlyinterdependent.PartIisaprerequisitefor PartII,butPartsIIIand IVcanberead independently.

Parts of the book assume some mathematical and computing background(calculus,linearalgebra,datastructures,numericalmethods).However, throughoutthebookmathematicalrigortakesabackseattoconcepts.Ourintentistotransmitasetof ideasabout anewfieldtothewidestpossibleaudience.

Inonebookitisimpossibletodojusticetothescopeanddepthofpriorworkincomputervision.Further,werealizethatinafastdevelopingfield,therapidinfluxofnewideaswillcontinue.Wehopethatourreaderswillbechallengedtothink,criticize,read further, andquicklygobeyondtheconfines ofthisvolume.

xiv Preface

Acknowledgments

JerryFeldman andHerbVoelcker(andthrough them theUniversity ofRochester)providedmany resources for thiswork.Oneof themost importantwasacapableand forgiving staff (secretarial, technical, and administrative). For massive textediting, valuable advice,and good humorweareespeciallygrateful to Rose Peet.Peggy Meeker, Jill Orioli, and Beth Zimmerman all helped at various stages.

Several colleaguesmade suggestions on early drafts: thanks to JamesAllen,Norm Badler, Larry Davis,Takeo Kanade, John Render, Daryl Lawton, JosephO'Rourke, Ari Requicha, Ed Riseman, Azriel Rosenfeld, Mike Schneier, KenSloan, SteveTanimoto, Marty Tenenbaum, and Steve Zucker.

Graduatestudentshelped inmanydifferent ways:thanksespeciallytoMichelDenber, Alan Frisch, Lydia Hrechanyk, Mark Kahrs,Keith Lantz, JoeMaleson,LeeMoore,MarkPeairs,DonPerlis,RickRashid,DanRussell,DanSabbah,BobSchudy,PeterSelfridge, UriShani,andBobTilove.BernhardStuthdeservesspecialmention formuchcareful andcritical reading.

Finally,thanksgoto JaneBallard,mostly for standing steadfast through thecycles ofelation and depression and for numerous engineeringtoEnglish translations.

AsPatWinstonputit:"Awillingnesstohelpisnotanimpliedendorsement."The aid of others was invaluable, butwe alone are responsible for theopinions,technical details, and faults of this book.

FundingassistancewasprovidedbytheSloanFoundation underGrant78415,bytheNational InstitutesofHealthunderGrantHL21253,andbytheDefenseAdvanced Research Projects Agency under Grant N0001478C0164.

The authors wish to credit the following sources for figures and tables. Forcomplete citations given here in abbreviated form (as "from ..." or "after .. ."),refer to the appropriate chapterend references.

Fig. 1.2 from Shani, U., "A 3D modeldriven system for the recognition of abdominalanatomy from CTscans,"TR77,Dept.ofComputer Science,University ofRochester, May1980.

Acknowledgments xv

Fig. 1.4 courtesy ofAllen Hanson and Ed Riseman, COINS Research Project, University ofMassachusetts, Amherst, MA.

Fig. 2.4 after Horn and Sjoberg, 1978.Figs. 2.5, 2.9, 2.10, 3.2, 3.6, and 3.7 courtesy of Bill Lampeter.Fig.2.7a painting by Louis Condax; courtesy of Eastman Kodak Company and the OpticalSociety of America.

Fig.2.8acourtesyofD.Greenberg andG. Joblove,Cornell Program ofComputer Graphics.Fig. 2.8b courtesy of Tom Check.Table 2.3 after Gonzalez and Wintz, 1977.Fig. 2.18 courtesy of EROS Data Center, Sioux Falls, SD.Figs. 2.19 and 2.20 from Herrick, C.N., Television Theory and Servicing: Black/White andColor, 2nd Ed. Reston, VA: Reston, 1976.

Figs. 2.21,2.22, 2.23, and 2.24 courtesy of Michel Denber.Fig. 2.25 from Popplestone et al., 1975.Fig. 2.26 courtesy of Production Automation Project, University of Rochester.Fig. 2.27 from Waag and Gramiak, 1976.Fig. 3.1 courtesy of Marty Tenenbaum.Fig. 3.8 after Horn, 1974.Figs. 3.14 and 3.15 after Frei and Chen, 1977.Figs. 3.17and 3.18 from Zucker, S.W. and R.A. Hummel, "An optimal 3Dedge operator,"IEEE Trans. PAMI3, May 1981,pp. 324331.

Fig. 3.19 curvesare based on data inAbdou, 1978.Figs.3.20, 3.21,and 3.22 from Prager, J.M., "Extracting and labeling boundary segments innatural scenes," IEEE Tans. PAMI 12, 1,January 1980. 1980 IEEE.

Figs.3.23,3.28,3.29, and 3.30courtesy of Berthold Horn.Figs. 3.24 and 3.26 from Marr, D. and T. Poggio, "Cooperative computation of stereo disparity," Science, Vol. 194, 1976,pp.283287. 1976bytheAmerican Association for theAdvancement of Science.

Fig. 3.31 from Woodham, R.J., "Photometric stereo:A reflectance map technique for determining surface orientation from image intensity," Proc. SPIE, Vol. 155, August 1978.

Figs. 3.33 and 3.34 after Horn and Schunck, 1980.Fig. 3.37 from Tanimoto, S. and T. Pavlidis, "A hierarchical data structure for picture processing," CGIP 4, 2, June 1975, pp. 104119.

Fig. 4.6 from Kimme et al., 1975.Figs. 4.7 and 4.16 from Ballard and Sklansky, 1976.Fig. 4.9 courtesy of Dana Ballard and Ken Sloan.Figs.4.12and4.13 from Ramer, U., "Extraction of linestructures from photgraphs ofcurvedobjects," CGIP 4, 2, June 1975,pp. 81103.

Fig. 4.14 courtesy of Jim Lester, Tufts/New England Medical Center.Fig. 4.17 from Chien, Y.P. and K.S. Fu, "A decision function method for boundary detection," CGIP 3, 2, June 1974, pp. 125140.

Fig. 5.3 from Ohlander, R., K. Price,and D.R. Reddy, "Picture segmentation using a recursiveregion splitting method," CGIP 8, 3, December 1979.

Fig. 5.4 courtesy of Sam Kapilivsky.Figs. 6.1, 11.16, and A1.13 courtesy of Chris Brown.Fig. 6.3 courtesy of Joe Maleson and John Kender.Fig. 6.4 from Connors, 1979. Texture images by Phil Brodatz, in Brodatz, Textures. NewYork: Dover, 1966.

Fig. 6.9 texture image by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966.Figs. 6.11, 6.12, and 6.13 from Lu, S.Y. and K.S. Fu, "A syntactic approach to textureanalysis," CGIP 7,3, June 1978, pp. 303330.

xvi Acknowledgments

Fig. 6.14 from Jayaramamurthy, S.N., "Multilevel array grammars for generating texturescenes," Proc. PRIP, August 1979,pp.391398. 1979 IEEE.

Fig.6.20 from Laws, 1980.Figs. 6.21 and 6.22 from Maleson et al., 1977.Fig. 6.23 courtesy of Joe Maleson.Figs. 7.1 and 7.3 courtesy of Daryl Lawton.Fig. 7.2 after Prager, 1979.Figs.7.4and 7.5 from Clocksin,W.F., "Computer prediction ofvisual thresholds for surfaceslant and edge detection from optical flow fields," Ph.D. dissertation, University of Edinburgh, 1980.

Fig. 7.7 courtesy of Steve Barnard and Bill Thompson.Figs. 7.8 and 7.9 from Rashid, 1980.Fig. 7.10 courtesy of Joseph O'Rourke.Figs. 7.11 and 7.12 after Aggarwal and Duda, 1975.Fig. 7.13 courtesy of HansHellmut Nagel.Fig. 8.Id after Requicha, 1977.Figs. 8.2, 8.3, 8.21a, 8.22, and 8.26 after Pavlidis, 1977.Figs. 8.10, 8.11, 9.6, and 9.16 courtesy of Uri Shani.Figs. 8.12, 8.13, 8.14, 8.15, and 8.16 from Ballard, 1981.Fig. 8.21 b from Preston, K., Jr., M.J.B.Duff; S. Levialdi, P.E. Norgren, and Ji. Toriwaki,"Basicsofcellular logicwith someapplications inmedical imageprocessing," Proc. IEEE,Vol. 67, No. 5, May 1979, pp. 826856.

Figs. 8.25, 9.8, 9.9, 9.10, and 11.3 courtesy of Robert Schudy.Fig. 8.29 after Bribiesca andGuzman, 1979.Figs. 9.1,9.18, 9.19, and 9.27 courtesy of Ari Requicha.Fig. 9.2 from Requicha, A.A.G., "Representations for rigid solids: theory, methods,systems," Computer Surveys 12,4, December 1980.

Fig. 9.3 courtesy of Lydia Hrechanyk.Figs. 9.4 and 9.5 after Baumgart, 1972.Fig. 9.7 courtesy of Peter Selfridge.Fig. 9.11 after Requicha, 1980.Figs.9.14and 9.15b from Agin,G.J. andT.O. Binford, "Computer description ofcurvedobjects," IEEE Trans, on Computers 25, 1, April 1976.

Fig. 9.15a courtesy of Gerald Agin.Fig. 9.17 courtesy of A. Christensen; published as frontispiece of ACM SIGGRAPH 80

Proceedings.Fig. 9.20 from Marr and Nishihara, 1978.Fig. 9.21 after Tilove, 1980.Fig. 9.22b courtesy of Gene Hartquist.Figs. 9.24, 9.25, and 9.26 from Lee and Requicha, 1980.Figs.9.28a, 9.29, 9.30, 9.31,9.32,9.35,and 9.37 and Table 9.1 from Brown, C.and R. Popplestone, "Cases insceneanalysis," in Pattern Recognition, ed.B.G. Batchelor. New York:Plenum, 1978.

Fig.9.28bfromGuzman,A.,"Decomposition ofavisualsceneintothreedimensional bodies,"in Automatic Interpretation and Classification of Images, A. Grasseli, ed., New York:Academic Press, 1969.

Fig. 9.28c from Waltz, D., "Understanding line drawing of scenes with shadows," in ThePsychology of Computer Vision,ed. P.H. Winston. New York: McGrawHill, 1975.

Fig. 9.28d after Turner, 1974.Figs. 9.33, 9.38, 9.40, 9.42, 9.43, and 9.44 after Mackworth, 1973.

Acknowledgments xvii

Figs. 9.39, 9.45, 9.46, and 9.47 and Table 9.2 after Kanade, 1978.Figs. 10.2 and A2.1 courtesy of Dana Ballard.Figs. 10.16, 10.17,and 10.18 after Russell, 1979.Fig. 11.5 after Fischler and Elschlager, 1973.Fig. 11.8 after Ambler et al., 1975.Fig. 11.10 from Winston, P.H., "Learning structural descriptions from examples," in ThePsychology of Computer Vision, ed. P.H. Winston. New York: McGrawHill, 1975.

Fig. 11.11 from Nevatia, 1974.Fig. 11.12 after Nevatia, 1974.Fig. 11.17 after Barrow and Popplestone, 1971.Fig. 11.18 from Davis, L.S., "Shape matching using relaxation techniques," IEEE Trans.PAMI 1, 4, January 1979, pp. 6072.

Figs. 12.4 and 12.5 from Sloan and Bajcsy, 1979.Fig. 12.6 after Barrow and Tenenbaum, 1976.Fig. 12.8 after Freuder. 1978.Fig. 12.10from Rosenfeld, A.R.,A.Hummel,andS.W.Zucker,"Scenelabelingby relaxationoperations," IEEE Trans. SMC 6, 6,June 1976,p.420.

Figs. 12.11, 12.12, 12.13, 12.14, and 12.15 after Hinton, 1979.Fig. 13.3 courtesy of Aaron Sloman.Figs. 13.6, 13.7, and 13.8 from Garvey, 1976.Fig.A1.11 after Duda and Hart, 1973.Figs.A2.2andA2.3from Hanson,A.R. and E.M. Riseman, "VISIONS:Acomputer systemfor interpreting scenes," inComputer VisionSystems, ed.A.R. Hanson and E.M. Riseman.New York: Academic Press, 1978.

Acknowledgments

Mnemonicsfor Proceedings and Special Collections

Cited in the References

CGIPComputer Graphicsand Image Processing

COMPSACIEEEComputer Society's3rd InternationalComputerSoftware andApplications Conference, Chicago,November 1979.

cvsHanson, A. R. and E. M. Riseman (Eds.). ComputerVision Systems.NewYork: Academic Press, 1978.

DARPA IUDefense Advanced Research Projects Agency Image UnderstandingWorkshop, Minneapolis, MN, April 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, PaloAlto, CA,October 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, Cambridge, MA,May 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, CarnegieMellon University, Pittsburgh, PA,November 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, University of Maryland, College Park, MD,April 1980.

IJCAI2nd International Joint Conference on Artificial Intelligence, ImperialCollege, London, September 1971.4th International JointConference onArtificial Intelligence,Tbilisi,Georgia,USSR, September 1975.5th International Joint Conference on Artificial Intelligence, MIT,Cambridge, MA,August 1977.6th International Joint Conference onArtificial Intelligence,Tokyo,August1979.

Mnemonics xix

IJCPR2nd International Joint Conference on Pattern Recognition, Copenhagen,August 1974.3rd International Joint Conference on Pattern Recognition, Coronado, CA,November 1976.4thInternationalJointConferenceonPatternRecognition,Kyoto,November1978.5th International Joint Conference on Pattern Recognition, Miami Beach,FL, December 1980.

MI4 *Meltzer, B.and D. Michie (Eds.).MachineIntelligence4.Edinburgh: Edinburgh University Press, 1969.

MI5Meltzer, B.and D.Michie (Eds.).MachineIntelligence5.Edinburgh: Edinburgh University Press, 1970.

MI6Meltzer, B.and D.Michie (Eds.).MachineIntelligence6.Edinburgh: Edinburgh University Press,1971.

M17Meltzer, B.and D.Michie (Eds.).MachineIntelligence 7.Edinburgh: Edinburgh University Press, 1972.

PCVWinston, P. H. (Ed.). The Psychologyof Computer Vision.New York:McGrawHill, 1975.

PRIPIEEE Computer Society Conference on Pattern Recognition and ImageProcessing, Chicago, August 1979.

Mnemonics

1

ComputerVisionIssues

1.1 ACHIEVINGSIMPLEVISIONGOALS

SupposethatyouaregivenanaerialphotosuchasthatofFig.1.1aandaskedtolocateshipsinit.Youmayneverhaveseenanavalvesselinanaerialphotographbefore, butyouwillhavenotroublepredictinggenerallyhowshipswillappear.Youmightreasonthatyouwillfindnoshipsinland,andsoturnyourattentiontooceanareas.Youmightbemomentarilydistractedbytheglareonthewater,butrealizingthat itcomesfrom reflected sunlight, you perceive theocean ascontinuous andflat.Shipsontheopenoceanstandouteasily (ifyouhaveseenshipsfrom theair,youknowtolookfortheirwakes).Neartheshoretheimageismoreconfusing,butyouknowthatshipsclosetoshoreareeithermooredordocked.Ifyouhaveamap(Fig.1.1b),itcanhelplocatethedocks (Fig.1.1c);inalowqualityphotograph itcanhelpyou identify theshoreline. Thus itmightbeagoodinvestment ofyourtime toestablish thecorrespondence between themapand the image.Asearchparalleltotheshoreinthedockareasrevealsseveralships(Fig.1.Id).

Again, suppose that you are presentedwith aset ofcomputeraided tomographic (CAT) scansshowing"slices"ofthehumanabdomen (Fig.1.2a).Theseimagesareproductsofhigh technology,andgiveusviewsnotnormallyavailableevenwithxrays.Yourjob isto reconstruct from thesecrosssections the threedimensional shape of the kidneys. Thisjob maywell seem harder thanfindingships.Youfirstneedtoknowwhattolookfor (Fig.1.2b),wheretofinditinCATscans,andhowitlooksinsuchscans.Youneedtobeableto"stackup"thescansmentallyandform aninternalmodeloftheshapeofthekidneyasrevealedbyitsslices(Fig.1.2cand1.2d).

Thisbookisaboutcomputervision.Thesetwoexampletasksaretypicalcom

1

ComputerVision

puter vision tasks; both were solvedbycomputers usingthesortsofknowledgeand techniques alluded tointhedescriptive paragraphs. Computer vision is theenterpriseofautomatingandintegratingawiderangeofprocessesandrepresentations used forvision perception. It includes asparts many techniques thatareuseful by themselves, such as imageprocessing(transforming, encoding, andtransmitting images) andstatisticalpatternclassification (statistical decision theoryapplied togeneral patterns, visualorotherwise).More importantly forus,it includestechniquesforgeometricmodelingandcognitiveprocessing.

1.2 HIGHLEVELANDLOWLEVEL CAPABILITIES

TheexamplesofSection1.1illustratevisionthatusescognitiveprocesses,geometricmodels,goals, andplans.These highlevelprocessesareveryimportant;ourexamples only weakly illustrate their power andscope.There surely would besomeoverall purpose tofindingships; theremightbecollateral information that therewere submarines, barges, orsmall craft intheharbor, andsoforth. CATscanswouldbeusedwithseveraldiagnosticgoalsinmindandanassociatedmedicalhistory available. Goals andknowledge arehighlevel capabilities that canguidevisualactivities,andavisualsystemshouldbeabletotakeadvantageofthem.

(a) (b)Fig. 1.1 Finding ships inanaerial photograph, (a)Thephotograph; (b)a correspondingmap; (c)thedock areaofthephotograph; (d)registered mapand image,with shiplocation.

2 Ch. 1 Computer Vision

Fig. 1.1 (cont.)

Even such elaborated tasks are very special ones and in their way easier tothink about than the commonplace visual perceptions needed to pick up a baby,cross a busy street, or arrive at a party and quickly " s e e " who you know, yourhost's taste in decor, and how long the festivities have been going on. All thesetasks require judgment and large amounts of knowledge of objects in the world,how they look, and how they behave. Such highlevel powers are so well integrated into"vision" astobeeffectively inseparable.

Knowledge and goals areonly part ofthe visionstory. Vision requires manylowlevelcapabilities weoften take for granted; for example, our ability to extractintrinsicimagesof "lightness," "color," and "range." We perceive black as blackin a complex scene even when the lighting is such that some black patches arereflecting more light than somewhite patches. Similarly, perceived colorsare notrelated simply to the wavelengths of reflected light; if they were, wewould consciously seecolors changing with illumination. Stereo fusion (stereopsis) isa lowlevel facility basictoshortrange threedimensional perception.

An important lowlevel capability isobjectperception:for our purposes it doesnot reallymatter ifthistalent isinnate, ("hardwired"), orifitisdevelopmental oreven learned ("compiledin"). Thefact remains thatmature biologicalvisionsystems are specialized and tuned to deal with the relevant objects in their environ

Sec.1.2 HighLevel and LowLevel Capabilities 3

Fig. 1.2 Findingakidney inacomputeraided lomographicscan, (a)Onesliceofscan data;(b)prototypekidneymodel; (c)model fitting; (d) resultingkidneyand spinalcord instances.

ments.Further specializationcanoften belearned, butitisbuiltonbasicimmutableassumptionsabouttheworldwhichunderliethevisionsystem.

Abasicsortofobject recognitioncapability isthe"figure/ground" discrimination that separates objects from the "background." Other basicorganizationalpredispositions are revealed by the "Gestalt laws" ofclustering, which demonstrate rules our vision systems use to form simple arrays of stimuli into morecoherent spatial groups.Adramatic example ofspecialized object perception for


humanbeingsisrevealed inour"face recognition"capability,whichseemstooccupyalargevolumeofbrainmatter.Geometricvisualillusionsaremoresurprisingsymptomsofnonintuitive processingthat isperformed byourvisionsystems,eitherforsomedirectpurposeorasasideeffect ofits specializedarchitecture.Someother illusions clearly reflect the intervention of highlevel knowledge. For instance, thefamiliar "Neckercubereversal"isgrounded inour threedimensionalmodelsforcubes.

Lowlevelprocessingcapabilitiesareelusive;theyareunconscious,andtheyare not well connected to other systems that allow direct introspection. For instance,ourvisualmemoryfor imagesisquiteimpressive,yetourquantitativeverbal descriptions of images are relatively primitive. The biological visual"hardware"hasbeendeveloped, honed,andspecialized overavery longperiod.However, itsorganization and functionality isnot well understood except atextreme levelsofdetailandgeneralitythe behaviorofsmallsetsofcatormonkeycorticalcellsandthebehaviorofhumanbeingsinpsychophysicalexperiments.

Computer vision isthus immediately faced with averydifficult problem; itmust reinvent, withgeneral digital hardware, themost basicandyet inaccessibletalentsofspecialized, parallel,andpartlyanalogbiologicalvisual systems.Figure1.3maygiveafeelingfor theproblem;itshowstwovisualrenditionsofafamiliarsubject.Theinsetisanormalimage,therestisaplotoftheintensities (graylevels)intheimageagainst theimagecoordinates.Inotherwords,itdisplays information

F1 ' ' ' '

/ " " ' ' . ' ? I k. ; . , ' %

. . . \ ^ r : ; ' ; ''" f.'.K:i #*&'" ' ' > iK

' , . : \ v.. !' Vs

.' " "V. .

Fig. 1.3 Tworepresentationsofanimage.Oneisdirectlyaccessibletoourlowlevelprocesses;theotherisnot.

Sec.7.2 HighLevel and LowLevel Capabilities 5

with "height" instead of "light." No information is lost, and the display isanimagelikeobject,butwedonotimmediatelyseeafaceinit.Theinitialrepresentation the computer has to work with is no better; it is typically just an array ofnumbers from which human beings could extract visual information only verypainfully. Skipping the lowlevel processing we take for granted turns normallyeffortless perceptionintoaverydifficult puzzle.

Computer vision is vitally concerned with both lowlevel or "early processing" issuesandwith thehighlevel and "cognitive" useofknowledge.Wheredoesvision leaveoff and reasoning andmotivation begin?Wedonot knowprecisely, butwefirmly believe (andhope toshow) that powerful, cooperating, richrepresentationsoftheworldareneeded for anyadvancedvisionsystem.Withoutthem, nosystemcanderiverelevantand invariant information from input thatisbesetwitheverchanging lighting and viewpoint, unimportant shape differences,noise,andother largebutirrelevantvariations.Theserepresentationscanremovesomecomputational loadbypredictingorassumingstructurefor thevisualworld.

Finally, ifasystem is to be successful in avariety of tasks, itneeds some"metalevel"capabilities:itmustbeabletomodelandreasonaboutitsowngoalsand capabilities, and the success of its approaches. These complex and relatedmodelsmustbemanipulatedbycognitiveliketechniques,eventhoughintrospectivelytheperceptualprocessdoesnotalways"feel" touslikecognition.

ComputerVisionSystems

1.3 ARANGEOFREPRESENTATIONS

Visualperception istherelationofvisualinputtopreviouslyexistingmodelsoftheworld. There is a large representational gap between the image and the models("ideas," "concepts")whichexplain,describe,orabstracttheimageinformation.Tobridgethatgap,computervisionsystemsusuallyhavea(looselyordered)rangeofrepresentationsconnecting theinputandthe"output" (afinaldescription,decision,orinterpretation). Computer vision then involvesthedesignofthese intermediate representations and the implementation ofalgorithms toconstruct themandrelatethemtooneanother.

We broadly categorize the representations into four parts (Fig. 1.4) whichcorrespondwith theorganization of thisvolume. Withineachpart theremaybeseverallayersofrepresentation, orseveralcooperatingrepresentations.Althoughthesetsofrepresentations arelooselyordered from "early"and "lowlevel"signalsto "late" and "cognitive'''' symbols, the actualflowofeffort and informationbetween them isnot unidirectional.Ofcourse, not all levels need to be used ineach computer vision application; some may be skipped, or the processingmaystartpartwayupthehierarchyorendpartwaydownit.

Generalizedimages (PartI) are iconic (imagelike) andanalogicalrepresentations of the input data. Images may initially arise from several technologies.


Fig. 1.4 Examplesofthefour categoriesofrepresentation usedincomputervision, (a)Iconic;(b)segmented; (c)geometric;(d)relational.

Domainindependent processing can produce other iconic representations moredirectly useful to later processing, such as arrays of edgeelements (grayleveldiscontinuities).Intrinsicimagescansometimesbeproducedatthisleveltheyrevealphysical propertiesofthe imagedscene (suchassurface orientations, range,or surface reflectance). Often parallelprocessingcanproducegeneralized images.More generally, most "lowlevel" processes can be implemented with parallelcomputation.

Segmentedimages(PartII)areformed from thegeneralizedimagebygathering its elements into sets likely to be associated with meaningful objects in thescene.Forinstance,segmentingasceneofplanarpolyhedra (blocks)might resultin a set of edgesegmentscorresponding to polyhedral edges, or a set of two

Sec.1.3 A Range ol Representations 7

Garage ( )Bushes ( ) Grass ( j House ( jSky ( j Tree1 ( j Tree2

6 Side2

Fig. 1.4 (cont.)dimensional regionsin theimagecorresponding topolyhedral faces.Inproducingthesegmentedimage,knowledgeabouttheparticulardomainatissuebeginstobeimportantbothtosavecomputationandtoovercomeproblemsofnoiseandinadequatedata.Intheplanarpolyhedralexample,ithelpstoknowbeforehand that thelinesegmentsmustbestraight. Textureandmotionareknowntobeveryimportantin segmentation, and arecurrently topicsofactive research; knowledge in theseareasisdevelopingveryfast.

Geometricrepresentations (PartIII) areusedtocapturetheallimportant idea8 Ch. 7 Computer Vision

of twodimensional and threedimensional shape. Quantifying shape isas importantasitisdifficult. Thesegeometricrepresentationsmustbepowerfulenoughtosupport complex and general processing, such as "simulation" of the effects oflightingandmotion. Geometricstructures areasuseful for encoding previouslyacquiredknowledgeastheyareforrerepresentingcurrentvisualinput.Computervisionrequiressomebasicmathematics;Appendix 1hasabriefselectionofusefultechniques.

Relationalmodels(PartIV)arecomplexassemblagesofrepresentationsusedto support sophisticated highlevel processing. An important tool inknowledgerepresentation issemanticnets,whichcanbeusedsimplyasanorganizationalconvenience or asa formalism in their own right. Highlevel processing often usespriorknowledgeandmodelsacquired priortoaperceptualexperience.Thebasicmodeofprocessing turns from constructingrepresentations tomatchingthem.Athighlevels,propositionalrepresentationsbecomemoreimportant. Theyaremadeupofassertionsthataretrueorfalsewithrespecttoamodel,andaremanipulatedby rules of inference.Inferencelike techniques can also be used forplanning,which models situations and actions through time,and thus must reason abouttemporally varying and hypothetical worlds.The higher the level of representation, themoremarked isthe flowof control(direction ofattention, allocationofeffort) downwardtolowerlevels,andthegreaterthetendencyofalgorithmstoexhibit serialprocessing. These issues of control are basic to complex informationprocessingingeneralandcomputervisioninparticular;Appendix2outlinessomespecificcontrolmechanisms.

Figure 1.5 illustratesthe looseclassification ofthe four categoriesintoanalogicalandpropositional representations.Weconsidergeneralizedandsegmentedimagesaswellasgeometricstructures tobeanalogicalmodels.Analogicalmodelscapture directly the relevant characteristics of the represented objects, and aremanipulatedandinterrogated bysimulationlikeprocesses.Relationalmodelsaregenerallyamixof analogical and propositional representations.Wedevelop thisdistinctioninmoredetailinChapter10.

1.4 THEROLEOFCOMPUTERS

Thecomputerisacongenialtoolforresearchintovisualperception. Computers are versatile and forgiving experimental subjects.They are easily

andethicallyreconfigurable, notmessy,andtheirworkingscanbescrutinizedinthefinestdetail.

Computers aredemanding critics.Imprecision, vagueness,andoversightsarenottoleratedinthecomputerimplementationofatheory.

Computers offer newmetaphors for perceptual psychology (also neurology,linguistics,andphilosophy).Processesandentitiesfromcomputerscienceprovide powerful and influential conceptual tools for thinking about perceptionandcognition.

Computers cangiveprecisemeasurements of the amount ofprocessing they

Sec. 1.4 TheRole of Computers 9

Knowledgebase

AiJogicalmodels

Analogicalprooositional

models

Generalizedimage

Geometricstructures

Relationalstructures

Fig. 1.5 The knowledge baseofacomplex computer vision system, showingfour basicrepresentationalcategories.

Table1.1

EXAMPLESOFIMAGE ANALYSIS TASKS

Domain Objects Modality Tasks Knowledge Sources

Robotics Threedimensional Light Identify or describeoutdoor scenes Xrays objects in sceneindoor scenes Industrial tasks

Mechanical parts LightStructured light

Aerial images Terrain Light Improved imagesBuildings,etc. Infrared Resource analyses

Radar Weather predictionSpyingMissileguidanceTactical analysis

Astronomy Stars Light Chemical compositionPlanets Improved images

MedicalMacro

Micro

Chemistry

Body organs

CellsProtein chainsChromosomes

Molecules

Neuroanatomy Neurons

Physics Particle tracks

XraysUltrasoundIsotopesHeatElectronmicroscopyLight

Electron densities

LightElectronmicroscopy

Light

Diagnosis of abnormalities

Operative and treatmentplanning

Pathology, cytologyKaryotyping

Analysis ofmolecularcompositions

Determination ofspatial orientation

Find newparticlesIdentify tracks

Models of objectsModelsofthereflection of

light from objects

MapsGeometrical models of shapesModels ofimage formation

Geometrical models of shapes

Anatomical modelsModels ofimage formation

Models of shape

Chemical modelsStructured models

Neural connectivity

Atomic physics

do.Acomputerimplementationplacesanupperlimitontheamountofcomputationnecessaryforatask.

Computersmaybeusedeithertomimicwhatweunderstandabouthumanperceptualarchitectureandprocesses,ortostrikeout indifferent directionstotrytoachievesimilarendsbydifferentmeans.

Computer modelsmay bejudged either by their efficacy for applications andonthejob performance or by their internal organization, processes, andstructuresthetheorytheyembody.

1.5 COMPUTERVISIONRESEARCHANDAPPLICATIONS

"Pure" computer visionresearchoften dealswithrelatively domainindependentconsiderations.Theresultsareuseful inabroadrangeofcontexts.Almostalwayssuchworkisdemonstrated inoneormoreapplicationsareas,andmoreoften thannotaninitialapplicationproblemmotivatesconsiderationofthegeneralproblem.Applicationsofcomputervisionareexciting,andtheirnumber isgrowingascomputervisionbecomesbetterunderstood.Table1.1givesapartiallistof"classical"andcurrentapplicationsareas.

Within the organization outlined above, this book presents many specificideasandtechniqueswithgeneralapplicability.Itismeanttoprovideenoughbasicknowledgeandtoolstosupportattacksonbothapplicationsandresearchtopics.


GENERALIZEDIMAGES

Thefirststep in thevision process isimage formation. Imagesmayarise fromavariety of technologies. For example, most televisionbased systems convertreflected lightintensity intoanelectronicsignalwhich isthendigitized;othersystemsusemoreexoticradiations,suchasxrays, laserlight,ultrasound, andheat.Thenetresultisusuallyanarrayofsamplesofsomekindofenergy.

Thevision systemmaybeentirely passive, takingasinput adigitized imagefrom amicrowave or infrared sensor, satellite scanner, oraplanetary probe, butmore likely involvessomekindofactive imaging. Automated activeimagingsystemsmay control the direction and resolution ofsensors, or regulate and directtheir own light sources.The light source itself may have special properties andstructuredesignedtorevealthenatureofthethreedimensionalworld;anexampleistouseaplaneoflightthatfallsonthesceneinastripewhosestructureiscloselyrelated to the structure ofopaqueobjects.Range data for the scenemaybeprovided by stereo (two images), but also by triangulation using lightstripe techniquesorby"spotranging"usinglaserlight.Asinglehardwaredevicemaydeliverrange and multispectral reflectivity ("color") information. The imageformingdevicemayalsoperform variousotheroperations. Forexample,itmayautomaticallysmoothorenhancetheimageorvaryitsresolution.

Thegeneralizedimageisasetofrelated imagelikeentitiesforthescene.Thissetmayinclude related imagesfrom severalmodalities,butmayalsoinclude theresultsofsignificant processingthatcanextract intrinsicimages.Anintrinsicimageisan"image,"orarray,ofrepresentationsofanimportant physicalquantitysuchassurface orientation, occludingcontours,velocity,orrange.Objectcolor,whichis a different entity from sensed redgreenblue wavelengths, is an intrinsicquality.Theseintrinsicphysicalqualitiesareextremelyuseful; theycanberelatedtophysicalobjects farmoreeasilythan theoriginalinput values,whichreveal thephysicalparametersonlyindirectly.Anintrinsicimageisamajorsteptowardsceneunderstandingandusuallyrepresentssignificantandinterestingcomputations.

PartI Generalized Images

Theinformation necessary tocomputeanintrinsicimageiscontained intheinput imageitself, andisextractedby"inverting" thetransformation wroughtbytheimagingprocess, thereflection ofradiationfrom thescene,andotherphysicalprocesses.Anexampleisthefusion oftwostereoimagestoyieldanintrinsicrangeimage.Many algorithms to recover intrinsic imagescan berealized withparallelimplementations,mirroringcomputations thatmaytakeplaceinthelowerneurologicallevelsofbiologicalimageprocessing.

Allofthecomputationslistedabovebenefit from theideaofresolutionpyramids.Apyramidisageneralizedimagedatastructureconsistingofthesameimageatseveralsuccessively increasinglevelsofresolution.Astheresolution increases,moresamplesare required torepresent the increased information andhence thesuccessive levels are larger, making the entire structure look like a pyramid.Pyramidsallowtheintroductionofmanydifferent coarsetofine imageresolutionalgorithmswhicharevastlymoreefficient thantheirsinglelevel, highresolutiononlycounterparts.

Part I Generalized Images 15

ImageFormation 2

2.1 IMAGES

Imageformation occurswhenasensorregisters radiation thathasinteracted withphysicalobjects. Section2.2dealswithmathematical modelsofimagesand imageformation.Section2.3describesseveralspecificimageformation technologies.

Themathematicalmodelofimaginghasseveraldifferent components.1. Animagefunctionisthefundamental abstractionofanimage.2. Ageometricalmodeldescribeshowthreedimensionsareprojectedintotwo.3. A radiometricalmodelshows how the imaging geometry, light sources, and

reflectancepropertiesofobjectsaffect thelightmeasurementatthesensor.4. A spatialfrequency modeldescribeshowspatial variationsoftheimagemay

becharacterizedinatransformdomain.5. Acolormodeldescribeshowdifferent spectralmeasurementsarerelatedtoim

agecolors.6. Adigitizingmodeldescribestheprocessofobtainingdiscretesamples.

This material forms the basis of much imageprocessing work and isdeveloped inmuch more detail elsewhere, e.g., [Rosenfeld andKak 1976;Pratt1978].Ourgoalsarenotthoseofimageprocessing,sowelimitourdiscussiontoasummaryoftheessentials.

The wide range of possible sources of samples and the resulting differentimplications for later processingmotivate our overviewofspecific imaging techniques.Ourgoalisnottoprovideanexhaustivecatalog,butrathertogiveanideaof the range of techniques available.Very different analysis techniques may beneeded depending on how the imagewasformed. Twoexamples illustrate this

17

point.Iftheimageisformedbyreflected lightintensity,asinaphotograph,theimage records both light from primary light sources and (more usually) the lightreflected offphysicalsurfaces.WeshowinChapter 3that incertaincaseswecanuse these kinds of images together with knowledge about physics to derive theorientation ofthesurfaces. If, on theotherhand, the imageisacomputed tomogramofthehumanbody (discussed inSection2.3.4), theimagerepresents tissuedensityofinternalorgans.Hereorientationcalculationsareirrelevant,butgeneralsegmentation techniques ofChapters 4and 5 (theagglomeration ofneighboringsamplesofsimilardensityintounitsrepresentingorgans)areappropriate.

2.2 IMAGEMODEL

Sophisticated imagemodels of astatistical flavor are useful in image processing[Jan1981].Hereweareconcernedwithmoregeometricalconsiderations.

2.2.1 Image Functions

Animagefunctionisamathematical representationofanimage. Generally,animagefunction isavectorvaluedfunction ofasmallnumberofarguments.Aspecialcaseofthe imagefunction isthe digital (discrete) imagefunction,where theargumentstoandvalueofthefunction areallintegers.Different imagefunctionsmaybeusedtorepresent thesameimage,dependingonwhichofitscharacteristicsareimportant. For instance, a camera produces an image on blackandwhitefilmwhich isusually thought of asarealvalued function (whosevalue could be thedensityofthephotographic negative) oftworealvalued arguments,oneforeachof twospatial dimensions.However, at avery small scale (the order of thefilmgrain)thenegativebasicallyhasonlytwodensities,"opaque"and "transparent."

Most images are presented by functions of two spatial variablesfix) =fix, y), wherefix, y) isthebrightnessofthegrayleveloftheimageataspatialcoordinate ix,y). Amultispectral imagefisavectorvalued function withcomponents if].. ,f). Onespecialmultispectral imageisacolorimageinwhich,for example, the components measure the brightness values of each of threewavelengths,thatis,

/CO J red( x ) >Jblue( x ) '/green ( x )

Timevarying images fix,t) have an added temporal argument. For specialthreedimensional images,x= ix,y, z). Usually,both thedomainandrange of/arebounded.

An important part of the formation process is theconversion of the imagerepresentation from acontinuous function toadiscrete function; weneed somewayofdescribingtheimagesassamplesatdiscretepoints.Themathematical toolweshalluseisthedeltafunction.

Formally,thedeltafunctionmaybedefinedby

18 Ch. 2 Image Formation

8(x)= 0whenx^0ooWhenx=0 ( 2 1 )

J8(x)dx= 1Ifsomecareisexercised, thedeltafunctionmaybeinterpretedasthelimitofasetoffunctions:

8(x) = lim 80c)n

wb^re

8Ax) /I ifUI

Fig. 2.1 Ageometric camera model.

Thecaseforx' istreatedsimilarly:

xif

(2.5)fzTheprojected imagehasz= 0everywhere.However, projecting awaythezcomponent is best considered a separate transformation; the projective transform isusuallythoughttodistortthezcomponentjustasitdoesthexand^.Perspectivedistortionthusmaps(x,y, z) to

be',y\ z')= fx fy hfz' fz' fz (2.6)Theperspective transformation yieldsorthographicprojectionasaspecialcase

whentheviewpointisthepointatinfinityinthezdirection.Thenallobjectsareprojectedontotheviewingplanewithnodistortionoftheirxand^coordinates.

The perspective distortion yields a threedimensional object that has been"pushed out ofshape"; it ismore shrunken the farther it isfrom theviewpoint.The zcomponent isnot available directly from a twodimensional image, beingidentically equal tozero. In ourmodel, however, thedistorted zcomponent hasinformation about the distance of imaged points from the viewpoint.When thisdistorted object isprojected orthographically onto the imageplane, theresult isaperspectivepicture.Thus,toachievetheeffect ofrailroadtracksappearingtocometogether in the distance, the perspective distortion transforms the tracksso thatthey docometogether (atapointatinfinity)!Thesimpleorthographic projectionthat projects away the z component unsurprisingly preserves this distortion.Severalpropertiesoftheperspectivetransform areofinterestandareinvestigatedfurther inAppendix1.

BinocularImagingBasicbinocular imaginggeometry isshown in Fig.2.3a. For simplicity,we


-4r^

(*, / ,2

\x',y

(b)

Fig. 2.2 (a)Cameramodelequivalent tothatofFig.2.1; (b)definitionofterms.

useasystemwithtwoviewpoints.Inthismodel theeyesdonotconverge; theyareaimedinparallelatthepointatinfinity inthezdirection.Thedepth informationaboutapoint isthen encoded only byitsdifferent positions {disparity) inthetwoimageplanes.

WiththestereoarrangementofFig.2.3,

x, _ jx d)ffz

....

x' =0

x=0

x " =0

Fig. 2.3 A nonconvergent binocularimagingsystem.

througheacheye.Thebaselineofthebinocularsystemis2d.Thus(/ z)x' = (x d)f (2.7)(f z)x" = (x +d)f (2.8)

Subtracting (2.7)from (2.8)gives(fz)(x"x') = 2df

or

2= / }df , (2.9)x X

Thus ifpointscanbematched todetermine thedisparity Cx" x') and thebaselineandfocallengthareknown,thezcoordinateissimpletocalculate.

Ifthesystemcanconvergeitsdirectionsofviewtoafinitedistance,convergence anglemayalso be used tocompute depth.The hardest part of extractingdepth information from stereo isthematchingofpointsfor disparity calculations."Lightstriping"isawaytomaintaingeometricsimplicityandalsosimplifymatching(Section2.3.3).2.2.3 Reflectance

TerminologyAbasicaspectofthe imagingprocess isthephysicsofthe reflectance ofob

jects,whichdetermineshowtheir "brightness" inanimagedependson their inherentcharacteristicsandthegeometryoftheimagingsituation.Aclearpresentation of themathematics ofreflectance isgiven in [HornandSjoberg 1978;Horn1977].Light energy flux ismeasured inwatts; "brightness" ismeasured withrespecttoareaandsolidangle. TheradiantintensityIofasourceistheexitantfluxperunitsolidangle:

/ = watts/steradian (2.10)do)

_

f ^Imageplane


Here dco isan incremental solidangle.The solidangleofasmall area dAmeasuredperpendicular toaradius /isgivenby

da>= 4* (2.11)r

inunitsofsteradians. (Thetotalsolidangleofasphere is 4ir.)The irradianceisfluxincidentonasurface element dA:

E = ^ watts/meter2 (2.12)dA

and the flux exitant from thesurface isdefined in terms of the radianceL, which isthefluxemitted perunit foreshortened surface areaperunitsolidangle:

L = watts/(meter2 steradian) (2.13)dA cos9do>

where9istheanglebetween thesurface normaland thedirection ofemission.Image irradiancef is the "brightness" of the image atapoint, and is propor

tional toscene radiance.A"graylevel" isaquantizedmeasurement ofimage irradiance. Image irradiance depends on the reflective properties of the imaged surfaces as well as on the illumination characteristics. How a surface reflects lightdepends on its microstructure and physical properties. Surfaces may be matte(dull, flat), specular (mirrorlike), or have more complicated reflectivity characteristics (Section 3.5.1).The reflectancerofasurface isgiven quitegenerally byitsBidirectional Reflectance Distribution Function (BRDF) [Nicodemusetal.1977].The BRDF is the ratio of reflected radiance in the direction towards the viewer totheirradianceinthedirection towardsasmallareaofthesource.

EffectsofGeometryonanImaging SystemLetusnowanalyzeasimpleimageforming systemshown inFig.2.4with the

objective ofshowing how thegray levels are related to the radiance of imaged objects.Following [HornandSjoberg 1978],assume that the imagingdevice isproperly focused; rays originating in the infinitesimal area dA0 on the object's surfaceare projected into some area dAp in the image plane and no rays from other portionsof theobject's surface reach thisareaof the image.The system isassumed tobeanidealone,obeying thelawsofsimplegeometrical optics.

,The energy flux/unit area that impingeson thesensor isdefined tobeEp. Toshow how Ep is related to the scene radiance L, first consider the flux arriving atthe lensfrom asmallsurface area dA0. From (2.13) thisisgivenas

d$> = dA0JLcos9doj (2.14)Thisfluxisassumed toarriveatanareadAp intheimagingplane.Hence the irradianceisgivenby [usingEq. (2.12)]

E, = (2.15)dAp

Now relate dA0 to dAp by equating the respective solid angles as seen from thelens; that is [makinguseofEq. (2.12)],

Sec. 2.2 Image Model 23

Fig. 2.4 Geometryofanimageforming system.

, . cosfl , . cos adA0 r2 = dAp Jo Jp

SubstitutingEqs.(2.16)and(2.14)into(2.15)gives

E= cosa fofp

JLdc

(2.16)

(2.17)

Theintegralisover thesolidangleseenbythelens.Inmost instanceswecanassume that L isconstant over this angle and hencecan be removed from the integral.Finally,approximatedoi bytheareaofthelensforeshortened bycosa, thatis, (TT/4)D2 cosa dividedbythedistance/0/cosa squared:

doi = JLD2 cpVa4 fa2

sothatfinally

= 4

D_fp

cos4air L

(2.18)

(2.19)

Theinterestingresultsherearethat (1) theimageirradianceisproportionaltothesceneradianceL,and (2)thefactorofproportionality includesthefourthpowerofthe offaxis angle a. Ideally, an imaging device should be calibrated so that thevariationinsensitivityasafunctionofa isremoved.

2.2.4 SpatialProperties

TheFourierTransformAnimageisaspatiallyvaryingfunction.Onewaytoanalyzespatialvariations

isthedecomposition ofanimagefunction intoasetoforthogonal functions, onesuchset being theFourier (sinusoidal) functions. TheFourier transform maybeusedtotransform theintensity imageintothedomainofspatialfrequency.Forno

24 Ch.2 ImageFormation

tational convenienceandintuition,weshallgenerally useasanexamplethecontinuousonedimensionalFouriertransform.Theresultscanreadilybeextendedtothediscretecaseandalsotohigherdimensions [Rosenfeld andKak1976].Intwodimensions we shall denote transform domain coordinates by iu, v).The onedimensionalFouriertransform, denoted CS,isdefinedby

3=[fix)]=Fiu)where

+ 0 0

Fiu) = ffix)expij2irux)dx (2.20)00

wherej =V( l ) . Intuitively, Fourier analysis expresses afunction asasumofsinewavesofdifferent frequency andphase. TheFourier transform hasaninverse~

l[F(u)] =fix). Thisinverseisgivenby

fix) = f F(u) exp(JITTUX) du (2.21)00

Thetransformhasmanyusefulproperties,someofwhicharesummarizedinTable2.1.CommononedimensionalFouriertransformpairsareshowninTable2.2.

The transform Fin) issimplyanotherrepresentation oftheimage function.ItsmeaningcanbeunderstoodbyinterpretingEq. (2.21) foraspecificvalueofx,say x0:

fixo) =JV()exp (j2irux0)du (2.22)Thisequationstatesthataparticularpointintheimagecanberepresentedby

aweightedsumofcomplex exponentials (sinusoidal patterns) atdifferent spatialfrequencies u.Fiu) isthusaweightingfunctionforthedifferent frequencies.Lowspatialfrequenciesaccountfor the"slowly"varyinggraylevelsinanimage,suchas the variation of intensity over a continuous surface. Highfrequency componentsareassociatedwith"quicklyvarying"information, suchasedges.Figure2.5showstheFouriertransformofanimageofrectangles,togetherwiththeeffectsofremovinglowandhighfrequency components.

The Fourier transform is defined above to be a continuous transform.Althoughitmaybeperformed instantlybyoptics,adiscreteversionofit,the "fastFourier transform," isalmostuniversallyusedinimageprocessingandcomputervision.Thisisbecauseoftherelativeversatilityofmanipulating thetransform inthedigitaldomainascomparedtotheopticaldomain.Imageprocessingtexts,e.g.,[Pratt1978;GonzalezandWintz1977]discusstheFFTinsomedetail;wecontentourselveswithanalgorithmforit(Appendix1).

TheConvolutionTheoremConvolution is avery important imageprocessing operation, and is a basic

operation oflinearsystemstheory.Theconvolutionoftwofunctions / andgisafunction hofadisplacementydefinedas

hiy)=f*g = jfix)giyx)dx (2.23)Sec.2.2 ImageModel 2 5

Table 2.1

PROPERTIESOFTHE FOURIER TRANSFORM

Spatial Domain FrequencyDomain

fix)gix)

Fiu)=(J[fix)]Giu)=(J[gix)}

(1)

(2)

(3)

(4)

(5)

(6)

(7)

LinearityC\fix) + c2gix)C\,C2scalarsScaling

fiax)Shiftingfix Xo)SymmetryFix)Conjugationf*ix)Convolution

nix) =f*g = J fix')gix x') dx'Differentiationd"fix)dx"

dFiu) + c2Giu)

\a\ [a]

e2wJx

Fiu)

/ ( )

F*i~u)

Fiu)Giu)

i2trju)"Fiu)

Parseval's theorem:

jjfix)\2dx =}\F(Z)\2dtJfix)g*ix) dx=jFi)G*it) d$fix) Fit)

ReaK/?) Real parteven (RE)Imaginarypartodd (10)

Imaginary (I) RO,IERE,IO RRE,IE IRE RERO 10IE IE10 RO

Complexeven (CE) CECO CO


Table2.2

FOURIER TRANSFORM PAIRS

f{x)

Rectanglefunction1

2 Rect(x) 2

Trianglefunction1

1 12 2

Gaussian

Unit impulse 8{x)

Unitstep

F(%)

Sinc(g) =

Sine2 (%)

2 27T/|

Sec.2.2 Image Model 27

Table 2.2 (cont.)

Comb function

urnill2 6(x nx0

n = oe>

2xQ x0 x0 2x

cos 2iroj0x

sin 27rcjx

2 (f^

t_LLJJ^2 H

i [ 5 ( ? w 0 ) +5(?+ w0)

5 / [ 8 (? w 0 ) +5{f+w0)]

ImF

Intuitively,onefunction is"sweptpast" (inonedimension)or"rubbedover" (intwodimensions) theother.Thevalueoftheconvolutionatanydisplacement istheintegraloftheproductofthe (relativelydisplaced) function values.Onecommonphenomenon that iswellexpressed byaconvolution istheformation ofanimagebyan optical system.The system (sayacamera) has a"pointspread function,"whichistheimageofasinglepoint. (Inlinearsystemstheory,thisisthe "impulseresponse,"orresponsetoadeltafunction input.)Theidealpointspread functionis,ofcourse,apoint.Atypicalpointspread function isatwodimensionalGaussian spatial distribution of intensities, but may include such phenomena asdiffraction rings.Inanyevent, ifthecamera ismodeledasalinearsystem (ignor

Fig. 2.5 (on facing page) (a) An image,fix, y). (b) A rotated version of (a), filtered toenhance high spatialfrequencies, (c) Similar to (b), but filtered to enhance low spatial frequencies, (d), (e), and (f) show the logarithm of the power spectrum of (a), (b), and (c).The power spectrum is the logsquare modulus of the Fouriertransform F(u, v).Considered inpolarcoordinates (p,0), pointsofsmallp correspond tolowspatial frequencies("slowlyvarying" intensities), largep to high spatial frequencies contributed by "fast" variations such as stepedges.The power at (p,9) isdetermined bythe amount of intensity variation at thefrequency p occurring at theangle0.


(b)

(d)

;

: ,

(f) n H B H o B B H H H H H

29

ing the added complexity that the pointspread function usually varies over thefield ofview),theimageistheconvolutionofthepointspreadfunctionandtheinputsignal.Thepointspread function isrubbedover theperfect input image, thusblurringit.

Convolution isalsoagood model for the application ofmany other linearoperators,suchaslinedetecting templates.Itcanbeusedinanotherguise (calledcorrelation) toperformmatchingoperations (Chapter3)whichdetectinstancesofsubimagesorfeatures inanimage.

Inthespatialdomain, theobviousimplementationoftheconvolutionoperation involvesashiftmultiplyintegrate operationwhich ishard todo efficiently.However,multiplicationandconvolutionare"transform pairs,"sothatthecalculation of the convolution in one domain (say the spatial) is simplified byfirstFourier transforming totheother (thefrequency) domain,performing amultiplication,andthentransforming back.

Theconvolution of/and gin thespatialdomain isequivalent to the pointwiseproductofFandGinthefrequency domain,

cSif*g) =FG (2.24)Weshallshowthisinamanner similar to [DudaandHart 1973]. Firstweprovetheshifttheorem.IftheFouriertransformof/Gc) isFiu), definedas

F(u) =Jf(x) exp [ j2ir(ux)]dx (2.25)X

then

5 [fix a)] = ffixa) exp [ j2ir(ux)]dx (2.26)X

changingvariablessothatx' =x aanddx=dx'= ff(x') exp {j2ir[u(x' +a)])dx' (2.27)

x'

Nowexp[ jliruix' +a)] = exp ( jlrrua) exp( JITTUX'), where thefirsttermisaconstant.Thismeansthat

3 [fix a)] =exp( jl7rua)Fiu) (shift theorem)Nowwearereadytoshowthat,'y[f(x)*g(x)]= Fiu)Giu).

(Sif*g) =j{j fix)giy x)} exp ( jlituy) dx dy (2.28)y x

= ffix){fgiy x) exp ( jlTTuy) dy)dx (2.29)x y

Recognizingthatthetermsinbracesrepresent ,fy[giy x)] andapplyingtheshifttheorem,weobtain

cS(f*g) =J/Gc)exp ( j2irux)G(u) dx (2.30)X

= Fiu)Giu) (2.31)


2.2.5 Color

Notallimagesaremonochromatic;infact,applicationsusingmultispectralimagesarebecomingincreasinglycommon (Section2.3.2).Further, humanbeingsintuitivelyfeel thatcolorisanimportantpartoftheirvisualexperience,andisusefulorevennecessaryforpowerful visualprocessing intherealworld. Colorvisionprovidesahost ofresearch issues, both forpsychology andcomputer vision. Webriefly discuss two aspects ofcolor vision: color spaces andcolor perception.Severalmodelsofthehumanvisualsystemnotonlyincludecolorbuthaveprovenuseful inapplications [Granrath1981].

ColorSpacesColorspacesareawayoforganizingthecolorsperceivedbyhumanbeings.It

happens thatweightedcombinations ofstimuliatthree principalwavelengthsaresufficient todefinealmostallthecolorsweperceive.Thesewavelengthsformanaturalbasisorcoordinatesystemfromwhichthecolormeasurementprocesscanbedescribed.Color perception isnot related inasimplewaytocolor measurement,however.

Colorisaperceptual phenomenon related tohuman response todifferentwavelengthsinthevisibleelectromagneticspectrum [400 (blue) to700nanometers(red);ananometer (nm) is109meter].The sensation ofcolor arises fromthesensitivitiesofthree typesofneurochemical sensors inthe retina tothe visiblespectrum. The relative responseofthesesensorsisshown inFig.2.6.Note thateach sensor respondstoarangeofwavelengths.The illumination source hasitsown spectral compositionf{k) which ismodified bythe reflecting surface.Letr(k) bethisreflectance function.ThenthemeasurementRproducedbythe "red"sensorisgivenby

R=jf(k)r(k)hR(k) dk (2.32)So thesensor output is actually theintegral of three different wavelengthdependentcomponents:thesource/ , thesurfacereflectance /,andthesensorhR.

Surprisingly,onlyweightedcombinationsofthreedeltafunction approximationstothedifferentf(k)h 00 , thatis,8(A/?), 8(AG),and8(XB),arenecessaryto

i 1

1 P

1

a ^ /

400 500 600Wavelength,nm

700 Fig. 2.6 Spectral responseof humancolor sensors.


producethesensationofnearlyallthecolors.Thisresultisdisplayedonachromaticitydiagram.Suchadiagramisobtainedbyfirstnormalizingthethreesensormeasurements:

Rr =

b=

R +G+BG

R +G+BB

R +G+B

(2.33)

andthenplottingperceivedcolorasafunction ofanytwo(usuallyredandgreen).Chromaticity explicitly ignores intensity orbrightness; it isasection through thethreedimensionalcolorspace(Fig.2.7).Thechoiceof(XR,kG, \B) = (410,530,650)nmmaximizes the realizablecolors, but somecolors still cannot berealizedsincetheywouldrequirenegativevaluesforsomeof/, g,andb.

Anothermore intuitivewayofvisualizingthepossiblecolorsfrom the RGBspaceistoviewthesemeasurementsasEuclideancoordinates.Hereanycolorcanbevisualizedasapoint intheunitcube.Othercoordinate systemsareuseful fordifferent applications;computergraphicshasprovedastrongstimulusfor investigationofdifferent colorspacebases.

ColorPerceptionColor perception is complex, but the essential step is a transformation of

threeinputintensitymeasurements intoanotherbasis.Thecoordinatesofthenew

(a) (b)Fig. 2.7 (a) An artist's conception of the chromaticity diagramsee color insert; (b) amore useful depiction. Spectral colors range along the curved boundary; the straight boundary isthelineofpurples.


basisaremoredirectlyrelatedtohumancolorjudgments.Although theRGBbasisisgoodfor theacquisitionordisplayofcolor infor

mation, itisnotaparticularlygoodbasistoexplain theperceptionofcolors. Human vision systemscanmakegoodjudgments about the relative surface reflectancer(A)despitedifferent illuminatingwavelengths;thisreflectanceseemstobewhatwemeanbysurfacecolor.

Another important feature ofthecolorbasisisrevealedbyanabilitytoperceive in "black andwhite," effectively deriving intensity information from thecolormeasurements. From an evolutionary point ofview,wemight expect thatcolorperceptioninanimalswouldbecompatiblewithpreexistingnoncolorperceptualmechanisms.

These twoneedstheneed tomakegoodcolorjudgments and theneed toretainanduseintensity informationimply thatweuseatransformed, nonRGBbasisforcolorspace.Ofthedifferent basesinuseforcolorvision,allarevariationsonthistheme:Intensityformsonedimensionandcolorisatwodimensionalsubspace.Thedifferences ariseinhowthecolorsubspaceisdescribed.Wecategorizesuchbasesintotwogroups.

1.Intensity/Saturation/Hue(IHS).Inthisbasis,wecomputeintensityasintensity:=R +G+B (2.34)

The saturation measures the lackofwhiteness in thecolor. Colorssuchas "fireengine" redand "grass"greenaresaturated; pastels (e.g.,pinksandpaleblues)aredesaturated.SaturationcanbecomputedfromRGBcoordinatesbytheformula[TenenbaumandWeyl1975]

, 3min (R, G,B) ,~~cxsaturation:= 1

: ^L (2.35)

intensityHue is roughly proportional to the average wavelength of the color. It can bedefinedusingRGBbythefollowingprogram fragment:

hue:= cos l MR G) +(R B))} (2.36)^VCR G)2+ (R B){G BY

UB > Gthenhue:= 2pihueThe IHS basis transforms the RGB basis in the following way. Thinking of thecolor cube, thediagonal from theorigin to (1, 1, 1)becomes the intensityaxis.Saturation isthedistanceofapointfrom thataxisandhueistheanglewithregardtothepointaboutthataxisfromsomereference (Fig.2.8).

This basis isessentially that used byartists [Munsell 1939],who term saturation chroma. Also, this basishasbeen used ingraphics [Smith 1978;JobloveandGreenberg1978].

One problemwith the IHSbasis,particularly asdefined by (2.34) through(2.36),isthatitcontainsessentialsingularitieswhereitisimpossibletodefine thecolor in aconsistent manner [Kender 1976].For example, hue hasan essentialsingularityforallvaluesof(R, G,B), whereR = G=B. Thismeansthatspecialcaremustbetakeninalgorithmsthatusehue.

2.Opponentprocesses.TheopponentprocessbasisusesCartesianrather than

Sec. 2.2 Image Model 33

(a) (b)

8 An IHSColorSpace, (a)Crosssection atone intensity; (b) crosssectionatonehueseecolorinserts.

cylindrical coordinates for the color subspace, and wasfirstproposed byHering[Teevan and Birney 1961].The simplest form ofbasis isalinear transformationfrom R, G,B coordinates. The new coordinates are termed "i? G","Bl r " , a n d " ^ Bk ":

\R GBl YW Bk

Theadvocatesofthisrepresentation, suchas[HurvichandJameson 1957],theorizethat thisbasishasneurological correlatesand isinfact thewayhuman beingsrepresent ("name") colors.Forexample,inthisbasisitmakessensetotalkabouta "reddish blue" but not a"reddish green." Practical opponent process modelsusuallyhavemorecomplexweightsinthetransformmatrixtoaccountforpsychophysical data.Somestartling experiments [Land 1977]showour ability tomakecorrectcolorjudgmentsevenwhen theilluminationconsistsofonlytwoprincipalwavelengths.The opponent process, at the level atwhich wehavedeveloped it,doesnotdemonstratehowsuchjudgmentsaremade,butdoesshowhowstimulusatonlytwowavelengthswillproject intothecolorsubspace.Readersinterestedinthedetailsofthetheoryshouldconsultthereferences.

Commercial television transmissionneedsanintensity,or"W Bk" component forblackandwhitetelevisionsetswhilestillspanningthecolorspace.TheNational Television Systems Committee (NTSC) uses a "YIQ" basis extractedfromRGBvia

1 2 1 R]1 1 2 G1 1 1 B

Ch. 2 Image Formation

0.60 0.28 0.320.21 0.52 0.310.30 0.59 0.11

Thisbasisisaweightedformof(/, Q, Y) = ("Rcyan,""magentagreen, " "WBk")

2.2.6 Digital Images

The digitalimageswithwhichcomputer visiondealsare represented bymvectordiscretevalued imagefunctions / ( x ) , usuallyofone, two,three,or four dimensions.

Usually m = 1,and both the domain and range offix) are discrete. Thedomain of / is finite, usually a rectangle, and the range of / is positive andbounded:0

.

;

Fig. 2.9 Usingdifferent numbersofsamples, (a)A'= 16;(b)N = 32; (c)A' =64; (d)N= 128;(e)A'= 256;(f) N= 512.

Ch.2 ImageFormatio

Table 2.3

NUMBER OF 8BIT BYTESOF STORAGE FORVARIOUS VALUES OF NAND M

Nm

32 64 128 256 512

1 128 512 2,048 8,192 32,7682 256 1,024 4,096 16,384 65,5363 512 2,048 8,192 32,768 131,0724 512 2,048 8,192 32,768 131,0725 1,024 4,096 16,384 65,536 262,1446 1,024 4,096 16,384 65,536 262,1447 1,024 4,096 16,384 65,536 262,1448 1,024 4,096 16,384 65,536 262,144

computer vision tohave tobalance thedesire for increased resolution (bothgrayscaleandspatial) against itscost. Betterdatacanoften makealgorithmseasier towrite,but asmallamount ofdatacanmakeprocessingmoreefficient. Ofcourse,the image domain, choice of algorithms, and image characteristics all heavilyinfluencethechoiceofresolutions.

TesselationsandDistanceMetricsAlthoughthespatialsamplesfor/(x) canberepresentedaspoints,itismore

satisfying totheintuitionandacloserapproximation totheacquisitionprocesstothink of these samplesasfinitesizedcellsofconstant graylevel partitioning theimage.Thesecellsare termedpixels,anacronymforpictureelements. Thepatternintowhich theplaneisdivided iscalled its tesselation.Themostcommon regulartesselationsoftheplaneareshowninFig.2.11.

Although rectangular tesselations arealmost universally used in computervision, they have a structural problem known as the "connectivity paradox."Givenapixelinarectangular tesselation,howshouldwedefinethepixelstowhichit is connected? Two common ways arefourconnectivity and eightconnectivity,showninFig.2.12.

However,eachoftheseschemeshascomplications.ConsiderFig.2.12c,consisting of a black object with a hole on a white background. If we use fourconnectedness, the figure consists of four disconnected pieces, yet the hole isseparated from the "outside" background. Alternatively, if we use eightconnectedness, thefigureisoneconnectedpiece,yettheholeisnowconnected totheoutside.Thisparadoxposescomplicationsformanygeometricalgorithms.Triangular and hexagonal tesselations donot suffer from connectivity difficulties (ifweusethreeconnectedness for triangles);however, distancecanbemore difficulttocomputeonthesearraysthanforrectangulararrays.

Thedistancebetween twopixelsinanimageisanimportantmeasurethatisfundamental tomanyalgorithms.Ingeneral,adistanced\sametric.Thatis,

Sec. 2.2 ImageModel 37

Fig. 2.10 Usingdifferent numbers ofbitsper sample, (a) m= 1; (b) m=2; (c)m = 4; (d) m= 8.

(1) d(x, y)=0i f fx= y(2) d(x, y)=d(y,x)(3) d(x, y)+ /(y, z) >d(\, z)

Forsquarearrayswithunitspacingbetweenpixels,wecanuseanyofthefollowingcommondistancemetrics (Fig.2.13)fortwopixelsx = (x\,yi) and>>=(^J^)

Euclidean:de(x, y) V ( x , x 2 ) 2+ (y]y2)2

Cityblock:

dCb(*>y ) = |jfi*2l'+ \y\y2\

(2.37)

(2.38)


(a)

(c)

Chessboard:

Fig. 2.11 Different tesselationsof theimage plane, (a)Rectangular; (b)triangular; (c) hexagonal.

dch(x,y) =max \x\x2l\y\y2\ (2.39)

Other definitions are possible, and all suchmeasures extend tomultiple dimensions.Thetesselationofhigherdimensionalspaceintopixelsusuallyisconfined to(/7dimensional) cubicalpixels.

TheSamplingTheoremConsider the onedimensional "image" shown inFig.2.14.Todigitize this

imageonemustsampletheimagefunction.Thesesampleswillusuallybeseparatedat regular intervalsasshown. Howfarapart should thesesamplesbetoallowreconstruction (toagivenaccuracy) oftheunderlyingcontinuous imagefrom itssamples?Thisquestion isanswered bytheShannonsamplingtheorem.Anexcellent rigorous presentation of the sampling theorem may befound in [RosenfeldandKak 1976].Hereweshallpresent ashortergraphical interpretation using theresultsofTable2.2.Forsimplicityweconsider theimagetobeperiodicinordertoavoid small edgeeffects introduced bythefiniteimagedomain.Amore rigorous


wm. f 1^ /// ^

^

^ i^ ^ ^ 7

^ mz^^

(b) (0

Fig. 2.12 Connectivity paradox for rectangular tesselations. (a)Acentral pixeland its4connected neighbors; (b) a pixel and its 8connected neighbors; (c)afigurewithambiguousconnectivity.

23232223

2211122

3323

321233210123 32101232211122 3212332223232

3233

(b)

3333333322222332111233210123321112332222233333333

(0

Fig. 2.13 Equidistant contours fordifferentmetrics.

Fig. 2.14 Onedimensionalimageanditssamples.

treatment, which considers these effects, is given in [Andrews and Hunt1977].

Supposethattheimageissampledwitha"comb"functionofspacingxQ(seeTable2.2).Thenthesampledimagecanbemodeledby

fAx) =f(x)^d(x nx0) (2.40)

wheretheimagefunction modulatesthecombfunction. Equivalently, thiscanbewrittenas

fs(x) = f(nx0)8(x nx0) (2.41)

Therighthand sideofEq. (2.40) istheproductoftwofunctions, sothatproperty


(6) inTable2.1isappropriate.TheFouriertransform offs(x) isequaltotheconvolutionofthetransformsofeachofthetwofunctions.Usingthisresultyields

Fsiu) =F(u)*T8(w ) (2.42)

ButfromEq.(2.3),

sothat

* o *o

Fiu) *8(w ~ ) = F(u ) (2.43)X0 XQ

Fs(u) =^Fiu ^) (2.44) *0 ^ o

Therefore, sampling theimagefunctionfix) atintervalsofxoisequivalentinthefrequency domain toreplicatingthetransform of /a t intervalsof . This

XQlimitstherecoveryoffix) from itssampledrepresentation, fs(x). Therearetwobasicsituationstoconsider.Ifthe transform offix) isbandlimitedsuchthat Fiu)= 0for|u|> l/(2x0), thenthereisnooverlapbetweensuccessivereplicationsofFiu) inthefrequency domain. Thisisshownfor thecaseofFig.2.15a,wherewehavearbitrarilyusedatriangularshaped imagetransform toillustratetheeffectsofsampling.Incidentally, note thatfor thistransform Fiu) =F(u) andthatithasno imaginary part; from Table2.2, the onedimensional imagemust alsoberealandeven.NowifF{u) isnotbandlimited, i.e.,thereareu> forwhichFiu)

2*o^ 0,thencomponentsofdifferent replicationsofFiu) willinteracttoproducethecomposite function Fsiu), asshowninFig. 2.15b.Inthefirstcasefix) canberecoveredfromFsiu) bymultiplyingFsiu) byasuitableGiu):

G(u) = 1 lw|

Piu)

7x,1

2xn

PAu)

y\\ / \ /

* 0

12xn

(b)

PAu)G(u) > PAu)G(u)

(0

Fig. 2.15 (a) Fiu) bandlimited so that F(u) = 0 for\u\> \/2x0.limitedasin(a),(c)reconstructed transform.

(b) Fiu) not band

theimage,orcanbeincorporatedwithsuchblurring (byintegratingtheimageintensityoverafiniteareaforeachsample).Imageblurringcanbury irrelevantdetails,reducecertainformsofnoise,andalsoreducetheeffectsofaliasing.

2.3 IMAGING DEVICESFORCOMPUTER VISION

Thereisavastarrayofmethodsforobtainingadigitalimageinacomputer.Inthissectionwehaveinmindonly "traditional" imagesproduced byvariousformsofradiationimpingingonasensorafterhavingbeenaffected byphysicalobjects.

Manysensorsarebestmodeledasananalogdevicewhoseresponsemustbedigitizedforcomputer representation.Thetypesofimaging devices possiblearelimitedonlybythetechnical ingenuityoftheirdevelopers;attemptingadefinitive


XRayscanner

Fig. 2.16 Imagingdevices (boxes), information structures (rectangles),and processes (circles).

taxonomy isprobably unwise. Figure 2.16 isa flowchart ofdevices, informationstructures,andprocessesaddressedinthisandsucceedingsections.

Whentheimagealreadyexistsinsomeform,orphysicalconsiderationslimitchoiceofimagingtechnology,thechoiceofdigitizingtechnologymaystillbeopen.Mostimagesarecarriedonapermanentmedium,suchasfilm,oratleastareavailable in (essentially) analog form to adigitizing device.Generally, the relevanttechnical characteristics of imaging or digitizing devices should be foremost inmind when a technique is being selected. Such considerations as the signaltonoise ratio ofthe device, its resolution, the speed atwhich itworks,and itsexpenseareimportantissues.

Sec.2.3 Imaging Devices for Computer Vision 43

2.3.1 Photographic Imaging

The camera is the most familiar producer of optical images on a permanentmedium.Weshallnotaddressherethemultitudesofstillandmoviecameraoptions;rather,webriefly treatthecharacteristicsofthephotographicfilmandofthedigitizingdevicesthatconverttheimagetomachinereadableform.MoreonthesetopicsiswellpresentedintheReferences.

Photographic (blackandwhite) filmconsistsofanemulsionofsilverhalidecrystalsonafilmbase.(Severalotherlayersareidentifiable,butarenotessentialtoanunderstandingof therelevant propertiesoffilm.)Uponexposure tolight, thesilver halide crystals form developmentcenters, whichare smallgrainsofmetallicsilver.The photographic development process extends the formation ofmetallicsilver totheentire silverhalidecrystal,which thusbecomesabinary ("light" or"no light") detector. Subsequent processing removes undeveloped silverhalide.The resultingfilmnegative isdarkwheremany crystalsweredeveloped and lightwherefewwere.Theresolution ofthefilmisdetermined bythegrainsize,whichdepends on the original halide crystals and on development techniques. Generally,thefasterthefilm(thelesslightneeded toexposeit),thecoarserthegrain.Filmexiststhatissensitivetoinfrared radiation;xrayfilmtypicallyhastwoemulsionlayers,givingitmoregraylevelrangethanthatofnormalfilm.

Arepetitionofthenegativeforming processisusedtoobtainaphotographicprint.Thenegative isprojected onto photographicpaper,which respondsroughlyinthesamewayasthenegative.Mostphotographicprintpapercannotcaptureinoneprint therangeofdensitiesthatcanbepresentinanegative. Positivefilmsdoexistthatdonotrequireprinting;themostcommonexampleiscolorslide film.

Theresponseoffilmtolightisnotcompletely linear.Thephotographicdensityobtained byanegative isdefined asthe logarithm (base 10)oftheratioofincidentlighttotransmittedlight.

D = log10 f

The exposure of a negative dictates (approximately) its response. Exposure isdefined astheenergyperunitarea thatexposed thefilm(in itssensitivespectralrange).Thusexposureistheproductoftheintensityandthetimeofexposure.Thismathematical model of the behavior of the photographic exposure process iscorrect for awideoperating rangeofthefilm,but reciprocityfailureeffects in thefilmkeeponefrom beingablealwaystotradelightlevelforexposuretime.Atverylowlightlevels,longerexposuretimesareneeded thanarepredictedbytheproductrule.

The responseoffilmtolight isusuallyplotted inan"H&Dcurve" (namedforHurterandDriffield), whichplotsdensityversusexposure.TheH&Dcurveoffilmdisplaysmany of its important characteristics. Figure 2.17 exhibits a typicalH&Dcurveforablackandwhite film.

The toeofthecurveisthelowerregionoflowslope.Itexpressesreciprocityfailure and the fact that thefilmhasacertain bias,orfog response,whichdominatesitsbehavioratthelowestexposurelevels.Asonewouldexpect, thereisanupper limittothedensityofthefilm,attainedwhenamaximumnumberofsilver

Ch. 2 Image Formation

Log(exposure) Fig. 2.17 TypicalH & D curve.

halide crystals are rendered developable. Increasing exposure beyond thismaximum levelhaslittleeffect, accountingfor the shoulderintheH&Dcurve,oritsflattenedupperend.

Inbetweenthetoeandshoulder, thereistypicallyalinearoperatingregionofthe curve. Highcontrast films are those with high slope (traditionally calledgamma); theyresponddramatically tosmallchangesinexposure.Ahighcontrastfilmmayhaveagammabetweenabout1.5and10.Filmswithgammasofapproximately 10areused ingraphicsarts tocopy linedrawings.Generalpurposefilmshavegammasofabout0.5to1.0.

The resolution ofageneralfilm isabout 40 lines/mm, whichmeans thata1400x 1400imagemaybedigitizedfrom a35mmslide.Atanygreater samplingfrequency, theindividualfilmgrainswilloccupymorethanapixel,andtheresolutionwillthusbegrainlimited.

ImageDigitizers(Scanners)Accuracyand speedarethemainconsiderations inconverting animageon

filmintodigitalform.Accuracyhastwoaspects:spatialresolution,looselythelevelofimagespatialdetailtowhichthedigitizercanrespond,andgraylevelresolution,defined generally as the range ofdensities or reflectances towhich the digitizerrespondsandhowfinelyitdividestherange. Speedisalsoimportantbecauseusuallymanydataareinvolved;imagesof1millionsamplesarecommonplace.

Digitizers broadly take two forms: mechanical and "flying spot." In amechanicaldigitizer,thefilmandasensingassemblyaremechanicallytransportedpastoneanotherwhilereadingsaremade.Inaflyingspotdigitizer, thefilmandsensorarestatic.Whatmovesisthe"flying spot,"whichisapointoflightonthefaceofacathoderay tube,oralaser beamdirectedbymirrors.Inalldigitizersaverynarrowbeamoflightisdirectedthroughthefilmorontotheprintataknowncoordinatepoint.Thelighttransmittanceorreflectance ismeasured, transformedfrom analogtodigitalform, andmadeavailabletothecomputer through interfacingelectronics.Thelocationonthemediumwheredensityisbeingmeasuredmayalsobetransmittedwitheachreading,butitisusuallydeterminedbyrelativeoffsetfrom positions transmitted lessfrequently. For example, a"new scan line" impulse istransmittedforTVoutput; thepositionalongthecurrentscanlineyieldsanxposition,andthenumberofscanlinesyieldsavposition.

Sec.2.3 Imaging Devices for Computer Vision 45

2.0

1.0

nn

Themechanicalscannersaremostlyoftwotypes,flatbedanddrum.Inaflatbeddigitizer, thefilmislaidflatonasurface overwhich the lightsourceand thesensor (usually a very accurate photoelectric cell) are transported in a rasterfashion. Inadrumdigitizer, thefilmisfastened toacirculardrumwhichrevolvesasthesensorandlightsourcearetransporteddownthedrumparalleltoitsaxisofrotation.

Color mechanical digitizers also exist; they work by using colored filters,effectively extracting inthreescansthree "color overlays"whichwhen superimposedwould yield the original color image.Extracting some "composite" colorsignalwithonereadingpresentstechnicalproblemsandwouldbedifficult todoasaccurately.

SatelliteImageryLANDSATandERTS (EarthResourcesTechnologySatellites) havesimilar

scannerswhichproduceimagesof2340x33807bitpixelsinfour spectralbands,coveringanareaofi00x 100nauticalmiles.Thescannerismechanical,scanningsix horizontal scan lines at a time; the rotation of the earth accounts for theadvancementofthescanintheverticaldirection.

Asetoffour imagesisshowninFig.2.18.Thefour spectralbandsarenumbered 4,5,6,and7.Band4 [0.5to0.6/xm (green)] accentuates sedimentladenwaterandshallowwater,band5[0.6to0.7/xm (red)]emphasizescultural featuressuchasroadsandcities,band6[0.7to0.8fxm (nearinfrared)] emphasizesvegetationandaccentuates thecontrast between landandwater, band7 [0.8to 1.1/xm(near infrared)] islike band 6except that it isbetter at penetrating atmospherichaze.

TheLANDSAT imagesareavailableatnominal costfrom theU.S.government (TheEROSDataCenter, SiouxFalls,SouthDakota 57198).Theyare furnished on tape, and cover the entire surface of the earth (often the buyer hasachoiceoftheamountofcloudcover).Theseimagesformahugedatabaseofmultispectral imagery, useful for landuseandgeologicalstudies;they furnish somethingofanimageanalysischallenge,sinceonesatellitecanproducesome6billionbitsofimagedataperday.

TelevisionImagingTelevisioncamerasareappealingdevicesforcomputervisionapplicationsfor

several reasons. For one thing, the image is immediate; the camera can showeventsastheyhappen.Foranother, theimageisalreadyinelectrical,ifnotdigitalform. "Television camera" is basically a nontechnical term, because manydifferent technologiesproducevideosignalsconformingtothestandardssetbytheFCCandNTSC.Camerasexistwithawidevarietyoftechnicalspecifications.

Usually,TVcamerashaveassociated electronicswhichscananentire "picture" atatime.Thisoperation isclosely related tobroadcastand receiver standards,and ismoreoriented tohuman viewing than tocomputer vision.An entireimage (ofsome525scanlinesintheUnitedStates) iscalledaframe, andconsistsoftwofields,eachmadeupofalternatescanlinesfrom theframe.Thesefieldsaregeneratedandtransmitted sequentiallybythecameraelectronics.Thetransmittedimageisthusinterlaced,withalloddnumbered scanlinesbeing"painted" on the


(c) (d)Fig. 2.18 Thestra

ballard d. and brown c. m. 1982 computer vision

Documents