ballard d. and brown c. m. 1982 computer vision
TRANSCRIPT
-
COMFUTDANAH.BALLARDCHRISTOPHERM.BROWN
-
COMPUTERVISION
DanaH.BallardChristopher M.Brown
Department of Computer ScienceUniversity of RochesterRochester, New York
PRENTICEHALL, INC., Englewood Cliffs, New Jersey 07632
-
LibraryofCongress Cataloging inPublication DataBALLARD. DANA HARRY.
Computer vision.Bibliography: p.Includes index.I. Imageprocessing. I. Brown,Christopher M.
II. Title.TA1632.B34 621.38'04I4 8120974ISBN 0131653164 AACR2
Cover design by Robin Breite
1982 by PrenticeHall, Inc.Englewood Cliffs, New Jersey 07632
All rights reserved. No part of this bookmay be reproduced in any form or by any meanswithout permission in writing from the publisher.
Printed in the United States of America
10 9 8 7 6 5 4 3 2
ISBN D13XtS31bM
PRENTICEHALL INTERNATIONAL, INC., LondonPRENTICEHALL OF AUSTRALIA PTY. LIMITED, SydneyPRENTICEHALL OF CANADA, LTD., TorontoPRENTICEHALL OF INDIA PRIVATE LIMITED, New DelhiPRENTICEHALL OF JAPAN, INC., TokyoPRENTICEHALL OF SOUTHEAST ASIA PTE. LTD., SingaporeWHITEHALL BOOKS LIMITED, Wellington, New Zealand
-
Preface xiii
Acknowledgments xv
Mnemonics for Proceedings and Special Collections Cited inReferences xix
1 COMPUTER VISION1.1 AchievingSimpleVisionGoals 11.2 HighLevel and LowLevel Capabilities 21.3 ARange of Representations 61.4 The Role ofComputers 91.5 Computer Vision Research and Applications 12
-
Part IGENERALIZED IMAGES
13
IMAGE FORMATION2.1 Images 172.2 Image Model 18
2.2.1 ImageFunctions, 182.2.2 Imaging Geometry, 192.2.3 Reflectance, 222.2.4 Spatial Properties, 242.2.5 Color, 312.2.6 Digital Images, 35
2.3 Imaging Devices for Computer Vision2.3.1 Photographic Imaging,442.3.2 Sensing Range, 522.3.3 Reconstruction Imaging, 56
EARLY PROCESSING3.1 Recovering Intrinsic Structure 633.2 Filtering the Image 65
3.2.1 TemplateMatching,653.2.2 Histogram Transformations, 703.2.3 Background Subtraction, 723.2.4 Filtering and Reflectance Models, 73
3.3 Finding Local Edges 753.3.1 TypesofEdgeOperators,763.3.2 Edge Thresholding Strategies, 803.3.3 ThreeDimensional Edge Operators, 813.3.4 How Good Are Edge Operators? 833.3.5 Edge Relaxation, 85
3.4 Range Information from Geometry 883.4.1 Stereo Vision andTriangulation, 883.4.2 A Relaxation Algorithm for Stereo, 89
3.5 Surface Orientation from Reflectance Models 933.5.1 Reflectivity Functions,933.5.2 Surface Gradient, 953.5.3 Photometric Stereo, 983.5.4 Shape from Shading by Relaxation, 99
3.6 Optical Flow 1023.6.1 TheFundamental FlowConstraint, 1023.6.2 Calculating Optical Flow by Relaxation, 103
3.7 Resolution Pyramids 1063.7.1 GrayLevel Consolidation, 1063.7.2 Pyramidal Structures in Correlation, 1073.7.3 Pyramidal Structures in Edge Detection, 109
42
-
PART IISEGMENTED IMAGES
115
4 BOUNDARY DETECTION4.1 On Associating Edge Elements 1194.2 Searching Nea r an Approximate Location 121
4.2.1 Adjusting A Priori Boundaries, 1214.2.2 NonLinear Correlation in Edge Space, 1214.2.3 DivideandConquer Boundary Detection, 122
4.3 The Hough Method for Curve Detection 1234.3.1 UseoftheGradient, 1244.3.2 Some Examples, 1254.3.3 Trading OffWork in Parameter Space for Work in
Image Space, 1264.3.4 GeneralizingtheHough Transform, 128
4.4 Edge Following as Graph Searching 1314.4.1 GoodEvaluationFunctions,1334.4.2 Finding All the Boundaries, 1334.4.3 Alternatives to the A Algorithm, 136
4.5 Edge Following as Dynamic Programming 1374.5.1 Dynamic Programming, 1374.5.2 Dynamic Programming for Images, 1394.5.3 Lower Resolution Evaluation Functions, 1414.5.4 Theoretical Questions about Dynamic
Programming, 1434.6 Contour Following 143
4.6.1 Extension toGrayLevel Images, 1444.6.2 Generalization to HigherDimensional Image
Data, 146
5 REGION GROWING5.1 Regions 1495.2 A Local Technique: Blob Coloring 1515.3 Global Techniques: Region Growing via Thresholding
5.3.1 Thresholding inMultidimensional Space,1535.3.2 Hierarchical Refinement, 155
5.4 Splitting and Merging 1555.4.1 StateSpaceApproach toRegionGrowing, 1575.4.2 LowLevel Boundary Data Structures, 1585.4.3 GraphOriented Region Structures, 159
5.5 Incorporat ion of Semantics 160
6 TEXTURE6.1 What Is Texture? 1666.2 Texture Primitives 169
Contents
-
6.3 Structural Models of Texel Placement 1706.3.1 Grammatical Models, 1726.3.2 Shape Grammars, 1736.3.3 Tree Grammars, 1756.3.4 Array Grammars, 178
6.4 Texture as a Pattern Recognition Problem 1816.4.1 Texture Energy, 1846.4.2 Spatial GrayLevel Dependence, 1866.4.3 Region Texels, 188
6.5 TheTexture Gradient 189
7 MOTION7.1 Mot ion Unders tanding 195
7.1.1 DomainIndependent Understanding, 1967.1.2 DomainDependent Understanding, 196
7.2 Understanding Optical Flow 1997.2.1 Focusof Expansion, 1997.2.2 Adjacency, Depth, and Collision, 2017.2.3 Surface Orientation and Edge Detection, 2027.2.4 Egomotion, 206
7.3 Unders tanding Image Sequences 2077.3.1 Calculating Flow from Discrete Images,2077.3.2 Rigid Bodies from Motion, 2107.3.3 Interpretation of Moving Light DisplaysA
DomainIndependent Approach, 2147.3.4 Human Motion UnderstandingA Model
Directed Approach, 2177.3.5 Segmented Images, 220
Part IIIGEOMETRICAL STRUCTURES
227
8 REPRESENTATION OF TWODIMENSIONALGEOMETRICSTRUCTURES
8.1 TwoDimensional Geometric Structures 2318.2 Boundary Representations 232
8.2.1 Polylines,2328.2.2 Chain Codes, 2358.2.3 The tyj Curve, 2378.2.4 Fourier Descriptors, 2388.2.5 Conic Sections, 2398.2.6 BSplines, 2398.2.7 Strip Trees, 244
-
8.3 Region Representations 2478.3.1 SpatialOccupancyArray,2478.3.2 y Axis, 2488.3.3 Quad Trees, 2498.3.4 Medial Axis Transform, 2528.3.5 Decomposing Complex Areas, 253
8.4 Simple Shape Properties 2548.4.1 Area,2548.4.2 Eccentricity, 2558.4.3 Euler Number, 2558.4.4 Compactness, 2568.4.5 Slope Density Function, 2568.4.6 Signatures, 2578.4.7 Concavity Tree, 2588.4.8 ShapeNumbers, 258
9 REPRESENTATION OF THREEDIMENSIONALSTRUCTURES
9.1 Solids and Their Representation 2649.2 Surface Representations 265
9.2.1 SurfaceswithFaces,2659.2.2 Surfaces Based on Splines, 2689.2.3 Surfaces That Are Functions on the Sphere, 270
9.3 Generalized Cylinder Representations 2749.3.1 GeneralizedCylinderCoordinate Systemsand
Properties, 2759.3.2 Extracting Generalized Cylinders, 2789.3.3 ADiscreteVolumetricVersion of theSkeleton, 279
9.4 Volumetric Representations 2809.4.1 SpatialOccupancy, 2809.4.2 Cell Decomposition, 2819.4.3 Constructive Solid Geometry, 2829.4.4 Algorithms for Solid Representations, 284
9.5 Unders tanding Line Drawings 2919.5.1 Matching LineDrawings to ThreeDimensional
Primitives, 2939.5.2 Grouping Regions Into Bodies, 2949.5.3 Labeling Lines, 2969.5.4 Reasoning About Planes, 301
Part IVRELATIONAL STRUCTURES
313
10 KNOWLEDGE REPRESENTATION AND USE10.1 Representations 317
10.1.1 TheKnowledgeBaseModels and Processes,318
-
10.1.2 Analogical and Propositional Representations,319
10.1.3 Procedural Knowledge, 32110.1.4 Computer Implementations, 322
10.2 SemanticNets 32310.2.1 SemanticNet Basics,32310.2.2 Semantic Nets for Inference, 327
10.3 Semantic Net Examples 33410.3.1 Frame Implementations, 33410.3.2 Location Networks, 335
10.4 Control Issues in Complex Vision Systems 34010.4.1 ParallelandSerialComputation, 34110.4.2 Hierarchical and Heterarchical Control, 34110.4.3 Belief Maintenance and Goal Achievement, 346
11 MATCHING 3521.1 Aspects of Matching 352
11.1.1 Interpretation:Construction, Matching, andLabeling 352
Il.l.2 Matching Iconic, Geometric, and RelationalStructures, 353
1.2 GraphTheoretic Algorithms 35511.2.1 TheAlgorithms,35711.2.2 Complexity, 359
1.3 Implementing GraphTheoretic Algorithms 36011.3.1 MatchingMetrics,36011.3.2 Backtrack Search, 36311.3.3 Association Graph Techniques, 365
1.4 Matching in Practice 36911.4.1 DecisionTrees,37011.4.2 Decision Tree and Subgraph Isomorphism, 37511.4.3 Informal Feature Classification, 37611.4.4 A Complex Matcher, 378
12 INFERENCE 38312.1 FirstOrder Predicate Calculus 384
12.1.1 ClauseForm Syntax (Informal), 38412.1.2 Nonclausal Syntax and Logic Semantics
(Informal), 38512.1.3 ConvertingNonclausal Form toClauses,38712.1.4 Theorem Proving, 38812.1.5 Predicate Calculus and Semantic Networks, 39012.1.6 Predicate Calculus and Knowledge
Representation, 39212.2 Computer Reasoning 39512.3 Production Systems 396
12.3.1 Production System Details, 39812.3.2 Pattern Matching, 399
Contents
-
12.3.3 An Example, 40112.3.4 Production System Pros and Cons, 406
12.4 SceneLabeling and Constraint Relaxation 40812.4.1 Consistent andOptimal Labelings,40812.4.2 Discrete Labeling Algorithms, 41012.4.3 A Linear Relaxation Operator and a Line
Labeling Example, 41512.4.4 ANonlinear Operator, 41912.4.5 Relaxation as Linear Programming, 420
12.5 Active Knowledge 43012.5.1 Hypotheses,43112.5.2 HOWTO and SOWHAT Processes, 43112.5.3 Control Primitives, 43112.5.4 Aspects of Active Knowledge, 433
13 GOAL ACHIEVEMENT 43813.1 Symbolic Planning 439
13.1.1 RepresentingtheWorld,43913.1.2 Representing Actions, 44113.1.3 Stacking Blocks, 44213.1.4 The Frame Problem, 444
13.2 Planning with Costs 44513.2.1 Planning,Scoring,andTheir Interaction,44613.2.2 Scoring Simple Plans, 44613.2.3 Scoring Enhanced Plans, 45113.2.4 Practical Simplifications, 45213.2.5 A Vision System Based on Planning, 453
APPENDICES465
A 1 SOME MATHEMATICAL TOOLS 465Al.l Coordinate Systems 465
Al.I.I Cartesian,465Al.l .2 Polar and Polar Space, 465Al. l .3 Spherical and Cylindrical, 466Al.l .4 Homogeneous Coordinates, 467
A l .2 Trigonometry 468Al.2.1 PlaneTrigonometry, 468A1.2.2 Spherical Trigonometry, 469
A1.3 Vectors 469A1.4 Matrices 471A1.5 Lines 474
A1.5.1 Two Points,474A1.5.2 Point and Direction, 474A1.5.3 Slope and Intercept, 474
Contents xi
-
A1.5.4 Ratios, 474Al.5.5 Normal and Distance from Origin (Line
Equation), 475A1.5.6 Parametric,476
A1.6 Planes 476A1.7 Geometric Transformations 477
A1.7.1 Rotation,477A1.7.2 Scaling, 478A1.7.3 Skewing, 479Al.7.4 Translation, 479A1.7.5 Perspective, 479Al.7.6 Transforming Lines and Planes, 480A1.7.7 Summary, 480
A1.8 Camera Calibration and Inverse Perspective 481A1.8.1 CameraCalibration, 482A1.8.2 Inverse Perspective, 483
A1.9 LeastSquaredError Fitting 484A1.9.1 PseudoInverseMethod,485A1.9.2 Principal Axis Method, 486Al.9.3 Fitting Curves by the PseudoInverse Method,
487A1.10 Conies 488A1.11 Interpolation 489
Al.11.1 OneDimensional,489A1.11.2 TwoDimensional, 490
A1.12 The Fast Fourier Transform 490A1.13 The Icosahedron 492A1.14 Root Finding 493
A 2 ADVANCED CONTROL MECHANISMSA2.1 Standard Control Structures 497
A2.1.1 Recursion,498A2.1.2 CoRoutining, 498
A2.2 Inherently Sequential Mechanisms 499A2.2.1 Automatic Backtracking,499A2.2.2 Context Switching, 500
A2.3 Sequential or Parallel Mechanisms 500A2.3.1 Modules and Messages,500A2.3.2 Priority Job Queue, 502A2.3.3 PatternDirected Invocation, 504A2.3.4 Blackboard Systems, 505
AUTHOR INDEX
SUBJECTINDEX
-
Preface
Thedreamofintelligentautomatagoesbacktoantiquity;itsfirstmajor articulationin the context of digital computerswas byTuring around 1950.Since then, thisdream has beenpursued primarily byworkers in thefieldofartificial intelligence,whose goal is to endow computers with informationprocessing capabilitiescomparable to thoseofbiological organisms.From theoutset, oneofthegoalsofartificial intelligencehasbeentoequipmachineswiththecapabilityofdealingwithsensory inputs.
Computervisionis the construction of explicit, meaningful descriptions ofphysical objects from images. Image understanding is very different from imageprocessing,which studies imagetoimage transformations, not explicit descriptionbuilding. Descriptions are a prerequisite for recognizing, manipulating, andthinking about objects.
We perceive a world of coherent threedimensional objects with manyinvariant properties. Objectively, the incoming visual data do not exhibitcorresponding coherence or invariance; they contain much irrelevant or evenmisleading variation. Somehow our visual system, from the retinal to cognitivelevels,understands, or imposesorder on, chaoticvisual input. It doessobyusingintrinsicinformationthatmayreliablybeextractedfrom theinput,andalsothroughassumptionsand knowledge that areapplied at various levelsinvisualprocessing.
The challenge of computer vision is one of explicitness. Exactly whatinformation about scenes can be extracted from an image using only very basicassumptions about physics and optics? Explicitly, what computations must beperformed? Then, at what stagemust domaindependent, prior knowledge abouttheworld beincorporated intotheunderstanding process?Howareworldmodelsand knowledge represented and used?Thisbook isabout the representations andmechanismsthatallowimageinformation andpriorknowledgetointeractinimageunderstanding.
Computer vision is a relatively new and fastgrowing field. The firstexperimentswere conducted in the late 1950s,and many of theessential concepts
xiii
-
havebeendevelopedduringthelastfiveyears.Withthisrapidgrowth,crucialideashave arisen indisparate areas such asartificial intelligence, psychology, computergraphics,andimageprocessing.Ourintentistoassembleaselectionofthismaterialin a form that will serve both as asenior/graduatelevel academic text and as auseful reference to thosebuilding vision systems.Thisbook has astrong artificialintelligenceflavor,andwehopethiswillprovokethought.Webelievethatboththeintrinsic image information and theinternal model of theworld are important insuccessful vision systems.
Thebookisorganized intofourparts,basedondescriptionsofobjectsat fourdifferent levelsof abstraction.
1.Generalizedimagesimagesandimagelikeentities.2. Segmented imagesimages organized into subimages that are likely to
correspond to"interesting objects."3. Geometricstructuresquantitativemodelsofimageandworldstructures.4. Relational structurescomplex symbolicdescriptions of imageandworld
structures.
Theparts follow aprogression of increasingabstractness.Although the fourpartsaremostnaturallystudiedinsuccession,theyarenottightlyinterdependent.PartIisaprerequisitefor PartII,butPartsIIIand IVcanberead independently.
Parts of the book assume some mathematical and computing background(calculus,linearalgebra,datastructures,numericalmethods).However, throughoutthebookmathematicalrigortakesabackseattoconcepts.Ourintentistotransmitasetof ideasabout anewfieldtothewidestpossibleaudience.
Inonebookitisimpossibletodojusticetothescopeanddepthofpriorworkincomputervision.Further,werealizethatinafastdevelopingfield,therapidinfluxofnewideaswillcontinue.Wehopethatourreaderswillbechallengedtothink,criticize,read further, andquicklygobeyondtheconfines ofthisvolume.
xiv Preface
-
Acknowledgments
JerryFeldman andHerbVoelcker(andthrough them theUniversity ofRochester)providedmany resources for thiswork.Oneof themost importantwasacapableand forgiving staff (secretarial, technical, and administrative). For massive textediting, valuable advice,and good humorweareespeciallygrateful to Rose Peet.Peggy Meeker, Jill Orioli, and Beth Zimmerman all helped at various stages.
Several colleaguesmade suggestions on early drafts: thanks to JamesAllen,Norm Badler, Larry Davis,Takeo Kanade, John Render, Daryl Lawton, JosephO'Rourke, Ari Requicha, Ed Riseman, Azriel Rosenfeld, Mike Schneier, KenSloan, SteveTanimoto, Marty Tenenbaum, and Steve Zucker.
Graduatestudentshelped inmanydifferent ways:thanksespeciallytoMichelDenber, Alan Frisch, Lydia Hrechanyk, Mark Kahrs,Keith Lantz, JoeMaleson,LeeMoore,MarkPeairs,DonPerlis,RickRashid,DanRussell,DanSabbah,BobSchudy,PeterSelfridge, UriShani,andBobTilove.BernhardStuthdeservesspecialmention formuchcareful andcritical reading.
Finally,thanksgoto JaneBallard,mostly for standing steadfast through thecycles ofelation and depression and for numerous engineeringtoEnglish translations.
AsPatWinstonputit:"Awillingnesstohelpisnotanimpliedendorsement."The aid of others was invaluable, butwe alone are responsible for theopinions,technical details, and faults of this book.
FundingassistancewasprovidedbytheSloanFoundation underGrant78415,bytheNational InstitutesofHealthunderGrantHL21253,andbytheDefenseAdvanced Research Projects Agency under Grant N0001478C0164.
The authors wish to credit the following sources for figures and tables. Forcomplete citations given here in abbreviated form (as "from ..." or "after .. ."),refer to the appropriate chapterend references.
Fig. 1.2 from Shani, U., "A 3D modeldriven system for the recognition of abdominalanatomy from CTscans,"TR77,Dept.ofComputer Science,University ofRochester, May1980.
Acknowledgments xv
-
Fig. 1.4 courtesy ofAllen Hanson and Ed Riseman, COINS Research Project, University ofMassachusetts, Amherst, MA.
Fig. 2.4 after Horn and Sjoberg, 1978.Figs. 2.5, 2.9, 2.10, 3.2, 3.6, and 3.7 courtesy of Bill Lampeter.Fig.2.7a painting by Louis Condax; courtesy of Eastman Kodak Company and the OpticalSociety of America.
Fig.2.8acourtesyofD.Greenberg andG. Joblove,Cornell Program ofComputer Graphics.Fig. 2.8b courtesy of Tom Check.Table 2.3 after Gonzalez and Wintz, 1977.Fig. 2.18 courtesy of EROS Data Center, Sioux Falls, SD.Figs. 2.19 and 2.20 from Herrick, C.N., Television Theory and Servicing: Black/White andColor, 2nd Ed. Reston, VA: Reston, 1976.
Figs. 2.21,2.22, 2.23, and 2.24 courtesy of Michel Denber.Fig. 2.25 from Popplestone et al., 1975.Fig. 2.26 courtesy of Production Automation Project, University of Rochester.Fig. 2.27 from Waag and Gramiak, 1976.Fig. 3.1 courtesy of Marty Tenenbaum.Fig. 3.8 after Horn, 1974.Figs. 3.14 and 3.15 after Frei and Chen, 1977.Figs. 3.17and 3.18 from Zucker, S.W. and R.A. Hummel, "An optimal 3Dedge operator,"IEEE Trans. PAMI3, May 1981,pp. 324331.
Fig. 3.19 curvesare based on data inAbdou, 1978.Figs.3.20, 3.21,and 3.22 from Prager, J.M., "Extracting and labeling boundary segments innatural scenes," IEEE Tans. PAMI 12, 1,January 1980. 1980 IEEE.
Figs.3.23,3.28,3.29, and 3.30courtesy of Berthold Horn.Figs. 3.24 and 3.26 from Marr, D. and T. Poggio, "Cooperative computation of stereo disparity," Science, Vol. 194, 1976,pp.283287. 1976bytheAmerican Association for theAdvancement of Science.
Fig. 3.31 from Woodham, R.J., "Photometric stereo:A reflectance map technique for determining surface orientation from image intensity," Proc. SPIE, Vol. 155, August 1978.
Figs. 3.33 and 3.34 after Horn and Schunck, 1980.Fig. 3.37 from Tanimoto, S. and T. Pavlidis, "A hierarchical data structure for picture processing," CGIP 4, 2, June 1975, pp. 104119.
Fig. 4.6 from Kimme et al., 1975.Figs. 4.7 and 4.16 from Ballard and Sklansky, 1976.Fig. 4.9 courtesy of Dana Ballard and Ken Sloan.Figs.4.12and4.13 from Ramer, U., "Extraction of linestructures from photgraphs ofcurvedobjects," CGIP 4, 2, June 1975,pp. 81103.
Fig. 4.14 courtesy of Jim Lester, Tufts/New England Medical Center.Fig. 4.17 from Chien, Y.P. and K.S. Fu, "A decision function method for boundary detection," CGIP 3, 2, June 1974, pp. 125140.
Fig. 5.3 from Ohlander, R., K. Price,and D.R. Reddy, "Picture segmentation using a recursiveregion splitting method," CGIP 8, 3, December 1979.
Fig. 5.4 courtesy of Sam Kapilivsky.Figs. 6.1, 11.16, and A1.13 courtesy of Chris Brown.Fig. 6.3 courtesy of Joe Maleson and John Kender.Fig. 6.4 from Connors, 1979. Texture images by Phil Brodatz, in Brodatz, Textures. NewYork: Dover, 1966.
Fig. 6.9 texture image by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966.Figs. 6.11, 6.12, and 6.13 from Lu, S.Y. and K.S. Fu, "A syntactic approach to textureanalysis," CGIP 7,3, June 1978, pp. 303330.
xvi Acknowledgments
-
Fig. 6.14 from Jayaramamurthy, S.N., "Multilevel array grammars for generating texturescenes," Proc. PRIP, August 1979,pp.391398. 1979 IEEE.
Fig.6.20 from Laws, 1980.Figs. 6.21 and 6.22 from Maleson et al., 1977.Fig. 6.23 courtesy of Joe Maleson.Figs. 7.1 and 7.3 courtesy of Daryl Lawton.Fig. 7.2 after Prager, 1979.Figs.7.4and 7.5 from Clocksin,W.F., "Computer prediction ofvisual thresholds for surfaceslant and edge detection from optical flow fields," Ph.D. dissertation, University of Edinburgh, 1980.
Fig. 7.7 courtesy of Steve Barnard and Bill Thompson.Figs. 7.8 and 7.9 from Rashid, 1980.Fig. 7.10 courtesy of Joseph O'Rourke.Figs. 7.11 and 7.12 after Aggarwal and Duda, 1975.Fig. 7.13 courtesy of HansHellmut Nagel.Fig. 8.Id after Requicha, 1977.Figs. 8.2, 8.3, 8.21a, 8.22, and 8.26 after Pavlidis, 1977.Figs. 8.10, 8.11, 9.6, and 9.16 courtesy of Uri Shani.Figs. 8.12, 8.13, 8.14, 8.15, and 8.16 from Ballard, 1981.Fig. 8.21 b from Preston, K., Jr., M.J.B.Duff; S. Levialdi, P.E. Norgren, and Ji. Toriwaki,"Basicsofcellular logicwith someapplications inmedical imageprocessing," Proc. IEEE,Vol. 67, No. 5, May 1979, pp. 826856.
Figs. 8.25, 9.8, 9.9, 9.10, and 11.3 courtesy of Robert Schudy.Fig. 8.29 after Bribiesca andGuzman, 1979.Figs. 9.1,9.18, 9.19, and 9.27 courtesy of Ari Requicha.Fig. 9.2 from Requicha, A.A.G., "Representations for rigid solids: theory, methods,systems," Computer Surveys 12,4, December 1980.
Fig. 9.3 courtesy of Lydia Hrechanyk.Figs. 9.4 and 9.5 after Baumgart, 1972.Fig. 9.7 courtesy of Peter Selfridge.Fig. 9.11 after Requicha, 1980.Figs.9.14and 9.15b from Agin,G.J. andT.O. Binford, "Computer description ofcurvedobjects," IEEE Trans, on Computers 25, 1, April 1976.
Fig. 9.15a courtesy of Gerald Agin.Fig. 9.17 courtesy of A. Christensen; published as frontispiece of ACM SIGGRAPH 80
Proceedings.Fig. 9.20 from Marr and Nishihara, 1978.Fig. 9.21 after Tilove, 1980.Fig. 9.22b courtesy of Gene Hartquist.Figs. 9.24, 9.25, and 9.26 from Lee and Requicha, 1980.Figs.9.28a, 9.29, 9.30, 9.31,9.32,9.35,and 9.37 and Table 9.1 from Brown, C.and R. Popplestone, "Cases insceneanalysis," in Pattern Recognition, ed.B.G. Batchelor. New York:Plenum, 1978.
Fig.9.28bfromGuzman,A.,"Decomposition ofavisualsceneintothreedimensional bodies,"in Automatic Interpretation and Classification of Images, A. Grasseli, ed., New York:Academic Press, 1969.
Fig. 9.28c from Waltz, D., "Understanding line drawing of scenes with shadows," in ThePsychology of Computer Vision,ed. P.H. Winston. New York: McGrawHill, 1975.
Fig. 9.28d after Turner, 1974.Figs. 9.33, 9.38, 9.40, 9.42, 9.43, and 9.44 after Mackworth, 1973.
Acknowledgments xvii
-
Figs. 9.39, 9.45, 9.46, and 9.47 and Table 9.2 after Kanade, 1978.Figs. 10.2 and A2.1 courtesy of Dana Ballard.Figs. 10.16, 10.17,and 10.18 after Russell, 1979.Fig. 11.5 after Fischler and Elschlager, 1973.Fig. 11.8 after Ambler et al., 1975.Fig. 11.10 from Winston, P.H., "Learning structural descriptions from examples," in ThePsychology of Computer Vision, ed. P.H. Winston. New York: McGrawHill, 1975.
Fig. 11.11 from Nevatia, 1974.Fig. 11.12 after Nevatia, 1974.Fig. 11.17 after Barrow and Popplestone, 1971.Fig. 11.18 from Davis, L.S., "Shape matching using relaxation techniques," IEEE Trans.PAMI 1, 4, January 1979, pp. 6072.
Figs. 12.4 and 12.5 from Sloan and Bajcsy, 1979.Fig. 12.6 after Barrow and Tenenbaum, 1976.Fig. 12.8 after Freuder. 1978.Fig. 12.10from Rosenfeld, A.R.,A.Hummel,andS.W.Zucker,"Scenelabelingby relaxationoperations," IEEE Trans. SMC 6, 6,June 1976,p.420.
Figs. 12.11, 12.12, 12.13, 12.14, and 12.15 after Hinton, 1979.Fig. 13.3 courtesy of Aaron Sloman.Figs. 13.6, 13.7, and 13.8 from Garvey, 1976.Fig.A1.11 after Duda and Hart, 1973.Figs.A2.2andA2.3from Hanson,A.R. and E.M. Riseman, "VISIONS:Acomputer systemfor interpreting scenes," inComputer VisionSystems, ed.A.R. Hanson and E.M. Riseman.New York: Academic Press, 1978.
Acknowledgments
-
Mnemonicsfor Proceedings and Special Collections
Cited in the References
CGIPComputer Graphicsand Image Processing
COMPSACIEEEComputer Society's3rd InternationalComputerSoftware andApplications Conference, Chicago,November 1979.
cvsHanson, A. R. and E. M. Riseman (Eds.). ComputerVision Systems.NewYork: Academic Press, 1978.
DARPA IUDefense Advanced Research Projects Agency Image UnderstandingWorkshop, Minneapolis, MN, April 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, PaloAlto, CA,October 1977.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, Cambridge, MA,May 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, CarnegieMellon University, Pittsburgh, PA,November 1978.Defense Advanced Research Projects Agency Image UnderstandingWorkshop, University of Maryland, College Park, MD,April 1980.
IJCAI2nd International Joint Conference on Artificial Intelligence, ImperialCollege, London, September 1971.4th International JointConference onArtificial Intelligence,Tbilisi,Georgia,USSR, September 1975.5th International Joint Conference on Artificial Intelligence, MIT,Cambridge, MA,August 1977.6th International Joint Conference onArtificial Intelligence,Tokyo,August1979.
Mnemonics xix
-
IJCPR2nd International Joint Conference on Pattern Recognition, Copenhagen,August 1974.3rd International Joint Conference on Pattern Recognition, Coronado, CA,November 1976.4thInternationalJointConferenceonPatternRecognition,Kyoto,November1978.5th International Joint Conference on Pattern Recognition, Miami Beach,FL, December 1980.
MI4 *Meltzer, B.and D. Michie (Eds.).MachineIntelligence4.Edinburgh: Edinburgh University Press, 1969.
MI5Meltzer, B.and D.Michie (Eds.).MachineIntelligence5.Edinburgh: Edinburgh University Press, 1970.
MI6Meltzer, B.and D.Michie (Eds.).MachineIntelligence6.Edinburgh: Edinburgh University Press,1971.
M17Meltzer, B.and D.Michie (Eds.).MachineIntelligence 7.Edinburgh: Edinburgh University Press, 1972.
PCVWinston, P. H. (Ed.). The Psychologyof Computer Vision.New York:McGrawHill, 1975.
PRIPIEEE Computer Society Conference on Pattern Recognition and ImageProcessing, Chicago, August 1979.
Mnemonics
-
1
ComputerVisionIssues
1.1 ACHIEVINGSIMPLEVISIONGOALS
SupposethatyouaregivenanaerialphotosuchasthatofFig.1.1aandaskedtolocateshipsinit.Youmayneverhaveseenanavalvesselinanaerialphotographbefore, butyouwillhavenotroublepredictinggenerallyhowshipswillappear.Youmightreasonthatyouwillfindnoshipsinland,andsoturnyourattentiontooceanareas.Youmightbemomentarilydistractedbytheglareonthewater,butrealizingthat itcomesfrom reflected sunlight, you perceive theocean ascontinuous andflat.Shipsontheopenoceanstandouteasily (ifyouhaveseenshipsfrom theair,youknowtolookfortheirwakes).Neartheshoretheimageismoreconfusing,butyouknowthatshipsclosetoshoreareeithermooredordocked.Ifyouhaveamap(Fig.1.1b),itcanhelplocatethedocks (Fig.1.1c);inalowqualityphotograph itcanhelpyou identify theshoreline. Thus itmightbeagoodinvestment ofyourtime toestablish thecorrespondence between themapand the image.Asearchparalleltotheshoreinthedockareasrevealsseveralships(Fig.1.Id).
Again, suppose that you are presentedwith aset ofcomputeraided tomographic (CAT) scansshowing"slices"ofthehumanabdomen (Fig.1.2a).Theseimagesareproductsofhigh technology,andgiveusviewsnotnormallyavailableevenwithxrays.Yourjob isto reconstruct from thesecrosssections the threedimensional shape of the kidneys. Thisjob maywell seem harder thanfindingships.Youfirstneedtoknowwhattolookfor (Fig.1.2b),wheretofinditinCATscans,andhowitlooksinsuchscans.Youneedtobeableto"stackup"thescansmentallyandform aninternalmodeloftheshapeofthekidneyasrevealedbyitsslices(Fig.1.2cand1.2d).
Thisbookisaboutcomputervision.Thesetwoexampletasksaretypicalcom
1
ComputerVision
-
puter vision tasks; both were solvedbycomputers usingthesortsofknowledgeand techniques alluded tointhedescriptive paragraphs. Computer vision is theenterpriseofautomatingandintegratingawiderangeofprocessesandrepresentations used forvision perception. It includes asparts many techniques thatareuseful by themselves, such as imageprocessing(transforming, encoding, andtransmitting images) andstatisticalpatternclassification (statistical decision theoryapplied togeneral patterns, visualorotherwise).More importantly forus,it includestechniquesforgeometricmodelingandcognitiveprocessing.
1.2 HIGHLEVELANDLOWLEVEL CAPABILITIES
TheexamplesofSection1.1illustratevisionthatusescognitiveprocesses,geometricmodels,goals, andplans.These highlevelprocessesareveryimportant;ourexamples only weakly illustrate their power andscope.There surely would besomeoverall purpose tofindingships; theremightbecollateral information that therewere submarines, barges, orsmall craft intheharbor, andsoforth. CATscanswouldbeusedwithseveraldiagnosticgoalsinmindandanassociatedmedicalhistory available. Goals andknowledge arehighlevel capabilities that canguidevisualactivities,andavisualsystemshouldbeabletotakeadvantageofthem.
(a) (b)Fig. 1.1 Finding ships inanaerial photograph, (a)Thephotograph; (b)a correspondingmap; (c)thedock areaofthephotograph; (d)registered mapand image,with shiplocation.
2 Ch. 1 Computer Vision
-
Fig. 1.1 (cont.)
Even such elaborated tasks are very special ones and in their way easier tothink about than the commonplace visual perceptions needed to pick up a baby,cross a busy street, or arrive at a party and quickly " s e e " who you know, yourhost's taste in decor, and how long the festivities have been going on. All thesetasks require judgment and large amounts of knowledge of objects in the world,how they look, and how they behave. Such highlevel powers are so well integrated into"vision" astobeeffectively inseparable.
Knowledge and goals areonly part ofthe visionstory. Vision requires manylowlevelcapabilities weoften take for granted; for example, our ability to extractintrinsicimagesof "lightness," "color," and "range." We perceive black as blackin a complex scene even when the lighting is such that some black patches arereflecting more light than somewhite patches. Similarly, perceived colorsare notrelated simply to the wavelengths of reflected light; if they were, wewould consciously seecolors changing with illumination. Stereo fusion (stereopsis) isa lowlevel facility basictoshortrange threedimensional perception.
An important lowlevel capability isobjectperception:for our purposes it doesnot reallymatter ifthistalent isinnate, ("hardwired"), orifitisdevelopmental oreven learned ("compiledin"). Thefact remains thatmature biologicalvisionsystems are specialized and tuned to deal with the relevant objects in their environ
Sec.1.2 HighLevel and LowLevel Capabilities 3
-
Fig. 1.2 Findingakidney inacomputeraided lomographicscan, (a)Onesliceofscan data;(b)prototypekidneymodel; (c)model fitting; (d) resultingkidneyand spinalcord instances.
ments.Further specializationcanoften belearned, butitisbuiltonbasicimmutableassumptionsabouttheworldwhichunderliethevisionsystem.
Abasicsortofobject recognitioncapability isthe"figure/ground" discrimination that separates objects from the "background." Other basicorganizationalpredispositions are revealed by the "Gestalt laws" ofclustering, which demonstrate rules our vision systems use to form simple arrays of stimuli into morecoherent spatial groups.Adramatic example ofspecialized object perception for
4 Ch. 7 Computer Vision
-
humanbeingsisrevealed inour"face recognition"capability,whichseemstooccupyalargevolumeofbrainmatter.Geometricvisualillusionsaremoresurprisingsymptomsofnonintuitive processingthat isperformed byourvisionsystems,eitherforsomedirectpurposeorasasideeffect ofits specializedarchitecture.Someother illusions clearly reflect the intervention of highlevel knowledge. For instance, thefamiliar "Neckercubereversal"isgrounded inour threedimensionalmodelsforcubes.
Lowlevelprocessingcapabilitiesareelusive;theyareunconscious,andtheyare not well connected to other systems that allow direct introspection. For instance,ourvisualmemoryfor imagesisquiteimpressive,yetourquantitativeverbal descriptions of images are relatively primitive. The biological visual"hardware"hasbeendeveloped, honed,andspecialized overavery longperiod.However, itsorganization and functionality isnot well understood except atextreme levelsofdetailandgeneralitythe behaviorofsmallsetsofcatormonkeycorticalcellsandthebehaviorofhumanbeingsinpsychophysicalexperiments.
Computer vision isthus immediately faced with averydifficult problem; itmust reinvent, withgeneral digital hardware, themost basicandyet inaccessibletalentsofspecialized, parallel,andpartlyanalogbiologicalvisual systems.Figure1.3maygiveafeelingfor theproblem;itshowstwovisualrenditionsofafamiliarsubject.Theinsetisanormalimage,therestisaplotoftheintensities (graylevels)intheimageagainst theimagecoordinates.Inotherwords,itdisplays information
F1 ' ' ' '
/ " " ' ' . ' ? I k. ; . , ' %
. . . \ ^ r : ; ' ; ''" f.'.K:i #*&'" ' ' > iK
' , . : \ v.. !' Vs
.' " "V. .
Fig. 1.3 Tworepresentationsofanimage.Oneisdirectlyaccessibletoourlowlevelprocesses;theotherisnot.
Sec.7.2 HighLevel and LowLevel Capabilities 5
-
with "height" instead of "light." No information is lost, and the display isanimagelikeobject,butwedonotimmediatelyseeafaceinit.Theinitialrepresentation the computer has to work with is no better; it is typically just an array ofnumbers from which human beings could extract visual information only verypainfully. Skipping the lowlevel processing we take for granted turns normallyeffortless perceptionintoaverydifficult puzzle.
Computer vision is vitally concerned with both lowlevel or "early processing" issuesandwith thehighlevel and "cognitive" useofknowledge.Wheredoesvision leaveoff and reasoning andmotivation begin?Wedonot knowprecisely, butwefirmly believe (andhope toshow) that powerful, cooperating, richrepresentationsoftheworldareneeded for anyadvancedvisionsystem.Withoutthem, nosystemcanderiverelevantand invariant information from input thatisbesetwitheverchanging lighting and viewpoint, unimportant shape differences,noise,andother largebutirrelevantvariations.Theserepresentationscanremovesomecomputational loadbypredictingorassumingstructurefor thevisualworld.
Finally, ifasystem is to be successful in avariety of tasks, itneeds some"metalevel"capabilities:itmustbeabletomodelandreasonaboutitsowngoalsand capabilities, and the success of its approaches. These complex and relatedmodelsmustbemanipulatedbycognitiveliketechniques,eventhoughintrospectivelytheperceptualprocessdoesnotalways"feel" touslikecognition.
ComputerVisionSystems
1.3 ARANGEOFREPRESENTATIONS
Visualperception istherelationofvisualinputtopreviouslyexistingmodelsoftheworld. There is a large representational gap between the image and the models("ideas," "concepts")whichexplain,describe,orabstracttheimageinformation.Tobridgethatgap,computervisionsystemsusuallyhavea(looselyordered)rangeofrepresentationsconnecting theinputandthe"output" (afinaldescription,decision,orinterpretation). Computer vision then involvesthedesignofthese intermediate representations and the implementation ofalgorithms toconstruct themandrelatethemtooneanother.
We broadly categorize the representations into four parts (Fig. 1.4) whichcorrespondwith theorganization of thisvolume. Withineachpart theremaybeseverallayersofrepresentation, orseveralcooperatingrepresentations.Althoughthesetsofrepresentations arelooselyordered from "early"and "lowlevel"signalsto "late" and "cognitive'''' symbols, the actualflowofeffort and informationbetween them isnot unidirectional.Ofcourse, not all levels need to be used ineach computer vision application; some may be skipped, or the processingmaystartpartwayupthehierarchyorendpartwaydownit.
Generalizedimages (PartI) are iconic (imagelike) andanalogicalrepresentations of the input data. Images may initially arise from several technologies.
6 Ch. 1 Computer Vision
-
Fig. 1.4 Examplesofthefour categoriesofrepresentation usedincomputervision, (a)Iconic;(b)segmented; (c)geometric;(d)relational.
Domainindependent processing can produce other iconic representations moredirectly useful to later processing, such as arrays of edgeelements (grayleveldiscontinuities).Intrinsicimagescansometimesbeproducedatthisleveltheyrevealphysical propertiesofthe imagedscene (suchassurface orientations, range,or surface reflectance). Often parallelprocessingcanproducegeneralized images.More generally, most "lowlevel" processes can be implemented with parallelcomputation.
Segmentedimages(PartII)areformed from thegeneralizedimagebygathering its elements into sets likely to be associated with meaningful objects in thescene.Forinstance,segmentingasceneofplanarpolyhedra (blocks)might resultin a set of edgesegmentscorresponding to polyhedral edges, or a set of two
Sec.1.3 A Range ol Representations 7
-
Garage ( )Bushes ( ) Grass ( j House ( jSky ( j Tree1 ( j Tree2
6 Side2
Fig. 1.4 (cont.)dimensional regionsin theimagecorresponding topolyhedral faces.Inproducingthesegmentedimage,knowledgeabouttheparticulardomainatissuebeginstobeimportantbothtosavecomputationandtoovercomeproblemsofnoiseandinadequatedata.Intheplanarpolyhedralexample,ithelpstoknowbeforehand that thelinesegmentsmustbestraight. Textureandmotionareknowntobeveryimportantin segmentation, and arecurrently topicsofactive research; knowledge in theseareasisdevelopingveryfast.
Geometricrepresentations (PartIII) areusedtocapturetheallimportant idea8 Ch. 7 Computer Vision
-
of twodimensional and threedimensional shape. Quantifying shape isas importantasitisdifficult. Thesegeometricrepresentationsmustbepowerfulenoughtosupport complex and general processing, such as "simulation" of the effects oflightingandmotion. Geometricstructures areasuseful for encoding previouslyacquiredknowledgeastheyareforrerepresentingcurrentvisualinput.Computervisionrequiressomebasicmathematics;Appendix 1hasabriefselectionofusefultechniques.
Relationalmodels(PartIV)arecomplexassemblagesofrepresentationsusedto support sophisticated highlevel processing. An important tool inknowledgerepresentation issemanticnets,whichcanbeusedsimplyasanorganizationalconvenience or asa formalism in their own right. Highlevel processing often usespriorknowledgeandmodelsacquired priortoaperceptualexperience.Thebasicmodeofprocessing turns from constructingrepresentations tomatchingthem.Athighlevels,propositionalrepresentationsbecomemoreimportant. Theyaremadeupofassertionsthataretrueorfalsewithrespecttoamodel,andaremanipulatedby rules of inference.Inferencelike techniques can also be used forplanning,which models situations and actions through time,and thus must reason abouttemporally varying and hypothetical worlds.The higher the level of representation, themoremarked isthe flowof control(direction ofattention, allocationofeffort) downwardtolowerlevels,andthegreaterthetendencyofalgorithmstoexhibit serialprocessing. These issues of control are basic to complex informationprocessingingeneralandcomputervisioninparticular;Appendix2outlinessomespecificcontrolmechanisms.
Figure 1.5 illustratesthe looseclassification ofthe four categoriesintoanalogicalandpropositional representations.Weconsidergeneralizedandsegmentedimagesaswellasgeometricstructures tobeanalogicalmodels.Analogicalmodelscapture directly the relevant characteristics of the represented objects, and aremanipulatedandinterrogated bysimulationlikeprocesses.Relationalmodelsaregenerallyamixof analogical and propositional representations.Wedevelop thisdistinctioninmoredetailinChapter10.
1.4 THEROLEOFCOMPUTERS
Thecomputerisacongenialtoolforresearchintovisualperception. Computers are versatile and forgiving experimental subjects.They are easily
andethicallyreconfigurable, notmessy,andtheirworkingscanbescrutinizedinthefinestdetail.
Computers aredemanding critics.Imprecision, vagueness,andoversightsarenottoleratedinthecomputerimplementationofatheory.
Computers offer newmetaphors for perceptual psychology (also neurology,linguistics,andphilosophy).Processesandentitiesfromcomputerscienceprovide powerful and influential conceptual tools for thinking about perceptionandcognition.
Computers cangiveprecisemeasurements of the amount ofprocessing they
Sec. 1.4 TheRole of Computers 9
-
Knowledgebase
AiJogicalmodels
Analogicalprooositional
models
Generalizedimage
Geometricstructures
Relationalstructures
Fig. 1.5 The knowledge baseofacomplex computer vision system, showingfour basicrepresentationalcategories.
-
Table1.1
EXAMPLESOFIMAGE ANALYSIS TASKS
Domain Objects Modality Tasks Knowledge Sources
Robotics Threedimensional Light Identify or describeoutdoor scenes Xrays objects in sceneindoor scenes Industrial tasks
Mechanical parts LightStructured light
Aerial images Terrain Light Improved imagesBuildings,etc. Infrared Resource analyses
Radar Weather predictionSpyingMissileguidanceTactical analysis
Astronomy Stars Light Chemical compositionPlanets Improved images
MedicalMacro
Micro
Chemistry
Body organs
CellsProtein chainsChromosomes
Molecules
Neuroanatomy Neurons
Physics Particle tracks
XraysUltrasoundIsotopesHeatElectronmicroscopyLight
Electron densities
LightElectronmicroscopy
Light
Diagnosis of abnormalities
Operative and treatmentplanning
Pathology, cytologyKaryotyping
Analysis ofmolecularcompositions
Determination ofspatial orientation
Find newparticlesIdentify tracks
Models of objectsModelsofthereflection of
light from objects
MapsGeometrical models of shapesModels ofimage formation
Geometrical models of shapes
Anatomical modelsModels ofimage formation
Models of shape
Chemical modelsStructured models
Neural connectivity
Atomic physics
-
do.Acomputerimplementationplacesanupperlimitontheamountofcomputationnecessaryforatask.
Computersmaybeusedeithertomimicwhatweunderstandabouthumanperceptualarchitectureandprocesses,ortostrikeout indifferent directionstotrytoachievesimilarendsbydifferentmeans.
Computer modelsmay bejudged either by their efficacy for applications andonthejob performance or by their internal organization, processes, andstructuresthetheorytheyembody.
1.5 COMPUTERVISIONRESEARCHANDAPPLICATIONS
"Pure" computer visionresearchoften dealswithrelatively domainindependentconsiderations.Theresultsareuseful inabroadrangeofcontexts.Almostalwayssuchworkisdemonstrated inoneormoreapplicationsareas,andmoreoften thannotaninitialapplicationproblemmotivatesconsiderationofthegeneralproblem.Applicationsofcomputervisionareexciting,andtheirnumber isgrowingascomputervisionbecomesbetterunderstood.Table1.1givesapartiallistof"classical"andcurrentapplicationsareas.
Within the organization outlined above, this book presents many specificideasandtechniqueswithgeneralapplicability.Itismeanttoprovideenoughbasicknowledgeandtoolstosupportattacksonbothapplicationsandresearchtopics.
12 Ch. 7 Computer Vision
-
GENERALIZEDIMAGES
-
Thefirststep in thevision process isimage formation. Imagesmayarise fromavariety of technologies. For example, most televisionbased systems convertreflected lightintensity intoanelectronicsignalwhich isthendigitized;othersystemsusemoreexoticradiations,suchasxrays, laserlight,ultrasound, andheat.Thenetresultisusuallyanarrayofsamplesofsomekindofenergy.
Thevision systemmaybeentirely passive, takingasinput adigitized imagefrom amicrowave or infrared sensor, satellite scanner, oraplanetary probe, butmore likely involvessomekindofactive imaging. Automated activeimagingsystemsmay control the direction and resolution ofsensors, or regulate and directtheir own light sources.The light source itself may have special properties andstructuredesignedtorevealthenatureofthethreedimensionalworld;anexampleistouseaplaneoflightthatfallsonthesceneinastripewhosestructureiscloselyrelated to the structure ofopaqueobjects.Range data for the scenemaybeprovided by stereo (two images), but also by triangulation using lightstripe techniquesorby"spotranging"usinglaserlight.Asinglehardwaredevicemaydeliverrange and multispectral reflectivity ("color") information. The imageformingdevicemayalsoperform variousotheroperations. Forexample,itmayautomaticallysmoothorenhancetheimageorvaryitsresolution.
Thegeneralizedimageisasetofrelated imagelikeentitiesforthescene.Thissetmayinclude related imagesfrom severalmodalities,butmayalsoinclude theresultsofsignificant processingthatcanextract intrinsicimages.Anintrinsicimageisan"image,"orarray,ofrepresentationsofanimportant physicalquantitysuchassurface orientation, occludingcontours,velocity,orrange.Objectcolor,whichis a different entity from sensed redgreenblue wavelengths, is an intrinsicquality.Theseintrinsicphysicalqualitiesareextremelyuseful; theycanberelatedtophysicalobjects farmoreeasilythan theoriginalinput values,whichreveal thephysicalparametersonlyindirectly.Anintrinsicimageisamajorsteptowardsceneunderstandingandusuallyrepresentssignificantandinterestingcomputations.
PartI Generalized Images
-
Theinformation necessary tocomputeanintrinsicimageiscontained intheinput imageitself, andisextractedby"inverting" thetransformation wroughtbytheimagingprocess, thereflection ofradiationfrom thescene,andotherphysicalprocesses.Anexampleisthefusion oftwostereoimagestoyieldanintrinsicrangeimage.Many algorithms to recover intrinsic imagescan berealized withparallelimplementations,mirroringcomputations thatmaytakeplaceinthelowerneurologicallevelsofbiologicalimageprocessing.
Allofthecomputationslistedabovebenefit from theideaofresolutionpyramids.Apyramidisageneralizedimagedatastructureconsistingofthesameimageatseveralsuccessively increasinglevelsofresolution.Astheresolution increases,moresamplesare required torepresent the increased information andhence thesuccessive levels are larger, making the entire structure look like a pyramid.Pyramidsallowtheintroductionofmanydifferent coarsetofine imageresolutionalgorithmswhicharevastlymoreefficient thantheirsinglelevel, highresolutiononlycounterparts.
Part I Generalized Images 15
-
ImageFormation 2
2.1 IMAGES
Imageformation occurswhenasensorregisters radiation thathasinteracted withphysicalobjects. Section2.2dealswithmathematical modelsofimagesand imageformation.Section2.3describesseveralspecificimageformation technologies.
Themathematicalmodelofimaginghasseveraldifferent components.1. Animagefunctionisthefundamental abstractionofanimage.2. Ageometricalmodeldescribeshowthreedimensionsareprojectedintotwo.3. A radiometricalmodelshows how the imaging geometry, light sources, and
reflectancepropertiesofobjectsaffect thelightmeasurementatthesensor.4. A spatialfrequency modeldescribeshowspatial variationsoftheimagemay
becharacterizedinatransformdomain.5. Acolormodeldescribeshowdifferent spectralmeasurementsarerelatedtoim
agecolors.6. Adigitizingmodeldescribestheprocessofobtainingdiscretesamples.
This material forms the basis of much imageprocessing work and isdeveloped inmuch more detail elsewhere, e.g., [Rosenfeld andKak 1976;Pratt1978].Ourgoalsarenotthoseofimageprocessing,sowelimitourdiscussiontoasummaryoftheessentials.
The wide range of possible sources of samples and the resulting differentimplications for later processingmotivate our overviewofspecific imaging techniques.Ourgoalisnottoprovideanexhaustivecatalog,butrathertogiveanideaof the range of techniques available.Very different analysis techniques may beneeded depending on how the imagewasformed. Twoexamples illustrate this
17
-
point.Iftheimageisformedbyreflected lightintensity,asinaphotograph,theimage records both light from primary light sources and (more usually) the lightreflected offphysicalsurfaces.WeshowinChapter 3that incertaincaseswecanuse these kinds of images together with knowledge about physics to derive theorientation ofthesurfaces. If, on theotherhand, the imageisacomputed tomogramofthehumanbody (discussed inSection2.3.4), theimagerepresents tissuedensityofinternalorgans.Hereorientationcalculationsareirrelevant,butgeneralsegmentation techniques ofChapters 4and 5 (theagglomeration ofneighboringsamplesofsimilardensityintounitsrepresentingorgans)areappropriate.
2.2 IMAGEMODEL
Sophisticated imagemodels of astatistical flavor are useful in image processing[Jan1981].Hereweareconcernedwithmoregeometricalconsiderations.
2.2.1 Image Functions
Animagefunctionisamathematical representationofanimage. Generally,animagefunction isavectorvaluedfunction ofasmallnumberofarguments.Aspecialcaseofthe imagefunction isthe digital (discrete) imagefunction,where theargumentstoandvalueofthefunction areallintegers.Different imagefunctionsmaybeusedtorepresent thesameimage,dependingonwhichofitscharacteristicsareimportant. For instance, a camera produces an image on blackandwhitefilmwhich isusually thought of asarealvalued function (whosevalue could be thedensityofthephotographic negative) oftworealvalued arguments,oneforeachof twospatial dimensions.However, at avery small scale (the order of thefilmgrain)thenegativebasicallyhasonlytwodensities,"opaque"and "transparent."
Most images are presented by functions of two spatial variablesfix) =fix, y), wherefix, y) isthebrightnessofthegrayleveloftheimageataspatialcoordinate ix,y). Amultispectral imagefisavectorvalued function withcomponents if].. ,f). Onespecialmultispectral imageisacolorimageinwhich,for example, the components measure the brightness values of each of threewavelengths,thatis,
/CO J red( x ) >Jblue( x ) '/green ( x )
Timevarying images fix,t) have an added temporal argument. For specialthreedimensional images,x= ix,y, z). Usually,both thedomainandrange of/arebounded.
An important part of the formation process is theconversion of the imagerepresentation from acontinuous function toadiscrete function; weneed somewayofdescribingtheimagesassamplesatdiscretepoints.Themathematical toolweshalluseisthedeltafunction.
Formally,thedeltafunctionmaybedefinedby
18 Ch. 2 Image Formation
-
8(x)= 0whenx^0ooWhenx=0 ( 2 1 )
J8(x)dx= 1Ifsomecareisexercised, thedeltafunctionmaybeinterpretedasthelimitofasetoffunctions:
8(x) = lim 80c)n
wb^re
8Ax) /I ifUI
-
Fig. 2.1 Ageometric camera model.
Thecaseforx' istreatedsimilarly:
xif
(2.5)fzTheprojected imagehasz= 0everywhere.However, projecting awaythezcomponent is best considered a separate transformation; the projective transform isusuallythoughttodistortthezcomponentjustasitdoesthexand^.Perspectivedistortionthusmaps(x,y, z) to
be',y\ z')= fx fy hfz' fz' fz (2.6)Theperspective transformation yieldsorthographicprojectionasaspecialcase
whentheviewpointisthepointatinfinityinthezdirection.Thenallobjectsareprojectedontotheviewingplanewithnodistortionoftheirxand^coordinates.
The perspective distortion yields a threedimensional object that has been"pushed out ofshape"; it ismore shrunken the farther it isfrom theviewpoint.The zcomponent isnot available directly from a twodimensional image, beingidentically equal tozero. In ourmodel, however, thedistorted zcomponent hasinformation about the distance of imaged points from the viewpoint.When thisdistorted object isprojected orthographically onto the imageplane, theresult isaperspectivepicture.Thus,toachievetheeffect ofrailroadtracksappearingtocometogether in the distance, the perspective distortion transforms the tracksso thatthey docometogether (atapointatinfinity)!Thesimpleorthographic projectionthat projects away the z component unsurprisingly preserves this distortion.Severalpropertiesoftheperspectivetransform areofinterestandareinvestigatedfurther inAppendix1.
BinocularImagingBasicbinocular imaginggeometry isshown in Fig.2.3a. For simplicity,we
20 Ch. 2 Image Formation
-
-4r^
(*, / ,2
\x',y
(b)
Fig. 2.2 (a)Cameramodelequivalent tothatofFig.2.1; (b)definitionofterms.
useasystemwithtwoviewpoints.Inthismodel theeyesdonotconverge; theyareaimedinparallelatthepointatinfinity inthezdirection.Thedepth informationaboutapoint isthen encoded only byitsdifferent positions {disparity) inthetwoimageplanes.
WiththestereoarrangementofFig.2.3,
x, _ jx d)ffz
....
-
x' =0
x=0
x " =0
Fig. 2.3 A nonconvergent binocularimagingsystem.
througheacheye.Thebaselineofthebinocularsystemis2d.Thus(/ z)x' = (x d)f (2.7)(f z)x" = (x +d)f (2.8)
Subtracting (2.7)from (2.8)gives(fz)(x"x') = 2df
or
2= / }df , (2.9)x X
Thus ifpointscanbematched todetermine thedisparity Cx" x') and thebaselineandfocallengthareknown,thezcoordinateissimpletocalculate.
Ifthesystemcanconvergeitsdirectionsofviewtoafinitedistance,convergence anglemayalso be used tocompute depth.The hardest part of extractingdepth information from stereo isthematchingofpointsfor disparity calculations."Lightstriping"isawaytomaintaingeometricsimplicityandalsosimplifymatching(Section2.3.3).2.2.3 Reflectance
TerminologyAbasicaspectofthe imagingprocess isthephysicsofthe reflectance ofob
jects,whichdetermineshowtheir "brightness" inanimagedependson their inherentcharacteristicsandthegeometryoftheimagingsituation.Aclearpresentation of themathematics ofreflectance isgiven in [HornandSjoberg 1978;Horn1977].Light energy flux ismeasured inwatts; "brightness" ismeasured withrespecttoareaandsolidangle. TheradiantintensityIofasourceistheexitantfluxperunitsolidangle:
/ = watts/steradian (2.10)do)
_
f ^Imageplane
22 Ch. 2 Image Formation
-
Here dco isan incremental solidangle.The solidangleofasmall area dAmeasuredperpendicular toaradius /isgivenby
da>= 4* (2.11)r
inunitsofsteradians. (Thetotalsolidangleofasphere is 4ir.)The irradianceisfluxincidentonasurface element dA:
E = ^ watts/meter2 (2.12)dA
and the flux exitant from thesurface isdefined in terms of the radianceL, which isthefluxemitted perunit foreshortened surface areaperunitsolidangle:
L = watts/(meter2 steradian) (2.13)dA cos9do>
where9istheanglebetween thesurface normaland thedirection ofemission.Image irradiancef is the "brightness" of the image atapoint, and is propor
tional toscene radiance.A"graylevel" isaquantizedmeasurement ofimage irradiance. Image irradiance depends on the reflective properties of the imaged surfaces as well as on the illumination characteristics. How a surface reflects lightdepends on its microstructure and physical properties. Surfaces may be matte(dull, flat), specular (mirrorlike), or have more complicated reflectivity characteristics (Section 3.5.1).The reflectancerofasurface isgiven quitegenerally byitsBidirectional Reflectance Distribution Function (BRDF) [Nicodemusetal.1977].The BRDF is the ratio of reflected radiance in the direction towards the viewer totheirradianceinthedirection towardsasmallareaofthesource.
EffectsofGeometryonanImaging SystemLetusnowanalyzeasimpleimageforming systemshown inFig.2.4with the
objective ofshowing how thegray levels are related to the radiance of imaged objects.Following [HornandSjoberg 1978],assume that the imagingdevice isproperly focused; rays originating in the infinitesimal area dA0 on the object's surfaceare projected into some area dAp in the image plane and no rays from other portionsof theobject's surface reach thisareaof the image.The system isassumed tobeanidealone,obeying thelawsofsimplegeometrical optics.
,The energy flux/unit area that impingeson thesensor isdefined tobeEp. Toshow how Ep is related to the scene radiance L, first consider the flux arriving atthe lensfrom asmallsurface area dA0. From (2.13) thisisgivenas
d$> = dA0JLcos9doj (2.14)Thisfluxisassumed toarriveatanareadAp intheimagingplane.Hence the irradianceisgivenby [usingEq. (2.12)]
E, = (2.15)dAp
Now relate dA0 to dAp by equating the respective solid angles as seen from thelens; that is [makinguseofEq. (2.12)],
Sec. 2.2 Image Model 23
-
Fig. 2.4 Geometryofanimageforming system.
, . cosfl , . cos adA0 r2 = dAp Jo Jp
SubstitutingEqs.(2.16)and(2.14)into(2.15)gives
E= cosa fofp
JLdc
(2.16)
(2.17)
Theintegralisover thesolidangleseenbythelens.Inmost instanceswecanassume that L isconstant over this angle and hencecan be removed from the integral.Finally,approximatedoi bytheareaofthelensforeshortened bycosa, thatis, (TT/4)D2 cosa dividedbythedistance/0/cosa squared:
doi = JLD2 cpVa4 fa2
sothatfinally
= 4
D_fp
cos4air L
(2.18)
(2.19)
Theinterestingresultsherearethat (1) theimageirradianceisproportionaltothesceneradianceL,and (2)thefactorofproportionality includesthefourthpowerofthe offaxis angle a. Ideally, an imaging device should be calibrated so that thevariationinsensitivityasafunctionofa isremoved.
2.2.4 SpatialProperties
TheFourierTransformAnimageisaspatiallyvaryingfunction.Onewaytoanalyzespatialvariations
isthedecomposition ofanimagefunction intoasetoforthogonal functions, onesuchset being theFourier (sinusoidal) functions. TheFourier transform maybeusedtotransform theintensity imageintothedomainofspatialfrequency.Forno
24 Ch.2 ImageFormation
-
tational convenienceandintuition,weshallgenerally useasanexamplethecontinuousonedimensionalFouriertransform.Theresultscanreadilybeextendedtothediscretecaseandalsotohigherdimensions [Rosenfeld andKak1976].Intwodimensions we shall denote transform domain coordinates by iu, v).The onedimensionalFouriertransform, denoted CS,isdefinedby
3=[fix)]=Fiu)where
+ 0 0
Fiu) = ffix)expij2irux)dx (2.20)00
wherej =V( l ) . Intuitively, Fourier analysis expresses afunction asasumofsinewavesofdifferent frequency andphase. TheFourier transform hasaninverse~
l[F(u)] =fix). Thisinverseisgivenby
fix) = f F(u) exp(JITTUX) du (2.21)00
Thetransformhasmanyusefulproperties,someofwhicharesummarizedinTable2.1.CommononedimensionalFouriertransformpairsareshowninTable2.2.
The transform Fin) issimplyanotherrepresentation oftheimage function.ItsmeaningcanbeunderstoodbyinterpretingEq. (2.21) foraspecificvalueofx,say x0:
fixo) =JV()exp (j2irux0)du (2.22)Thisequationstatesthataparticularpointintheimagecanberepresentedby
aweightedsumofcomplex exponentials (sinusoidal patterns) atdifferent spatialfrequencies u.Fiu) isthusaweightingfunctionforthedifferent frequencies.Lowspatialfrequenciesaccountfor the"slowly"varyinggraylevelsinanimage,suchas the variation of intensity over a continuous surface. Highfrequency componentsareassociatedwith"quicklyvarying"information, suchasedges.Figure2.5showstheFouriertransformofanimageofrectangles,togetherwiththeeffectsofremovinglowandhighfrequency components.
The Fourier transform is defined above to be a continuous transform.Althoughitmaybeperformed instantlybyoptics,adiscreteversionofit,the "fastFourier transform," isalmostuniversallyusedinimageprocessingandcomputervision.Thisisbecauseoftherelativeversatilityofmanipulating thetransform inthedigitaldomainascomparedtotheopticaldomain.Imageprocessingtexts,e.g.,[Pratt1978;GonzalezandWintz1977]discusstheFFTinsomedetail;wecontentourselveswithanalgorithmforit(Appendix1).
TheConvolutionTheoremConvolution is avery important imageprocessing operation, and is a basic
operation oflinearsystemstheory.Theconvolutionoftwofunctions / andgisafunction hofadisplacementydefinedas
hiy)=f*g = jfix)giyx)dx (2.23)Sec.2.2 ImageModel 2 5
-
Table 2.1
PROPERTIESOFTHE FOURIER TRANSFORM
Spatial Domain FrequencyDomain
fix)gix)
Fiu)=(J[fix)]Giu)=(J[gix)}
(1)
(2)
(3)
(4)
(5)
(6)
(7)
LinearityC\fix) + c2gix)C\,C2scalarsScaling
fiax)Shiftingfix Xo)SymmetryFix)Conjugationf*ix)Convolution
nix) =f*g = J fix')gix x') dx'Differentiationd"fix)dx"
dFiu) + c2Giu)
\a\ [a]
e2wJx
Fiu)
/ ( )
F*i~u)
Fiu)Giu)
i2trju)"Fiu)
Parseval's theorem:
jjfix)\2dx =}\F(Z)\2dtJfix)g*ix) dx=jFi)G*it) d$fix) Fit)
ReaK/?) Real parteven (RE)Imaginarypartodd (10)
Imaginary (I) RO,IERE,IO RRE,IE IRE RERO 10IE IE10 RO
Complexeven (CE) CECO CO
26 Ch.2 ImageFormation
-
Table2.2
FOURIER TRANSFORM PAIRS
f{x)
Rectanglefunction1
2 Rect(x) 2
Trianglefunction1
1 12 2
Gaussian
Unit impulse 8{x)
Unitstep
F(%)
Sinc(g) =
Sine2 (%)
2 27T/|
Sec.2.2 Image Model 27
-
Table 2.2 (cont.)
Comb function
urnill2 6(x nx0
n = oe>
2xQ x0 x0 2x
cos 2iroj0x
sin 27rcjx
2 (f^
t_LLJJ^2 H
i [ 5 ( ? w 0 ) +5(?+ w0)
5 / [ 8 (? w 0 ) +5{f+w0)]
ImF
Intuitively,onefunction is"sweptpast" (inonedimension)or"rubbedover" (intwodimensions) theother.Thevalueoftheconvolutionatanydisplacement istheintegraloftheproductofthe (relativelydisplaced) function values.Onecommonphenomenon that iswellexpressed byaconvolution istheformation ofanimagebyan optical system.The system (sayacamera) has a"pointspread function,"whichistheimageofasinglepoint. (Inlinearsystemstheory,thisisthe "impulseresponse,"orresponsetoadeltafunction input.)Theidealpointspread functionis,ofcourse,apoint.Atypicalpointspread function isatwodimensionalGaussian spatial distribution of intensities, but may include such phenomena asdiffraction rings.Inanyevent, ifthecamera ismodeledasalinearsystem (ignor
Fig. 2.5 (on facing page) (a) An image,fix, y). (b) A rotated version of (a), filtered toenhance high spatialfrequencies, (c) Similar to (b), but filtered to enhance low spatial frequencies, (d), (e), and (f) show the logarithm of the power spectrum of (a), (b), and (c).The power spectrum is the logsquare modulus of the Fouriertransform F(u, v).Considered inpolarcoordinates (p,0), pointsofsmallp correspond tolowspatial frequencies("slowlyvarying" intensities), largep to high spatial frequencies contributed by "fast" variations such as stepedges.The power at (p,9) isdetermined bythe amount of intensity variation at thefrequency p occurring at theangle0.
28 Ch. 2 Image Formation
-
(b)
(d)
;
: ,
(f) n H B H o B B H H H H H
29
-
ing the added complexity that the pointspread function usually varies over thefield ofview),theimageistheconvolutionofthepointspreadfunctionandtheinputsignal.Thepointspread function isrubbedover theperfect input image, thusblurringit.
Convolution isalsoagood model for the application ofmany other linearoperators,suchaslinedetecting templates.Itcanbeusedinanotherguise (calledcorrelation) toperformmatchingoperations (Chapter3)whichdetectinstancesofsubimagesorfeatures inanimage.
Inthespatialdomain, theobviousimplementationoftheconvolutionoperation involvesashiftmultiplyintegrate operationwhich ishard todo efficiently.However,multiplicationandconvolutionare"transform pairs,"sothatthecalculation of the convolution in one domain (say the spatial) is simplified byfirstFourier transforming totheother (thefrequency) domain,performing amultiplication,andthentransforming back.
Theconvolution of/and gin thespatialdomain isequivalent to the pointwiseproductofFandGinthefrequency domain,
cSif*g) =FG (2.24)Weshallshowthisinamanner similar to [DudaandHart 1973]. Firstweprovetheshifttheorem.IftheFouriertransformof/Gc) isFiu), definedas
F(u) =Jf(x) exp [ j2ir(ux)]dx (2.25)X
then
5 [fix a)] = ffixa) exp [ j2ir(ux)]dx (2.26)X
changingvariablessothatx' =x aanddx=dx'= ff(x') exp {j2ir[u(x' +a)])dx' (2.27)
x'
Nowexp[ jliruix' +a)] = exp ( jlrrua) exp( JITTUX'), where thefirsttermisaconstant.Thismeansthat
3 [fix a)] =exp( jl7rua)Fiu) (shift theorem)Nowwearereadytoshowthat,'y[f(x)*g(x)]= Fiu)Giu).
(Sif*g) =j{j fix)giy x)} exp ( jlituy) dx dy (2.28)y x
= ffix){fgiy x) exp ( jlTTuy) dy)dx (2.29)x y
Recognizingthatthetermsinbracesrepresent ,fy[giy x)] andapplyingtheshifttheorem,weobtain
cS(f*g) =J/Gc)exp ( j2irux)G(u) dx (2.30)X
= Fiu)Giu) (2.31)
30 Ch. 2 Image Formation
-
2.2.5 Color
Notallimagesaremonochromatic;infact,applicationsusingmultispectralimagesarebecomingincreasinglycommon (Section2.3.2).Further, humanbeingsintuitivelyfeel thatcolorisanimportantpartoftheirvisualexperience,andisusefulorevennecessaryforpowerful visualprocessing intherealworld. Colorvisionprovidesahost ofresearch issues, both forpsychology andcomputer vision. Webriefly discuss two aspects ofcolor vision: color spaces andcolor perception.Severalmodelsofthehumanvisualsystemnotonlyincludecolorbuthaveprovenuseful inapplications [Granrath1981].
ColorSpacesColorspacesareawayoforganizingthecolorsperceivedbyhumanbeings.It
happens thatweightedcombinations ofstimuliatthree principalwavelengthsaresufficient todefinealmostallthecolorsweperceive.Thesewavelengthsformanaturalbasisorcoordinatesystemfromwhichthecolormeasurementprocesscanbedescribed.Color perception isnot related inasimplewaytocolor measurement,however.
Colorisaperceptual phenomenon related tohuman response todifferentwavelengthsinthevisibleelectromagneticspectrum [400 (blue) to700nanometers(red);ananometer (nm) is109meter].The sensation ofcolor arises fromthesensitivitiesofthree typesofneurochemical sensors inthe retina tothe visiblespectrum. The relative responseofthesesensorsisshown inFig.2.6.Note thateach sensor respondstoarangeofwavelengths.The illumination source hasitsown spectral compositionf{k) which ismodified bythe reflecting surface.Letr(k) bethisreflectance function.ThenthemeasurementRproducedbythe "red"sensorisgivenby
R=jf(k)r(k)hR(k) dk (2.32)So thesensor output is actually theintegral of three different wavelengthdependentcomponents:thesource/ , thesurfacereflectance /,andthesensorhR.
Surprisingly,onlyweightedcombinationsofthreedeltafunction approximationstothedifferentf(k)h 00 , thatis,8(A/?), 8(AG),and8(XB),arenecessaryto
i 1
1 P
1
a ^ /
400 500 600Wavelength,nm
700 Fig. 2.6 Spectral responseof humancolor sensors.
Sec.2.2 Image Model 31
-
producethesensationofnearlyallthecolors.Thisresultisdisplayedonachromaticitydiagram.Suchadiagramisobtainedbyfirstnormalizingthethreesensormeasurements:
Rr =
b=
R +G+BG
R +G+BB
R +G+B
(2.33)
andthenplottingperceivedcolorasafunction ofanytwo(usuallyredandgreen).Chromaticity explicitly ignores intensity orbrightness; it isasection through thethreedimensionalcolorspace(Fig.2.7).Thechoiceof(XR,kG, \B) = (410,530,650)nmmaximizes the realizablecolors, but somecolors still cannot berealizedsincetheywouldrequirenegativevaluesforsomeof/, g,andb.
Anothermore intuitivewayofvisualizingthepossiblecolorsfrom the RGBspaceistoviewthesemeasurementsasEuclideancoordinates.Hereanycolorcanbevisualizedasapoint intheunitcube.Othercoordinate systemsareuseful fordifferent applications;computergraphicshasprovedastrongstimulusfor investigationofdifferent colorspacebases.
ColorPerceptionColor perception is complex, but the essential step is a transformation of
threeinputintensitymeasurements intoanotherbasis.Thecoordinatesofthenew
(a) (b)Fig. 2.7 (a) An artist's conception of the chromaticity diagramsee color insert; (b) amore useful depiction. Spectral colors range along the curved boundary; the straight boundary isthelineofpurples.
32 Ch. 2 Image Formation
-
basisaremoredirectlyrelatedtohumancolorjudgments.Although theRGBbasisisgoodfor theacquisitionordisplayofcolor infor
mation, itisnotaparticularlygoodbasistoexplain theperceptionofcolors. Human vision systemscanmakegoodjudgments about the relative surface reflectancer(A)despitedifferent illuminatingwavelengths;thisreflectanceseemstobewhatwemeanbysurfacecolor.
Another important feature ofthecolorbasisisrevealedbyanabilitytoperceive in "black andwhite," effectively deriving intensity information from thecolormeasurements. From an evolutionary point ofview,wemight expect thatcolorperceptioninanimalswouldbecompatiblewithpreexistingnoncolorperceptualmechanisms.
These twoneedstheneed tomakegoodcolorjudgments and theneed toretainanduseintensity informationimply thatweuseatransformed, nonRGBbasisforcolorspace.Ofthedifferent basesinuseforcolorvision,allarevariationsonthistheme:Intensityformsonedimensionandcolorisatwodimensionalsubspace.Thedifferences ariseinhowthecolorsubspaceisdescribed.Wecategorizesuchbasesintotwogroups.
1.Intensity/Saturation/Hue(IHS).Inthisbasis,wecomputeintensityasintensity:=R +G+B (2.34)
The saturation measures the lackofwhiteness in thecolor. Colorssuchas "fireengine" redand "grass"greenaresaturated; pastels (e.g.,pinksandpaleblues)aredesaturated.SaturationcanbecomputedfromRGBcoordinatesbytheformula[TenenbaumandWeyl1975]
, 3min (R, G,B) ,~~cxsaturation:= 1
: ^L (2.35)
intensityHue is roughly proportional to the average wavelength of the color. It can bedefinedusingRGBbythefollowingprogram fragment:
hue:= cos l MR G) +(R B))} (2.36)^VCR G)2+ (R B){G BY
UB > Gthenhue:= 2pihueThe IHS basis transforms the RGB basis in the following way. Thinking of thecolor cube, thediagonal from theorigin to (1, 1, 1)becomes the intensityaxis.Saturation isthedistanceofapointfrom thataxisandhueistheanglewithregardtothepointaboutthataxisfromsomereference (Fig.2.8).
This basis isessentially that used byartists [Munsell 1939],who term saturation chroma. Also, this basishasbeen used ingraphics [Smith 1978;JobloveandGreenberg1978].
One problemwith the IHSbasis,particularly asdefined by (2.34) through(2.36),isthatitcontainsessentialsingularitieswhereitisimpossibletodefine thecolor in aconsistent manner [Kender 1976].For example, hue hasan essentialsingularityforallvaluesof(R, G,B), whereR = G=B. Thismeansthatspecialcaremustbetakeninalgorithmsthatusehue.
2.Opponentprocesses.TheopponentprocessbasisusesCartesianrather than
Sec. 2.2 Image Model 33
-
(a) (b)
8 An IHSColorSpace, (a)Crosssection atone intensity; (b) crosssectionatonehueseecolorinserts.
cylindrical coordinates for the color subspace, and wasfirstproposed byHering[Teevan and Birney 1961].The simplest form ofbasis isalinear transformationfrom R, G,B coordinates. The new coordinates are termed "i? G","Bl r " , a n d " ^ Bk ":
\R GBl YW Bk
Theadvocatesofthisrepresentation, suchas[HurvichandJameson 1957],theorizethat thisbasishasneurological correlatesand isinfact thewayhuman beingsrepresent ("name") colors.Forexample,inthisbasisitmakessensetotalkabouta "reddish blue" but not a"reddish green." Practical opponent process modelsusuallyhavemorecomplexweightsinthetransformmatrixtoaccountforpsychophysical data.Somestartling experiments [Land 1977]showour ability tomakecorrectcolorjudgmentsevenwhen theilluminationconsistsofonlytwoprincipalwavelengths.The opponent process, at the level atwhich wehavedeveloped it,doesnotdemonstratehowsuchjudgmentsaremade,butdoesshowhowstimulusatonlytwowavelengthswillproject intothecolorsubspace.Readersinterestedinthedetailsofthetheoryshouldconsultthereferences.
Commercial television transmissionneedsanintensity,or"W Bk" component forblackandwhitetelevisionsetswhilestillspanningthecolorspace.TheNational Television Systems Committee (NTSC) uses a "YIQ" basis extractedfromRGBvia
1 2 1 R]1 1 2 G1 1 1 B
Ch. 2 Image Formation
-
0.60 0.28 0.320.21 0.52 0.310.30 0.59 0.11
Thisbasisisaweightedformof(/, Q, Y) = ("Rcyan,""magentagreen, " "WBk")
2.2.6 Digital Images
The digitalimageswithwhichcomputer visiondealsare represented bymvectordiscretevalued imagefunctions / ( x ) , usuallyofone, two,three,or four dimensions.
Usually m = 1,and both the domain and range offix) are discrete. Thedomain of / is finite, usually a rectangle, and the range of / is positive andbounded:0
-
.
;
Fig. 2.9 Usingdifferent numbersofsamples, (a)A'= 16;(b)N = 32; (c)A' =64; (d)N= 128;(e)A'= 256;(f) N= 512.
Ch.2 ImageFormatio
-
Table 2.3
NUMBER OF 8BIT BYTESOF STORAGE FORVARIOUS VALUES OF NAND M
Nm
32 64 128 256 512
1 128 512 2,048 8,192 32,7682 256 1,024 4,096 16,384 65,5363 512 2,048 8,192 32,768 131,0724 512 2,048 8,192 32,768 131,0725 1,024 4,096 16,384 65,536 262,1446 1,024 4,096 16,384 65,536 262,1447 1,024 4,096 16,384 65,536 262,1448 1,024 4,096 16,384 65,536 262,144
computer vision tohave tobalance thedesire for increased resolution (bothgrayscaleandspatial) against itscost. Betterdatacanoften makealgorithmseasier towrite,but asmallamount ofdatacanmakeprocessingmoreefficient. Ofcourse,the image domain, choice of algorithms, and image characteristics all heavilyinfluencethechoiceofresolutions.
TesselationsandDistanceMetricsAlthoughthespatialsamplesfor/(x) canberepresentedaspoints,itismore
satisfying totheintuitionandacloserapproximation totheacquisitionprocesstothink of these samplesasfinitesizedcellsofconstant graylevel partitioning theimage.Thesecellsare termedpixels,anacronymforpictureelements. Thepatternintowhich theplaneisdivided iscalled its tesselation.Themostcommon regulartesselationsoftheplaneareshowninFig.2.11.
Although rectangular tesselations arealmost universally used in computervision, they have a structural problem known as the "connectivity paradox."Givenapixelinarectangular tesselation,howshouldwedefinethepixelstowhichit is connected? Two common ways arefourconnectivity and eightconnectivity,showninFig.2.12.
However,eachoftheseschemeshascomplications.ConsiderFig.2.12c,consisting of a black object with a hole on a white background. If we use fourconnectedness, the figure consists of four disconnected pieces, yet the hole isseparated from the "outside" background. Alternatively, if we use eightconnectedness, thefigureisoneconnectedpiece,yettheholeisnowconnected totheoutside.Thisparadoxposescomplicationsformanygeometricalgorithms.Triangular and hexagonal tesselations donot suffer from connectivity difficulties (ifweusethreeconnectedness for triangles);however, distancecanbemore difficulttocomputeonthesearraysthanforrectangulararrays.
Thedistancebetween twopixelsinanimageisanimportantmeasurethatisfundamental tomanyalgorithms.Ingeneral,adistanced\sametric.Thatis,
Sec. 2.2 ImageModel 37
-
Fig. 2.10 Usingdifferent numbers ofbitsper sample, (a) m= 1; (b) m=2; (c)m = 4; (d) m= 8.
(1) d(x, y)=0i f fx= y(2) d(x, y)=d(y,x)(3) d(x, y)+ /(y, z) >d(\, z)
Forsquarearrayswithunitspacingbetweenpixels,wecanuseanyofthefollowingcommondistancemetrics (Fig.2.13)fortwopixelsx = (x\,yi) and>>=(^J^)
Euclidean:de(x, y) V ( x , x 2 ) 2+ (y]y2)2
Cityblock:
dCb(*>y ) = |jfi*2l'+ \y\y2\
(2.37)
(2.38)
38 Ch.2 ImageFormation
-
(a)
(c)
Chessboard:
Fig. 2.11 Different tesselationsof theimage plane, (a)Rectangular; (b)triangular; (c) hexagonal.
dch(x,y) =max \x\x2l\y\y2\ (2.39)
Other definitions are possible, and all suchmeasures extend tomultiple dimensions.Thetesselationofhigherdimensionalspaceintopixelsusuallyisconfined to(/7dimensional) cubicalpixels.
TheSamplingTheoremConsider the onedimensional "image" shown inFig.2.14.Todigitize this
imageonemustsampletheimagefunction.Thesesampleswillusuallybeseparatedat regular intervalsasshown. Howfarapart should thesesamplesbetoallowreconstruction (toagivenaccuracy) oftheunderlyingcontinuous imagefrom itssamples?Thisquestion isanswered bytheShannonsamplingtheorem.Anexcellent rigorous presentation of the sampling theorem may befound in [RosenfeldandKak 1976].Hereweshallpresent ashortergraphical interpretation using theresultsofTable2.2.Forsimplicityweconsider theimagetobeperiodicinordertoavoid small edgeeffects introduced bythefiniteimagedomain.Amore rigorous
Sec.2.2 Image Model 39
-
wm. f 1^ /// ^
^
^ i^ ^ ^ 7
^ mz^^
(b) (0
Fig. 2.12 Connectivity paradox for rectangular tesselations. (a)Acentral pixeland its4connected neighbors; (b) a pixel and its 8connected neighbors; (c)afigurewithambiguousconnectivity.
23232223
2211122
3323
321233210123 32101232211122 3212332223232
3233
(b)
3333333322222332111233210123321112332222233333333
(0
Fig. 2.13 Equidistant contours fordifferentmetrics.
Fig. 2.14 Onedimensionalimageanditssamples.
treatment, which considers these effects, is given in [Andrews and Hunt1977].
Supposethattheimageissampledwitha"comb"functionofspacingxQ(seeTable2.2).Thenthesampledimagecanbemodeledby
fAx) =f(x)^d(x nx0) (2.40)
wheretheimagefunction modulatesthecombfunction. Equivalently, thiscanbewrittenas
fs(x) = f(nx0)8(x nx0) (2.41)
Therighthand sideofEq. (2.40) istheproductoftwofunctions, sothatproperty
40 Ch. 2 Image Formation
-
(6) inTable2.1isappropriate.TheFouriertransform offs(x) isequaltotheconvolutionofthetransformsofeachofthetwofunctions.Usingthisresultyields
Fsiu) =F(u)*T8(w ) (2.42)
ButfromEq.(2.3),
sothat
* o *o
Fiu) *8(w ~ ) = F(u ) (2.43)X0 XQ
Fs(u) =^Fiu ^) (2.44) *0 ^ o
Therefore, sampling theimagefunctionfix) atintervalsofxoisequivalentinthefrequency domain toreplicatingthetransform of /a t intervalsof . This
XQlimitstherecoveryoffix) from itssampledrepresentation, fs(x). Therearetwobasicsituationstoconsider.Ifthe transform offix) isbandlimitedsuchthat Fiu)= 0for|u|> l/(2x0), thenthereisnooverlapbetweensuccessivereplicationsofFiu) inthefrequency domain. Thisisshownfor thecaseofFig.2.15a,wherewehavearbitrarilyusedatriangularshaped imagetransform toillustratetheeffectsofsampling.Incidentally, note thatfor thistransform Fiu) =F(u) andthatithasno imaginary part; from Table2.2, the onedimensional imagemust alsoberealandeven.NowifF{u) isnotbandlimited, i.e.,thereareu> forwhichFiu)
2*o^ 0,thencomponentsofdifferent replicationsofFiu) willinteracttoproducethecomposite function Fsiu), asshowninFig. 2.15b.Inthefirstcasefix) canberecoveredfromFsiu) bymultiplyingFsiu) byasuitableGiu):
G(u) = 1 lw|
-
Piu)
7x,1
2xn
PAu)
y\\ / \ /
* 0
12xn
(b)
PAu)G(u) > PAu)G(u)
(0
Fig. 2.15 (a) Fiu) bandlimited so that F(u) = 0 for\u\> \/2x0.limitedasin(a),(c)reconstructed transform.
(b) Fiu) not band
theimage,orcanbeincorporatedwithsuchblurring (byintegratingtheimageintensityoverafiniteareaforeachsample).Imageblurringcanbury irrelevantdetails,reducecertainformsofnoise,andalsoreducetheeffectsofaliasing.
2.3 IMAGING DEVICESFORCOMPUTER VISION
Thereisavastarrayofmethodsforobtainingadigitalimageinacomputer.Inthissectionwehaveinmindonly "traditional" imagesproduced byvariousformsofradiationimpingingonasensorafterhavingbeenaffected byphysicalobjects.
Manysensorsarebestmodeledasananalogdevicewhoseresponsemustbedigitizedforcomputer representation.Thetypesofimaging devices possiblearelimitedonlybythetechnical ingenuityoftheirdevelopers;attemptingadefinitive
42 Ch. 2 Image Formation
-
XRayscanner
Fig. 2.16 Imagingdevices (boxes), information structures (rectangles),and processes (circles).
taxonomy isprobably unwise. Figure 2.16 isa flowchart ofdevices, informationstructures,andprocessesaddressedinthisandsucceedingsections.
Whentheimagealreadyexistsinsomeform,orphysicalconsiderationslimitchoiceofimagingtechnology,thechoiceofdigitizingtechnologymaystillbeopen.Mostimagesarecarriedonapermanentmedium,suchasfilm,oratleastareavailable in (essentially) analog form to adigitizing device.Generally, the relevanttechnical characteristics of imaging or digitizing devices should be foremost inmind when a technique is being selected. Such considerations as the signaltonoise ratio ofthe device, its resolution, the speed atwhich itworks,and itsexpenseareimportantissues.
Sec.2.3 Imaging Devices for Computer Vision 43
-
2.3.1 Photographic Imaging
The camera is the most familiar producer of optical images on a permanentmedium.Weshallnotaddressherethemultitudesofstillandmoviecameraoptions;rather,webriefly treatthecharacteristicsofthephotographicfilmandofthedigitizingdevicesthatconverttheimagetomachinereadableform.MoreonthesetopicsiswellpresentedintheReferences.
Photographic (blackandwhite) filmconsistsofanemulsionofsilverhalidecrystalsonafilmbase.(Severalotherlayersareidentifiable,butarenotessentialtoanunderstandingof therelevant propertiesoffilm.)Uponexposure tolight, thesilver halide crystals form developmentcenters, whichare smallgrainsofmetallicsilver.The photographic development process extends the formation ofmetallicsilver totheentire silverhalidecrystal,which thusbecomesabinary ("light" or"no light") detector. Subsequent processing removes undeveloped silverhalide.The resultingfilmnegative isdarkwheremany crystalsweredeveloped and lightwherefewwere.Theresolution ofthefilmisdetermined bythegrainsize,whichdepends on the original halide crystals and on development techniques. Generally,thefasterthefilm(thelesslightneeded toexposeit),thecoarserthegrain.Filmexiststhatissensitivetoinfrared radiation;xrayfilmtypicallyhastwoemulsionlayers,givingitmoregraylevelrangethanthatofnormalfilm.
Arepetitionofthenegativeforming processisusedtoobtainaphotographicprint.Thenegative isprojected onto photographicpaper,which respondsroughlyinthesamewayasthenegative.Mostphotographicprintpapercannotcaptureinoneprint therangeofdensitiesthatcanbepresentinanegative. Positivefilmsdoexistthatdonotrequireprinting;themostcommonexampleiscolorslide film.
Theresponseoffilmtolightisnotcompletely linear.Thephotographicdensityobtained byanegative isdefined asthe logarithm (base 10)oftheratioofincidentlighttotransmittedlight.
D = log10 f
The exposure of a negative dictates (approximately) its response. Exposure isdefined astheenergyperunitarea thatexposed thefilm(in itssensitivespectralrange).Thusexposureistheproductoftheintensityandthetimeofexposure.Thismathematical model of the behavior of the photographic exposure process iscorrect for awideoperating rangeofthefilm,but reciprocityfailureeffects in thefilmkeeponefrom beingablealwaystotradelightlevelforexposuretime.Atverylowlightlevels,longerexposuretimesareneeded thanarepredictedbytheproductrule.
The responseoffilmtolight isusuallyplotted inan"H&Dcurve" (namedforHurterandDriffield), whichplotsdensityversusexposure.TheH&Dcurveoffilmdisplaysmany of its important characteristics. Figure 2.17 exhibits a typicalH&Dcurveforablackandwhite film.
The toeofthecurveisthelowerregionoflowslope.Itexpressesreciprocityfailure and the fact that thefilmhasacertain bias,orfog response,whichdominatesitsbehavioratthelowestexposurelevels.Asonewouldexpect, thereisanupper limittothedensityofthefilm,attainedwhenamaximumnumberofsilver
Ch. 2 Image Formation
-
Log(exposure) Fig. 2.17 TypicalH & D curve.
halide crystals are rendered developable. Increasing exposure beyond thismaximum levelhaslittleeffect, accountingfor the shoulderintheH&Dcurve,oritsflattenedupperend.
Inbetweenthetoeandshoulder, thereistypicallyalinearoperatingregionofthe curve. Highcontrast films are those with high slope (traditionally calledgamma); theyresponddramatically tosmallchangesinexposure.Ahighcontrastfilmmayhaveagammabetweenabout1.5and10.Filmswithgammasofapproximately 10areused ingraphicsarts tocopy linedrawings.Generalpurposefilmshavegammasofabout0.5to1.0.
The resolution ofageneralfilm isabout 40 lines/mm, whichmeans thata1400x 1400imagemaybedigitizedfrom a35mmslide.Atanygreater samplingfrequency, theindividualfilmgrainswilloccupymorethanapixel,andtheresolutionwillthusbegrainlimited.
ImageDigitizers(Scanners)Accuracyand speedarethemainconsiderations inconverting animageon
filmintodigitalform.Accuracyhastwoaspects:spatialresolution,looselythelevelofimagespatialdetailtowhichthedigitizercanrespond,andgraylevelresolution,defined generally as the range ofdensities or reflectances towhich the digitizerrespondsandhowfinelyitdividestherange. Speedisalsoimportantbecauseusuallymanydataareinvolved;imagesof1millionsamplesarecommonplace.
Digitizers broadly take two forms: mechanical and "flying spot." In amechanicaldigitizer,thefilmandasensingassemblyaremechanicallytransportedpastoneanotherwhilereadingsaremade.Inaflyingspotdigitizer, thefilmandsensorarestatic.Whatmovesisthe"flying spot,"whichisapointoflightonthefaceofacathoderay tube,oralaser beamdirectedbymirrors.Inalldigitizersaverynarrowbeamoflightisdirectedthroughthefilmorontotheprintataknowncoordinatepoint.Thelighttransmittanceorreflectance ismeasured, transformedfrom analogtodigitalform, andmadeavailabletothecomputer through interfacingelectronics.Thelocationonthemediumwheredensityisbeingmeasuredmayalsobetransmittedwitheachreading,butitisusuallydeterminedbyrelativeoffsetfrom positions transmitted lessfrequently. For example, a"new scan line" impulse istransmittedforTVoutput; thepositionalongthecurrentscanlineyieldsanxposition,andthenumberofscanlinesyieldsavposition.
Sec.2.3 Imaging Devices for Computer Vision 45
2.0
1.0
nn
-
Themechanicalscannersaremostlyoftwotypes,flatbedanddrum.Inaflatbeddigitizer, thefilmislaidflatonasurface overwhich the lightsourceand thesensor (usually a very accurate photoelectric cell) are transported in a rasterfashion. Inadrumdigitizer, thefilmisfastened toacirculardrumwhichrevolvesasthesensorandlightsourcearetransporteddownthedrumparalleltoitsaxisofrotation.
Color mechanical digitizers also exist; they work by using colored filters,effectively extracting inthreescansthree "color overlays"whichwhen superimposedwould yield the original color image.Extracting some "composite" colorsignalwithonereadingpresentstechnicalproblemsandwouldbedifficult todoasaccurately.
SatelliteImageryLANDSATandERTS (EarthResourcesTechnologySatellites) havesimilar
scannerswhichproduceimagesof2340x33807bitpixelsinfour spectralbands,coveringanareaofi00x 100nauticalmiles.Thescannerismechanical,scanningsix horizontal scan lines at a time; the rotation of the earth accounts for theadvancementofthescanintheverticaldirection.
Asetoffour imagesisshowninFig.2.18.Thefour spectralbandsarenumbered 4,5,6,and7.Band4 [0.5to0.6/xm (green)] accentuates sedimentladenwaterandshallowwater,band5[0.6to0.7/xm (red)]emphasizescultural featuressuchasroadsandcities,band6[0.7to0.8fxm (nearinfrared)] emphasizesvegetationandaccentuates thecontrast between landandwater, band7 [0.8to 1.1/xm(near infrared)] islike band 6except that it isbetter at penetrating atmospherichaze.
TheLANDSAT imagesareavailableatnominal costfrom theU.S.government (TheEROSDataCenter, SiouxFalls,SouthDakota 57198).Theyare furnished on tape, and cover the entire surface of the earth (often the buyer hasachoiceoftheamountofcloudcover).Theseimagesformahugedatabaseofmultispectral imagery, useful for landuseandgeologicalstudies;they furnish somethingofanimageanalysischallenge,sinceonesatellitecanproducesome6billionbitsofimagedataperday.
TelevisionImagingTelevisioncamerasareappealingdevicesforcomputervisionapplicationsfor
several reasons. For one thing, the image is immediate; the camera can showeventsastheyhappen.Foranother, theimageisalreadyinelectrical,ifnotdigitalform. "Television camera" is basically a nontechnical term, because manydifferent technologiesproducevideosignalsconformingtothestandardssetbytheFCCandNTSC.Camerasexistwithawidevarietyoftechnicalspecifications.
Usually,TVcamerashaveassociated electronicswhichscananentire "picture" atatime.Thisoperation isclosely related tobroadcastand receiver standards,and ismoreoriented tohuman viewing than tocomputer vision.An entireimage (ofsome525scanlinesintheUnitedStates) iscalledaframe, andconsistsoftwofields,eachmadeupofalternatescanlinesfrom theframe.Thesefieldsaregeneratedandtransmitted sequentiallybythecameraelectronics.Thetransmittedimageisthusinterlaced,withalloddnumbered scanlinesbeing"painted" on the
46 Ch.2 ImageFormation
-
(c) (d)Fig. 2.18 Thestra