[17].handbook of computer vision algorithms in image algebra

Search Tips

Advanced Search

Handbook of Computer Vision Algorithms in Image Algebraby Gerhard X. Ritter; Joseph N. WilsonCRC Press, CRC Press LLCISBN: 0849326362 Pub Date: 05/01/96

Search this book:

Preface

Acknowledgments

Chapter 1—Image Algebra1.1. Introduction

1.2. Point Sets

1.3. Value Sets

1.4. Images

1.5. Templates

1.6. Recursive Templates

1.7. Neighborhoods

1.8. The p-Product

1.9. References

Chapter 2—Image Enhancement Techniques2.1. Introduction

2.2. Averaging of Multiple Images

2.3. Local Averaging

2.4. Variable Local Averaging

2.5. Iterative Conditional Local Averaging

2.6. Max-Min Sharpening Transform

2.7. Smoothing Binary Images by Association

2.8. Median Filter

2.9. Unsharp Masking

http://www.earthweb.com/

http://corpitk.earthweb.com/

http://corpitk.earthweb.com/content/corp.html

http://corpitk.earthweb.com/search/

http://corpitk.earthweb.com/faq/faq.html

http://corpitk.earthweb.com/sitemap.html

http://corpitk.earthweb.com/contactus.html

http://corpitk.earthweb.com/search/search-tips.html




2.10. Local Area Contrast Enhancement

2.11. Histogram Equalization

2.12. Histogram Modification

2.13. Lowpass Filtering

2.14. Highpass Filtering

2.15. References

Chapter 3—Edge Detection and Boundary Finding Techniques3.1. Introduction

3.2. Binary Image Boundaries

3.3. Edge Enhancement by Discrete Differencing

3.4. Roberts Edge Detector

3.5. Prewitt Edge Detector

3.6. Sobel Edge Detector

3.7. Wallis Logarithmic Edge Detection

3.8. Frei-Chen Edge and Line Detection

3.9. Kirsch Edge Detector

3.10. Directional Edge Detection

3.11. Product of the Difference of Averages

3.12. Crack Edge Detection

3.13. Local Edge Detection in Three-Dimensional Images

3.14. Hierarchical Edge Detection

3.15. Edge Detection Using K-Forms

3.16. Hueckel Edge Operator

3.17. Divide-and-Conquer Boundary Detection

3.18. Edge Following as Dynamic Programming

3.19. References

Chapter 4—Thresholding Techniques4.1. Introduction

4.2. Global Thresholding

4.3. Semithresholding

4.4. Multilevel Thresholding

4.5. Variable Thresholding

4.6. Threshold Selection Using Mean and Standard Deviation

4.7. Threshold Selection by Maximizing Between-Class Variance

4.8. Threshold Selection Using a Simple Image Statistic

4.9. References

Chapter 5—Thining and Skeletonizing5.1. Introduction

5.2. Pavlidis Thinning Algorithm

5.3. Medial Axis Transform (MAT)

5.4. Distance Transforms

5.5. Zhang-Suen Skeletonizing

5.6. Zhang-Suen Transform — Modified to Preserve Homotopy

5.7. Thinning Edge Magnitude Images

5.8. References

Chapter 6—Connected Component Algorithms6.1. Introduction

6.2. Component Labeling for Binary Images

6.3. Labeling Components with Sequential Labels

6.4. Counting Connected Components by Shrinking

6.5. Pruning of Connected Components

6.6. Hole Filling

6.7. References

Chapter 7—Morphological Transforms and Techniques7.1. Introduction

7.2. Basic Morphological Operations: Boolean Dilations and Erosions

7.3. Opening and Closing

7.4. Salt and Pepper Noise Removal

7.5. The Hit-and-Miss Transform

7.6. Gray Value Dilations, Erosions, Openings, and Closings

7.7. The Rolling Ball Algorithm

7.8. References

Chapter 8—Linear Image Transforms8.1. Introduction

8.2. Fourier Transform

8.3. Centering the Fourier Transform

8.4. Fast Fourier Transform

8.5. Discrete Cosine Transform

8.6. Walsh Transform

8.7. The Haar Wavelet Transform

8.8. Daubechies Wavelet Transforms

8.9. References

Chapter 9—Pattern Matching and Shape Detection9.1. Introduction

9.2. Pattern Matching Using Correlation

9.3. Pattern Matching in the Frequency Domain

9.4. Rotation Invariant Pattern Matching

9.5. Rotation and Scale Invariant Pattern Matching

9.6. Line Detection Using the Hough Transform

9.7. Detecting Ellipses Using the Hough Transform

9.8. Generalized Hough Algorithm for Shape Detection

9.9. References

Chapter 10—Image Features and Descriptors10.1. Introduction

10.2. Area and Perimeter

10.3. Euler Number

10.4. Chain Code Extraction and Correlation

10.5. Region Adjacency

10.6. Inclusion Relation

10.7. Quadtree Extraction

10.8. Position, Orientation, and Symmetry

10.9. Region Description Using Moments

10.10. Histogram

10.11. Cumulative Histogram

10.12. Texture Descriptors: Gray Level Spatial Dependence Statistics

10.13. References

Chapter 11—Neural Networks and Cellular Automata11.1. Introduction

11.2. Hopfield Neural Network

11.3. Bidirectional Associative Memory (BAM)

11.4. Hamming Net

11.5. Single-Layer Perceptron (SLP)

11.6. Multilayer Perceptron (MLP)

11.7. Cellular Automata and Life

11.8. Solving Mazes Using Cellular Automata

11.9. References

Appendix A

Index

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rightsreserved. Reproduction whole or in part in any form or medium without express written permission of

EarthWeb is prohibited. Read EarthWeb's privacy statement.







http://corpitk.earthweb.com/products.html


http://corpitk.earthweb.com/aboutus.html

http://www.earthweb.com/about_us/privacy.html

http://www.itmarketer.com/


http://corpitk.earthweb.com/agreement.html

http://corpitk.earthweb.com/copyright.html

http://www.earthweb.com/about_us/perm.html


Search Tips

Advanced Search


Search this book:

Table of Contents

PrefaceThe aim of this book is to acquaint engineers, scientists, and students with the basic concepts of image algebraand its use in the concise representation of computer vision algorithms. In order to achieve this goal weprovide a brief survey of commonly used computer vision algorithms that we believe represents a core ofknowledge that all computer vision practitioners should have. This survey is not meant to be an encyclopedicsummary of computer vision techniques as it is impossible to do justice to the scope and depth of the rapidlyexpanding field of computer vision.

The arrangement of the book is such that it can serve as a reference for computer vision algorithm developersin general as well as for algorithm developers using the image algebra C++ object library, iac++.1 Thetechniques and algorithms presented in a given chapter follow a progression of increasing abstractness. Eachtechnique is introduced by way of a brief discussion of its purpose and methodology. Since the intent of thistext is to train the practitioner in formulating his algorithms and ideas in the succinct mathematical languageprovided by image algebra, an effort has been made to provide the precise mathematical formulation of eachmethodology. Thus, we suspect that practicing engineers and scientists will find this presentation somewhatmore practical and perhaps a bit less esoteric than those found in research publications or various textbooksparaphrasing these publications.

1The iac++ library supports the use of image algebra in the C++ programming language and is available foranonymous ftp from ftp://ftp.cis.ufl.edu/pub/src/ia/.

Chapter 1 provides a short introduction to field of image algebra. Chapters 2-11 are devoted to particulartechniques commonly used in computer vision algorithm development, ranging from early processingtechniques to such higher level topics as image descriptors and artificial neural networks. Although thechapters on techniques are most naturally studied in succession, they are not tightly interdependent and can bestudied according to the reader’s particular interest. In the Appendix we present iac++ computer programsof some of the techniques surveyed in this book. These programs reflect the image algebra pseudocodepresented in the chapters and serve as examples of how image algebra pseudocode can be converted intoefficient computer programs.












ftp://ftp.cis.ufl.edu/pub/src/ia/

Table of Contents




















Search Tips

Advanced Search


Search this book:

Table of Contents

AcknowledgmentsWe wish to take this opportunity to express our thanks to our current and former students who have, invarious ways, assisted in the preparation of this text. In particular, we wish to extend our appreciation to Dr.Paul Gader, Dr. Jennifer Davidson, Dr. Hongchi Shi, Ms. Brigitte Pracht, Mr. Mark Schmalz, Mr. VenugopalSubramaniam, Mr. Mike Rowlee, Dr. Dong Li, Dr. Huixia Zhu, Ms. Chuanxue Wang, Mr. Jaime Zapata, andMr. Liang-Ming Chen. We are most deeply indebted to Dr. David Patching who assisted in the preparation ofthe text and contributed to the material by developing examples that enhanced the algorithmic exposition.Special thanks are due to Mr. Ralph Jackson, who skillfully implemented many of the algorithms herein, andto Mr. Robert Forsman, the primary implementor of the iac++ library.

We also want to express our gratitude to the Air Force Wright Laboratory for their encouragement andcontinuous support of image algebra research and development. This book would not have been writtenwithout the vision and support provided by numerous scientists of the Wright Laboratory at Eglin Air ForceBase in Florida. These supporters include Dr. Lawrence Ankeney who started it all, Dr. Sam Lambert whochampioned the image algebra project since its inception, Mr. Nell Urquhart our first program manager, Ms.Karen Norris, and most especially Dr. Patrick Coffield who persuaded us to turn a technical report oncomputer vision algorithms in image algebra into this book.

Last but not least we would like to thank Dr. Robert Lyjack of ERIM for his friendship and enthusiasticsupport during the formative stages of image algebra.

Notation

The tables presented here provide a brief explantation of the notation used throughout this document. Thereader is referred to Ritter [1] for a comprehensive treatise covering the mathematics of image algebra.

LogicSymbol Explanationp Ò q “p implies q.” If p is true, then q is true.p Ô q “p if and only if q,” which means that p and q are logically equivalent.












iff “if and only if”¬ “not”� “there exists”

“there does not exist”

� “for each”s.t. “such that”

Sets Theoretic Notation and OperationsSymbol ExplanationX, Y, Z Uppercase characters represent arbitrary sets.x, y, z Lowercase characters represent elements of an arbitrary set.X, Y, Z Bold, uppercase characters are used to represent point sets.x, y, z Bold, lowercase characters are used to represent points, i.e., elements of point sets.

The set = {0, 1, 2, 3, ...}.

The set of integers, positive integers, and negative integers.

The set = {0, 1,..., n - 1}.

The set = {1, 2,..., n}.

The set = {-n+1,..., -1, 0, 1,..., n - 1}.The set of real numbers, positive real numbers, negative real numbers, and positivereal numbers including 0.

The set of complex numbers.

An arbitrary set of values.

The set unioned with {�}.

The set unioned with {�}.

The set unioned with {-�,�}.� The empty set (the set that has no elements).2X The power set of X (the set of all subsets of X).� “is an element of” “is not an element of”4 “is a subset of”

Union

X * Y = {z : z � X or z � Y}

Let be a family of sets indexed by an indexing set ›. = {x : x � X»for at least one » � ›}

= X1 * X2 * ... * Xn

= {x : x � Xi for some i � }

X Y Intersection

X ) Y = {z : z � X and z � Y}

Let be a family of sets indexed by an indexing set ›. = {x : x � X»for all » � ›}

= X1 ) X2 ) ... ) Xn

= {x : x � Xi for all i � }

X × Y Cartesian productX × Y {(x, y) : x � X, y � Y}

= {(x1,x2,...,xn) : xi � Xi}

= {(x1,x2,x3,...) : xi � Xi}

The Cartesian product of n copies of , i.e., .X \ Y Set difference

Let X and Y be subsets of some universal set U, X \ Y = {x � X : x Y}.X2 Complement

X2 = U \ X, where U is the universal set that contains X.card(X) The cardinality of the set X.choice(X) A function that randomly selects an element from the set X.

Point and Point Set OperationsSymbol Explanation

x + y If x, y � , then x + y = (x1 + y1,..., xn + yn)

x - y If x, y � , then x - y = (x1 - y1,..., xn - yn)

x · y If x, y � , then x · y = (x1y1,..., xnyn)

x/y If x, y � , then x/y = (x1/y1,..., xn/yn)

x ¦ y If x, y � , then x ¦ y = (x1 ¦ y1,..., xn ¦ yn)

x ¥ y If x, y � , then x ¥ y = (x1 ¥ y1,..., xn ¥ yn)

x ³ y In general, if x, y � , and = (x1³y1,..., xn³yn)

k³x If k � and x � and , then k³x = (k³x1,..., k³xn)

x"y If x, y � , then x"y = x1y1 + x2y2 + ··· + xnyn

x × y If x, y � , then x × y = (x2y3 - x3y2, x3y1 - x1y3, x1y2 - x2y1)

If x � and y � then = (x1,..., xn, y1,..., ym)

-x If x � , then -x = (-x1,..., -xn)

�x If x � , then If x � , then �x = (�x1,..., �xn)

x� If x � , then x� = (x1�,..., xn�)

[x] If x � , then [x] = ([x1],..., [xn])

pi(x) If x = (x1, x2,..., xn) � , then pi (x) = xi

£x If x � , then £x = x1 + x2 + ··· + xn

x If x � , then x = x1x2 ··· xn

¦x If x � , then ¦x = x1 ¦ x2 ¦ ··· ¦ xn

¥x If x � , then ¥x = x1 ¥ x2 ¥ ··· ¥ xn

||x||2 If x � , then ||x||2 =

||x||1 If x � , then ||x||1 = |x1| + |x2| + ··· + |xn|

||x||� If x � , then ||x||� = |x1| ¦ |x2| ¦ ··· ¦ |xn|

dim(x) If x � , then dim(x) = n

X + Y If X, Y , then X + Y = {x + y : x � X and y � Y}

X - Y If X, Y , then X - Y = {x - y : x � X and y � Y}

X + p If X , then X + p = {x + p : x � X}

X - p If X , then X - p = {x - p : x � X}

X * Y If X, Y , then X * Y = {z : z � X or z � Y}

X\Y If X, Y , then X\Y = {z : z � X and z Y}

X ” Y If X, Y , then X ” Y = {z : z � X * Y and z X ) Y}

X × Y If X, Y , then X × Y = {(x, y) : x � X and y � Y}

-X If X , then -X = {-x : x � X}

If X , then = {z : z � and z X}sup(X) If X , then sup(X) = the supremum of X. If X = {x1, x2,..., xn }, then sup(X) = x1

¦ x2 ¦ ... ¦ xn

X For a point set X with total order , x0 = X Ô x x0, �x � X \ {x0}

inf(X) If X , then inf(X) = the infimum of X . If X = {x1, x2,..., xn}, , then sup(X) = x1

¥ x2 ¥ ... ¥ xn

X For a point set X with total order , x0 = X Ô x0 x, �x � X \ {x0}

choice(X) If X then, choice(X) � X (randomly chosen element)

card(X) If X , then card(X) = the cardinality of X

Morphology

In following table A, B, D, and E denote subsets of .

Symbol Explanation

A* The reflection of A across the origin 0 = (0, 0, ... 0) � .

A2 The complement of A; i.e., A2 = {x � : x A}.Ab Ab = {a + b : a � A}

A × B Minkowski addition is defined as A × B = {a + b : a � A, b � B}. (Section 7.2)A/B Minkowski subtraction is defined as A/B = (A2 × B*)2. (Section 7.2)

A B The opening of A by B is denoted A B and is defined by A B = (A/B) × B.(Section 7.3)

A " B The closing of A by B is denoted A " B and is defined by A " B = (A × B)/B. (Section7.3)

A C Let C = (D, E) be an ordered pair of structuring elements. The hit-and-miss transform

of the set A is given by A C = {p : Dp 4 A and Ep 4 A2}. (Section 7.5)

Functions and Scalar OperationsSymbol Explanation

f : X ’ Y f is a function from X into Y.domain(f) The domain of the function f : X ’ Y is the set X.range(f) The range of the function f : X ’ Y is the set {f (x) : x � X}.f-1 The inverse of the function f.YX The set of all functions from X into Y, i.e., if f � YX, then f : X ’ Y.f|A Given a function f : X ’ Y and a subset A 4 X, the restriction of f to A, f|A : A ’ Y, is

defined by f|A(a) = f(a) for a � A.

f|gGiven: f : A ’ Y and g : B ’ Y, the extension of f to g is defined by

.

g f Given two functions f : X ’ Y and g : Y ’ Z, the composition g f : X ’ Z is defined by(g f)(x) = g(f (x)), for every x � X.

f + g Let f and g be real or complex-valued functions, then (f + g)(x) = f(x) + g(x).f · g Let f and g be real or complex-valued functions, then (f · g)(x) = f(x) · g(x).k · f Let f be a real or complex-valued function, and k be a real or complex number, then f �

, (k · f)(x) = k · (f (x)).|f| |f|(x) = |f(x)|, where f is a real (or complex)-valued function, and |f(x)| denotes the

absolute value (or magnitude) of f(x).1X The identity function 1X : X ’ X is given by 1X(x) = x.

The projection function pj onto the jth coordinate is defined by pj(x1,...,xj,...,xn) = xj.

card(X) The cardinality of the set X.choice(X) A function which randomly selects an element from the set X.

x ¦ y For x, y � , x ¦ y is the maximum of x and y.

x ¥ y For x, y � , x ¥ y is the minimun of x and y.�x For x � the ceiling function �x returns the smallest integer that is greater than or

equal to x.x� For x � the floor function x� returns the largest integer that is less than or equal to x.[x] For x � the round function returns the nearest integer to x. If there are two such

integers it yields the integer with greater magnitude.x mod y For x, y � , x mod y = r if there exists k, r � with r < y such that x = yk + r.

ÇS(x)The characteristic function ÇS is defined by .

Images and Image OperationsSymbol Explanationa, b, c Bold, lowercase characters are used to represent images. Image variables will usually

be chosen from the beginning of the alphabet.

a � The image a is an -valued image on X. The set is called the value set of a and Xthe spatial domain of a.

1 � Let be a set with unit 1. Then 1 denotes an image, all of whose pixel values are 1.

0 � Let be a set with zero 0. Then 0 denotes an image, all of whose pixel values are 0.

a|Z The domain restriction of a � to a subset Z of X is defined by a|Z = a ) (Z × ).

a||S The range restriction of a � to the subset S 4 is defined by a||S = a ) (X × S).The double-bar notation is used to focus attention on the fact that the restriction is

applied to the second coordinate of a 4 X × .

a|(Z,S) If a � , Z 4 X, and S 4 , then the restriction of a to Z and S is defined as a|(Z,S) =a ) (Z × S).

a|b Let X and Y be subsets of the same topological space. The extension of a � to b �

is defined by .

(a|b), (a1|a2| ···, |an)Row concatenation of images a and b, respectively the row concatenation of imagesa1, a2,..., an.

Column concatenation of images a and b.

f(a) If a � and f : ’ Y, then the image f(a) � YX is given by f a, i.e., f(a) = {(x, c(x)): c(x) = f(a(x)), x � X}.

a f If f : Y ’ X and a � , the induced image a f � is defined by a f = {(y,a(f(y))) : y � Y}.

a ³ b If ³ is a binary operation on , then an induced operation on can be defined. Let

a, b � ; the induced operation is given by a ³ b = {(x, c(x)) : c(x) = a(x) ³ b(x), x �X}.

k ³ a Let k � , a � , and ³ be a binary operation on . An induced scalar operation onimages is defined by k ³ a = {(x, c(x)) : c(x) = k ³ a(x),x � X}.

ab Let a, b � ; ab = {(x, c(x)) : c(x) = a(x)b(x), x � X}.

logbaLet a, b � logba = {(x, c(x)) : c(x) = logb(x)a(x), x � X}.

a* Pointwise complex conjugate of image a, a* (x) = (a(x))*.

“a “a denotes reduction by a generic reduce operation .

The following four items are specific examples of the global reduce operation. Each assumes a � and X ={x1, x2,..., xn}.

= a(x1) + a(x2) + ··· + a(xn)

= a(x1) · a(x2) ····· a(xn)

= a(x1) ¦ a(x2) ¦ ··· ¦ a(xn)

= a(x1) ¥ a(x2) ¥ ··· ¥ a(xn)

a " bDot product, a " b = £(a · b) = (a(x) · b(x)).

ã Complementation of a set-valued image a.ac Complementation of a Boolean image a.a2 Transpose of image a.

Templates and Template OperationsSymbol Explanations, t, u Bold, lowercase characters are used to represent templates. Usually characters from the

middle of the alphabet are used as template variables.

t � A template is an image whose pixel values are images. In particular, an -valued

template from Y to X is a function t : Y ’ . Thus, t � and t is an

-valued image on Y.

ty

Let t � . For each y � Y, ty = t(y). The image ty � is given by ty = {(x, ty

(x)) : x � X}.S(ty)

If and t � , then the support of t is denoted by S(ty) and isdefined by S(ty) = {x � X : ty(x) ` 0}.

S�(ty) If t � , then S�(ty) = {x � X : ty(x) ` �}.

S-�(ty) If t � , then S-�(ty) = {x � X : ty(x) ` -�}.

S±�(ty) If t � , then S±�(ty) = {x � X : ty(x) ` ±�}.

t(p) A parameterized -valued template from Y to X with parameters in P is a function of

the form t : P ’ .t2

Let t � . The transpose t2 � is defined as .

Image-Template Operations

In the table below, X is a finite subset of .

Symbol Explanation

a tLet ( , ³, ) be a semiring and a � , t � , then the generic right product

of a with t is defined as a .

t aWith the conditions above, except that now t � , the generic left product of a

with t is defined as .

a tLet Y 4 , a � , and t � , where . The right linearproduct (or convolution) is defined as

.

t aWith the conditions above, except that t � , the left linear product (or

convolution) is defined as .

a tFor a � and t � , the right additive maximum is defined by

.

t aFor a � and t � , the left additive maximum is defined by

.

a tFor a � and t � , the right additive minimum is defined by

.

t aFor a � and t � , the left additive minimum is defined by

.

a tFor a � and t � , the right multiplicative maximum is defined by

.

t aFor a � and t � , the left multiplicative maximum is defined by

.

a tFor a � and t � , the right multiplicative minimum is defined by

.

t aFor a � and t � , the left multiplicative minimum is defined by

.

Neighborhoods and Neighborhood OperationsSymbol ExplanationM, N Italic uppercase characters are used to denote neighborhoods.

A neighborhood is an image whose pixel values are sets of points. In particular, aneighborhood from Y to X is a function N : Y ’ 2X.

N(p) A parameterized neighborhood from Y to X with parameters in P is a function of the

form N : P ’ .N2

Let N � , the transpose N2 � is defined as N2(x) = {y � Y : x � N (y)} thatis, x � N(y) iff y � N2(x).

N1 • N2 The dilation of N1 by N2 is defined by N(y) = (N1(y) + (p - y)).

Image-Neighborhood Operations

In the table below, X is a finite subset of .

Symbol Explanation

a N Given a � and N � , and reduce operation , the generic

right reduction of a with N is defined as (a N)(x) = .

N a With the conditions above, except that now N � , the generic left reduction of a

with t is defined as (N a)(x) = (a N2)(x).

a N Given a � , and the image average function , yielding the

average of its image argument. (a N)(x) = a(a|N(x)).

a N Given a � , and the image median function , yielding the

average of its image argument, (a N)(x) = m(a|N(x)).

Matrix and Vector Operations

In the table below, A and B represent matrices.

Symbol ExplanationA* The conjugate of matrix A.A2 The transpose of matrix A.A × B, AB The matrix product of matrices A and B.A — B The tensor product of matrices A and B.

The p-product of matrices A and B.

The dual p-product of matrices A and B, defined by .

References1 G. Ritter, “Image algebra with applications.” Unpublished manuscript, available via anonymous ftpfrom ftp://ftp.cis.ufl.edu/pub/src/ia/documents, 1994.

Dedication

To our brothers, Friedrich Karl and Scott Winfield

Table of Contents




ftp://ftp.cis.ufl.edu/pub/src/ia/documents

















Search Tips

Advanced Search


Search this book:

Previous Table of Contents Next

Chapter 1Image Algebra

1.1. Introduction

Since the field of image algebra is a recent development it will be instructive to provide some backgroundinformation. In the broad sense, image algebra is a mathematical theory concerned with the transformationand analysis of images. Although much emphasis is focused on the analysis and transformation of digitalimages, the main goal is the establishment of a comprehensive and unifying theory of image transformations,image analysis, and image understanding in the discrete as well as the continuous domain [1].

The idea of establishing a unifying theory for the various concepts and operations encountered in image andsignal processing is not new. Over thirty years ago, Unger proposed that many algorithms for imageprocessing and image analysis could be implemented in parallel using cellular array computers [2]. Thesecellular array computers were inspired by the work of von Neumann in the 1950s [3, 4]. Realization of vonNeumann’s cellular array machines was made possible with the advent of VLSI technology. NASA’smassively parallel processor or MPP and the CLIP series of computers developed by Duff and his colleaguesrepresent the classic embodiment of von Neumann’s original automaton [5, 6, 7, 8, 9]. A more general classof cellular array computers are pyramids and Thinking Machines Corporation’s Connection Machines [10, 11,12]. In an abstract sense, the various versions of Connection Machines are universal cellular automatons withan additional mechanism added for non-local communication.

Many operations performed by these cellular array machines can be expressed in terms of simple elementaryoperations. These elementary operations create a mathematical basis for the theoretical formalism capable ofexpressing a large number of algorithms for image processing and analysis. In fact, a common thread amongdesigners of parallel image processing architectures is the belief that large classes of image transformationscan be described by a small set of standard rules that induce these architectures. This belief led to the creationof mathematical formalisms that were used to aid in the design of special-purpose parallel architectures.Matheron and Serra’s Texture Analyzer [13] ERIM’s (Environmental Research Institute of Michigan)Cytocomputer [14, 15, 16], and Martin Marietta’s GAPP [17, 18, 19] are examples of this approach.

The formalism associated with these cellular architectures is that of pixel neighborhood arithmetic and












mathematical morphology. Mathematical morphology is the part of image processing concerned with imagefiltering and analysis by structuring elements. It grew out of the early work of Minkowski and Hadwiger [20,21, 22], and entered the modern era through the work of Matheron and Serra of the Ecole des Mines inFontainebleau, France [23, 24, 25, 26]. Matheron and Serra not only formulated the modern concepts ofmorphological image transformations, but also designed and built the Texture Analyzer System. Since thoseearly days, morphological operations have been applied from low-level, to intermediate, to high-level visionproblems. Among some recent research papers on morphological image processing are Crimmins and Brown[27], Haralick et al. [28, 29], Maragos and Schafer [30, 31, 32], Davidson [33, 34], Dougherty [35], Goutsias[36, 37], and Koskinen and Astola [38].

Serra and Sternberg were the first to unify morphological concepts and methods into a coherent algebraictheory specifically designed for image processing and image analysis. Sternberg was also the first to use theterm “image algebra” [39, 40]. In the mid 1980s, Maragos introduced a new theory unifying a large class oflinear and nonlinear systems under the theory of mathematical morphology [41]. More recently, Davidsoncompleted the mathematical foundation of mathematical morphology by formulating its embedding into thelattice algebra known as Mini-Max algebra [42, 43]. However, despite these profound accomplishments,morphological methods have some well-known limitations. For example, such fairly common imageprocessing techniques as feature extraction based on convolution, Fourier-like transformations, chain coding,histogram equalization transforms, image rotation, and image registration and rectification are — with theexception of a few simple cases — either extremely difficult or impossible to express in terms ofmorphological operations. The failure of a morphologically based image algebra to express a fairlystraightforward U.S. government-furnished FLIR (forward-looking infrared) algorithm was demonstrated byMiller of Perkin-Elmer [44].

The failure of an image algebra based solely on morphological operations to provide a universal imageprocessing algebra is due to its set-theoretic formulation, which rests on the Minkowski addition andsubtraction of sets [22]. These operations ignore the linear domain, transformations between differentdomains (spaces of different sizes and dimensionality), and transformations between different value sets(algebraic structures), e.g., sets consisting of real, complex, or vector valued numbers. The image algebradiscussed in this text includes these concepts and extends the morphological operations [1].

The development of image algebra grew out of a need, by the U.S. Air Force Systems Command, for acommon image-processing language. Defense contractors do not use a standardized, mathematically rigorousand efficient structure that is specifically designed for image manipulation. Documentation by contractors ofalgorithms for image processing and rationale underlying algorithm design is often accomplished via worddescription or analogies that are extremely cumbersome and often ambiguous. The result of these ad hocapproaches has been a proliferation of nonstandard notation and increased research and development cost. Inresponse to this chaotic situation, the Air Force Armament Laboratory (AFATL — now known as WrightLaboratory MNGA) of the Air Force Systems Command, in conjunction with the Defense AdvancedResearch Project Agency (DARPA — now known as the Advanced Research Project Agency or ARPA),supported the early development of image algebra with the intent that the fully developed structure wouldsubsequently form the basis of a common image-processing language. The goal of AFATL was thedevelopment of a complete, unified algebraic structure that provides a common mathematical environment forimage-processing algorithm development, optimization, comparison, coding, and performance evaluation.The development of this structure proved highly successful, capable of fulfilling the tasks set forth by thegovernment, and is now commonly known as image algebra.





















Search Tips

Advanced Search


Search this book:


Because of the goals set by the government, the theory of image algebra provides for a language which, ifproperly implemented as a standard image processing environment, can greatly reduce research anddevelopment costs. Since the foundation of this language is purely mathematical and independent of anyfuture computer architecture or language, the longevity of an image algebra standard is assured. Furthermore,savings due to commonality of language and increased productivity could dwarf any reasonable initialinvestment for adapting image algebra as a standard environment for image processing.

Although commonality of language and cost savings are two major reasons for considering image algebra as astandard language for image processing, there exists a multitude of other reasons for desiring the broadacceptance of image algebra as a component of all image processing development systems. Premier amongthese is the predictable influence of an image algebra standard on future image processing technology. In this,it can be compared to the influence on scientific reasoning and the advancement of science due to thereplacement of the myriad of different number systems (e.g., Roman, Syrian, Hebrew, Egyptian, Chinese,etc.) by the now common Indo-Arabic notation. Additional benefits provided by the use of image algebra are

• The elemental image algebra operations are small in number, translucent, simple, and provide amethod of transforming images that is easily learned and used;

• Image algebra operations and operands provide the capability of expressing all image-to-imagetransformations;

• Theorems governing image algebra make computer programs based on image algebra notationamenable to both machine dependent and machine independent optimization techniques;

• The algebraic notation provides a deeper understanding of image manipulation operations due toconciseness and brevity of code and is capable of suggesting new techniques;

• The notational adaptability to programming languages allows the substitution of extremely short andconcise image algebra expressions for equivalent blocks of code, and therefore increases programmerproductivity;

• Image algebra provides a rich mathematical structure that can be exploited to relate image processingproblems to other mathematical areas;

• Without image algebra, a programmer will never benefit from the bridge that exists between an imagealgebra programming language and the multitude of mathematical structures, theorems, and identitiesthat are related to image algebra;

• There is no competing notation that adequately provides all these benefits.












The role of image algebra in computer vision and image processing tasks and theory should not be confusedwith the government’s Ada programming language effort. The goal of the development of the Adaprogramming language was to provide a single high-order language in which to implement embeddedsystems. The special architectures being developed nowadays for image processing applications are not oftencapable of directly executing Ada language programs, often due to support of parallel processing models notaccommodated by Ada’s tasking mechanism. Hence, most applications designed for such processors are stillwritten in special assembly or microcode languages. Image algebra, on the other hand, provides a level ofspecification, directly derived from the underlying mathematics on which image processing is based and thatis compatible with both sequential and parallel architectures.

Enthusiasm for image algebra must be tempered by the knowledge that image algebra, like any other field ofmathematics, will never be a finished product but remain a continuously evolving mathematical theoryconcerned with the unification of image processing and computer vision tasks. Much of the mathematicsassociated with image algebra and its implication to computer vision remains largely unchartered territorywhich awaits discovery. For example, very little work has been done in relating image algebra to computervision techniques which employ tools from such diverse areas as knowledge representation, graph theory, andsurface representation.

Several image algebra programming languages have been developed. These include image algebra Fortran(IAF) [45], an image algebra Ada (IAA) translator [46], image algebra Connection Machine *Lisp [47, 48], animage algebra language (IAL) implementation on transputers [49, 50], and an image algebra C++ class library(iac++) [51, 52]. Unfortunately, there is often a tendency among engineers to confuse or equate theselanguages with image algebra. An image algebra programming language is not image algebra, which is amathematical theory. An image algebra-based programming language typically implements a particularsubalgebra of the full image algebra. In addition, simplistic implementations can result in poor computationalperformance. Restrictions and limitations in implementation are usually due to a combination of factors, themost pertinent being development costs and hardware and software environment constraints. They are notlimitations of image algebra, and they should not be confused with the capability of image algebra as amathematical tool for image manipulation.

Image algebra is a heterogeneous or many-valued algebra in the sense of Birkhoff and Lipson [53, 1], withmultiple sets of operands and operators. Manipulation of images for purposes of image enhancement,analysis, and understanding involves operations not only on images, but also on different types of values andquantities associated with these images. Thus, the basic operands of image algebra are images and the valuesand quantities associated with these images. Roughly speaking, an image consists of two things, a collectionof points and a set of values associated with these points. Images are therefore endowed with two types ofinformation, namely the spatial relationship of the points, and also some type of numeric or other descriptiveinformation associated with these points. Consequently, the field of image algebra bridges two broadmathematical areas, the theory of point sets and the algebra of value sets, and investigates theirinterrelationship. In the sections that follow we discuss point and value sets as well as images, templates, andneighborhoods that characterize some of their interrelationships.





















Search Tips

Advanced Search


Search this book:


1.2. Point Sets

A point set is simply a topological space. Thus, a point set consists of two things, a collection of objects calledpoints and a topology which provides for such notions as nearness of two points, the connectivity of a subsetof the point set, the neighborhood of a point, boundary points, and curves and arcs. Point sets will be denotedby capital bold letters from the end of the alphabet, i.e., W, X, Y, and Z.

Points (elements of point sets) will be denoted by lower case bold letters from the end of the alphabet, namely

x, y, z � X. Note also that if , then x is of form x = (x1, x2, & , xn), where for each i = 1, 2, & , n, xi

denotes a real number called the ith coordinate of x.

The most common point sets occurring in image processing are discrete subsets of n-dimensional Euclidean

space with n = 1, 2, or 3 together with the discrete topology. However, other topologies such as the vonNeumann topology and the product topology are also commonly used topologies in computer vision [1].

There is no restriction on the shape of the discrete subsets of used in applications of image algebra tosolve vision problems. Point sets can assume arbitrary shapes. In particular, shapes can be rectangular,

circular, or snake-like. Some of the more pertinent point sets are the set of integer points (here we view

), the n-dimensional lattice

with n = 2 or n = 3,

and rectangular subsets of . Two of the most often encountered rectangular point sets are of form

or

We follow standard practice and represent these rectangular point sets by listing the points in matrix form.

Figure 1.2.1 provides a graphical representation of the point set .












Figure 1.2.1 The rectangular point set

Point Operations

As mentioned, some of the more pertinent point sets are discrete subsets of the vector space . These point

sets inherit the usual elementary vector space operations. Thus, for example, if and (or

), x = (x1, & , xn), y = (y1, & , yn) � X, then the sum of the points x and y is defined as

x + y = (x1 + y1 & , xn + yn),

while the multiplication and addition of a scalar (or ) and a point x is given by

k · x = (k · x1, & , k · xn)

and

k + x = (k + x1, & , k + xn),

respectively. Point subtraction is also defined in the usual way.

In addition to these standard vector space operations, image algebra also incorporates three basic types of

point multiplication. These are the Hadamard product, the cross product (or vector product) for points in

(or ), and the dot product which are defined by

x · y = (x1 · y1, & , xn · yn),

x × y = (x2 · y3 - x3 · y2, x3 · y1 - x1 · y3, x1 · y2 - x2 · y1),

and

x · y = x1 · y1 + x2 · y2 + & + xn · yn,

respectively.

Note that the sum of two points, the Hadamard product, and the cross product are binary operations that takeas input two points and produce another point. Therefore these operations can be viewed as mappings X × X ’X whenever X is closed under these operations. In contrast, the binary operation of dot product is a scalar and

not another vector. This provides an example of a mapping , where denotes theappropriate field of scalars. Another such mapping, associated with metric spaces, is the distance function

which assigns to each pair of points x and y the distance from x to y. The most commondistance functions occurring in image processing are the Euclidean distance, the city block or diamonddistance, and the chessboard distance which are defined by

and

´(x,y) = max{|xk - yk| : 1 d k d n},

respectively.

Distances can be conveniently computed in terms of the norm of a point. The three norms of interest here arederived from the standard Lp norms

The L� norm is given by

where . Specifically, the Euclidean norm is given by

. Thus, d(x,y) = ||x - y||2. Similarly, the city block distance can be computedusing the formulation Á(x,y) = ||x - y||1 and the chessboard distance by using ´(x,y) = ||x - y||�





















Search Tips

Advanced Search


Search this book:


Note that the p-norm of a point x is a unary operation, namely a function . Another

assemblage of functions which play a major role in various applications are the projection

functions. Given , then the ith projection on X, where i � {1, & , n}, is denoted by pi and defined bypi(x) = xi, where xi denotes the ith coordinate of x.

Characteristic functions and neighborhood functions are two of the most frequently occurring unaryoperations in image processing. In order to define these operations, we need to recall the notion of a power setof a set. The power set of a set S is defined as the set of all subsets of S and is denoted by 2S. Thus, if Z is apoint set, then 2Z = {X : X 4 Z}.

Given X � 2Z (i.e., X 4 Z), then the characteristic function associated with X is the function

Çx : Z ’ {0, 1}

defined by

For a pair of point sets X and Z, a neighborhood system for X in Z, or equivalently, a neighborhood functionfrom X to Z, is a function

N : X ’ 2z.

It follows that for each point x � X, N(x) 4 Z. The set N(x) is called a neighborhood for x.

There are two neighborhood functions on subsets of which are of particular importance in imageprocessing. These are the von Neumann neighborhood and the Moore neighborhood. The von Neumann

neighborhood is defined by

N(x) = {y : y = (x1 ± j, x2) or y = (x1, x2 ± k), j, k � {0, 1}},












where , while the Moore neighborhood is defined by

M(x) = {y : y = x1 ± j, x2 ± k), j, k � {0, 1}}.

Figure 1.2.2 provides a pictorial representation of these two neighborhood functions; the hashed center arearepresents the point x and the adjacent cells represent the adjacent points. The von Neumann and Mooreneighborhoods are also called the four neighborhood and eight neighborhood, respectively. They are localneighborhoods since they only include the directly adjacent points of a given point.

Figure 1.2.2 The von Neumann neighborhood N(x) and the Moore neighborhood M(x) of a point x.

There are many other point operations that are useful in expressing computer vision algorithms in succinctalgebraic form. For instance, in certain interpolation schemes it becomes necessary to switch from points withreal-valued coordinates (floating point coordinates) to corresponding integer-valued coordinate points. One

such method uses the induced floor operation defined by x� = (x1�, x2�, & , xn�), where

and denotes the largest integer less than or equal to xi (i.e.,

xi� d xi and if with k d xi, then k d xi�).





















Search Tips

Advanced Search


Search this book:


Summary of Point Operations

We summarize some of the more pertinent point operations. Some image algebra implementations such asiac++ provide many additional point operations [54].

Binary operations. Let x = (x1, x2, & , xn), and

addition x + y = (x1 + y1, & , xn + yn)

subtraction x - y = (x1 - y1, & , xn - yn)

multiplication x · y = (x1y1, & , xnyn)

division x/y = (x1/y1, & , xn/yn)

supremum sup(x,y) = (x1 ¦ y1, & , xn ¦ yn)

infimum inf(x,y) = (x1 ¥ y1, & , xn ¥ yn)

dot product x·y = x1y1 + x2y2 + & + xnyn

cross product (n = 3) x × y = (x2y3 - x3y2, x3y1 - x1y3, x1y2 - x2y1)

concatenation

scalar operations k³x = (k³x1, & , k³xn), where ³ � {+, -, *, ¦, ¥}

Unary operations. In the following let .

negation -x = (-x1, &, -xn)

ceiling �x = (�x1, &, �xn)

floor x{ = (x1�, .&, �xn)

rounding [x] = ([x1], &, [xn])

projection pi(x) = xi

sum £x = x1 + x2 + & + xn












product x = x1x2 & xn

maximum ¦x = x1 ¦ x2 ¦ & ¦ xn

minimum ¦x = x1 g x2 g & ¦ xn

Euclidean norm

L1 norm ||x||1 = |x1| + |x2| + & + |xn|

L� norm ||x||� = |x1| ¦ |x2| ¦ & ¦ |xn|

dimension dim(x) = n

neighborhood

characteristic function

It is important to note that several of the above unary operations are special instances of spatialtransformations X ’ Y. Spatial transforms play a vital role in many image processing and computer visiontasks.

In the above summary we only considered points with real- or integer-valued coordinates. Points of other

spaces have their own induced operations. For example, typical operations on points of (i.e.,Boolean-valued points) are the usual logical operations of AND, OR, XOR, and complementation.

Point Set Operations

Point arithmetic leads in a natural way to the notion of set arithmetic. Given a vector space Z, then for X, Y �2Z (i.e., X, Y 4 Z) and an arbitrary point p � Z we define the following arithmetic operations:

addition X + Y = {x + y : x � X and y � Y}subtraction X - Y = {x - y : x � X and y � Y}point addition X + p = {x + p : x � X}point subtraction X - p = {x - p : x � X}





















Search Tips

Advanced Search


Search this book:


Another set of operations on 2Z are the usual set operations of union, intersection, set difference (or relativecomplement), symmetric difference, and Cartesian product as defined below.

union X * Y = {z : z � X or z � Y}intersection X ) Y = {z : z � X and z � Y}set difference X\Y = {z : z � X and z Y}symmetric difference X”Y = {z : z � X * Y and z X ) Y}Cartesian product X × Y = {(x,y) : x � X and y � Y}

Note that with the exception of the Cartesian product, the set obtained for each of the above operations isagain an element of 2Z.

Another common set theoretic operation is set complementation. For X � 2Z, the complement of X is denoted

by , and defined as . In contrast to the binary set operations definedabove, set complementation is a unary operation. However, complementation can be computed in terms of the

binary operation of set difference by observing that .

In addition to complementation there are various other common unary operations which play a major role inalgorithm development using image algebra. Among these is the cardinality of a set which, when applied to afinite point set, yields the number of elements in the set, and the choice function which, when applied to a set,selects a randomly chosen point from the set. The cardinality of a set X will be denoted by card(X). Note that

while

choice : 2Z ’ Z.

That is, and choice(X) = x, where x is some randomly chosen element of X.

As was the case for operations on points, algebraic operations on point sets are too numerous to discuss atlength in a short treatise as this. Therefore, we again only summarize some of the more frequently occurringunary operations.












Summary of Unary Point Set Operations

In the following .

negation -X = {-x : x � X}

complementationsupremum sup(X) (for finite point set X)infimum inf(X) (for finite point set X)choice function choice(X) � X (randomly chosen element)cardinality card(X) = the cardinality of X

The interpretation of sup(X) is as follows. Suppose X is finite, say X = {x1, x2, & , xk}. Then sup(X) = sup( &sup(sup(sup(x1,x2),x3),x4), & , xn), where sup(xi,xj) denotes the binary operation of the supremum of twopoints defined earlier. Equivalently, if xi = (xi,yi) for i = 1, &, k, then sup(X) = (x1 ¦ x2 ¦ & ¦ xk, y1 ¦ y2 ¦ & ¦ yk).More generally, sup(X) is defined to be the least upper bound of X (if it exists). The infimum of X isinterpreted in a similar fashion.

If X is finite and has a total order, then we also define the maximum and minimum of X, denoted by and

, respectively, as follows. Suppose X = {x1, x2, & , xk} and , where the

symbol denotes the particular total order on X. Then and . The most commonly

used order for a subset X of is the row scanning order. Note also that in contrast to the supremum orinfimum, the maximum and minimum of a (finite totally ordered) set is always a member of the set.

1.3. Value Sets

A heterogeneous algebra is a collection of nonempty sets of possibly different types of elements togetherwith a set of finitary operations which provide the rules of combining various elements in order to form a newelement. For a precise definition of a heterogeneous algebra we refer the reader to Ritter [1]. Note that thecollection of point sets, points, and scalars together with the operations described in the previous section forma heterogeneous algebra.

A homogeneous algebra is a heterogeneous algebra with only one set of operands. In other words, ahomogeneous algebra is simply a set together with a finite number of operations. Homogeneous algebras will

be referred to as value sets and will be denoted by capital blackboard font letters, e.g., , , and . Thereare several value sets that occur more often than others in digital image processing. These are the set ofintegers, real numbers (floating point numbers), the complex numbers, binary numbers of fixed length k, theextended real numbers (which include the symbols +� and/or -�), and the extended non-negative real numbers.

We denote these sets by , and

, and , respectively, where

the symbol denotes the set of positive real numbers.





















Search Tips

Advanced Search


Search this book:


Operations on Value Sets

The operations on and between elements of a given value set are the usual elementary operations

associated with . Thus, if , then the binary operations are the usual arithmetic andlogic operations of addition, multiplication, and maximum, and the complementary operations of subtraction,

division, and minimum. If , then the binary operations are addition, subtraction, multiplication, anddivision. Similarly, we allow the usual elementary unary operations associated with these sets such as theabsolute value, conjugation, as well as trigonometric, logarithmic and exponential functions as these areavailable in all higher-level scientific programming languages.

For the set we need to extend the arithmetic and logic operations of as follows:

Note that the element -� acts as a null element the system if we view the operation + asmultiplication and the operation ¦ as addition. The same cannot be said about the element � in the system

since (-�)+� = �+(-�) = -�. In order to remedy this situation we define the dual structure

of as follows:












Now the element +� acts as a null element in the system . Observe, however, that the dualadditions + and +2 introduce an asymmetry between -� and +�. The resultant structure

is known as a bounded lattice ordered group [1].

Dual structures provide for the notion of dual elements. For each we define its dual or conjugater* by r* = -r, where -(-�) = �. The following duality laws are a direct consequence of this definition:

(1) (r*) * = r

(2) (r ¥ t)* = r* ¦ t* and (r ¦ t)* = r* ¥ t*.

Closely related to the additive bounded lattice ordered group described above is the multiplicative bounded

lattice ordered group . Here the dual ×2 of ordinary multiplication is defined as

with both multiplicative operations extended as follows:

Hence, the element 0 acts as a null element in the system and the element +� acts as a null

element in the system . The conjugate r* of an element of this value set is definedby

Another algebraic structure with duality which is of interest in image algebra is the value set

, where . The logicaloperations ¦ and ¥ are the usual binary operations of max (or) and min (and), respectively, while the dual

additive operations and are defined by the tables shown in Figure 1.3.1.

Figure 1.3.1 The dual additive operations and .

Note that the addition (as well as ) restricted to is the exclusive or operation xor andcomputes the values for the truth table of the biconditional statement p ” q (i.e., p if and only if q).

The operations on the value set can be easily generalized to its k-fold Cartesian product

. Specifically, if and

, where for i = 1, & , k, then

.

The addition should not be confused with the usual addition mod2k on . In fact, for m,

, where

Many point sets are also value sets. For example, the point set is a metric space as well as a

vector space with the usual operation of vector addition. Thus, , where the symbol “+” denotesvector addition, will at various times be used both as a point set and as a value set. Confusion as to usage willnot arise as usage should be clear from the discussion.

Summary of Pertinent Numeric Value Sets

In order to focus attention on the value sets most often used in this treatise we provide a listing of theiralgebraic structures:

(a)

(b)

(c)

(d)

(e)

(f)

(g)

In contrast to structure c, the addition and multiplication in structure d is addition and multiplication mod2k.

These listed structures represent the pertinent global structures. In various applications only certain

subalgebras of these algebras are used. For example, the subalgebras and of

play special roles in morphological processing. Similarly, the subalgebra

of , where , is the only pertinent applicablealgebra in certain cases.

The complementary binary operations, whenever they exist, are assumed to be part of the structures. Thus, forexample, subtraction and division which can be defined in terms of addition and multiplication, respectively,

are assumed to be part of .

Value Set Operators

As for point sets, given a value set , the operations on are again the usual operations of union,

intersection, set difference, etc. If, in addition, is a lattice, then the operations of infimum and supremumare also included. A brief summary of value set operators is given below.

For the following operations assume that A, for some value set .

union A * B = {c : c � A or c � B}intersection A ) B = {c : c � A and c � B}set difference A\B = {c : c � A and c B}symmetric difference A”B = {c : c � A * B and c A ) B}Cartesian product A × B = {(a,b) : a � A and b È B}choice function choice(A) � Acardinality card(A) = cardinality of A

supremum sup(A) = supremum of Ainfimum inf(A) = infimum of A





















Search Tips

Advanced Search


Search this book:


1.4. Images

The primary operands in image algebra are images, templates, and neighborhoods. Of these three classes ofoperands, images are the most fundamental since templates and neighborhoods can be viewed as special casesof the general concept of an image. In order to provide a mathematically rigorous definition of an image thatcovers the plethora of objects called an “image” in signal processing and image understanding, we define animage in general terms, with a minimum of specification. In the following we use the notation AB to denotethe set of all functions B ’ A (i.e., AB = {f : f is a function from B to A}).

Definition: Let be a value set and X a point set. An -valued image on X is any element of

. Given an -valued image (i.e., ), then is called the set ofpossible range values of a and X the spatial domain of a.

It is often convenient to let the graph of an image represent a. The graph of an image is alsoreferred to as the data structure representation of the image. Given the data structure representation a ={(x,a(x)) : x � X}, then an element (x, a(x)) of the data structure is called a picture element or pixel. The firstcoordinate x of a pixel is called the pixel location or image point, and the second coordinate a(x) is called thepixel value of a at location x.

The above definition of an image covers all mathematical images on topological spaces with range in analgebraic system. Requiring X to be a topological space provides us with the notion of nearness of pixels.Since X is not directly specified we may substitute any space required for the analysis of an image or imposed

by a particular sensor and scene. For example, X could be a subset of or with x � X of form x = (x,y,t),where the first coordinates (x,y) denote spatial location and t a time variable.

Similarly, replacing the unspecified value set with or provides us withdigital integer-valued and digital vector-valued images, respectively. An implication of these observations isthat our image definition also characterizes any type of discrete or continuous physical image.

Induced Operations on Images












Operations on and between -valued images are the natural induced operations of the algebraic system .

For example, if ³ is a binary operation on , then ³ induces a binary operation — again denoted by ³ — on

defined as follows:

Let a, . Then

a³b = {(x,c(x)) : c(x) = a(x)³b(x), x � X}.

For example, suppose a, and our value set is the algebraic structure of the real numbers

. Replacing ³ by the binary operations +, ·, ¦, and ¥ we obtain the basic binary operations

a + b = {(x,c(x)) : c(x) = a(x) + b(x), x � X},a · b = {(x,c(x)) : c(x) = a(x) · b(x), x � X},a ¦ b = {(x,c(x)) : c(x) = a(x) ¦ b(x), x � X},

and

a ¦ b = {(x,c(x)) : c(x) = a(x) ¦ b(x), x � X)}

on real-valued images. Obviously, all four operations are commutative and associative.

In addition to the binary operation between images, the binary operation ³ on also induces the followingscalar operations on images:

For and ,

k³a = {(x,c(x)) : c(x) = k³a(x), x � X}

and

a³k = {(x,c(x)) : c(x) = a(x)³k, x � X}.

Thus, for , we obtain the following scalar multiplication and addition of real-valued images:

k·a = {(x,c(x)) : c(x) = k·a(x), x � X}

and

k + a = {(x,c(x)) : c(x) = k + a(x), x � X}.

It follows from the commutativity of real numbers that,

k·a = a·k and k + a = a + k.

Although much of image processing is accomplished using real-, integer-, binary-, or complex-valued images,many higher-level vision tasks require manipulation of vector and set-valued images. A set-valued image is of

form . Here the underlying value set is , where the tilde symbol denotescomplementation. Hence, the operations on set-valued images are those induced by the Boolean algebra of the

value set. For example, if a, , then

a * b = {(x,c(x)) : c(x) = a(x) * b(x), x � X},a ) c = {(x,c(x)) : c(x) = a(x) * b(x), x � X},

and

where .


Search Tips

Advanced Search


Search this book:


The operation of complementation is, of course, a unary operation. A particularly useful unary operation onimages which is induced by a binary operation on a value set is known as the global reduce operation. More

precisely, if ³ is an associative and commutative binary operation on and X is finite, say X = {x1, x2, ..., xn},then ³ induces a unary operation

called the global reduce operation induced by ³, which is defined as

Thus, for example, if and ³ is the operation of addition (³ = +) then “ = £ and

In all, the value set provides for four basic global reduce operations, namely

, and .

Induced Unary Operations and Functional Composition

In the previous section we discussed unary operations on elements of induced by a binary operation ³ on

. Typically, however, unary image operations are induced directly by unary operations on . Given a unary

operation , then the induced unary operation is again denoted by f and is definedby

f(a) = {(x,c(x)) : c(x) = f(a(x)), x � X}.

Note that in this definition we view the composition as a unary operation on with operand a. Thissubtle distinction has the important consequence that f is viewed as a unary operation — namely a function












from to — and a as an argument of f. For example, substituting for and the sine function

for f, we obtain the induced operation , where

sin(a) = {(x, c(x)) : c(x) = sin(a(x)), x � X}.

As another example, consider the characteristic function

Then for any is the Boolean (two-valued) image on X with value 1 at location x if a(x) e kand value 0 if a(x) < k. An obvious application of this operation is the thresholding of an image. Given afloating point image a and using the characteristic function

then the image b in the image algebra expression

b : = Ç[j,k] (a)

is given by

b = {(x, b(x)) : b(x) = a(x) if j d a(x) d k, otherwise b(x) = 0}.

The unary operations on an image discussed thus far have resulted either in a scalar (an element of

) by use of the global reduction operation, or another -valued image by use of the composition

. More generally, given a function , then the composition provides for a unary

operation which changes an -valued image into a -valued image f(a). Taking the same viewpoint, butusing a function f between spatial domains instead, provides a scheme for realizing naturally induced

operations for spatial manipulation of image data. In particular, if f : Y ’ X and , then we define the

induced image by

Thus, the operation defined by the above equation transforms an -valued image defined over the space X

into an -valued image defined over the space Y.

Examples of spatial based image transformations are affine and perspective transforms. For instance, suppose

, where is a rectangular m × n array. If and f : X ’ X is defined as

then is a one sided reflection of a across the line x = k. Further examples are provided by several of thealgorithms presented in this text.

Simple shifts of an image can be achieved by using either a spatial transformation or point addition. In

particular, given , and , we define a shift of a by y as

a + y = {(z, b(z)) : b(z) = a(z - y), z - y � X}.

Note that a + y is an image on X + y since z - y � X Ô z � X + y, which provides for the equivalent formulation

a + y = {(z, b(z)) : b(z) = a(z - y), z � X + y}.

Of course, one could just as well define a spatial transformation f : X + y ’ X by f(z) = z - y in order to obtain

the identical shifted image .

Another simple unary image operation that can be defined in terms of a spatial map is image transposition.

Given an image , then the transpose of a, denoted by a2, is defined as , where

is given by f(x,y) = (y,x).

Binary Operations Induced by Unary Operations

Various unary operations image operations induced by functions can be generalized to binary

operations on . As a simple illustration, consider the exponentiation function defined byf(r) = rk, where k denotes some non-negative real number. Then f induces the exponentiation operation

where a is a non-negative real-valued image on X. We may extend this operation to a binary image operation

as follows: if a, , then

The notion of exponentiation can be extended to negative valued images as long as we follow the rules ofarithmetic and restrict this binary operation to those pairs of real-valued images for which

. This avoids creation of complex, undefined, and indeterminate pixel values such as

, and 00, respectively. However, there is one exception to these rules of standard arithmetic. The

algebra of images provides for the existence of pseudo inverses. For , the pseudo inverse of a, whichfor reason of simplicity is denoted by a-1 is defined as

Note that if some pixel values of a are zero, then a·a-1 ` 1, where 1 denotes unit image all of whose pixelvalues are 1. However, the equality a·a-1 ·a = a always holds. Hence the name “pseudo inverse.”





















Search Tips

Advanced Search


Search this book:


The inverse of exponentiation is defined in the usual way by taking logarithms. Specifically,

logba = {(x,c(x)) : c(x) = logb(x)a(x), x � X}.

As for real numbers, logba is defined only for positive images; i.e., a, .

Another set of examples of binary operations induced by unary operations are the characteristic functions for

comparing two images. For we define

Çdb (a) = {(x,c(x)) : c(x) = 1 if a(x) d b(x), otherwise c(x) = 0}

Ç<b (a) = {(x,c(x)) : c(x) = 1 if a(x) < b(x), otherwise c(x) = 0}

Ç=b (a) = {(x,c(x)) : c(x) = 1 if a(x) = b(x), otherwise c(x) = 0}

Çeb (a) = {(x,c(x)) : c(x) = 1 if a(x) e b(x), otherwise c(x) = 0}

Ç>b (a) = {(x,c(x)) : c(x) = 1 if a(x) > b(x), otherwise c(x) = 0}

Ç`b (a) = {(x,c(x)) : c(x) = 1 if a(x) ` b(x), otherwise c(x) = 0}.

Functional Specification of Image Operations

The basic concepts of elementary function theory provide the underlying foundation of a functionalspecification of image processing techniques. This is a direct consequence of viewing images as functions.The most elementary concepts of function theory are the notions of domain, range, restriction, and extensionof a function.

Image restrictions and extensions are used to restrict images to regions of particular interest and to embedimages into larger images, respectively. Employing standard mathematical notation, the restriction of

to a subset Z of X is denoted by a|Z, and defined by

Thus, . In practice, the user may specify Z explicitly by providing bounds for the coordinates of thepoints of Z.

There is nothing magical about restricting a to a subset Z of its domain X. We can just as well define

restrictions of images to subsets of the range values. Specifically, if and , then therestriction of a to S is denoted by a||S and defined as

In terms of the pixel representation of a||S we have a||S = {(x,a(x)) : a(x) � S}. The double-bar notation is used

to focus attention on the fact that the restriction is applied to the second coordinate of .

Image restrictions in terms of subsets of the value set is an extremely useful concept in computer vision asmany image processing tasks are restricted to image domains over which the image values satisfy certainproperties. Of course, one can always write this type of restriction in terms of a first coordinate restriction bysetting Z = {x � X : a(x) � S} so that a||S = a|Z. However, writing a program statement such as b := a|Z is oflittle value since Z is implicitly specified in terms of S; i.e., Z must be determined in terms of the property“a(x) � S.” Thus, Z would have to be precomputed, adding to the computational overhead as well as increasedcode. In contrast, direct restriction of the second coordinate values to an explicitly specified set S avoids theseproblems and provides for easier implementation.

As mentioned, restrictions to the range set provide a useful tool for expressing various algorithmic

procedures. For instance, if and S is the interval , where k denotes some given thresholdvalue, then a||(k,�) denotes the image a restricted to all those points of X where a(x) exceeds the value k. Inorder to reduce notation, we define a||>k a a||(k,�). Similarly,

a||ek a a||[k,�), a||<k a a||(-�,k), a||k a a||{k}, and a||dk a a||(-�,k].

As in the case of characteristic functions, a more general form of range restriction is given when S

corresponds to a set-valued image ; i.e., . In this case we define

a||S = {(x,a(x)) : a(x) � S(x)}.

For example, for a, we define

a||db a {(x,a(x)) : a(x) d b(x)}, a||b a {(x,a(x)) : a(x) > b(x)},

a||b a {(x,a(x)) : a(x) = b(x)}, a||`b a {(x,a(x)) : a(x) ` b(x)}.

Search Tips

Advanced Search


Search this book:


Combining the concepts of first and second coordinate (domain and range) restrictions provides the general

definition of an image restriction. If , Z 4 X, and , then the restriction of a to Z and S isdefined as

a|(Z,S) = a ) (Z × S).

It follows that a|(Z,S) = {(x,a(x)) : x � Z and a(x) � S}, a|(X,S) = a||S, and .

The extension of to on Y, where X and Y are subsets of the same topological space, isdenoted by a|b and defined by

In actual practice, the user will have to specify the function b.

Two of the most important concepts associated with a function are its domain and range. In the field of imageunderstanding, it is convenient to view these concepts as functions that map images to sets associated withcertain image properties. Specifically, we view the concept of range as a function

defined by .

Similarly, the concept of domain is viewed as the function

where

and domain is defined by












These mapping can be used to extract point sets and value sets from regions of images of particular interest.For example, the statement

s := domain(a||>k)

yields the set of all points (pixel locations) where a(x) exceeds k, namely s = {x � X : a(x) > k}. The statement

s := range(a||>k)

on the other hand, results in a subset of instead of X.

Closely related to spatial transformations and functional composition is the notion of image concatenation.Concatenation serves as a tool for simplifying algorithm code, adding translucency to code, and to provide a

link to the usual block notion used in matrix algebra. Given and , then therow-order concatenation of a with b is denoted by (a | b) and is defined as

(a|b) a a|b+(0,k).

Note that .

Assuming the correct dimensionality in the first coordinate, concatenation of any number of images is definedinductively using the formula (a | b|c) = ((a | b)|c) so that in general we have

Column-order concatenation can be defined in a similar manner or by simple transposition; i.e.,

Multi-Valued Image Operations

Although general image operations described in the previous sections apply to both single and multi-valued

images as long as there is no specific value type associated with the generic value set , there exist a largenumber of multi-valued image operations that are quite distinct from single-valued image operations. As thegeneral theory of multi-valued image operations is beyond the scope of this treatise, we shall restrict ourattention to some specific operations on vector-valued images while referring the reader interested in moreintricate details to Ritter [1]. However, it is important to realize that vector-valued images are a special casesof multi-valued images.

If and , then a(x) is a vector of form a(x) = (a1(x), &, an(x)) where for each i = 1, & , n,

. Thus, an image is of form a = (a1, & , an) and with each vector value a(x) there areassociated n real values ai(x).

Real-valued image operations generalize to the usual vector operations on . In particular, if a,

, then

If , then we also have

r + a = (r1 + a1, &, rn + an),

r · a = (r1 · a1, &, rn · an),

etc. In the special case where r = (r, r, & , r), we simply use the scalar and define r + a a r + a, r · a ar · a, and so on.

As before, binary operations on multi-valued images are induced by the corresponding binary operation

on the value set . It turns out to be useful to generalize this concept by replacing

the binary operation ³ by a sequence of binary operations , where j = 1, … n, and defining

a³b a (a³1b,a³2b, & , a³nb).

For example, if is defined by

(x1, & , xn)³j(y1, & , yn) = max{x1 ¦ yj : 1 d i d j},

then for a, and c = a³b, the components of c(x) = (c1(x), & , cn(x)) have values

cj(x) = a(x)³jb(x) = max{ai(x) ¦ aj(x) : 1 d i d j}

for j = 1, & , n.





















Search Tips

Advanced Search


Search this book:


As another example, suppose ³1 and ³2 are two binary operations defined by

(x1, x2)³1(y1, y2) = x1y1 - x2y2

and

(x1, x2)³2(y1, y2) = x1y2 + x2y1,

respectively. Now if a, represent two complex-valued images, then the productc = a³b representspointwise complex multiplication, namely

c(x) = (a1(x)b1(x) - a2(x)b2(x), a1(x)b2(x) + a2(x)b1(x)).

Basic operations on single and multi-valued images can be combined to form image processing operations ofarbitrary complexity. Two such operations that have proven to be extremely useful in processing realvector-valued images are the winner take all jth-coordinate maximum and minimum of two images.

Specifically, if a, , then the jth-coordinate maximum of a and b is defined as

a ¦ |jb = {(x,c(x)) : c(x) = a(x) if aj(x) e bj(x), otherwise c(x) = b(x)},

while the jth-coordinate minimum is defined as

a ¥ |jb = {(x,c(x)) : c(x) = a(x) if aj(x) d bj(x), otherwise c(x) = b(x)}.

Unary operations on vector-valued images are defined in a similar componentwise fashion. Given a function

, then f induces a function , again denoted by f, which is defined by

f(x1, x2, & , xn) a (f(x1), f(x2), & , f(xn)).

These functions provide for one type of unary operations on vector-valued images. In particular, if

, then












Thus, if , then

sin(a) = (sin(a1), & , sin(an)).

Similarly, if f = Çek, then

Çek(a) = (Çek(a1), & , Çek(an)).

Any function gives rise to a sequence of functions , where j = 1,

& , n. Conversely, given a sequence of functions , where j = 1, & , n, then we can define a

function by

where . Such functions provide for a more complex type of unary image operationssince by definition

which means that the construction of each new coordinate depends on all the original coordinates. To provide

a specific example, define by f1(x,y) = sin(x) + cosh(y) and by f2(x, y) = cos(x)

+ sinh(y). Then the induced function given by f = (f1, f2). Applying f to an image

results in the image

Thus, if we represent complex numbers as points in and a denotes a complex-valued image, then f(a) is apointwise application of the complex sine function.

Global reduce operations are also applied componentwise. For example, if , and k = card(X), then

In contrast, the summation since each . Note that the projection function pi

is a unary operation .

Similarly,

and

a = ( a1, & , an).








Search Tips

Advanced Search


Search this book:


Summary of Image Operations

The lists below summarize some of the more significant image operations.

Binary image operations

It is assumed that only appropriately valued images are employed for the operations listed below. Thus, forthe operations of maximum and minimum apply to real- or integer-valued images but not complex-valuedimages. Similarly, union and intersection apply only to set-valued images.

generic a³b = {(x, c(x)) : c(x) = a(x)³b(x), x � X}addition a + b = {(x, c(x)) : c(x) = a(x) + b(x), x � X}multiplication a · b = {(x, c(x)) : c(x) = a(x) · b(x), x � X}maximum a ¦ b = {(x, c(x)) : c(x) = a(x) ¦ b(x), x � X}minimum a ¥ b = {(x, c(x)) : c(x) = a(x) ¥ b(x), x � X}scalar addition k + a = {(x, c(x)) : c(x) = k + a(x), x � X}scalar multiplication k · a = {(x, c(x)) : c(x) = k · a(x), x � X}point addition a + y = {(z, b(z)) : b(z) = a(z - y), z � X + y }union a * b = {(x,c(x)) : c(x) = a(x) * b(x), x � X}intersection a ) b = {(x,c(x)) : c(x) = a(x) ) b(x), x � X}

exponentiation

logarithm logba = {(x,c(x)) : c(x) = logb(x)a(x), x � X}

concatenation

concatenation

characteristicsÇdb(a) = {(x,c(x)) : c(x) = 1 if a(x) d b(x), otherwise c(x) = 0}












Ç<b(a) = {(x,c(x)) : c(x) = 1 if a(x) < b(x), otherwise c(x) = 0}

Ç=b(a) = {(x,c(x)) : c(x) = 1 if a(x) = b(x), otherwise c(x) = 0}

Çeb(a) = {(x,c(x)) : c(x) = 1 if a(x) e b(x), otherwise c(x) = 0}

Ç>b(a) = {(x,c(x)) : c(x) = 1 if a(x) > b(x), otherwise c(x) = 0}

Ç`b(a) = {(x,c(x)) : c(x) = 1 if a(x) ` b(x), otherwise c(x) = 0}

Whenever b is a constant image, say b = k (i.e., b(x) = k �x � X), then we simply write ak for ab and logka forlogba. Similarly, we have k+a, Çdk(a),Ç<k(a), etc.

Unary image operations

As in the case of binary operations, we again assume that only appropriately valued images are employed forthe operations listed below.

value transform

spatial transform

domain restriction a|Z = {(x, a(x)) : x � Z}

range restriction a||S = {(x, a(x)) : a(x) � S}

extension

domain

range

generic reduction “a = a(x1)³a(x2)³ ··· ³a(xn)

image sum

image product

image maximum

image minimum

image complement

pseudo inverse

image transpose a2 = {((x,y), a2(x,y)) : a2(x,y) = a(y,x), (y,x) � X}

Search Tips

Advanced Search


Search this book:


1.5. Templates

Templates are images whose values are images. The notion of a template, as used in image algebra, unifiesand generalizes the usual concepts of templates, masks, windows, and neighborhood functions into onegeneral mathematical entity. In addition, templates generalize the notion of structuring elements as used inmathematical morphology [26, 55].

Definition. A template is an image whose pixel values are images (functions). In particular, an

-valued template from Y to X is a function . Thus, and t is an

-valued image on Y.

For notational convenience we define ty a t(y) �y � Y. The image ty has representation

ty = {(x, ty(x)) : x � X}.

The pixel values ty(x) of this image are called the weights of the template at point y.

If t is a real- or complex-valued template from Y to X, then the support of ty is denoted by S(ty) and is definedas

S(ty) = {x � X : ty (x) ` 0}.

More generally, if and is an algebraic structure with a zero element 0, then the support of ty willbe defined as S(ty) = {x � X : ty(x) ` 0}.

For extended real-valued templates we also define the following supports at infinity:

S�(ty) = {x � X : ty(x) ` �}

and

S-�(ty) = {x � X : ty(x) ` -�}.












If X is a space with an operation + such that (X, +) is a group, then a template is said to betranslation invariant (with respect to the operation +) if and only if for each triple x, y, z � X we have thatty(x) = ty+z (x + z). Templates that are not translation invariant are called translation variant or, simply,variant templates. A large class of translation invariant templates with finite support have the nice property

that they can be defined pictorially. For example, let and y = (x,y) be an arbitrary point of X. Set x1

= (x, y - 1), x2 = (x + 1, y), and x3 = (x + 1, y - 1). Define by defining the weights ty(y) = 1, ty(x1) =3, ty(x2) = 2, ty(x3) = 4, and ty(x) = 0 whenever x is not an element of {y, x1, x2, x3 }. Note that it follows fromthe definition of t that S(ty) = {y, x1, x2, x3}. Thus, at any arbitrary point y, the configuration of the supportand weights of ty is as shown in Figure 1.5.1. The shaded cell in the pictorial representation of ty indicates thelocation of the point y.

Figure 1.5.1 Pictorial representation of a translation invariant template.

There are certain collections of templates that can be defined explicitly in terms of parameters. Theseparameterized templates are of great practical importance.

Definition. A parameterized -valued template from Y to X with parameters in P is a function

of form . The set P is called the set of parameters and each p � P is called aparameter of t.

Thus, a parameterized -valued template from Y to X gives rise to a family of regular -valued templates

from Y to X, namely .

Image-Template Products

The definition of an image-template product provides the rules for combining images with templates andtemplates with templates. The definition of this product includes the usual correlation and convolution

products used in digital image processing. Suppose is a value set with two binary operations and ³,

where distributes over ³, and ³ is associative and commutative. If , then for each

. Thus, if a , where X is finite, then a and “ . It

follows that the binary operations and ³ induce a binary operation

where

is defined by

Therefore, if X = {x1, x2, …, xn}, then

The expression is called the right product of a with t. Note that while a is an image on X, the product

is an image on Y. Thus, templates allow for the transformation of an image from one type of domain toan entirely different domain type.

Replacing by changes into

b = a •t,

the linear image-template product, where

, and .

Every template has a transpose which is defined . Obviously, (s2)

2 = s and s2 reverses the mapping order from to . By definition, and

, whenever and . Hence the binary operations and ³ induceanother product operation

where

is defined by

The expression is called the left product of a with s.





















Search Tips

Advanced Search


Search this book:


When computing , it is not necessary to use the transpose s2 since

This allows us to redefine the transformation as

For the remainder of this section we assume that is a monoid and let 0 denote the zero of under the

operation ³. Suppose and , where X and Z are subsets of the same space. Since is a

monoid, the operator can be extended to a mapping

where is defined by is defined by

The left product is defined in a similar fashion. Subsequent examples will demonstrate that the abilityof replacing X with Z greatly simplifies the issue of template implementation and the use of templates inalgorithm development.

Significant reduction in the number of computations involved the image-template product can be achieved if

is a commutative semiring. Recall that if t , then the support of t at a point y � Y withrespect to the operation ³ is defined as S(ty) = {x � Z : ty(x) ` 0}. Since ty(x) = 0 whenever x S(ty), we have

that whenever x S(ty) and, therefore,












It follows that the computation of the new pixel value b(y) does not depend on the size of X, but on the size ofS(ty). Therefore, if k = card(X ) S(ty)), then the computation of b(y) requires a total of 2k2 - 1 operations of

type ³ and .

As pointed out earlier, substitution of different value sets and specific binary operations for ³ and results in

a wide variety of different image transforms. Our prime examples are the ring and the value sets

and . The structure provides for twolattice products:

where

and

where

In order to distinguish between these two types of lattice transforms, we call the operator the additive

maximum and the additive minimum. It follows from our earlier discussion that if

, then the value of b(y) is -�, the zero of under the operation of ¦. Similarly,

if , then b(y) = �.

The left additive max and min operations are defined by

and

respectively. The relationship between the additive max and min is given in terms of lattice duality by

where the image a* is defined by a*(x) = [a(x)]*, and the conjugate (or dual) of is the template

defined by . It follows that .

The value set also provides for two lattice products. Specifically, we have

where

and

where

Here 0 is the zero of under the operation of ¦, so that b(y) = 0 whenever . Similarly,

b(y) = � whenever .

The lattice products and are called the multiplicative maximum and multiplicative minimum,respectively. The left multiplicative max and left multiplicative min are defined as

and

respectively. The duality relation between the multiplicative max and min is given by

where a*(x) = (a(x))* and . Here r* denotes the conjugate of r in .

Summary of Image-Template Products

In the following list of pertinent image-template products and . Again, for each operation

we assume the appropriate value set .

right generic product

right linear product

right additive max

right additive min

right multiplicative max

right multiplicative min

right xor max

right xor min

In the next set of operations, .

left generic product

left linear product

left additive max

left additive min

left multiplicative max

left multiplicative min

Binary and Unary Template Operations

Since templates are images, all unary and binary image operations discussed earlier apply to templates as

well. Any binary ³ on induces a binary operation (again denoted by ³) on as follows: for each pair

the induced operation s³t is defined in terms of the induced binary image operation on ,

namely (s³t)y a sy³ty �y � Y. Thus, if , and ³ = +, then (s + t)y = sy + ty, where sy +

ty denotes the pointwise sum of the two images and .

The unary template operations of prime importance are the global reduce operations. Suppose Y is a finite

point set, say Y = {y1, y2, …, yn}, and . Any binary semigroup operation ³ on induces a globalreduce operation

which is defined by

Thus, for example, if and ³ is the operation of addition (³ = +), then “ = £ and

Therefore, is an image, namely the sum of a finite number of images.

In all, the value set provides for four basic global reduce operations, namely

.

If the value set has two binary operations ³ and so that is a ring (or semiring), then under

the induced operations is also a ring (or semiring). Analogous to the image-template product, the

binary operations and ³ induce a template convolution product

defined as follows. Suppose , and X a finite point set. Then the template product

, where , is defined as

Thus, if and , then r = s •t is given by the formula





















Search Tips

Advanced Search


Search this book:


The lattice product is defined in a similar manner. For and ,the product template r is given by

The following example provides a specific instance of the above product formulation.

Example: Suppose the following translation invariant templates:

Then the template product r = s •t is the template defined by

If are defined as above with values -� outside the support, then the template product

is the template defined by












The template t is not an -valued template. To provide an example of the template product , weredefine t as

Then is given by

The utility of template products stems from the fact that in semirings the equation

holds [1]. This equation can be utilized in order to reduce the computational burden associated with typical

convolution problems. For example, if is defined by , then

where

The construction of the new image b := a•r requires nine multiplications and eight additions per pixel (if weignore boundary pixels). In contrast, the computation of the image b := (a•s) •t requires only sixmultiplications and four additions per pixel. For large images (e.g., size 1024 × 1024) this amounts tosignificant savings in computation.

Summary of Unary and Binary Template Operations

In the following and denotes the appropriate value set.

generic binary operation s³t : (s³t)y a sy³ty

template sum s + t : (s + t)y a sy + ty

max of two templates s ¦ t : (s ¦ t)y a sy ¦ ty

min of two templates s ¥ t : (s ¥ t)y a sy ¥ ty

generic reduce operation

sum reduce

product reduce

max reduce

min reduce

In the next list, , X is a finite point set, and denotes the appropriate value set.

generic template product

linear template product

additive max product

additive min product

multiplicative max product

multiplicative min product

1.6. Recursive Templates

In this section we introduce the notions of recursive templates and recursive template operations, which aredirect extensions of the notions of templates and the corresponding template operations discussed in thepreceding section.

A recursive template is defined in terms of a regular template from some point set X to another point set Ywith some partial order imposed on Y.

Definition. A partially ordered set (or poset) is a set P together with a binary relation, satisfying the following three axioms for arbitrary x, y, z � P:

(i) (reflexive)

(ii) (antisymmetric)

(iii) (transitive)

Now suppose that X is a point set, Y is a partially ordered point set with partial order , and a monoid. An

-valued recursive template t from Y to X is a function , where

and , such that

1. and

2. for each .

Thus, for each is an -valued image on X and is an -valued image on Y.

In most applications, the relation X 4 Y or X = Y usually holds. Also, for consistency of notation and for

notational convenience, we define and so that . The

support of t at a point y is defined as . The set of all -valued recursive

templates from Y to X will be denoted by .

In analogy to our previous definition of translation invariant templates, if X is closed under the operation +,

then a recursive template is called translation invariant if for each triple x, y, z � X,

we have ty(x) = ty+z(x + z), or equivalently, and .An example of an invariant recursive template is shown in Figure 1.6.1.

Figure 1.6.1 An example of an integer-valued invariant recursive template from .

If t is an invariant recursive template and has only one pixel defined on the target point of its nonrecursive

support , then t is called a simplified recursive template. Pictorially, a simplified recursive templatecan be drawn the same way as a nonrecursive template since the recursive part and the nonrecursive part donot overlap. In particular, the recursive template shown in Figure 1.6.1 can be redrawn as illustrated in Figure1.6.2

Figure 1.6.2 An example of an integer-valued simplified recursive template.

The notions of transpose and dual of a recursive template are defined in terms of those for nonrecursive

templates. In particular, the transpose t2 of a recursive template t is defined as . Similarly,

, then the additive dual of t is defined by . The multiplicative dual for

recursive -valued templates is defined in a likewise fashion.





















Search Tips

Advanced Search


Search this book:


Operations between Images and Recursive Templates

In order to facilitate the discussion on recursive templates operations, we begin by extending the notions of

the linear product •, the additive maximum , and the multiplicative maximum to the corresponding

recursive operations , and , respectively.

Let X and Y be finite subsets of with Y partially ordered by and ,

then the recursive linear image-template product is defined by

The recursive template operation computes a new pixel value b(y) based on both the pixel values a(x) ofthe source image and some previously calculated new pixel values b(z) which are determined by the partialorder and the region of support of the participating template. By definition of a recursive template,

for every and . Therefore, b(y) is always recursively computable.Some partial orders that are commonly used in two-dimensional recursive transforms are forward andbackward raster scanning and serpentine scanning.

It follows from the definition of that the computation of a new pixel b(y) can be done only after all itspredecessors (ordered by ) have been computed. Thus, in contrast to nonrecursive template operations,recursive template operations are not computed in a globally parallel fashion.

Note that if the recursive template t is defined such that for all y � Y, then one obtains theusual nonrecursive template operation












Hence, recursive template operations are natural extensions of nonrecursive template operations.

Recursive additive maximum and multiplicative minimum are defined in a similar fashion. Specifically, if

and , then

is defined by

For and ,

is defined by

The operations of the recursive additive minimum and multiplicative minimum ( and ) are definedin the same straightforward fashion.

Recursive additive maximum, minimum as well as recursive multiplicative maximum and minimum arenonlinear operations. However, the recursive linear product remains a linear operation.

The basic recursive template operations described above can be easily generalized to the generic recursiveimage-template product by simple substitution of the specific operations, such as multiplication and addition,

by the generic operations and ³. More precisely, given a semiring with identity, then one candefine the generic recursive product

by defining by

Again, in addition to the basic recursive template operations discussed earlier, a wide variety of recursivetemplate operations can be derived from the generalized recursive rule by substituting different binary

operations for and ³. Additionally, parameterized recursive templates are defined in the same manner asparametrized nonrecursive templates; namely as functions

where P denotes the set of parameters, and with and

.

Summary of Recursive Template Operations

In the following list of pertinent recursive image-template products and . As

before, for each operation we assume the appropriate value set .

recursive generic product

recursive linear product

recursive additive max

recursive additive min

recursive multiplicative max

right multiplicative min

The definition of the left recursive product is also straightforward. However, for sake of brevity andsince the different left products are not required for the remainder of this text, we dispense with theirformulation. Additional facts about recursive products, their properties and applications can be found in [1,56, 57].

1.7. Neighborhoods

There are several types of template operations that are more easily implemented in terms of neighborhoodoperations. Typically, neighborhood operations replace template operations whenever the values in thesupport of a template consist only of the unit elements of the value set associated with the template. A

template with the property that for each y � Y, the values in the support of ty consist only of the

unit of is called a unit template.

For example, the invariant template shown in Figure 1.7.1 is a unit template with respect to the

value set since the value 1 is the unit with respect to multiplication.

Figure 1.7.1 The unit Moore template for the value set .

Similarly, the template shown in Figure 1.7.2 is a unit template with respect to the value set

since the value 0 is the unit with respect to the operation +.

Figure 1.7.2 The unit von Neumann template for the value set .

If is an m × n array of points, , and is the 3 × 3 unit Moore template, thenthe values of the m × n image b obtained from the statement b := a •t are computed by using the equation

We need to point out that the difference between the mathematical equality b = a •t and the pseudocodestatement b := a •t is that in the latter the new image is computed only for those points y for which

. Observe that since a(x) · 1 = a(x) and M(y) = S(ty), where M(y) denotes the Mooreneighborhood of y (see Figure 1.2.2), it follows that

This observation leads to the notion of neighborhood reduction. In implementation, neighborhood reductionavoids unnecessary multiplication by the unit element and, as we shall shortly demonstrate, neighborhoodreduction also avoids some standard boundary problems associated with image-template products.

To precisely define the notion of neighborhood reduction we need a more general notion of the reduce

operation , which was defined in terms of a binary operation ³ on . The more generalform of “ is a function

where .








Search Tips

Advanced Search


Search this book:


For example, if , is an m × n array of points, then one such functioncould be defined as

where . Another example would be to define

as , then “ implements the averaging function, which we shall denote byaverage. Similarly, for integer-valued images, the median reduction

is defined as , where

.

Now suppose is a unit template with respect to the operation of the semiring

, is a neighborhood system defined by N(y) = S(ty), and . It then

follows that is given by

This observation leads to the following definition of an image-neighborhood product. Given X 4 Z,

, a reduction function , and a neighborhood system












, then the image-neighborhood product is defined by

for each y � Y. Note that the product is similar to the image template product in that is a function

In particular, if is the Moore neighborhood, and is the 3 × 3unit Moore template defined earlier, then a 1t = a 1M.

Likewise, , where denotes the von Neumann unit template (Figure1.7.2) and N denotes the von Neumann neighborhood (1.2.2). The latter equality stems from the fact that if

and , then since ry(x) = 0 for all x � X ) S-�(ry) and S-�(ry) = N(y) for all

points , we have that

Unit templates act like characteristic functions in that they do not weigh a pixel, but simply note which pixelsare in their support and which are not. When employed in the image-template operations of their semiring,they only serve to collect a number of values that need to be reduced by the gamma operation. For this reason,unit templates are also referred to as characteristic templates. Now suppose that we wish to describe atranslation invariant unit template with a specific support such as the 3 × 3 support of the Moore template tshown in Figure 1.7.1. Suppose further that we would like this template to be used with a variety of reductionoperations, for instance, summation and maximum. In fact, we cannot describe such an operand without

regard of the image-template operation by which it will be used. For us to derive the expected results, the

template must map all points in its support to the unitary value with respect to the combining operation .

Thus, for the reduce operation of summation , the unit values in the support must be 1, while for the

maximum reduce operation , the values in the support must all be 0. Therefore, we cannot define a singletemplate operand to characterize a neighborhood for reduction without regard to the image-template operationto be used to reduce the values within the neighborhood. However, we can capture exactly the information ofinterest in unit templates with the simple notion of neighborhood function. Thus, for example, the Mooreneighborhood M can be used to add the values in every 3 × 3 neighborhood as well as to find the maximum or

minimum in such a neighborhood by using the statements a • M, , and , respectively. This is oneadvantage for replacing unit templates with neighborhoods.

Another advantage of using neighborhoods instead of templates can be seen by considering the simple

example of image smoothing by local averaging. Suppose , where is an m × n array of

points, and is the 3 × 3 unit Moore template with unit values 1. The image b obtained from the

statement represents the image obtained from a by local averaging since the new pixelvalue b(y) is given by

Of course, there will be a boundary effect. In particular, if X {(i,j) : 1 d i d m, 1 d j d n}, then

which is not the average of four points. One may either ignore this boundary effect (the most commonchoice), or one may one of several schemes to prevent it [1]. However, each of these schemes adds to thecomputational burden. A simpler and more elegant way is to use the Moore neighborhood function M

combined with the averaging reduction a a average. The simple statement provides for thedesired locally averaged image without boundary effect.

Neighborhood composition plays an important role in algorithm optimization and simplification of algebraic

expressions. Given two neighborhood functions , then the dilation of N1 by N2,

denoted by N1 • N2, is a neighborhood function which is defined as

where N(y) + q a {x + q : x � N(y)}. Just as for template composition, algorithm optimization can be achieved

by use of the equation for appropriate neighborhood functions and

neighborhood reduction functions “. For , the kth iterate of a neighborhood is

defined inductively as Nk = Nk-1 •N, where .

Most neighborhood functions used in image processing are translation invariant subsets of (in particular,

subsets of ). A neighborhood function is said to be translation invariant if

N(y + p) = N(y) + p for every point . Given a translation invariant neighborhood N, we define its

reflection or conjugate N* by N*(y) = N*(0) + y, where N*(0) = {-x : x � N(0)} and denotesthe origin. Conjugate neighborhoods play an important role in morphological image processing.

Note also that for a translation invariant neighborhood N, the kth iterate of N can be expressed in terms of thesum of sets

Nk(y) = Nk-1 (y) + N(0)

Furthermore, since and , we have the symmetric relation

.





















Search Tips

Advanced Search


Search this book:


Summary of Image-Neighborhood Products

In the following list of pertinent image-neighborhood products , and

. Again, for each operation we assume the appropriate value set .

generic neighborhood reduction

neigborhood sum

neighborhood maximum

neighborhood minimum

Note that

and, therefore, . Similarly, .

Although we did not address the issues of parameterized neighborhoods and recursive neighborhood












operations, it should be clear that these are defined in the usual way by simple substitution of the appropriateneighborhood function for the corresponding Boolean template. For example, a parameterized neighborhood

with parameters in the set P is a function . Thus, for each parameter p � P, N(p) is aneighborhood system for X in Z since N(p) : X ’ 2Z. Similarly, a recursive neighborhood system for a partially

ordered set is a function satisfying the conditions that for each

x � X, , and for each .

1.8. The p-Product

It is well known that in the linear domain template convolution products and image-template products areequivalent to matrix products and vector-matrix products, respectively [58, 1]. The notion of a generalizedmatrix product was developed in order to provide a general matrix theory approach to image-templateproducts and template convolution products in both the linear and non-linear domains. This generalizedmatrix or p-product was first defined in Ritter [59]. This new matrix operation includes the matrix and vectorproducts of linear algebra, the matrix product of minimax algebra [60], as well as generalized convolutions asspecial cases [59]. It provides for a transformation that combines the same or different types of values (orobjects) into values of a possibly different type from those initially used in the combining operation. It hasbeen shown that the p-product can be applied to express various image processing transforms in computingform [61, 62, 63]. In this document, however, we consider only products between matrices having the same

type of values. In the subsequent discussion, and the set of all m × n matrices with entries

from will be denoted by . We will follow the usual convention of setting and

view as the set of all n-dimensional row vectors with entries from . Similarly, the set of all

m-dimensional column vectors with entries from is given by .

Let m, n, and p be positive integers with p dividing both m and n. Define the following correspondences:

and

Since rp(i,k) < rp (i2,k2) Ô i < i2 or i = i2 and k < k2, rp linearizes the array using the rowscanning order as shown:

It follows that the row-scanning order on is given by

or, equivalently, by

We define the one-to-one correspondence

The one-to-one correspondence allows us to re-index the entries of a matrix interms of a triple index as,(i,k) by using the convention

Example: Suppose l = 2, m = 6 and p = 2. Then m/p = 3, 1 d k d p = 2, and 1 d i d m/p = 3. Hence for

, we have

The factor of the Cartesian product is decomposed in a similar fashion. Here therow-scanning map is given by

This allows us to re-index the entries of a matrix in terms of a triple indexb(k,j),t by using the convention

Example: Suppose n = 4, q = 3 and p = 2. Then n/p = 2, 1 d k d p = 2, and 1 d j d n/p = 2. Hence for

, we have

Now let and . Using the maps rp and cp, A and Bcan be rewritten as

, where 1 d s d l, 1 d rp (i,k) = j2 d m, and

, where 1 d cp (k,j) = i2 d n and 1 d t d q.

The p-product or generalized matrix product of A and B is denoted by A •pB, and is the matrix

defined by

where c(s,j)(i,t) denotes the (s, j)th row and (i, t)th column entry of C. Here we use the lexicographical order (s,j) < (s2, j2) Ô s < s2 or if s = s2, j < j2. Thus, the matrix C has the following form:

The entry c(s,j)(i,t) in the (s,j)-row and (i,t)-column is underlined for emphasis.

To provide an example, suppose that l = 2, m = 6, n = 4, and q = 3. Then for p = 2, one obtains m/p = 3, n/p =2 and 1 d k d 2. Now let

and

Then the (2,1)-row and (2,3)-column element c(2,1)(2,3) of the matrix

is given by

Thus, in order to compute c(2,1)(2,3), the two underlined elements of A are combined with the two underlinedelements of B as illustrated:

In particular,

If

then

This shows that the transpose property, which holds for the regular matrix product, is generally false for thep-product. The reason is that the p-product is not a dual operation in the transpose domain. In order to make

the transpose property hold we define the dual operation of •p by

It follows that

and the p-product is the dual operation of . In particular, we now have the transpose property

.





















Search Tips

Advanced Search


Search this book:


Since the operation is defined in terms of matrix transposition, labeling of matrix indices are reversed.Specifically, if A = (ast) is an l × m matrix, then A gets reindexed as A = (as,(kj)), using the convention

Similarly, if B = (bst) is an n × q matrix, then the entries of B are relabeled as b(i,k),t, using the convention

The product is then defined by the equation

Note that the dimension of .

To provide a specific example of the dual operation , suppose that












In this case we have l = 3, m = 4, n = 6, and q = 2. Thus, for p = 2 and using the scheme described above,the reindexed matrices have form

According to the dual product definition, the matrix is a 9 × 4 matrix given by

The underlined element c63 is obtained by using the formula:

Thus, in order to compute c63, the two underlined elements of A are combined with the two underlinedelements of B as illustrated:

As a final observation, note that the matrices A, B, and C in this example have the form of the transposes ofthe matrices B, A, and C, respectively, of the previous example.

1.9. References

1 G. Ritter, “Image algebra with applications.” Unpublished manuscript, available via anonymous ftpfrom ftp://ftp.cis.ufl.edu/pub/src/ia/documents, 1994.

2 S. Unger, “A computer oriented toward spatial problems,” Proceedings of the IRE, vol. 46, pp.1144-1750, 1958.

3 J. von Neumann, “The general logical theory of automata,” in Cerebral Mechanism in Behavior:The Hixon Symposium, New York, NY: John Wiley & Sons, 1951.

4 J. von Neumann, Theory of Self-Reproducing Automata. Urbana, IL: University of Illinois Press,1966.

5 K. Batcher, “Design of a massively parallel processor,” IEEE Transactions on Computers, vol. 29,no. 9, pp. 836-840, 1980.

6 M. Duff, D. Watson, T. Fountain, and G. Shaw, “A cellular logic array for image processing,”Pattern Recognition, vol. 5, pp. 229-247, Sept. 1973.

7 M. J. B. Duff, “Review of CLIP image processing system,” in Proceedings of the NationalComputing Conference, pp. 1055-1060, AFIPS, 1978.

8 M. Duff, “Clip4,” in Special Computer Architectures for Pattern Processing (K. Fu and T.Ichikawa, eds.), ch. 4, pp. 65-86, Boca Raton, FL: CRC Press, 1982.


9 T. Fountain, K. Matthews, and M. Duff, “The CLIP7A image processor,” IEEE Pattern Analysisand Machine Intelligence, vol. 10, no. 3, pp. 310-319, 1988.

10 L. Uhr, “Pyramid multi-computer structures, and augmented pyramids,” in Computing Structuresfor Image Processing (M. Duff, ed.), pp. 95-112, London: Academic Press, 1983.

11 L. Uhr, Algorithm-Structured Computer Arrays and Networks. New York, NY: Academic Press,1984.

12 W. Hillis, The Connection Machine. Cambridge, MA: The MIT Press, 1985.

13 J. Klein and J. Serra, “The texture analyzer,” Journal of Microscopy, vol. 95, 1972.

14 R. Lougheed and D. McCubbrey, “The cytocomputer: A practical pipelined image processor,” inProceedings of the Seventh International Symposium on Computer Architecture, pp. 411-418, 1980.

15 S. Sternberg, “Biomedical image processing,” Computer, vol. 16, Jan. 1983.

16 R. Lougheed, “A high speed recirculating neighborhood processing architecture,” inArchitectures and Algorithms for Digital Image Processing II, vol. 534 of Proceedings of SPIE, pp.22-33, 1985.

17 E. Cloud and W. Holsztynski, “Higher efficiency for parallel processors,” in Proceedings IEEESouthcon 84, (Orlando, FL), pp. 416-422, Mar. 1984.

18 E. Cloud, “The geometric arithmetic parallel processor,” in Proceedings Frontiers of MassivelyParallel Processing, George Mason University, Oct. 1988.

19 E. Cloud, “Geometric arithmetic parallel processor: Architecture and implementation,” inParallel Architectures and Algorithms for Image Understanding (V. Prasanna, ed.), (Boston, MA.),Academic Press, Inc., 1991.

20 H. Minkowski, “Volumen und oberflache,” Mathematische Annalen, vol. 57, pp. 447-495, 1903.

21 H. Minkowski, Gesammelte Abhandlungen. Leipzig-Berlin: Teubner Verlag, 1911.

22 H. Hadwiger, Vorlesungen Über Inhalt, Oberflæche und Isoperimetriem. Berlin: Springer-Verlag,1957.

23 G. Matheron, Random Sets and Integral Geometry. New York: Wiley, 1975.

24 J. Serra, “Introduction a la morphologie mathematique,” booklet no. 3, Cahiers du Centre deMorphologie Mathematique, Fontainebleau, France, 1969.

25 J. Serra, “Morphologie pour les fonctions “a peu pres en tout ou rien”,” technical report, Cahiersdu Centre de Morphologie Mathematique, Fontainebleau, France, 1975.

26 J. Serra, Image Analysis and Mathematical Morphology. London: Academic Press, 1982.

27 T. Crimmins and W. Brown, “Image algebra and automatic shape recognition,” IEEETransactions on Aerospace and Electronic Systems, vol. AES-21, pp. 60-69, Jan. 1985.

28 R. Haralick, S. Sternberg, and X. Zhuang, “Image analysis using mathematical morphology: PartI,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 532-550, July 1987.

29 R. Haralick, L. Shapiro, and J. Lee, “Morphological edge detection,” IEEE Journal of Roboticsand Automation, vol. RA-3, pp. 142-157, Apr. 1987.

30 P. Maragos and R. Schafer, “Morphological skeleton representation and coding of binaryimages,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-34, pp.1228-1244, Oct. 1986.

31 P. Maragos and R. Schafer, “Morphological filters Part II : Their relations to median,order-statistic, and stack filters,” IEEE Transactions on Acoustics, Speech, and Signal Processing,vol. ASSP-35, pp. 1170-1184, Aug. 1987.

32 P. Maragos and R. Schafer, “Morphological filters Part I: Their set-theoretic analysis andrelations to linear shift-invariant filters,” IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. ASSP-35, pp. 1153-1169, Aug. 1987.

33 J. Davidson and A. Talukder, “Template identification using simulated annealing in morphologyneural networks,” in Second Annual Midwest Electro-Technology Conference, (Ames, IA), pp. 64-67,IEEE Central Iowa Section, Apr. 1993.

34 J. Davidson and F. Hummer, “Morphology neural networks: An Introduction with applications,”IEEE Systems Signal Processing, vol. 12, no. 2, pp. 177-210, 1993.

35 E. Dougherty, “Unification of nonlinear filtering in the context of binary logical calculus, part ii:Gray-scale filters,” Journal of Mathematical Imaging and Vision, vol. 2, pp. 185-192, Nov. 1992.

36 D. Schonfeld and J. Goutsias, “Optimal morphological pattern restoration from noisy binary

images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 14-29, Jan.1991.

37 J. Goutsias, “On the morphological analysis of discrete random shapes,” Journal of MathematicalImaging and Vision, vol. 2, pp. 193-216, Nov. 1992.

38 L. Koskinen and J. Astola, “Asymptotic behaviour of morphological filters,” Journal ofMathematical Imaging and Vision, vol. 2, pp. 117-136, Nov. 1992.

39 S. R. Sternberg, “Language and architecture for parallel image processing,” in Proceedings of theConference on Pattern Recognition in Practice, (Amsterdam), May 1980.

40 S. Sternberg, “Overview of image algebra and related issues,” in Integrated Technology forParallel Image Processing (S. Levialdi, ed.), London: Academic Press, 1985.

41 P. Maragos, A Unified Theory of Translation-Invariant Systems with Applications toMorphological Analysis and Coding of Images. Ph.D. dissertation, Georgia Institute of Technology,Atlanta, 1985.

42 J. Davidson, Lattice Structures in the Image Algebra and Applications to Image Processing. PhDthesis, University of Florida, Gainesville, FL, 1989.

43 J. Davidson, “Classification of lattice transformations in image processing,” Computer Vision,Graphics, and Image Processing: Image Understanding, vol. 57, pp. 283-306, May 1993.

44 P. Miller, “Development of a mathematical structure for image processing,” optical division tech.report, Perkin-Elmer, 1983.

45 J. Wilson, D. Wilson, G. Ritter, and D. Langhorne, “Image algebra FORTRAN language, version3.0,” Tech. Rep. TR-89-03, University of Florida CIS Department, Gainesville, 1989.

46 J. Wilson, “An introduction to image algebra Ada,” in Image Algebra and Morphological ImageProcessing II, vol. 1568 of Proceedings of SPIE, (San Diego, CA), pp. 101-112, July 1991.

47 J. Wilson, G. Fischer, and G. Ritter, “Implementation and use of an image processing algebra forprogramming massively parallel computers,” in Frontiers ’88: The Second Symposium on theFrontiers of Massively Parallel Computation, (Fairfax, VA), pp. 587-594, 1988.

48 G. Fischer and M. Rowlee, “Computation of disparity in stereo images on the ConnectionMachine,” in Image Algebra and Morphological Image Processing, vol. 1350 of Proceedings ofSPIE, pp. 328-334, 1990.

49 D. Crookes, P. Morrow, and P. McParland, “An algebra-based language for image processing ontransputers,” IEE Third Annual Conference on Image Processing and its Applications, vol. 307, pp.457-461, July 1989.

50 D. Crookes, P. Morrow, and P. McParland, “An implementation of image algebra on transputers,”tech. rep., Department of Computer Science, Queen’s University of Belfast, Northern Ireland, 1990.

51 J. Wilson, “Supporting image algebra in the C++ language,” in Image Algebra andMorphological Image Processing IV, vol. 2030 of Proceedings of SPIE, (San Diego, CA), pp.315-326, July 1993.

52 University of Florida Center for Computer Vision and Visualization, Software User Manual forthe iac++ Class Library, version 1.0 ed., 1994. Avialable via anonymous ftp from ftp.cis.ufl.edu in/pub/ia/documents.

53 G. Birkhoff and J. Lipson, “Heterogeneous algebras,” Journal of Combinatorial Theory, vol. 8,pp. 115-133, 1970.

54 University of Florida Center for Computer Vision and Visualization, Software User Manual forthe iac++ Class Library, version 2.0 ed., 1995. Avialable via anonymous ftp from ftp.cis.ufl.edu in/pub/ia/documents.

55 G. Ritter, J. Davidson, and J. Wilson, “Beyond mathematical morphology,” in VisualCommunication and Image Processing II, vol. 845 of Proceedings of SPIE, (Cambridge, MA), pp.260-269, Oct. 1987.

56 D. Li, Recursive Operations in Image Algebra and Their Applications to Image Processing. PhDthesis, University of Florida, Gainesville, FL, 1990.

57 D. Li and G. Ritter, “Recursive operations in image algebra,” in Image Algebra andMorphological Image Processing, vol. 1350 of Proceedings of SPIE, (San Diego, CA), July 1990.

58 R. Gonzalez and P. Wintz, Digital Image Processing. Reading, MA: Addison-Wesley, second ed.,1987.

59 G. Ritter, “Heterogeneous matrix products,” in Image Algebra and Morphological Image

http://corpitk.earthweb.com/reference/pro/0849326362/ch01/ftp.cis.ufl.edu

http://corpitk.earthweb.com/reference/pro/0849326362/ch01/ftp.cis.ufl.edu

Processing II, vol. 1568 of Proceedings of SPIE, (San Diego, CA), pp. 92-100, July 1991.

60 R. Cuninghame-Green, Minimax Algebra: Lecture Notes in Economics and MathematicalSystems 166. New York: Springer-Verlag, 1979.

61 G. Ritter and H. Zhu, “The generalized matrix product and its applications,” Journal ofMathematical Imaging and Vision, vol. 1(3), pp. 201-213, 1992.

62 H. Zhu and G. X. Ritter, “The generalized matrix product and the wavelet transform,” Journal ofMathematical Imaging and Vision, vol. 3, no. 1, pp. 95-104, 1993.

63 H. Zhu and G. X. Ritter, “The p-product and its applications in signal processing,” SIAM Journalof Matrix Analysis and Applications, vol. 16, no. 2, pp. 579-601, 1995.





















Search Tips

Advanced Search


Search this book:


Chapter 2Image Enhancement Techniques

2.1. Introduction

The purpose of image enhancement is to improve the visual appearance of an image, or to transform an imageinto a form that is better suited for human interpretation or machine analysis. Although there exists amultitude of image enhancement techniques, surprisingly, there does not exist a corresponding unifyingtheory of image enhancement. This is due to the absence of a general standard of image quality that couldserve as a design criterion for image enhancement algorithms. In this chapter we consider several techniquesthat have proved useful in a wide variety of applications.

2.2. Averaging of Multiple Images

The purpose of averaging multiple images is to obtain an enhanced image by averaging the intensities ofseveral images of the same scene. A detailed discussion concerning rationale and methodology can be foundin Gonzalez and Wintz [1].

Image Algebra Formulation

For, i=1, …, k, let be a family of images of the same scene. The enhanced image, , isgiven by

For actual implementation the summation will probably involve the loop












Figure 2.2.1 Averaging of multiple images for different values of k. Additional explanations are given in thecomments section.

Comments and Observations

Averaging multiple images is applicable when several noise degraded images, a1, a2, …, ak, of the same sceneexist. Each ai is assumed to have pixel values of the form

where a0 is the true (uncorrupted by noise) image and ·i (x) is a random variable representing the introductionof noise (see Figure 2.2.1). The averaging multiple images technique assumes that the noise is uncorrelatedand has mean equal zero. Under these assumptions the law of large numbers guarantees that as k increases,

approaches a0(x). Thus, by averaging multiple images, it may be possible to assuagedegradation due to noise. Clearly, it is necessary that the noisy images be registered so that correspondingpixels line up correctly.

2.3. Local Averaging

Local averaging smooths an image by reducing the variation in intensities locally. This is done by replacingthe intensity level at a point by the average of the intensities in a neighborhood of the point.

Specifically, if a denotes the source image and N(y) a neighborhood of y with ,then the enhanced image b is given by

Additional details about the effects of this simple technique can be found in Gonzalez and Wintz [1].


Let be the source image, and a predefined neighborhood of for

arbitrary Y be the function yielding the average pixel value in its image argument. The result image ,

derived by local averaging from a is given by:


Local averaging traditionally imparts an artifact to the boundary of its result image. This is because the

number of neighbors is smaller at the boundary of an image, so the average should be computed over fewervalues. Simply dividing the sum of those neighbors by a fixed constant will not yield an accurate average. Theimage algebra specification does not yield such an artifact because the average of pixels is computed from theset of neighbors of each image pixel. No fixed divisor is specified.

2.4. Variable Local Averaging

Variable local averaging smooths an image by reducing the variation in intensities locally. This is done byreplacing the intensity level at a point by the average of the intensities in a neighborhood of the point. Incontrast to local averaging, this technique allows the size of the neighborhood configuration to vary. This isdesirable for images that exhibit higher noise degradation toward the edges of the image [2, 3].

The actual mathematical formulation of this method is as follows. Suppose denotes the source imageand N : X ’ 2X a neighborhood function. If ny denotes the number of points in N(y) 4 X, then the enhancedimage b is given by


Let denote the source image and N : X ’ 2X the specific neighborhood configuration function. Theenhanced image b is now given by


Although this technique is computationally more intense than local averaging, it may be more desirable ifvariations in noise degradation in different image regions can be determined beforehand by statistical or othermethods. Note that if N is translation invariant, then the technique is reduced to local averaging.

2.5. Iterative Conditional Local Averaging

The goal of iterative conditional local averaging is to reduce additive noise in approximately piecewiseconstant images without blurring of edges. The method presented here is a simplified version of one of severalmethods proposed by Lev, Zucker, and Rosenfeld [4]. In this method, the value of the image a at location y,a(y), is replaced by the average of the pixel values in a neighborhood of y whose values are approximately thesame as a(y). The method is iterated (usually four to six times) until the image assumes the right visualfidelity as judged by a human observer.

For the precise formulation, let and for y � X, let N(y) denote the desired neighborhood of y.Usually, N(y) is a 3 × 3 Moore neighborhood. Define

where T denotes a user-defined threshold, and set n(y) = card(S(y)).

The conditional local averaging operation has the form

where ak(y) is the value at the kth iteration and a0 = a.


Let denote the source image and N : X ’ 2X the desired neighborhood function. Select an appropriatethreshold T and define the following parameterized neighborhood function:

The iterative conditional local averaging algorithm can now be written as

where a0 = a.





















Search Tips

Advanced Search


Search this book:


2.6. Max-Min Sharpening Transform

The max-min sharpening transform is an image enhancement technique which sharpens fuzzy boundaries andbrings fuzzy gray level objects into focus. It also smoothens isolated peaks or valleys. It is an iterativetechnique that compares maximum and minimum values with respect to the central pixel value in a smallneighborhood. The central pixel value is replaced by whichever of the extrema in its neighborhood is closestto its value.

The following specification of the max-min sharpening transform was formulated in Kramer and Bruchner

[5]. Let be the source image and N(y) denote a symmetric neighborhood of y ’ X. Define

The sharpening transform s is defined as

The procedure can be iterated as


Let be the source image, and N(y) a desired neighborhood of . The max-min sharpenedimage s is given by the following algorithm:












The algorithm is usually iterated until s stabilizes or objects in the image have assumed desirable fidelity (asjudged by a human observer).


Figure 2.6.1 is a blurred image of four Chinese characters. Figure 2.6.2 show the results, after convergence, ofapplying the max-min sharpening algorithm to the blurred characters. Convergence required 128 iterations.The neighborhood N used is as pictured below.

Figure 2.6.1 Blurred characters.

Figure 2.6.2 Result of applying max-min sharpening to blurred characters.

2.7. Smoothing Binary Images by Association

The purpose of this smoothing method is to reduce the effects of noise in binary pictures. The basic idea isthat the 1-elements due to noise are scattered uniformly while the 1-elements due to message information tendto be clustered together. The original image is partitioned into rectangular regions. If the number of 1’s ineach region exceeds a given threshold, then the region is not changed; otherwise, the 1’s are set to zero. Theregions are then treated as single cells, a cell being assigned a 1 if there is at least one 1 in the correspondingregion and 0 otherwise. This new collection of cells can be viewed as a lower resolution image. The pixelwiseminimum of the lower resolution image and the original image provides for the smoothed version of theoriginal image. The smoothened version of the original image can again be partitioned by viewing the cells ofthe lower resolution image as pixels and partitioning these pixels into regions subject to the same thresholdprocedure. The precise specification of this algorithm is given by the image algebra formulation below.


Let T denote a given threshold and a ’ {0, 1}X be the source image with . For a fixed integer k e 2,define a neighborhood function N(k) : X ’ 2X by

Here means that if x = (x, y), then .

The smoothed image a1 ’ {0, 1}X is computed by using the statement

If recursion is desired, define

for i > 0, where a0 = a.

The recursion algorithm may reintroduce pixels with values 1 that had been eliminated at a previous stage.The following alternative recursion formulation avoids this phenomenon:


Figures 2.7.1 through 2.7.5 provide an example of this smoothing algorithm for k = 2 and T = 2. Note that

N(k) partitions the point set X into disjoint subsets since . Obviously, thelarger the number k, the larger the size of the cells [N(k)](y). In the iteration, one views the cells [N(k)](y) aspixels forming the next partition [N(k + 1)](y). The effects of the two different iteration algorithms can beseen in Figures 2.7.4 and 2.7.5.

Figure 2.7.1 The binary source image a.

Figure 2.7.2 The lower-resolution image is shown on the left and the smoothened

version of a on the right.

Figure 2.7.3 The lower-resolution image of the first iteration.

Figure 2.7.4 The smoothed version of a.

Figure 2.7.5 The image .

As can be ascertained from Figs. 2.7.1 through 2.7.5, several problems can arise when using this smoothingmethod. The technique as stated will not fill in holes caused by noise. It could be modified so that it fills inthe rectangular regions if the number of 1’s exceeds the threshold, but this would cause distortion of theobjects in the scene. Objects that split across boundaries of adjacent regions may be eliminated by thisalgorithm. Also, if the image cannot be broken into rectangular regions of uniform size, otherboundary-sensitive techniques may need to be employed to avoid inconsistent results near the imageboundary.

Additionally, the neighborhood N(k) is a translation variant neighborhood function that needs to be computedat each pixel location y, resulting in possibly excessive computational overhead. For these reasons,morphological methods producing similar results may be preferable for image smoothing.

2.8. Median Filter

The median filter is a smoothing technique that causes minimal edge blurring. However, it will removeisolated spikes and may destroy fine lines [1, 2, 6]. The technique involves replacing the pixel value at eachpoint in an image by the median of the pixel values in a neighborhood about the point.

The choice of neighborhood and median selection method distinguish the various median filter algorithms.Neighborhood selection is dependent on the source image. The machine architecture will determine the bestway to select the median from the neighborhood.

A sampling of two median filter algorithms is presented in this section. The first is for an arbitraryneighborhood. It shows how an image-template operation can be defined that finds the median value bysorting lists. The second formulation shows how the familiar bubble sort can be used to select the median overa 3 × 3 neighborhood.





















Search Tips

Advanced Search


Search this book:



Let be the source image, N : X ’ 2X neighborhood function. Let for arbitrary Y bethe function yielding the median of its image argument. The median filtered image m is given by

Alternate Image Algebra Formulation

The alternate formulation uses a bubble sort to compute the median value over a 3 × 3 neighborhood. Let

be the source image. The images ai are obtained by

where the functions are defined as follows:

The median image m is calculated with the following image algebra pseudocode:













The figures below offer a comparison between the results of applying averaging filter and a median filters.Figure 2.8.1 is the source image of a noisy jet. Figures 2.8.2 and 2.8.3 show the results of applying averagingand median filter, respectively, over 3 × 3 neighborhoods of the noisy jet image.

Figure 2.8.1 Noisy jet image.

Figure 2.8.2 Noisy jet smoothed with averaging filter over a 3 × 3 neighborhood.

Figure 2.8.3 Result of applying median filter over a 3 × 3 neighborhood to the noisy jet image.

Coffield [7] describes a stack comparator architecture which is particularly well suited for image processingtasks that involve order statistic filtering. His methodology is similar to the alternative image algebraformulation given above.

2.9. Unsharp Masking

Unsharp masking involves blending an image’s high-frequency components and low-frequency componentsto produce an enhanced image [2, 8, 9, 10]. The blending may sharpen or blur the source image depending onthe proportion of each component in the enhanced image. Enhancement takes place in the spatial domain. Theprecise formulation of this procedure is given in the image algebra formulation below.


Let be the source image and let b be the image obtained from a by applying an averaging mask(Section 2.3). The image b is the low-frequency component of the source image and the high-frequencycomponent is a - b. The enhanced image c produced by unsharp masking is given by

or, equivalently,

A between 0 and 1 results in a smoothing of the source image. A ³ greater than 1 emphasizes thehigh-frequency components of the source image, which sharpens detail. Figure 2.9.1 shows the result ofapplying the unsharp masking technique to a mammogram for several values of ³. A 3 ×3 averagingneighborhood was used to produce the low-frequency component image b.

A more general formulation for unsharp masking is given by

where . Here ± is the weighting of the high-frequency component and ² is the weighting of thelow-frequency component.

Figure 2.9.1 Unsharp masking at various values of ³.


Unsharp masking can be accomplished by simply convolving the source image with the appropriate template.For example, the unsharp masking can be done for the mammogram in Figure 2.9.1 using

where t is the template defined in Figure 2.9.2. The values of v and w are and , respectively.

Figure 2.9.2 The Moore configuration template for unsharp masking using a simple convolution.

2.10. Local Area Contrast Enhancement

In this section we present two methods of local contrast enhancement from Harris [11] and Narendra andFitch [12]. Each is a form of unsharp masking (Section 2.9) in which the weighting of the high-frequencycomponent is a function of local standard deviation.


Let , and N be a neighborhood selected for local averaging (Section 2.3). The von Neumann orMoore neighborhoods are the most commonly used neighborhoods for this unsharp masking technique. Thelocal mean image of a with respect to this neighborhood function is given by

The local standard deviation of a is given by the image

The image actually represents the local standard deviation while ² > 0 is a lower boundapplied to d in order to avoid problems with division by zero in the next step of this technique.

The enhancement technique of [11] is a high-frequency emphasis scheme which uses local standard deviationto control gain. The enhanced image b is given by

As seen in Section 2.9, a - m represents a highpass filtering of a in the spatial domain. The local gain factor1/d is inversely proportional to the local standard deviation. Thus, a larger gain will be applied to regions withlow contrast. This enhancement technique is useful for images whose information of interest lies in thehigh-frequency component and whose contrast levels vary widely from region to region.

The local contrast enhancement technique of Narendra and Fitch [12] is similar to the one above, except for aslightly different gain factor and the addition of a low-frequency component. In this technique, let

denote the global mean of a. The enhanced image b is given by

where 0 < ± lt; 1. The addition of the low-frequency component m is used to restore the average intensitylevel within local regions.

We need to remark that in both techniques above it may be necessary to put limits on the gain factor toprevent excessive gain variations.

Search Tips

Advanced Search


Search this book:


2.11. Histogram Equalization

Histogram equalization is a technique which rescales the range of an image’s pixel values to produce anenhanced image whose pixel values are more uniformly distributed [13, 1, 14]. The enhanced image tends tohave higher contrast.

The mathematical formulation is as follows. Let denote the source image, n = card(X), and nj be the

number of times the gray level j occurs in the image a. Recall that . The enhancedimage b is given by


Let be the source image and let denote the normalized cumulative histogram of a as defined inSection 10.11.

The enhanced image b is given by


Figure 2.11.1 is an illustration of the histogram equalization process. The original image and its histogramsare on the left side of the figure. The original image appears dark (or underexposed). This darkness manifestsitself in a bias toward the lower end of the gray scale in the original image’s histogram. On the right side ofFigure 2.11.1 is the equalized image and its histograms. The equalized histogram is distributed moreuniformly. This more uniform distribution of pixel values has resulted in an image with better contrast.












Figure 2.11.1 Left: Original image and histograms. Right: Equalized iamge and histograms.

2.12. Histogram Modification

The histogram equalization technique of Section 2.11 is a specific example of image enhancement usinghistogram modification. The goal of the histogram modification technique is to adjust the distribution of animage’s pixel values so as to produce an enhanced image in which certain gray level ranges are highlighted.In the case of histogram equalization, the goal is to enhance the contrast of an image by producing anenhanced image in which the pixel values are evenly distributed, i.e., the histogram of the enhanced image isflat.

Histogram modification is accomplished via a transfer function

where the variable z, z1 e z e zn, represents a gray level value of the source image and g, g1 e g e gm, is thevariable that represents a gray level value of the enhanced image. The method of deriving a transfer functionto produce an enhanced image with a prescribed distribution of pixel values can be found in either Gonzalezand Wintz [1] or Pratt [2].

Table 2.12.1 lists the transfer functions for several of the histogram modification examples featured in Pratt[2]. In the table P(z) is the cumulative probability distribution of the input image and ± is an empiricallyderived constant. The image algebra formulations for histogram modification based on the transfer functionsin Table 2.12.1 are presented next.

Table 2.12.1 Histogram Transfer Functions

Name of modification Transfer function T(z)

Uniform modification: g=[gm - g1]P(z) + g1

Exponential modification:

Rayleigh modification:

Hyperbolic cube root modification:

Hyperbolic logarithmic modification:


Let denote the source image and the normalized cumulative histogram of a as defined in Section10.11. Let g1 and gm denote the grey value bounds for the enhanced image. Table 2.12.2 below describes theimage algebra formulation for obtaining the enhanced image b using the histogram transform functionsdefined in Table 2.12.1.








Search Tips

Advanced Search


Search this book:


2.13. Lowpass Filtering

Lowpass filtering is an image enhancement process used to attenuate the high-frequency components of animage’s Fourier transform [1]. Since high-frequency components are associated with sharp edges, lowpassfiltering has the effect of smoothing the image.

Suppose where . Let â denote the Fourier transform of a, and denote the lowpassfilter transfer function. The enhanced image g is given by

where is the filter which attenuates high frequencies, and denotes the inverse Fourier transform.Sections 8.2 and 8.4 present image algebra implementations of the Fourier transform and its inverse.

Table 2.12.2 Image Algebra Formulation of Histogram Transfer Functions

Name of modification Image algebra formulation

Uniform modification:

Exponential modification:

Rayleigh modification:

Hyperbolic cube root modification:

Hyperbolic logarithmic modification:

Let be the distance from the point (u, v) to the origin of the frequency plane; that is,












The transfer function of the ideal lowpass filter is given by

where d is a specified nonnegative quantity, which represents the cutoff frequency.

The transfer function of the Butterworth lowpass filter of order k is given by

where c is a scaling constant. Typical values for c are 1 and .

The transfer function of the exponential lowpass filter is given by

Typical values for a are 1 and .

The images that follow illustrate some of the properties of lowpass filtering. When filtering in the frequencydomain, the origin of the image is assumed to be at the center of the display. The Fourier transform image hasbeen shifted so that its origin appears at the center of the display (see Section 8.2).

Figure 2.13.1 is the original image of a noisy angiogram. The noise is in the form a sinusoidal wave. Figure2.13.2 represents the power spectrum image of the noisy angiogram. The noise component of the originalimage shows up as isolated spikes above and below the center of the frequency image. The noise spikes in thefrequency domain are easily filtered out by an ideal lowpass filter whose cutoff frequency d falls within thedistance from the center of the image to the spikes.

Figure 2.13.3 shows the result of applying an ideal lowpass filter to 2.13.2 and then mapping back to spatialcoordinates via the inverse Fourier transform. Note how the washboard effect of the sinusoidal noise has beenremoved.

One artifact of lowpass filtering is the “ringing” which can be seen in Figure 2.13.2. Ringing is caused by theideal filter’s sharp cutoff between the low frequencies it lets pass and the high frequencies it suppresses. TheButterworth filter offers a smooth discrimination between frequencies, which results in less severe ringing.

Figure 2.13.1 Noisy angiogram.

The lowpass filtering used above was successful in removing the noise from the angiogram; however, thefiltering blurred the true image. The image of the angiogram before noise was added is seen in Figure 2.13.4.The blurring introduced by filtering is seen by comparing this image with Figure 2.13.3.

Lowpass filtering blurs an image because edges and other sharp transitions are associated with the highfrequency content of an image’s Fourier spectrum. The degree of blurring is related to the proportion of the

spectrum’s signal power that remains after filtering. The signal power P of is defined by

The percentage of power that remains after filtering a using an ideal filter with cutoff frequency d is

Figure 2.13.2 Fourier transform of noisy angiogram.

Figure 2.13.3 Filtered angiogram using ideal filter.

As d increases, more and higher frequencies are encompassed by the circular region that makes up the support

of . Thus, as d increases the signal power of the filtered image increases and blurring decreases. The toptwo images of Figure 2.13.5 are those of an original image (peppers) and its power spectrum. The lower fourimages show the blurring that results from filtering with ideal lowpass filters whose cutoff frequenciespreserve 90, 93, 97, and 99% of the original image’s signal power.

The blurring caused by lowpass filtering is not always undesirable. In fact, lowpass filtering may be used as asmoothing technique.





















Search Tips

Advanced Search


Search this book:



The image algebra formulation of the lowpass filter is roughly that of the mathematical formulation presentedearlier. However, there are a few subtleties that a programmer needs to be aware of when implementinglowpass filters.

Figure 2.13.4 Angiogram before noise was introduced.

The transfer function

represents a disk of radius d of unit height centered at the origin (0, 0). On the other hand, the Fourier

transform of an image , where , results in a complex valued image . In

other words, the location of the origin with respect to is not the center of X, which is the point ,but the upper left-hand corner of â. Therefore, the product â is undefined. Simply shifting the image â so thatthe midpoint of X moves to the origin, or shifting the disk image so that the disk’s center will be located at themidpoint of X, will result in properly aligned images that can be multiplied, but will not result in a lowpassfilter since the high frequencies, which are located at the corners of â, would be eliminated.

There are various options for the correct implementation of the lowpass filter. One such options is to centerthe spectrum of the Fourier transform at the midpoint of X (Section 8.3) and then multiplying the centeredtransform by the shifted disk image. The result of this process needs to be uncentered prior to applying theinverse Fourier transform. The exact specification of the ideal lowpass filter is given by the following












algorithm.

Suppose and , where m and n are even integers. Define the point set

and set

Figure 2.13.5 Lowpass filtering with ideal filter at various power levels.

The algorithm now becomes

In this algorithm, 1 and 0 denote the complex-valued unit and zero image, respectively.

If U is defined as instead, then it becomes necessary to define as

, where the extension to 0 is on the array X, prior to multiplying â by . Figure 2.13.6provides a graphical interpretation of this algorithm.

Another method, which avoids centering the Fourier transform, is to specify the set

and translate the image directly to the corners of X in order to obtain thedesired multiplication result. Specifically, we obtain the following form of the lowpass filter algorithm:

Figure 2.13.6 Illustration of the basic steps involved in the ideal lowpass filter.

Here means restricting the image d to the point set X and then extending d on X to the zero image

wherever it is not defined on X. The basic steps of this algorithm are shown in Figure 2.13.7.

Figure 2.13.7 Illustration of the basic steps as specified by the alternate version of the lowpass filter.

As for the ideal lowpass filter, there are several way for implementing the kth-order Butterworth lowpass

filter. The transfer function for this filter is defined over the point set , and is defined by

where c denotes the scaling constant and the value needs to be converted to a complex value. For

correct multiplication with the image â, the transfer function needs to be shifted to the corners of X. Theexact specification of the remainder of the algorithm is given by the following two lines of pseudocode:

The transfer function for exponential lowpass filtering is given by

where and needs to be a complex valued. The remainder of the algorithm isidentical to the Butterworth filter algorithm.

2.14. Highpass Filtering

Highpass filtering enhances the edge elements of an image based on the fact that edges and abrupt changes ingray levels are associated with the high-frequency components of an image’s Fourier transform (Sections 8.2

and 8.4). As before, suppose â denotes the Fourier transform of the source image a. If denotes a transferfunction which attenuates low frequencies and lets high frequencies pass, then the filtered enhancement of a is

the inverse Fourier transform of the product of â and . That is, the enhanced image g is given by

where denotes the inverse Fourier transform.

The formulation of the highpass transfer function is basically the complement of the lowpass transferfunction. Specifically, the transfer function of the ideal highpass filter is given by

where d is a specified nonnegative quantity which represents the cutoff frequency and

.

The transfer function of the Butterworth highpass filter of order k is given by

where c is a scaling constant. Typical values for c are 1 and .

The transfer function of the exponential highpass filter is given by

Typical values for a are 1 and ln .


Let denote the source image, where . Specify the point set ,

and define by

Once the transfer function is specified, the remainder of the algorithm is analogous to the lowpass filteralgorithm. Thus, one specification would be

This pseudocode specification can also be used for the Butterworth and exponential highpass filter transferfunctions which are described next.

The Butterworth highpass filter transfer function of order k with scaling constant c is given by

where

The transfer function for exponential highpass filtering is given by


Search Tips

Advanced Search


Search this book:


2.15. References


2 W. Pratt, Digital Image Processing. New York: John Wiley, 1978.

3 G. Ritter, J. Wilson, and J. Davidson, “Image algebra: An overview,” Computer Vision, Graphics,and Image Processing, vol. 49, pp. 297-331, Mar. 1990.

4 A. Lev, S. Zucker, and A. Rosenfeld, “Iterative enhancement of noisy images,” IEEE Transactionson Systems, Man, and Cybernetics, vol. SMC-7, pp. 435-442, June 1977.

5 H. Kramer and J. Bruchner, “Iterations of a non-linear transformation for the enhancement of digitalimages,” Pattern Recognition, vol. 7, pp. 53-58, 1975.

6 R. Haralick and L. Shapiro, Computer and Robot Vision, vol. I. New York: Addison-Wesley, 1992.

7 P. Coffield, “An architecture for processing image algebra operations,” in Image Algebra andMorphological Image Processing III, vol. 1769 of Proceedings of SPIE, (San Diego, CA), pp. 178-189,July 1992.

8 R. Gonzalez and R. Woods, Digital Image Processing. Reading, MA: Addison-Wesley, 1992.

9 L. Loo, K. Doi, and C. Metz, “Investigation of basic imaging properties in digital radiography 4.Effect of unsharp masking on the detectability of simple patterns,” Medical Physics, vol. 12, pp.209-214, Mar. 1985.

10 R. Cromartie and S. Pizer, “Edge-affected context for adaptive contrast enhancement,” in NinthConference on Information Processing in Medical Imaging, 1991.

11 J. Harris, “Constant variance enhancement: a digital processing technique,” Applied Optics, vol. 16,pp. 1268-1271, May 1977.

12 P. Narendra and R. Fitch, “Real-time adaptive contrast enhancement,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. PAMI-3, pp. 655-661, Nov. 1981.

13 D. Ballard and C. Brown, Computer Vision. Englewood Cliffs, NJ: Prentice-Hall, 1982.

14 A. Rosenfeld and A. Kak, Digital Picture Processing. New York, NY: Academic Press, 2nd ed.,1982.












Search Tips

Advanced Search


Search this book:


Chapter 3Edge Detection and Boundary Finding Techniques

3.1. Introduction

Edge detection is an important operation in a large number of image processing applications such as imagesegmentation, character recognition, and scene analysis. An edge in an image is a contour across which thebrightness of the image changes abruptly in magnitude or in the rate of change of magnitude.

Generally, it is difficult to specify a priori which edges correspond to relevant boundaries in an image. Imagetransforms which enhance and/or detect edges are usually task domain dependent. We, therefore, present awide variety of commonly used transforms. Most of these transforms can be categorized as belonging to oneof the following four classes: (1) simple windowing techniques to find the boundaries of Boolean objects, (2)transforms that approximate the gradient, (3) transforms that use multiple templates at different orientations,and (4) transforms that fit local intensity values with parametric edge models.

The goal of boundary finding is somewhat different from that of edge detection methods which are generallybased on intensity information methods classified in the preceding paragraph. Gradient methods for edgedetection, followed by thresholding, typically produce a number of undesired artifacts such as missing edgepixels and parallel edge pixels, resulting in thick edges. Edge thinning processes and thresholding may resultin disconnected edge elements. Additional processing is usually required in order to group edge pixels intocoherent boundary structures. The goal of boundary finding is to provide coherent one-dimensional boundaryfeatures from the individual local edge pixels.

3.2. Binary Image Boundaries

A boundary point of an object in a binary image is a point whose 4-neighborhood (or 8-neighborhood,depending on the boundary classification) intersects the object and its complement. Boundaries for binaryimages are classified by their connectivity and whether they lie within the object or its complement. The fourclassifications will be expanded upon in the discussion involving the mathematical formulation.

Binary image boundary transforms are thinning methods. They do not preserve the homotopy of the original












image. Boundary transforms can be especially useful when used inside of other algorithms that requirelocation of the boundary to perform their tasks. Many of the other thinning transforms in this chapter fall intothis category.

The techniques outlined below work by using the appropriate neighborhood to either enlarge or reduce the

region with the or operation, respectively. After the object has been enlarged or reduced, it isintersected with its original complement to produce the boundary image.

For a � {0, 1}X, let A denote the support of a. The boundary image b � {0, 1}X of a is classified by itsconnectivity and whether B 4 A or B 4 A2, where B denotes the support of b.

(a) The image b is an exterior 8-boundary image if B is 8-connected, B 4 A2, and B is the set of pointsin A2 whose 4-neighborhoods intersect A. That is,

(b) The image b is an interior 8-boundary image if B is 8-connected, B 4 A, and B is the set of pointsin A whose 4-neighborhoods intersect A2. The interior 8-boundary b can be expressed as

(c) The image b is an exterior 4-boundary image if B is 4-connected, B 4 A2, and B is the set of pointsin A2 whose 8-neighborhoods intersect A. That is, the image b is defined by

(d) The image b is an interior 4-boundary image if B is 4-connected, B 4 A, and B is the set of pointsin A whose 8-neighborhoods intersect A2. Thus,

Figure 3.2.1 below illustrates the boundaries just described. The center image is the original image. The8-boundaries are to the left, and the 4-boundaries are to the right. Exterior boundaries are black. Interiorboundaries are gray.

Figure 3.2.1 Interior and exterior 8-boundaries (left), original image (center), and interior and exterior4-boundaries (right).


The von Neumann neighborhood function N is used in the image algebra formulation for detecting8-boundaries, while the Moore neighborhood function M is used for detection of 4-boundaries. Theseneighborhoods are defined below.

Let a � {0, 1}X be the source image. The boundary image will be denoted by b.

(1) Exterior 8-boundary —

(2) Interior 8-boundary —

(3) Exterior 4-boundary —

(4) Interior 4-boundary —


These transforms are designed for binary images only. More sophisticated algorithms must be used for graylevel images.

Noise around the boundary may adversely affect results of the algorithm. An algorithm such as the salt andpepper noise removal transform may be useful in cleaning up the boundary before the boundary transform isapplied.

3.3. Edge Enhancement by Discrete Differencing

Discrete differencing is a local edge enhancement technique. It is used to sharpen edge elements in an imageby discrete differencing in either the vertical or horizontal direction, or in a combined fashion [1, 2, 3, 4].

Let be the source image. The edge enhanced image can be obtained by one of thefollowing difference methods:

(1) Horizontal differencing

or

(2) Vertical differencing

or

(3) Gradient approximation

or


Given the source image , the edge enhanced image is obtained by the appropriateimage-template convolution below.

(1) Horizontal differencing

where the invariant enhancement template r is defined by

or

where

(2) Vertical differencing

where the template t is defined by

or

where

(3) Gradient approximation

or

where the templates t and r are defined as above and v, w are defined by

and

The templates can be described pictorially as








Search Tips

Advanced Search


Search this book:



Figures 3.3.2 through 3.3.4 below are the edge enhanced images of the motorcycle image in Figure 3.3.1.

Figure 3.3.1 Original image.

Figure 3.3.2 Horizontal differencing: left |a •r|, right |a •s|.

Figure 3.3.3 Vertical differencing: left |a •t|, right |a •u|.

Figure 3.3.4 Gradient approximation: left |a •t| + |a •r|, right |a •v| + |a •w|.

3.4. Roberts Edge Detector

The Roberts edge detector is another example of an edge enhancement technique that uses discrete

differencing [5, 1]. Let be the source image. The edge enhanced image that theRoberts technique produces is defined by













Given the source image , the edge enhanced image is given by

where the templates s and t are defined by

The templates s and t can be represented pictorially as


Figure 3.4.1 shows the result of applying the Roberts edge detector to the image of a motorcycle.

Figure 3.4.1 Motorcycle and its Roberts edge enhanced image.

3.5. Prewitt Edge Detector

The Prewitt edge detector calculates an edge gradient vector at each point of the source image. An edgeenhanced image is produced from the magnitude of the gradient vector. An edge angle, which is equal to theangle the gradient makes to the horizontal axis, can also be assigned to each point of the source image [6, 7, 1,8, 4].

Two masks are convolved with the source image to approximate the gradient vector. One mask represents thepartial derivative with respect to x and the other the partial derivative with respect to y.

Let be the source image, and a0, a1, …, a7 denote the pixel values of the eight neighbors of (i, j)enumerated in the counterclockwise direction as follows:

Let u = (a5 + a6 + a7) - (a1 + a2 + a3) and v = (a0 + a1 + a7) - (a3 + a4 + a5). The edge enhanced image

is given by

and the edge direction image is given by


Let be the source image and let s and t be defined as follows:

Pictorially we have:

The edge enhanced image is given by

The edge direction image is given by

Here we use the common programming language convention for the arctangent of two variables which

defines


A variety of masks may be used to approximate the partial derivatives.

Figure 3.5.1 Motorcycle and the image that represents the magnitude component of the Prewitt edgedetector.

3.6. Sobel Edge Detector

The Sobel edge detector is a nonlinear edge enhancement technique. It is another simple variation of thediscrete differencing scheme for enhancing edges [9, 10, 1, 8, 4].

Let be the source image, and a0, a1, …, a7 denote the pixel values of the eight neighbors of (i, j)enumerated in the counterclockwise direction as follows:

The Sobel edge magnitude image is given by

where

and

The gradient direction image d is given by


Let be the source image. The gradient magnitude image is given by

where the templates s and t are defined as follows:

The gradient direction image is given by

where


The Sobel edge detection emphasizes horizontal and vertical edges over skewed edges; it is relativelyinsensitive to off-axis edges. Figure 3.6.1 shows the Sobel edge image of the motorcycle test image.

Figure 3.6.1 Motorcycle and its Sobel edge enhanced image.

3.7. Wallis Logarithmic Edge Detection

Under the Wallis edge detection scheme a pixel is an edge element if the logarithm of its value exceeds the

average of the logarithms of its 4-neighbors by a fixed threshold [2, 1]. Suppose denotes a sourceimage containing only positive values, and a0, a1, a2, a3 denote the values of the 4-neighbors of (i, j)enumerated as follows:

The edge enhanced image is given by

or


Let be the source image. The edge enhanced image is given by

where the invariant enhancement template is defined as follows:

Pictorially, t is given by


The Wallis edge detector is insensitive to a global multiplicative change in image values. The edge image of a

will be the same as that of n · a for any .

Note that if the edge image is to be thresholded, it is not necessary to compute the logarithm of a. That is, if

and

c := a •t,

then

Figure 3.7.1 shows the result of applying the Wallis edge detector to the motorcycle image. The logarithmused for the edge image is base 2 (b = 2).

Figure 3.7.1 Motorcycle and its Wallis edge enhancement.





















Search Tips

Advanced Search


Search this book:


3.8. Frei-Chen Edge and Line Detection

A 3 × 3 subimage b of an image a may be thought of as a vector in . For example, the subimage shown inFigure 3.8.1 has vector representation

The mathematical structure of the vector space carries over to the vector space of 3 × 3 subimages in theobvious way.

Figure 3.8.1 A 3 × 3 subimage.

Let V denote the vector space of 3 × 3 subimages. An orthogonal basis for V, , that is used for theFrei-Chen method is the one shown in Figure 3.8.2. The subspace E of V that is spanned by the subimages v1,v2, v3, and v4 is called the edge subspace of V. The Frei-Chen edge detection method bases its determinationof edge points on the size of the angle between the subimage b and its projection on the edge subspace [11,12, 13].












Figure 3.8.2 The basis used for Frei-Chen feature detection.

The angle ¸E between b and its projection on the edge subspace is given by

The " operator in the formula above is the familiar dot (or scalar) product defined for vector spaces. The dotproduct of b, c � V is given by

Small values of ¸E imply a better fit between b and the edge subspace.

For each point (x, y) in the source image a, the Frei-Chen algorithm for edge detection calculates the angle ¸E

between the projection of the 3 × 3 subimage centered at (x, y) and the edge subspace of V. The smaller thevalue of ¸E at (x, y) is, the better edge point (x ,y) is deemed to be. After ¸E is calculated for each image point, athreshold Ä is applied to select points for which ¸E d Ä.

Figure 3.8.3 shows the results of applying the Frei-Chen edge detection algorithm to a source image of varietyof peppers. Thresholds were applied for angles ¸E of 18, 19, 20, 21, and 22°.

Equivalent results can be obtained by thresholding based on the statistic

which is easier to compute. In this case, a larger value indicates a stronger edge point.

The Frei-Chen method can also be used to detect lines. Subimages v5, v6, v7, and v8 form the basis of the linesubspace of V. The Frei-Chen edge detection method bases its determination of lines on the size of the anglebetween the subimage b and its projection on the line subspace. Thresholding for line detection is done usingthe statistic

Larger values indicate stronger line points.


Let be the source image and let v(i)y denote the parameterized template whose values are definedby the image vi of Figure 3.8.2. The center cell of the image vi is taken to be the location of y for the template.For a given threshold level Ä, the Frei-Chen edge image e is given by

The Frei-Chen line image l is given by

Figure 3.8.3 Edge detection using the Frei-Chen method.

3.9. Kirsch Edge Detector

The Kirsch edge detector [14] applies eight masks at each point of an image in order to determine an edge

gradient direction and magnitude. Let be the source image. For each (i, j) we denote a0, a1, …, a7

as pixel values of the eight neighbors of (i, j) enumerated in the counterclockwise direction as follows:

The image that represents the magnitude of the edge gradient is given by

where sk = ak + ak+1 + ak+2 and tk = ak+3 + ak+4 + … + ak+7. The subscripts are evaluated modulo 8. Furtherdetails about this method of directional edge detection are given in [11, 8, 1, 14, 4].

The image d � {0, 1, …, 7}X that represents the gradient direction is defined by

The image d · 45° can be used if it is desirable to express gradient angle in degree measure. Figure 3.9.1 showthe magnitude image that results from applying the Kirsch edge detector to the motorcycle image.

Figure 3.9.1 Kirsch edge detector applied to the motorcycle image.


Let M = {-1, 0, 1} × {-1, 0, 1} denote the Moore neighborhood about the origin, and M2 = M\{(0, 0)} denotethe deleted Moore neighborhood.

Let f be the function that specifies the counterclockwise enumeration of M2 as shown below. Note that

and for each 8-neighbor x of a point (i, j), the point x2 defined by x2 = x - (i, j) is a member of M2.

For each k � {0, 1, …, 7}, we define t(k) as follows:

Thus, t can be pictorially represented as follows:

For the source image let bk := |a•t(k)|. The Kirsch edge magnitude image m is given by

The gradient direction image d is given by

In actual implementation, the algorithm for computing the image m will probably involve the loop

A similar loop argument can be used to construct the direction image d.





















Search Tips

Advanced Search


Search this book:


3.10. Directional Edge Detection

The directional edge detector is an edge detection technique based on the use of directional derivatives[15,16]. It identifies pixels as possible edge elements and assigns one of n directions to them. This is

accomplished by convolving the image with a set of edge masks. Each mask has two directions associatedwith it. A pixel is assigned the direction associated with the largest magnitude obtained by the convolutions.Suppose that the convolution with the ith mask yielded the largest magnitude. If the result of the convolutionwas positive, then the direction ¸i is assigned; otherwise, the direction ¸i + 180 (modulo 360) is assigned.

For a given source image , let ai = a * mi, where m0, m1, …, m(n/2)-1 denote the edge masks and *

denotes convolution. The edge magnitude image is given by

Let ¸i be the direction associated with the mask mi. Suppose that m(x) = |ak(x)|. The direction image

is given by

As an illustration, the directional edge technique is applied to the infrared image of a causeway across a bayas shown in Figure 3.10.1. The six masks of Figure 3.10.2 are used to determine one of twelve directions, 0°,30°, 60°, …, or 330°, to be assigned to each point of the source image. The result of applying the directionaledge detection algorithm is seen in Figure 3.10.3. In the figure, directions are presented as tangent vectors tothe edge rather than gradient vectors (tangent to the edge is orthogonal to the gradient). With thisrepresentation, the bridge and causeway can be characterized by two bands of vectors. The vectors in oneband run along a line in one directions, while those in the other band run along a line in the opposite direction.

The edge image representation is different from the source image representation in that it is a plot of a vectorfield. The edge magnitude image was thresholded (see Section 4.2), and vectors were attached to the pointsthat survived the thresholding process. The vector at each point has length proportional to the edge magnitude












at the point, and points in the direction of the edge.

Figure 3.10.1 Source image of causeway with bridge.

Figure 3.10.2 Edge mask with their associated directions.

Figure 3.10.3 Edge points with their associated directions.


Let be the source image. For , let

where f(r) is defined as

The result image b is given by

Note that the result b contains at each point both the maximal edge magnitude and the direction associatedwith that magnitude. As formulated above, integers 0 through n - 1 to represent the n directions.


Trying to include the edge direction information in Figure 3.10.3 on one page has resulted in a rather “busy”display. However, edge direction information is often very useful. For example, the adjacent parallel edges inFigure 3.10.3 that run in opposite directions are characteristic of bridge-like structures.

The major problem with the display of Figure 3.10.3 is that, even after thresholding, thick (more than onepixel wide) edges remain. The thinning edge direction technique of Section 5.7 is one remedy to this problem.It uses edge magnitude and edge direction information to reduce edges to one pixel wide thickness.

One other comment on directional edge detection technique concerns the size of the edge masks. Selecting anappropriate size for the edge mask is important. The use of larger masks allows a larger range of edgedirections. Also, larger masks are less sensitive to noise than smaller masks. The disadvantage of larger masksis that they are less sensitive to detail. The choice of mask size is application dependent.

3.11. Product of the Difference of Averages

If the neighborhoods over which differencing takes place are small, edge detectors based on discretedifferencing will be sensitive to noise. However, the ability to precisely locate an edge suffers as the size ofthe neighborhood used for differencing increases in size. The product of the differences of averages technique[17] integrates both large and small differencing neighborhoods into its scheme to produce an edge detector

that is insensitive to noise and that allows precise location of edges.

We first restrict our attention to the detection of horizontal edges. Let , where ,

be the source image. For define the image vk. by

The product

will be large only if each of its factors is large. The averaging that takes place over the large neighborhoodsmakes the factors with large k less likely to pick up false edges due to noise. The factors from largeneighborhoods contribute to the product by cancelling out false edge readings that may be picked up by thefactors with small k. As (x1, x2) moves farther from an edge point, the factors with small k get smaller. Thus,the factors in the product for small neighborhoods serve to pinpoint the edge point’s location.

The above scheme can be extended to two dimensions by defining another image h to be

where

The image h is sensitive to vertical edges. To produce an edge detector that is sensitive to both vertical andhorizontal edges, one can simply take the maximum of h and v at each point in X.

Figure 3.11.1 compares the results of applying discrete differencing and the product of the difference ofaverages techniques. The source image (labeled 90/10) in the top left corner of the figure is of a circularregion whose pixel values have a 90% probability of being white against a background whose pixel valueshave a 10% probability of being white. The corresponding regions of the image in the upper right corner haveprobabilities of white pixels equal to 70% and 30%. The center images show the result of taking themaximum from discrete differencing along horizontal and vertical directions. The images at the bottom of thefigure show the results of the product of the difference of average algorithm. Specifically, the figure’s edgeenhanced image e produced by using the product of the difference of averages is given by





















Search Tips

Advanced Search


Search this book:



Let be the source image. Define the parameterized templates h(k) and v(k) to be

and

where (y1, y2) = y. The edge enhanced image e produced by the product of the difference of averagesalgorithm is given by the image algebra statement

The templates used the formulation above is designed to be sensitive to vertical and horizontal edges.

3.12. Crack Edge Detection

Crack edge detection techniques comprise the family of algorithms that associate edge values with the pointslying between neighboring image pixels [18]. More precisely, if one considers a 2-dimensional image with4-neighbor connectivity, each spatial location lying exactly between two horizontally or verticallyneighboring pixels corresponds to a crack. Each such point is assigned an edge value. The name crack edgestems from the view of pixels as square regions with cracks in between them.












Figure 3.11.1 The product of the difference of averages edge detection method vs. discrete differencing.

Any of the edge detection methods presented in this chapter can be converted into a crack edge detector.Below, we consider the particular case of discrete differencing.

For a given source image , pixel edge horizontal differencing will yield an image suchthat b(i, j) = a(i, j) - a(i, j + 1) for all (i, j) � X. Thus if there is a change in value from point (i, j) to point (i, j +1), the difference in those values is associated with point (i, j).

The horizontal crack edge differencing technique, given , will yield a result image

where W = {(i, r) : (i, r - ), (i, r + ) � X}, that is, b’s point set corresponds to the cracks that lie betweenhorizontally neighboring points in X. The values of b are given by

One can compute the vertical differences as well, and store both horizontal and vertical differences in a singleimage with scalar edge values. (This cannot be done using pixel edges since each pixel is associated with both

a horizontal and vertical edge.) That is, given , we can compute where

and


Given source image , define spatial functions f1, f2, f3, f4 on as follows:

Next construct the point set

and define by

For a point with integral first coordinate, t can be pictured as follows:

and for a point with integral second coordinate, t can be pictured as

The crack edge image is given by


Some implementations of image algebra will not permit one to specify points with non-integral coordinates asare required above. In those cases, several different techniques can be used to represent the crack edges asimages. One may want to map each point (y1, y2) in Y to the point (2y1, 2y2). In such a case, the domain of the

crack edge image does not cover a rectangular subset of , all points involving two odd or two evencoordinates are missing.

Another technique that can be used to transform the set of crack edges onto a less sparse subset of is toemploy a spatial transformation that will rotate the crack edge points by angle À/4 and shift and scaleappropriately to map them onto points with integral coordinates. The transformation f : X ’ Z defined by

is just such a function. The effect of applying f to a set of edge points shown in Figure 3.12.1 below.

Figure 3.12.1 The effect of applying f to a set of edge points.


Crack edge finding can be used to find edges using a variety of underlying techniques. The primary benefit itprovides is the ability to detect edges of features of single pixel width.

Representation of crack edges requires one to construct images over pointsets that are either sparse, rotated, orcontain fractional coordinates. Such images may be difficult to represent efficiently.

3.13. Local Edge Detection in Three-Dimensional Images

This technique detects surface elements in three-dimensional data. It consists of approximating the surfacenormal vector at a point and thresholding the magnitude of the normal vector. In three-dimensional data,boundaries of objects are surfaces and the image gradient is the local surface normal [11, 19, 20].

For w = (x, y, z), let , and define g1 (w) = x/||w||, g2(w) = y/||w||, and g3(w) =z/||w||.

Suppose a denotes the source image and

denotes a three-dimensional m × m × m neighborhood about the point w0 = (x0, y0, z0). The components of thesurface normal vector of the point w0 are given by

Thus, the surface normal vector of w0 is

The edge image e for a given threshold T0 is given by


Let be the source image. The edge image e is defined as

where = gi (w - w0) if w � M(w0). For example, consider M(w0) a 3 × 3 × 3 domain. The figurebelow (Figure 3.13.1) shows the domain of the gi’s.

Figure 3.13.1 Illustration of a three-dimensional 3 × 3 × 3 neighborhood.

Fixing z to have value z0, we obtain

Fixing z to have value z0 + 1, we obtain

Fixing z to have value z0 - 1, we obtain


Search Tips

Advanced Search


Search this book:



The derivation uses a continuous model and the method minimizes the mean-square error with an ideal stepedge.

3.14. Hierarchical Edge Detection

In this edge detection scheme, a pyramid structure is developed and used to detect edges. The idea is toconsolidate neighborhoods of certain pixels into one pixel with a value obtained by measuring some localfeatures, thus forming a new image. The new image will have lower resolution and fewer pixels. Thisprocedure is iterated to obtain a pyramid of images. To detect an edge, one applies a local edge operator toan intermediate resolution image. If the magnitude of the response exceeds a given threshold at a pixel thenone proceeds to the next higher resolution level and applies the edge operator to all the corresponding pixelsin that level. The procedure is repeated until the source image resolution level is reached [21, 11].

For k � {0,1, …, l - 1}, let denote the images in a pyramid, where Xk = {(i, j) : 0 d i, j d 2k - 1}.

The source image is . Given al - 1, one can construct, a0, …, al - 2 in the following manner:

The algorithm starts at some intermediate level, k, in the pyramid. This level is typically chosen to be

.

The image containing boundary pixels at level k is formed by

bk(i, j) = |ak(i, j - 1) - ak(i, j + 1)| + |ak(i + 1, j) - ak(i - 1, j)|

for all (i, j) � Xk.

If, for some given threshold T, bk(i, j) > T, then apply the boundary pixel formation technique at points (2i,












2j), (2i + 1, 2j), (2i, 2j + 1), (2i + 1, 2j + 1) in level k + 1. All other points at level k + 1 are assigned value 0.Boundary formation is repeated until level l - 1 is reached.


For any non-negative integer k, let Xk denote the set {(i, j) : 0 d i, j d 2k - 1}. Given a source image

construct a pyramid of images for k � {0, 1, …, l - 2} as follows:

where

For , construct the boundary images bk,…, bl - 1 using the statement:

bk := abs(ak • t1) + abs(ak • t2),

where

To construct the boundary images at level m where k < m d l - 1, first define

Next let be defined by

g(i, j) = {(2i, 2j), (2i + 1, 2j), (2i, 2j + 1), (2i + 1, 2j + 1)}

and be defined by

The effect of this mapping is shown in Figure 3.14.1.

Figure 3.14.1 Illustration of the effect of the pyramidal map .

The boundary image bm is computed by

where 0 denotes the zero image on the point set Xm.


The above image algebra formulation gives rise to a massively parallel process operating at each point in thepyramid. One can restrict the operator to only those points in levels m > k where the boundary threshold issatisfied as described below.

Let g1(i, j) = {(i, j), (i - 1, j), (i + 1, j), (i, j - 1), (i, j + 1)}. This maps a point (i, j) into its von Neumann

neighborhood. Let be defined by

Note that in contrast to the functions g and , g1 and are functions defined on the same pyramid level.We now compute bm by


This technique ignores edges that appear at high-resolution levels which do not appear at lower resolutions.

The method of obtaining intermediate-resolution images could easily blur, or wipe out edges at highresolutions.

3.15. Edge Detection Using K-Forms

The K-forms technique encodes the local intensity difference information of an image. The codes arenumbers expressed either in ternary or decimal form. Local intensity differences are calculated and arelabeled 0, 1, or 2 depending on the value of the difference. The K-form is a linear function of these labels,where k denotes the neighborhood size. The specific formulation, first described in Kaced [22], is asfollows.

Let a0 be a pixel value and let a1, …, a8 be the pixels values of the 8-neighbors of a0 represented pictoriallyas

Set

el = al - a0

for all l � {1, …, 8} and for some positive threshold number T define p : as

Some common K-form neighborhoods are pictured in Figure 3.15.1. We use the notation

Figure 3.15.1 Common K-forms in a 3 × 3 window.

to denote the K-form with representation in base b, having a neighborhood containing n pixelneighbors of the reference pixel, and having orientation characterized by the roman letter c. Thus the

horizontal decimal 2-form shown in Figure 3.15.1 is denoted , having base 10, 2 neighbors, andhorizontal shape. Its value is given by





















Search Tips

Advanced Search


Search this book:


The value of is a single-digit decimal number encoding the topographical characteristics of the point inquestion. Figure 3.15.2 graphically represents each of the possible values for the horizontal 2-form showing apixel and the orientation of the surface to left and right of that pixel. (The pixel is represented by a large dot,and the surface orientation on either side is shown by the orientation of the line segments to the left and rightof the dot.)

Figure 3.15.2 The nine horizontal 2-forms and their meaning.

The vertical decimal 2-form is defined to be

The ternary K-forms assign a ternary number to each topographical configuration. The ternary horizontal2-form is given by

and the ternary normal 4-form is

Note that the multiplicative constants are represented base 3, thus these expressions form the ternary numberyielded by concatenating the results of application of p.

The values of other forms are calculated in a similar manner.

Three different ways of using the decimal 2-forms to construct a binary edge image are the following:












(a) An edge in relief is associated with any pixel whose K-form value is 0, 1, or 3.

(b) An edge in depth is associated with any pixel whose K-form value is 5, 7, or 8.

(c) An edge by gradient threshold is associated with a pixel whose K-form is not 0, 4, or 8.


Let be the source image. We can compute images for l � {1, …, 8}, representing thefundamental edge images from which the K-forms are computed, as follows:

where the are defined as shown in Figure 3.15.3.

Figure 3.15.3 K-form fundamental edge templates.

Thus, for example, t(1) is defined by

Let the function p be defined as in the mathematical formulation above. The decimal horizontal 2-form isdefined by

Other K-forms are computed in a similar manner.

Edge in relief is computed as

Edge in depth is computed as

Edge in threshold is computed as


The K-forms technique captures the qualitative notion of topographical changes in an image surface.

The use of K-forms requires more computation than gradient formation to yield the same information asmultiple forms must be computed to extract edge directions.

3.16. Hueckel Edge Operator

The Hueckel edge detection method is based on fitting image data to an ideal two-dimensional edge model [1,23, 24]. In the one-dimensional case, the image a is fitted to a step function

If the fit is sufficiently accurate at a given location, an edge is assumed to exist with the same parameters asthe ideal edge model. (See Figure 3.16.1.) An edge is assumed if

Figure 3.16.1 One-dimensional edge fitting.

the mean-square error

is below some threshold value.

In the two-dimensional formulation the ideal step edge is defined as

where Á represents the distance from the center of a test disk D of radius R to the ideal step edge (Á d R), and¸ denotes the angle of the normal to the edge as shown in Figure 3.16.2.

Figure 3.16.2 Two-dimensional edge fitting.

The edge fitting error is

In Hueckel’s method, both image data and the ideal edge model are expressed as vectors in the vector space ofcontinuous functions over the unit disk D = {(x, y) : x2 + y2 d 1}. A basis for this vector space — also knownas a Hilbert space — is any complete sequence of orthonormalized continuous functions {hi : i = 0, 1, …}with domain D. For such a basis, a and s have vector form a = (a1, a2, …) and s = (s1, s2, …), where

and

For application purposes, only a finite number of basis functions can be used. Hueckel truncates the infiniteHilbert basis to only eight functions, h0, h1, …, h7. This provides for increased computational efficiency.Furthermore, his sub-basis was chosen so as to have a lowpass filtering effect for inherent noise smoothing.Although Hueckel provides an explicit formulation for his eight basis functions, their actual derivation hasnever been published!

Having expressed the signal a and edge model s in terms of Hilbert vectors, minimization of the mean-square

error of Equation 3.16.1 can be shown to be equivalent to minimization of . Hueckel hasperformed this minimization by using some simplifying approximations. Also, although a is expressed interms of vector components a0, …, a7, s(x, y) is defined parametrically in terms of the parameters (b, h, Á, ¸).

The exact discrete formulation is given below.

Definition of the eight basis functions. Let D (x0, y0) be a disk with center (x0, y0) and radius R. For each (x,y) � D(x0, y0), define

and

Then

Note that when , then Q(r) = 0. Thus, on the boundary of D each of thefunctions hi is 0. In fact, the functions hi intersect the disk D as shown in the following figure:

Figure 3.16.3 The intersection D graph(hi).

In his algorithm, however, Hueckel uses the functions Hi (i = 0, 1, …, 7) instead of hi (i = 0, 1, …, 7) in orderto increase computation efficiency. This is allowable since the Hi’s are also linearly independent and span thesame eight-dimensional subspace.


Search Tips

Advanced Search


Search this book:


As a final remark, we note that the reason for normalizing r — i.e., instead of

forcing 0 d r d 1 — is to scale the disk D(x0, y0) back to the size of the unitdisk, so that a restricted to D(x0, y0) can be expressed as a vector in the Hilbert space of continuous functionson D.

Hueckel’s algorithm. The algorithm proceeds as follows: Choose a digital disk of radius R (R is usually 4, 5,6, 7, or 8, depending on the application) and a finite number of directions ¸j (j = 1,…, m) — usually ¸ = 0°,45°, 90°, 135°.

Step 1. For i = 0, 1, …, 7, compute

Step 2. For j = 0, 1, …, m, compute e0(¸j) = a2cos¸j + a3sin¸j;

Step 3. For j = 0, 1, …, m, compute e1(¸j) = a4cos¸j + a5sin¸j;

Step 4. For j = 0, 1, …, m, compute e2(¸j) = a1 + a6(cos2¸j - sin2¸j) + a72sin¸jcos¸j;

Step 5. For j = 0, 1, …, m, compute

Step 6. For j = 0, 1, …, m, compute ›(¸j) = e0(¸j) + u(¸j);

Step 7. Find k such that |›(¸k)| e |›(¸j)|, j = 0, 1, … , m;

Step 8. Calculate

Step 9. Set












and define

Remarks. The edge image e has either pixel value zero or the parameter (b, h, Á, ¸k). Thus, with each edgepixel we obtain parameters describing the nature of the edge. These parameters could be useful in furtheranalysis of the edges. If only the existence of an edge pixel is desired, then the algorithm can be shortened bydropping STEP 8 and defining

Note that in this case we also need only compute a1, …, a7 since a0 is only used in the calculation of b.

It can be shown that the constant K in STEP 9 is the cosine of the angle between the vectors a = (a0, a1, …, a7)and s = (s0, s1, …, s7). Thus, if the two vectors match exactly, then the angle is zero and K = 1; i.e., a isalready an ideal edge. If K < 0.9, then the fit is not a good fit in the mean-square sense. For this reason thethreshold T is usually chosen in the range 0.9 d T < 1. In addition to thresholding with T, a further test is oftenmade to determine if the edge constant factor h is greater than a threshold factor.

As a final observation we note that K would have remained the same whether the basis Hi or hi had been usedto compute the components ai.


In the algorithm, the user specifies the radius R of the disk over which the ideal step edge is to be fitted to theimage data, a finite number of directions ¸0, ¸1, … , ¸m, and a threshold T.

Let denote the source image. Define a parameterized template , where i = 0, 1,…, 7, by

Here y = (y1, y2) corresponds to the center of the disk D(y) of radius R and x = (x1, x2). For j in {0, 1, …, m},

define by fj(r) = (r,|r|,¸j).

The algorithm now proceeds as follows:

Remarks. Note that e is a vector valued image of four values. If we only seek a Boolean edge image, thenthe formulation can be greatly simplified. The first two loops remain the same and after the second loop thealgorithm is modified as follows:

Furthermore, one convolution can be saved as a0 need not be computed.


Experimental results indicate that the Hueckel operator performs well in noisy and highly texturedenvironments. The operator not only provides edge strength, but also information as to the height of the stepedge, orientation, and distance from the edge pixel.

The Hueckel operator is highly complex and computationally very intensive. The complexity renders itdifficult to analyze the results theoretically.








Search Tips

Advanced Search


Search this book:


3.17. Divide-and-Conquer Boundary Detection

Divide-and-conquer boundary detection is used to find a boundary between two known edge points. If twoedge points are known to be on a boundary, then one searches along the perpendiculars of the line segmentjoining the two points, looking for an edge point. The method is recursively applied to the resulting 2 pairs ofpoints. The result is an ordered list of points approximating the boundary between the two given points.

The following method was described in [11]. Suppose a � {0, 1}X is an edge image defined on a point set X, pand q denote two edge points, and L(p, q) denotes the line segment joining them. Given D > 0, define arectangular point set S of size 2D × ||p - q|| as shown in Figure 3.17.1

Figure 3.17.1 Divide-and-conquer region of search.

Let Y be the set of all the edge points in the rectangular point set S. Next, choose point r from Y such that thedistance from r to L(p, q) is maximal. Apply the above procedure to the pairs (p, r) and (r, q).


Let a � {0,1}X be an edge image, and p = (p1, p2), q = (q1, q2) � X be the two edge points of a. Let L(x, y)denote the line segment connecting points x and y, and let d(x, L) denote the distance between point x and lineL.

Find a rectangular point set S with width 2D about the line segment connecting p and q. Assume without loss

of generality that p1 d q1. Let . Let points s, t, u, v be defined as follows:












Figure 3.17.2 shows the relationships of s, t, u, v, and S to p and q.

Let f(j, x, y), where x = (x1, x2) and y = (y1, y2), denote the first coordinate of the closest point to x on L(x, y)having second coordinate j, that is,

Note that f is partial in that it is defined only if there is a point on L(x, y) with second coordinate j.

Furthermore, let F(j, w, x, y, z) denote the set ; that is, the set ofpoints with second coordinate j bounded by L(w, x) and L(y, z).

Figure 3.17.2 Variables characterizing the region of search.

We can then compute the set S containing all integral points in the rectangle described by corner points s, t, u,and v as follows:

Let . If card(Y) = 0, the line segment joining p and q defines the boundary betweenp and q, and our work is finished.

If card(Y) > 0, define image by b(x) = d(x, L(p, q)). Let r = choice(domain ). Thus, asshown in Figure 3.17.3, r will be an edge point in set S farthest from line segment L(p, q). The algorithm canbe repeated with points p and r, and with points r and q.

Figure 3.17.3 Choice of edge point r.


This technique is useful in the case that a low curvature boundary is known to exist between edge elementsand the noise levels in the image are low. It could be used for filling in gaps left by an edgedetecting-thresholding-thinning operation.

Difficulties arise in the use of this technique when there is more than one candidate point for the new edgepoint, or when these points do not lie on the boundary of the object being segmented. In our presentation ofthe algorithm, we make a non-deterministic choice of the new edge point if there are several candidates. Weassume that the edge point selection method — which is used as a preprocessing step to boundary detection— is robust and accurate, and that all edge points fall on the boundary of the object of interest. In practice,this does not usually occur, thus the sequence of boundary points detected by this technique will often beinaccurate.

3.18. Edge Following as Dynamic Programming

The purpose of this particular technique is to extract one high-intensity line or curve of fixed length andlocally low-curvature from an image using dynamic programming. Dynamic programming in this particularcase involves the definition and evaluation of a figure-of-merit (FOM) function that embodies a notion of“best curvature.” The evaluation of the FOM function is accomplished by using the multistage optimizationprocess described below (see also Ballard and Brown [11]).

Let be a finite point set and suppose g is a real-valued function of n discrete variables with eachvariable a point of X. That is,

The objective of the optimization process is to find an n-tuple � Xn such that

It can be shown that if g can be expressed as a sum of functions

then a multistage optimization procedure using the following recursion process can be applied:

1. Define a sequence of functions recursively by setting

for k = 1,…, n - 1.

2. For each y � X, determine the point mk+1(y) = xk � X such that

that is, find the point xk for which the maximum of the function

is achieved.

The n-tuple can now be computed by using the recursion formula

3. Find � X such that .

4. Set for k = n - 1, …, 1.

The input to this algorithm is an edge enhanced image with each pixel location x having edge magnitude e(x)

e 0 and edge direction . Thus, only the eight octagonal edge directions 0, 1, …, 7 are allowed (seeFigure 3.18.2).





















Search Tips

Advanced Search


Search this book:


In addition, an a priori determined integer n, which represents the (desired) length of the curve, must besupplied by the user as input.

For two pixel locations x, y of the image, the directional difference q(x, y) is defined by

The directional difference q is a parameter of the FOM function g which is defined as

subject to the constraints

(i) e(xk) > 0 (or e(xk) > k0 for some threshold k0),

(ii) , and

(iii) q(xk, xk+1) d 1.

The solution obtained through the multistage optimization process will be the optimal curvewith respect to the FOM function g.

Note that constraint (i) restricts the search to actual edge pixels. Constraint (ii) further restricts the search byrequiring the point xk to be an 8-neighbor of the point xk+1. Constraint (iii), the low-curvature constraint,narrows the search for adjacent points even further by restricting the search to three points for each point onthe curve. For example, given xk+1 with d(xk+1) = 0, then the point xk preceding xk+1 can only have directionalvalues 0, 1, or 7. Thus, xk can only be one of the following 8-neighbors of xk+1 shown in Figure 3.18.1.

Figure 3.18.1 The three possible predecessors of xk+1.












In this example, the octagonal directions are assumed to be oriented as shown in Figure 3.18.2.

Figure 3.18.2 The eight possible directions d(x).

In the multistage optimization process discussed earlier, the FOM function g had no constraints. In order toincorporate the constraints (i), (ii), and (iii) into the process, we define two step functions s1 and s2 by

and

Redefining g by

where

for k = 1, …, n - 1, results in a function which has constraints (i), (ii), and (iii) incorporated into its definitionand has the required format for the optimization process. The algorithm reduces now to applying steps 1

through 4 of the optimization process to the function g. The output of this processrepresents the optimal curve of length n.


The input consists of an edge magnitude image e, the corresponding direction image d, and a positive integern representing the desired curve length. The edge magnitude/direction image (e, d) could be provided by thedirectional edge detection algorithm of Section 3.10.

Suppose . Define a parameterized template by

and a function by

The multistage edge following process can then be expressed as

Note that by restricting f1 to one may further reduce the search area for the optimal

curve .


The algorithm provides the optimal curve of length n that satisfies the FOM function. The method is veryflexible and general. For example, the length n of the curve could be incorporated as an additional constraintin the FOM function or other desirable constraints could be added.

The method has several deficiencies. Computation time is much greater than many other local boundaryfollowing techniques. Storage requirements (tables for mk+1) are high. The length n must somehow bepredetermined. Including the length as an additional constraint into the FOM function drastically increasesstorage and computation requirements.





















Search Tips

Advanced Search


Search this book:


3.19. References



3 A. Rosenfeld and A. Kak, Digital Image Processing. New York: Academic Press, 1976.

4 R. Haralick and L. Shapiro, Computer and Robot Vision, vol. I. New York: Addison-Wesley, 1992.

5 L. G. Roberts, “Machine perception of three-dimensional solids,” in Optical and Electro-OpticalInformation Processing (J. Tippett, ed.), pp. 159-197, Cambridge, MA: MIT Press, 1965.

6 M. Lineberry, “Image segmentation by edge tracing,” Applications of Digital Image Processing IV,vol. 359, 1982.

7 J. Prewitt, “Object enhancement and extraction,” in Picture Processing and Psychopictorics (B.Lipkin and A. Rosenfeld, eds.), New York: Academic Press, 1970.

8 A. Rosenfeld and A. Kak, Digital Picture Processing. New York. NY: Academic Press, 2nd ed.,1982.

9 I. Sobel, Camera Models and Machine Perception. PhD thesis, Stanford University, Stanford, CA,1970.

10 R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York, NY: John Wiley &Sons, 1973.


12 W. Frei and C. Chen, “Fast boundary detection: A generalization and a new algorithm,” IEEETransactions on Electronic Computers, vol. C-26, Oct. 1977.


14 R. Kirsch, “Computer determination of the constituent structure of biological images,” Computersand Biomedical Research, vol. 4, pp. 315-328, June 1971.

15 R. Nevatia and K. Babu, “Linear feature extraction and description,” Computer Graphics andImage Processing, vol. 13, 1980.

16 G. Ritter, J. Wilson, and J. Davidson, “Image algebra: An overview,” Computer Vision, Graphics,












and Image Processing, vol. 49, pp. 297-331, Mar. 1990.

17 A. Rosenfeld, “A nonlinear edge detection technique,” Proceedings of the IEEE, Letters, vol. 58,May 1970.

18 J. M. Prager, “Extracting and labeling boundary segments in natural scenes,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 1, Jan. 1980.

19 S. Zucker and R. Hummel, “A three-dimensional edge operator,” Tech. Rep. TR-79-10, McGillUniversity, 1979.

20 H. Liu, “Two- and three dimensional boundary detection,” Computer Graphics and ImageProcessing, vol. 6, no. 2, 1977.

21 S. Tanimoto and T. Pavlidis, “A hierarchical data structure for picture processing,” ComputerGraphics and Image Processing, vol. 4, no. 2, pp. 104-119, 1975.

22 A. Kaced, “The K-forms: A new technique and its applications in digital image processing,” inIEEE Proceedings 5th International Conference on Pattern Recognition, 1980.

23 M. Hueckel, “An operator which locates edges in digitized pictures,” Journal of the ACM, vol. 18,pp. 113-125, Jan. 1971.

24 M. Hueckel, “A local visual operator which recognizes edges and lines,” Journal of the ACM, vol.20, pp. 634-647, Oct. 1973.





















Search Tips

Advanced Search


Search this book:


Chapter 4Thresholding Techniques

4.1. Introduction

This chapter describes several standard thresholding techniques. Thresholding is one of the simplest and mostwidely used image segmentation techniques. The goal of thresholding is to segment an image into regions ofinterest and to remove all other regions deemed inessential. The simplest thresholding methods use a singlethreshold in order to isolate objects of interest. In many cases, however, no single threshold provides a goodsegmentation result over an entire image. In such cases variable and multilevel threshold techniques based onvarious statistical measures are used. The material presented in this chapter provides some insight intodifferent strategies of threshold selection.

4.2. Global Thresholding

The global thresholding technique is used to isolate objects of interest having values different from thebackground. Each pixel is classified as either belonging to an object of interest or to the background. This isaccomplished by assigning to a pixel the value 1 if the source image value is within a given threshold rangeand 0 otherwise [1]. A sampling of algorithms used to determine threshold levels can be found in Sections4.6, 4.7, and 4.8.

The global thresholding procedure is straightforward. Let be the source image and [h, k] be a giventhreshold range. The thresholded image b � {0, 1}X is given by

for all x � X.

Two special cases of this methodology are concerned with the isolation of uniformly high values or uniformlylow values. In the first the thresholded image b will be given by












while in the second case

where k denotes the suitable threshold value.


Let be the source image and [h, k] be a given threshold range. The thresholded result image b � {0,1}X can be computed using the characteristic function

The characteristic functions

and

can be used to isolate object of high values and low values, respectively.


Global thresholding is effective in isolating objects of uniform value placed against a background of differentvalues. Practical problems occur when the background is nonuniform or when the object and backgroundassume a broad range of values. Note also that

4.3. Semithresholding

Semithresholding is a useful variation of global thresholding [1]. Pixels whose values lie within a giventhreshold range retain their original values. Pixels with values lying outside of the threshold range are set to 0.

For a source image and a threshold range [h, k], the semithresholded image is given by

for all x � X.

Regions of high values can be isolated using

and regions of low values can be isolated using


The image algebra formulation for the semithresholded image over the range of values [h, k] is

The images semithresholded over the unbounded sets [k, �) and (- �, k] are given by

and

respectively.


The semithresholded image can also be derived by restricting the source image to those points whose valueslie in the threshold range, and then extending the restriction to X with value 0. The image algebra formulationfor this method of semithresholding is

If appropriate, instead of constructing a result over X, one can construct the subimage c of a containing onlythose pixels lying in the threshold range, that is,


Figures 4.3.2 and 4.3.3 below show the thresholded and semithresholded images of the original image of theThunderbirds in Figure 4.3.1.

Figure 4.3.1 Image of Thunderbirds a.

Figure 4.3.2 Thresholded image of Thunderbirds b := Ç d 190 (a).

Figure 4.3.3 Semithresholded image of Thunderbirds b := a · Ç d 190 (a).

4.4. Multilevel Thresholding

Global thresholding and semithresholding techniques (Sections 4.2 and 4.3) segment an image based on theassumption that the image contains only two types of regions. Certainly, an image may contain more than twotypes of regions. Multilevel thresholding is an extension of the two earlier thresholding techniques that allowsfor segmentation of pixels into multiple classes [1].

For example, if the image histogram contains three peaks, then it is possible to segment the image using twothresholds. These thresholds divide the value set into three nonoverlapping ranges, each of which can beassociated with a unique value in the resulting image. Methods for determining such threshold values arediscussed in Sections 4.6, 4.7, and 4.8.

Let be the source image, and let k1,...,kn be threshold values satisfying k1 > k2 > ... > kn. These values

partition into n + 1 intervals which are associated with values v1,...,vn+1 in the thresholded result image. A

typical sequence of result values might be . The thresholded image is defined by


Define the function by

The thresholded image can be computed by composing f with a, i.e.,

4.5. Variable Thresholding

No single threshold level may produce good segmentation results over an entire image. Variable thresholdingallows different threshold levels to be applied to different regions of a image.

For example, objects may contrast with the background throughout an image, but due to uneven illumination,objects and background may have lower values on one side of the image than on the other. In such instances,the image may be subdivided into smaller regions. Thresholds are then established for each region and global(or other) thresholding is applied to each subimage corresponding to a region [1].

The exact methodology is as follows. Let be the source image, and let image denote theregion threshold value associated with each point in X, that is, d(x) is the threshold value associated with theregion in which point x lies. The thresholded image b � {0, 1}X is defined by





















Search Tips

Advanced Search


Search this book:



The thresholded image b can be computed as follows:


Variable thresholding is effective for images with locally bimodal histograms. This method will produce thedesired results if the objects are relatively small and are not clustered too close together. The subimagesshould be large enough to ensure that they contain both background and object pixels.

The same problems encountered in global thresholding can occur on a local level in variable thresholding.Thus, if an image has locally nonuniform background, or large ranges of values in some regions, or if themultimodal histogram does not distinguish object from background, the technique will fail. Also, it is difficultto define the image d without some a priori information.

4.6. Threshold Selection Using Mean and Standard Deviation

In this section we present the first of three automatic thresholding techniques. The particular threshold derivedhere is a linear combination, k1¼ + k2Ã, of the mean and standard deviation of the source image and wasproposed by Hamadani [2]. The mean and standard deviation are intrinsic properties of the source image. Theweights k1 and k2 are pre-selected, based on image type information, in order to optimize performance.

For a given image , where X is an m × n grid, the mean ¼ and standard deviation Ã of a are given by

and












respectively. The threshold level Ä is set at

where the constants k1 and k2 are image type dependent.


Let , where . The mean and standard deviation of a are given by the imagealgebra statements

and

The threshold level is given by


For typical low-resolution IR images k1 = k2 = 1 seems to work fairly well for extracting “warm” objects. Forhigher resolution k1 = 1 or k1 = 1.5 and k2 = 2 may yield better results.

Figure 4.6.1 show an original IR image of a jeep (top) and three binary images that resulted from thresholdingthe original based on its mean and standard deviation for various k1 and k2.

Figure 4.6.1 Thresholding an image based on threshold level Ä = k1¼ + k2Ã for various k1 and k2: (a) k1 = 1,k2 = 1, Ä = 145, (b) k1 = 1, k2 = 2, Ä = 174, (c) k1 = 1.5, k2 = 1, Ä = 203. At the bottom of the figure is thehistogram of the original image. The threshold levels corresponding to images (a), (b), and (c) have beenmarked with arrows.

4.7. Threshold Selection by Maximizing Between-Class Variance

In this section we present the method of Otsu [3, 4] for finding threshold parameters to be used inmultithresholding schemes. For a given k the method finds thresholds 0 d Ä1 < Ä2 < ... < Äk-1 < l - 1 for

partitioning the pixels of into the classes

by maximizing the separability between classes. Between-class variance is used as the measure of separability

between classes. Its definition, which follows below, uses histogram information derived from the sourceimage. After threshold values have been determined, they can be used as parameters in the multithresholdingalgorithm of Section 4.4.

Let and let be the normalized histogram of a (Section 10.10). The pixels of a are to bepartitioned into k classes C0, C1,…,Ck-1 by selecting the Äi’s as stipulated below.

The probabilities of class occurrence Pr(Ci) are given by

where is the 0th-order cumulative moment of the histogram evaluated up to the Äith level.The class mean levels are given by

where is the 1st-order cumulative moment of the histogram up to the Äith level and

is the total mean level of a.

In order to evaluate the goodness of the thresholds at levels 0 d Ä1 < Ä2 < ... < Äk-1 < l-1, the between-class

variance is used as discriminant criterion measure of class separability. This between-class variance isdefined as

or, equivalently,

Note that is a function of Ä1,Ä2,…,Äk-1 (if we substitute the corresponding expressions involving the

Äi’s). The problem of determining the goodness of the Äi’s reduces to optimizing , i.e., to finding the

maximum of or

To the left in Figure 4.7.1 is the blurred image of three nested squares with pixel values 80, 160, and 240. Itshistogram below in Figure 4.7.2 shows the two threshold levels. 118 and 193, that result from maximizing thevariance between three classes. The result of thresholding the blurred image at the levels above is seen in theimage to the right of Figure 4.7.1.

Figure 4.7.1 Blurred image (left) and the result of thresholding it (right).

Figure 4.7.2 Histogram of blurred image with threshold levels marked.





















Search Tips

Advanced Search


Search this book:



We will illustrate the image algebra formulation of the algorithm for k = 3 classes. It is easily generalized forother k with 1 d k < l - 1.

Let be the source image. Let and be the normalized histogram and normalized cumulative

histogram of a, respectively (Sections 10.10 and 10.11). The 1st order cumulative moment image u of isgiven by

where t is the parameterized template defined by

The image algebra pseudocode for the threshold finding algorithm for k = 3 is given by the followingformulation:

When the algorithm terminates, the desired thresholds are Ä1 and Ä2.


This algorithm is an unsupervised algorithm. That is, it has the advantageous property of not requiring humanintervention. For large k the method may fail unless the classes are extremely well separated.

In the figures below, the directional edge detection algorithm (Section 3.10) has been applied to the sourceimage in Figure 4.7.3. The magnitude image produced by the directional edge detection algorithm isrepresented by Figure 4.7.4. The directional edge magnitude image has 256 gray levels. Its histogram can beseen in Figure 4.7.6.

Otsu’s algorithm has been applied to find a bi-level threshold Ä1 that can be used to clean up Figure 4.7.4.The threshold level is determined to be 105, which lies in the valley between the broad peak to the left of thehistogram and the spiked peak at 255. The threshold is marked by an arrow in Figure 4.7.6. The thresholdedimage can be seen in Figure 4.7.5.

Figure 4.7.3 Source image of a causeway with bridge.

Figure 4.7.4 Magnitude image after applying directional edge detection to Figure 4.7.3.

Figure 4.7.5 Result of thresholding Figure 4.7.4 at level specified by maximizing between-class variance.

Figure 4.7.6 Histogram of edge magnitude image (Figure 4.7.4).

4.8. Threshold Selection Using a Simple Image Statistic

The simple image statistic (SIS) algorithm [5, 4, 6] is an automatic bi-level threshold selection technique. Themethod is based on simple image statistics that do not require computing a histogram for the image. Let

be the source image over an m × n array X. The SIS algorithm assumes that a is an imperfectrepresentation of an object and its background. The ideal object is composed of pixels with gray level a andthe ideal background has gray level b.

Let e(i, j) be the maximum in the absolute sense of the gradient masks s and t (below) applied to a andcentered at (i, j) � X.

In [5], it is shown that

The fraction on the right-hand side of the equality is the midpoint between the gray level of the object andbackground. Intuitively, this midpoint is an appealing level for thresholding. Experimentation has shown thatreasonably good performance has been achieved by thresholding at this midpoint level [6]. Thus, the SISalgorithm sets its threshold Ä equal to

The image to the left in Figure 4.8.1 is that of a square region with pixel value 192 set against a backgroundof pixel value 64 that has been blurred with a smoothing filter. Its histogram (seen in Figure 4.8.2) is bimodal.The threshold value 108 selected based on the SIS method divides the two modes of the histogram. The resultof thresholding the blurred image at level 108 is seen in the image to the right of Figure 4.8.1.

Figure 4.8.1 Blurred image (left) and the result of thresholding (right) using the threshold level determinedby the SIS method.


Let be the source image, where . Let s and t be the templates picturedbelow.

Let e : = |a•s| ¦ |a•t|. The SIS threshold is given by the image algebra statement

Figure 4.8.2 Histogram of blurred image.


Figures 4.8.4 and 4.8.5 are the results of applying Otsu’s (Section 4.7) and the SIS algorithms, respectively, tothe source image in Figure 4.8.3. The histogram of the source image with the threshold levels marked byarrows is shown in Figure 4.8.6. Note that the maximum between class variance for Otsu’s threshold occurredat gray level values 170 through 186. Thus, the smallest gray level value at which the maximumbetween-class variance occurred was chosen as the Otsu threshold level for our illustration.

Figure 4.8.3 Original image.

Figure 4.8.4 Bi-level threshold image using Otsu’s algorithm.

Figure 4.8.5 Bi-level threshold image using the SIS algorithm.

Figure 4.8.6 Histogram of original image with threshold levels marked by arrows.

4.9. References


2 N. Hamadani, “Automatic target cueing in IR imagery,” Master’s thesis, Air Force Institute ofTechnology, WPAFB, Ohio, Dec. 1981.

3 N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems,Man, and Cybernetics, vol. SMC-9, pp. 62-66, Jan. 1979.

4 P. K. Sahoo, S. Soltani, A. C. Wong, and Y. C. Chen, “A survey of thresholding techniques,”Computer Vision, Graphics, and Image Processing, vol. 41, pp. 233-260, 1988.

5 J. Kittler, J. Illingworth, and J. Foglein, “Threshold selection based on a simple image statistic,”Computer Vision, Graphics, and Image Processing, vol. 30, pp. 125-147, 1985.

6 S. U. Lee, S. Y. Chung, and R. H. Park, “A comparative performance study of several globalthresholding techniques for segmentation,” Computer Vision, Graphics, and Image Processing, vol. 52,pp. 171-190, 1990.








Search Tips

Advanced Search


Search this book:


Chapter 5Thining and Skeletonizing

5.1. Introduction

Thick objects in a discrete binary image are often reduced to thinner representations called skeletons, whichare similar to stick figures. Most skeletonizing algorithms iteratively erode the contours in a binary imageuntil a thin skeleton or single pixel remain. These algorithms typically examine the neighborhood of eachcontour pixel and identify those pixels that can be deleted and those that can be classified as skeletal pixels.

Thinning and skeletonizing algorithms have been used extensively for processing thresholded images, datareduction, pattern recognition, and counting and labeling of connected regions. Another thinning applicationis edge thinning which is an essential step in the description of objects where boundary information is vital.Algorithms given in Chapter 3 describe how various transforms and operations applied to digital images yieldprimitive edge elements. These edge detection operations typically produce a number of undesired artifactsincluding parallel edge pixels which result in thick edges. The aim of edge thinning is to remove the inherentedge broadening in the gradient image without destroying the edge continuity of the image.

5.2. Pavlidis Thinning Algorithm

The Pavlidis thinning transform is a simple thinning transform [1]. It provides an excellent illustration fortranslating set theoretic notation into image algebra formulation.

Let a � {0, 1}X denote the source image and let A denote the support of a; i.e., A = {x � X : a(x) = 1}. Theinside boundary of A is denoted by IB(A). The set C(A) consists of those points of IB(A) whose onlyneighbors are in IB(A) or A2.

The algorithm proceeds by starting with n = 1, setting A0 equal to A, and iterating the statement












until An = An-1.

It is important to note that the algorithm described may result in the thinned region having disconnectedcomponents. This situation can be remedied by replacing Eq. 5.2.1 with

where OB(R) is the outside boundary of region R. The trade-off for connectivity is higher computational costand the possibility that the thinned image may not be reduced as much. Definitions for the various boundariesof a Boolean image and boundary detection algorithms can be found in Section 3.2.

Figure 5.2.1 below shows the disconnected and connected skeletons of the SR71 superimposed over theoriginal image.

Figure 5.2.1 Disconnected and connected Pavlidis skeletons of the SR71.


Define neighborhoods M0 and M as shown in Figure 5.2.2. The image algebra

Figure 5.2.2 Neighborhoods used for Pavlidis thinning algorithm.

formulation of the Pavlidis thinning algorithm is now as follows:

When the loop terminates the thinned image will be contained in image variable a. The correspondencesbetween the images in the code above and the mathematical formulation are

If connected components are desired then the last statement in the loop should be changed to


There are two approaches to this thinning algorithm. Note that the mathematical formulation of the algorithmis in terms of the underlying set (or domain) of the image a, while the image algebra formulation is written interms of image operations. Since image algebra also includes the set theoretic operations of union andintersection, the image algebra formulation could just as well have been expressed in terms of set theoretic

notation; i.e., A := domain(a||1), A2 := domain(a||0), and so on. However, deriving the sets IB(A) and C(A)would involve neighborhood or template operations, thus making the algorithm less translucent and morecumbersome. The image-based algorithm is much cleaner.

5.3. Medial Axis Transform (MAT)

Let , and let A denote the support of a. The medial axis is the set, ,consisting of those points x for which there exists a ball of radius rx, centered at x, that is contained in A andintersects the boundary of A in at least two distinct points. The dotted line (center dot for the circle) of Figure5.3.1 represents the medial axis of some simple regions.

Figure 5.3.1 Medial axes.

The medial axis transform m is a gray level image defined over A by

The medial axis transform is unique. The original image can be reconstructed from its medial axis transform.

The medial axis transform evolved through Blum’s work on animal vision systems [1-10]. His interestinvolved how animal vision systems extract geometric shape information. There exists a wide range ofapplications that finds a minimal representation of an image useful [7]. The medial axis transform isespecially useful for image compression since reconstruction of an image from its medial axis transform ispossible.

Let Br (x) denote the closed ball of radius r and centered at x. The medial axis of region A is the set

The medial axis transform is the function defined by

The reconstruction of the domain of the original image a in terms of the medial axis transform m is given by

The original image a is then








Search Tips

Advanced Search


Search this book:



Let a � {0, 1}X denote the source image. Usually, the neighborhood N is a digital disk of a specified radius.The shape of the digital disk will depend on the distance function used. In our example N is the von Neumannneighborhood shown in Figure 5.3.2 below.

Figure 5.3.2 von Neumann neighborhood.

The following image algebra pseudocode will generate the medial axis transform of a. When the loopterminates the medial axis transform will be stored in the image variable m.

The result of applying the medial axis transform to the SR71 can be seen in Figure 5.3.3. The medial axistransformation is superimposed over the original image of the SR71 which is represented by dots.Hexadecimal numbering is used for the display of the medial axis transform.












Figure 5.3.3 MAT superimposed over the SR71.

The original image a can be recovered from the medial axis transform m using the image algebra expression

The reconstruction in this case will not be exact. Applying the algorithm to a small square block will showhow the reconstruction fails to be exact.


The medial axis transform of a connected region may not be connected. Preserving connectivity is a desirableproperty for skeletonizing transforms.

The application of the medial axis on a discretized region may not represent a good approximation of themedial axis on the continuous region.

Different distance functions may be used for the medial axis transform. The choice of distance function isreflected by the template or neighborhood employed in this algorithm.

5.4. Distance Transforms

A distance transform assigns to each feature pixel of a binary image a value equal to its distance to the nearestnon-feature pixels. The algorithm may be performed in parallel or sequentially [11]. In either the parallel orsequential case, global distances are propagated using templates that reflect local distances to the target pointof the template [9, 11, 12, 13].

A thinned subset of the original image can be derived from the distance transform by extracting the image thatconsists of the local maxima of the distance transform. This derived subset is called the distance skeleton. Theoriginal distance transform, and thus the original binary image, can be reconstructed from the distanceskeleton. The distance skeleton can be used as a method of image compression.

Let a be a binary image defined on with � for feature pixels and 0 for non-feature pixels. The SR71of Figure 5.4.1 will serve as our source image for illustrations purposes. An asterisk represents a pixel valueof �. Non-feature pixel values are not displayed.

Figure 5.4.1 Source binary image.

The distance transform of a is a gray level image such that pixel value b(i, j) is the distancebetween the pixel location (i, j) and the nearest non-feature pixels. The distance can be measured in terms ofEuclidean distance, city block, or chess block, etc., subject to the application’s requirements. The result ofapplying the city block distance transform to the image of Figure 5.4.1 can be seen in Figure 5.4.2. Note thathexadecimal numbering is used in the figure.

The distance skeleton c of the distance transform b is the image whose nonzero pixel values consist of localmaxima of b. Figure 5.4.3 represents the distance skeleton extracted from Figure 5.4.2 and superimposed overthe original SR71 image.

The restoration transform is used to reconstruct the distance transform b from the distance skeleton c.

Figure 5.4.2 Distance transform image.


Let , and a � {0, �}X be the source image. Note that the templates , and u may bedefined according to the specific distance measure being used. The templates used for our city block distanceexample are defined pictorially below in Figure 5.4.4.

Distance transform

The distance transform b of a, computed recursively, is given by

where is the forward scanning order defined on X and is the backward scanning order defined on X.The parallel image algebra formulation for the distance transform is given by

Figure 5.4.3 Distance skeleton.

Figure 5.4.4 Templates used for city block distance image algebra formulation.

Distance skeleton transform

The distance skeleton transform is formulated nonrecursively as

Restoration transform

The restoration of b from c can be done recursively by





















Search Tips

Advanced Search


Search this book:


5.5. Zhang-Suen Skeletonizing

The Zhang-Suen transform is one of many derivatives of Rutovitz’ thinning algorithm [6, 7, 14, 15, 16]. Thisclass of thinning algorithms repeatedly removes boundary points from a region in a binary image until anirreducible skeleton remains. Deleting or removing a point in this context means to change its pixel valuefrom 1 to 0. Not every boundary point qualifies for deletion. The 8-neighborhood of the boundary point isexamined first. Only if the configuration of the 8-neighborhood satisfies certain criteria will the boundarypoint be removed.

It is the requirements placed on its 8-neighborhood that qualify a point for deletion together with the order ofdeletion that distinguish the various modifications of Rutovitz’ original algorithm. Some of the derivations ofthe original algorithm are applied sequentially, and some are parallel algorithms. One iteration of a parallelalgorithm may consist of several subiterations, targeting different boundary points on each subiteration. It isthe order of removal and the configuration of the 8-neighborhood that qualify a boundary point for deletionthat ultimately determine the topological properties of the skeleton that is produced.

The Zhang-Suen skeletonizing transform is a parallel algorithm that reduces regions of a Boolean image to an8-connected skeletons of unit thickness. Each iteration of the algorithm consists of two subiterations. The firstsubiteration examines a 3 × 3 neighborhood of every southeast boundary point and northwest corner point.The configuration of its 3 × 3 neighborhood will determine whether the point can be deleted withoutcorrupting the ideal skeleton. On the second subiteration a similar process is carried out to select northwestboundary points and southeast corner points that can be removed without corrupting the ideal skeleton. Whatis meant by corrupting the ideal skeleton and a discussion of the neighborhood configurations that make acontour point eligible for deletion follows next.

Let a � {0, 1}X be the source image. Select contour points are removed iteratively from a until a skeleton ofunit thickness remains. A contour (boundary) point is a point that has pixel value I and has at least one8-neighbor whose pixel value is 0. In order to preserve 8-connectivity, only contour points that will notdisconnect the skeleton are removed.

Each iteration consists of two subiterations. The first subiteration removes southeast boundary points andnorthwest corner points. The second iteration removes northwest boundary points and southeast corner points.The conditions that qualify a contour point for deletion are discussed next.












Let p1, p2, …, p8 be the 8-neighbors of p. (See Figure 5.5.1 below.) Recalling our convention that boldfacecharacters are used to represent points and italics are used for pixel values, let pi = a(pi), i � {1,…, 8}.

Figure 5.5.1 The 8-neighbors of p.

Let B(p) denote the number of nonzero 8-neighbors of p and let A(p) denote the number of zero-to-onetransitions in the ordered sequence p1, p2, …, p8, p1. If p is a contour point and its 8-neighbors satisfy thefollowing four conditions listed below, then p will be removed on the first subiteration. That is, the new valueassociated with p will be 0. The conditions for boundary pixel removal are

(a) 2 d B(p) d 6

(b) A(p) = 1

(c) p1 · p3 · p5 = 0

(d) p3 · p5 · p7 = 0

In the example shown in Figure 5.5.2, p is not eligible for deletion during the first subiteration. With thisconfiguration B(p) = 5 and A(p) = 3.

Figure 5.5.2 Example pixel values of the 8-neighborhood about p.

Condition (a) insures that endpoints are preserved. Condition (b) prevents the deletion of points of theskeleton that lie between endpoints. Conditions (c) and (d) select southeast boundary points and northwestcorner points for the first subiteration.

A contour point will be subject to deletion during the second subiteration provided its 8-neighbors satisfyconditions (a) and (b) above, and conditions (c2) and (d2) are given by

(c2) p1 · p3 · p7 = 0

(d2) p1 · p5 · p7 = 0

Iteration continues until either subiteration produces no change in the image. Figure 5.5.3 shows theZhang-Suen skeleton superimposed over the original image of the SR71.

Figure 5.5.3 Zhang-Suen skeleton of the SR71.








Search Tips

Advanced Search


Search this book:



Let a � {0, 1}X be the source image. The template t, defined pictorially in

Figure 5.5.4 Census template used for Zhang-Suen thinning.

Figure 5.5.4, is a census template used in conjunction with characteristic functions defined on the sets

S1 = {3, 6, 7, 12, 14, 15, 24, 28, 30, 48, 56, 60, 62, 96,

112, 120, 129, 131, 135, 143, 192, 193, 195, 199, 207, 224, 225, 227, 231, 240, 241, 243, 248, 249}

and

S2 = {3, 6, 7, 12, 14, 15, 24, 28, 30, 31, 48, 56, 60, 62

63, 96, 112, 120, 124, 126, 129, 131, 135, 143, 159, 192, 193, 195, 224, 225, 227, 240, 248, 252}

to identify points that satisfy the conditions for deletion. S1 targets points that satisfy (a) to (d) for the firstsubiteration and S2 targets points that satisfy (a), (b), (c2), and (d2) for the second subiteration.

The image algebra pseudocode for the Zhang-Suen skeletonizing transform is












It may be more efficient to use lookup tables rather than the characteristic functions ÇS1, and ÇS2. The lookuptables that correspond to ÇS1, and ÇS2 are defined by

respectively.


The Zhang-Suen algorithm reduces the image to a skeleton of unit pixel width. Endpoints are preserved. TheZhang-Suen algorithm is also immune to contour noise. The transform does not allow reconstruction of theoriginal image from the skeleton.

Two-pixel-wide diagonal lines may become excessively eroded. 2 × 2 blocks will be completely removed.

Finite sets A and B are homotopic if there exists a continuous Euler number preserving one-to-onecorrespondence between the connected components of A and B. The fact that 2 × 2 blocks are removed by theZhang-Suen transform means that the transform does not preserve homotopy. Preservation of homotopy isconsidered to be an important topological property. We will provide an example of a thinning algorithm thatpreserves homotopy in Section 5.6.

5.6. Zhang-Suen Transform — Modified to Preserve Homotopy

The Zhang-Suen transform of Section 5.5 is an effective thinning algorithm. However, the fact that 2 × 2blocks are completely eroded means that the transform does not preserve homotopy. Preservation ofhomotopy is a desirable property of a thinning algorithm. We take this opportunity to discuss what it meansfor a thinning transform to preserve homotopy.

Two finite sets A and B are homotopic if there exists a continuous Euler number preserving one-to-onecorrespondence between the connected components of A and B.

Figure 5.6.1 shows the results of applying the two versions of the Zhang-Suen thinning algorithm to an image.The original image is represented in gray. The largest portion of the original image is the SR71. A 3 × 3-pixelsquare hole has been punched out near the top center. An X has been punched out near the center of the SR71.A 2 × 2-pixel square has been added to the lower right-hand corner. The original Zhang-Suen transform hasbeen applied to the image on the left. The Zhang-Suen transform modified to preserve homotopy has beenapplied to the image on the right. The resulting skeletons are represented in black.

The connectivity criterion for feature pixels is 8-connectivity. For the complement 4-connectivity is thecriterion. In analyzing the image, note that the white X in the center of the SR71 is not one 8-connectedcomponent of the complement of the image, but rather nine 4-connected components.

The feature pixels of the original image are contained in two 8-connected components. Its complementconsists of eleven 4-connected components. The skeleton image produced by the original Zhang-Suenalgorithm (left Figure 5.6.1) has one 8-connected components. Its complement has eleven 4-connectedcomponents. Therefore, there is not a one-to-one correspondence between the connected components of theoriginal image and its skeleton.

Notice that the modified Zhang-Suen transform (right Figure 5.6.1) does preserve homotopy. This is becausethe 2 × 2 in the lower right-hand corner of the original image was shrunk to a point rather than being erased.

To preserve homotopy, the conditions that make a point eligible for deletion must be made more stringent.The conditions that qualify a point for deletion in the original Zhang-Suen algorithm remain in place.However, the 4-neighbors (see Figure 5.5.1) of the point p are examined more closely. If the target point phas none or one 4-neighbor that has pixel value 1, then no change is made to the existing set of criteria fordeletion. If p has two or three 4-neighbors with pixel value 1, it can be deleted on the first pass provided p3 ·p5 = 0. It can be deleted on the second pass if p1 · p7 = 0. These changes insure that 2 × 2 blocks do not getcompletely removed.

Figure 5.6.1 Original Zhang-Suen transform (left) and modified Zhang-Suen transform (right).


As probably anticipated, the only effect on the image algebra formulation caused by modifying theZhang-Suen transform to preserve homotopy shows up in the sets S1 and S2 of Section 5.5. The sets S21 and S22 that replace them are

S21; = S1 \ {28, 30, 60, 62}

andS22 = S2 \ {193, 195, 225, 227}.





















Search Tips

Advanced Search


Search this book:


5.7. Thinning Edge Magnitude Images

The edge thinning algorithm discussed here uses edge magnitude and edge direction information to reduce theedge elements of an image to a set that is only one pixel wide. The algorithm was originally proposed byNevatia and Babu [17]. The algorithm presented here is a variant of Nevatia and Babu’s algorithm. It wasproposed in Ritter, Gader, and Davidson [18] and was implemented by Norris [19].

The input to the algorithm is an image of edge magnitudes and directions. The input image is also assumed tohave been thresholded. The edge thinning algorithm retains a point as an edge point if its magnitude anddirection satisfy certain heuristic criteria in relation to the magnitudes and directions of two of its 8-neighbors.The criteria that a point must satisfy to be retained as an edge point will be presented in the next subsection.

The edge image that was derived in Section 3.10 will serve as our example source image. Recall that even

after thresholding the thickness of the resulting edges was undesirable. Let a = (m, d) � be a image with

edge magnitudes and directions . The range of d in this variation of the edge thinning

algorithm is the set (in degrees), .

To understand the requirements that a point must satisfy to be an edge point, it is necessary to know what ismeant by the expression “the two 8-neighbors normal to the direction at a point.” The two 8-neighbors thatcomprise the normal to a given horizontal or vertical edge direction are apparent. The question then arisesabout points in the 8-neighborhood of (i, j) that lie on the normal to, for instance, a 30° edge direction. Fornon-vertical and non-horizontal edge directions, the diagonal elements of the 8-neighborhood are assumed tomake up the normals. For example, the points (i - 1, j - 1) and (i + 1, j + 1) are on the normal to a 30° edgedirection. These two points are also on the normal for edge directions 60°, 210°, and 240° (see Figure 5.7.1).We are now ready to present the conditions a point must satisfy to qualify as an edge point.

Figure 5.7.1 Directions and their 8-neighborhood normals about (i, j).












An edge element is deemed to exist at the point (i, j) if any one of the following sets of conditions hold:

(1) The edge magnitudes of points on the normal to d(i, j) are both zero.

(2) The edge directions of the points on the normal to d(i, j) are both within 30° of d(i, j), and m(i, j) isgreater than the magnitudes of its neighbors on the normal.

(3) One neighbor on the normal has direction within 30° of d(i, j), the other neighbor on the normal hasdirection within 30° of d(i, j) + 180°, and m(i, j) is greater than the magnitude of the former of its twoneighbors on the normal.

The result of apply the edge thinning algorithm to Figure 3.10.3 can be seen in Figure 5.7.2.


Let be an image on the point set X, where is the edge magnitude image, and

is the edge direction image.

The parameterized template t(i) has support in the three-by-three 8-neighborhood about its target point. Theith neighbor (ordered clockwise starting from bottom center) has value 1. All other neighbors have value 0.Figure 5.7.3 provides an illustration for the case i = 4. The small number in the upper right-hand corner of thetemplate cell indicates the point’s position in the neighborhood ordering.

The function g : {0, 30, …, 330} ’ {0, 1, …, 7} defined by

is used to defined the image i = g(d) � {0, …, 7}x. The pixel value i(x) is equal to the positional value of oneof the 8-neighbors of x that is on the normal to d(x).

The function f is used to discriminate between points that are to remain as edge points and those that are to bedeleted (have their magnitude set to 0). It is defined by

The edge thinned image can now be expressed as

Figure 5.7.2 Thinned edges with direction vectors.

Figure 5.7.3 Parameterized template used for edge thinning.





















Search Tips

Advanced Search


Search this book:


5.8. References

1 T. Pavlidis, Structural Pattern Recognition. New York: Springer-Verlag, 1977.

2 H. Blum, “An associative machine for dealing with the visual field and some of its biologicalimplications,” in Biological Prototypes and Synthetic Systems (Bernard and Kare, eds.), vol. 1, NewYork: Plenum Press, 1962.

3 H. Blum, “A transformation for extracting new descriptors of shape,” in Symposium on Models forPerception of Speech and Visual Form (W. Whaten-Dunn, ed.), Cambridge, MA: MIT Press, 1967.

4 H. Blum, “Biological shape and visual science (part I),” Journal of Theoretical Biology, vol. 38,1973.

5 J. Davidson, “Thinning and skeletonizing: A tutorial and overview,” in Digital Image Processing:Fundamentals and Applications (E. Dougherty, ed.), New York: Marcel Dekker, Inc., 1991.

6 B. Jang and R. Chin, “Analysis of thinning algorithms using mathematical morphology,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 541-551, June 1990.

7 L. Lam, S. Lee, and C. Suen, “Thinning methodologies — a comprehensive survey,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 868-885, Sept. 1992.

8 D. Marr, “Representing visual information,” tech. rep., MIT AI Lab, 1977. AI Memo 415.

9 A. Rosenfeld and J. Pfaltz, “Sequential operators in digital picture processing,” Journal of the ACM,vol. 13, 1966.


11 G. Borgefors, “Distance transformations in digital images,” Computer Vision, Graphics, and ImageProcessing, vol. 34, pp. 344-371, 1986.

12 D. Li and G. Ritter, “Recursive operations in image algebra,” in Image Algebra and MorphologicalImage Processing, vol. 1350 of Proceedings of SPIE, (San Diego, CA), July 1990.

13 G. Ritter, “Recent developments in image algebra,” in Advances in Electronics and ElectronPhysics (P. Hawkes, ed.), vol. 80, pp. 243-308, New York, NY: Academic Press, 1991.

14 U. Eckhardt, “A note on Rutovitz’ method for parallel thinning,” Pattern Recognition Letters, vol.8, no. 1, pp. 35-38, 1988.












15 D. Rutovitz, “Pattern recognition,” Journal of the Royal Statistical Society, vol. 129, Series A, pp.504-530, 1966.

16 T. Zhang and C. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications ofthe ACM, vol. 27, pp. 236-239, Mar. 1984.

17 R. Nevatia and K. Babu, “Linear feature extraction and description,” Computer Graphics andImage Processing, vol. 13, 1980.

18 G. Ritter, P. Gader, and J. Davidson, “Automated bridge detection in FLIR images,” in Proceedingsof the Eighth International Conference on Pattern Recognition, (Paris, France), 1986.

19 K. Norris, “An image algebra implementation of an autonomous bridge detection algorithm forforward looking infrared (flir) imagery,” Master’s thesis, University of Florida, Gainesville, FL, 1992.





















Search Tips

Advanced Search


Search this book:


Chapter 6Connected Component Algorithms

6.1. Introduction

A wide variety of techniques employed in computer vision reduce gray level images to binary images. Thesebinary images usually contain only objects deemed interesting and worthy of further analysis. Objects ofinterest are analyzed by computing various geometric properties such as size, shape, or position. Before suchan analysis is done, it is often necessary to remove various undesirable artifacts such as feelers fromcomponents through pruning processes. Additionally, it may be desirable to label the objects so that eachobject can be analyzed separately and its properties listed under its’ label. As such, component labeling can beconsidered a fundamental segmentation technique.

The labeling of connected components also provides for the number of components in an image. However,there are often faster methods for determining the number of components. For example, if componentscontain small holes, the holes can be rapidly filled. An application of the Euler characteristic to an imagecontaining components with no holes provides one fast method for counting components. This chapterprovides some standard labeling algorithms as well as examples of pruning, hole filling, and componentcounting algorithms.

6.2. Component Labeling for Binary Images

Component labeling algorithms segment the domain of a binary image into partitions that correspond toconnected components. Component labeling is one of the fundamental segmentation techniques used in imageprocessing. As well as being an important segmentation technique in itself, it is often an element of morecomplex image processing algorithms [1].

The set of interest for component labeling of an image, a � {0, 1}X, is the set of points that havepixel value 1. This set is partitioned into disjoint connected subsets. The partitioning is represented by animage b in which all points of Y that lie in the same connected component have the same pixel value. Distinctpixel values are assigned to distinct connected components.












Either 4- or 8-connectivity can be used for component labeling. For example, let the image to the left inFigure 6.2.1 be the binary source image (pixel values equal to 1 are displayed with black). The image in thecenter represents the labeling of 4-components. The image to the right represents the 8-components. Differentcolors (or gray levels) are often used to distinguish connected components. This is why component labeling isalso referred to as blob coloring. From the different gray levels in Figure 6.2.1, we see that the source imagehas five 4-connected components and one 8-connected component.

Figure 6.2.1 Original image, labeled 4-component image, and labeled 8-component image.


Let a � {0, 1}X be the source image, where X is an m × n grid. Let the image d be defined by

The algorithm starts by assigning each black pixel a unique label and uses c to keep track of component labelsas it proceeds. The neighborhood N has one of the two configurations pictured below. The configuration tothe left is used for 4-connectivity and the configuration to the right is used for 8-connectivity.

When the loop of the pseudocode below terminates, b will be the image that contains connected componentinformation. That is, points of X that have the same pixel value in the image b lie in the same connectedcomponent of X.

Initially, each feature pixel of a has a corresponding unique label in the image c. The algorithm works bypropagating the maximum label within a connected component to every point in the connected component.

Alternate Image Algebra Formulations

Let d be as defined above. Define the neighborhood function N(a) as follows:

The components of a are labeled using the image algebra pseudocode that follows. When the loop terminatesthe image with the labeled components will be contained in image variable b.

Note that this formulation propagates the minimum value within a component to every point in thecomponent. The variant template t as defined above is used to label 4-connected components. It is easy to seehow the transition from invariant template to variant has been made. The same transition can be applied to theinvariant template used for labeling 8-components.

Variant templates are not as efficient to implement as invariant templates. However, in this implementation ofcomponent labeling there is one less image multiplication for each iteration of the while loop.

Although the above algorithms are simple, their computational cost is high. The number of iterations isdirectly proportional to mn. A faster alternate labeling algorithm [2] proceeds in two phases. The number ofiterations is only directly proportional to m + n. However, the price for decreasing computational complexityis an increase in space complexity.

In the first phase, the alternate algorithm applies the shrinking operation developed in Section 6.4 to thesource image m + n times. The first phase results in m + n binary images a0, a1, … , am+n-1. Each ak representsan intermediate image in which connected components are shrinking toward a unique isolated pixel. Thisisolated pixel is the top left corner of the component’s bounding rectangle, thus it may not be in the connectedcomponent. The image am+n-1 consists of isolated black pixels.





















Search Tips

Advanced Search


Search this book:


In the second phase, the faster algorithm assigns labels to each of the isolated black pixels of the image am+n-1.These labels are then propagated to the pixels connected to them in am+n-2 by applying a label propagatingoperation, then to am+n-3, and so on until a0 is labeled. In the process of propagating labels, new isolated blackpixels may be encountered in the intermediate images. When this occurs, the isolated pixels are assignedunique labels and label propagation continues.

Note that the label propagating operations are applied to the images in the reverse order they are generated bythe shrinking operations. More than one connected component may be shrunk to the same isolated pixel, butat different iterations. A unique label assigned to an isolated pixel should include the iteration number atwhich it is generated.

The template s used for image shrinking and the neighborhood function P used for label propagation aredefined in following figure. The faster algorithm for labeling

8-components is as follows:












When the last loop terminates the labeled image will be contain in the variable c. If 4-component labeling ispreferred, the characteristic function in line 4 should be replaced with


The alternate algorithm must store m + n intermediate images. There are other fast algorithms proposed toreduce this storage requirement [2, 3, 4, 5]. It may also be more efficient to replace the characteristic functionÇ{5, 7, 9, …, 15} by a lookup table.

6.3. Labeling Components with Sequential Labels

The component labeling algorithms of Section 6.2 began by assigning each point in the domain of the sourceimage the number that represents its position in a forward scan of the domain grid. The label assigned to acomponent is either the maximum or minimum from the set of point numbers encompassed by the component(or its bounding box). Consequently, label numbers are not used sequentially, that is, in 1, 2, 3, …, n order.

In this section two algorithms for labeling components with sequential labels are presented. The first locatesconnected components and assigns labels sequentially, and thus takes the source image as input. The secondtakes as input label images, such as those produced by the algorithms of Section 6.2, and reassigns new labelssequentially.

It is easy to determine the number of connected components in an image from its corresponding sequentiallabel image; simply find the maximum pixel value of the label image. The set of label numbers is also known,namely {1, 2, 3, …, v}, where v is the maximum pixel value of the component image. If component labelingis one step in larger image processing regimen, this information may facilitate later processing steps.

Labeling with sequential labels also offers a savings in storage space. Suppose one is working with 17 × 17gray level images whose pixel values have eight bit representations. Up to 255 labels can be assigned iflabeling with sequential labels is used. It may not be possible to represent the label image at all if labelingwith sequential labels is not used and a label value greater than 255 is assigned.


Let a � {0, 1}X be the source image, where X is an m × n grid. The neighborhood function N used forsequential labeling has one of the two configurations pictured below. The configuration to the left is used forlabeling 4-connected components, and the configuration to the right is used for labeling 8-connectedcomponents.

When the algorithm below ends, the sequential label image will be contained in image variable b.


The alternate image algebra algorithm takes as input a label image a and reassigns labels sequentially. Whenthe while loop in the pseudocode below terminates, the image variable b will contain the sequential labelimage. The component connectivity criterion (four or eight) for b will be the same that was used in generatingthe image a.


Figure 6.3.1 shows a binary image (left) whose feature pixels are represented by an asterisk. The pixel valuesassigned to its 4- and 8-components after sequential labeling are seen in the center and right images,respectively, of Figure 6.3.1.

Figure 6.3.1 A binary image, its 4-labeled image (center) and 8-labeled image (right).








Search Tips

Advanced Search


Search this book:


6.4. Counting Connected Components by Shrinking

The purpose of the shrinking algorithm presented in this section is to count the connected components in aBoolean image. The idea, based on Levialdi’s approach [6], is to shrink each component to a point and thencount the number of points obtained. To express this idea explicitly, let a � {0, 1}X denote the source imageand let h denote the Heaviside function defined by

One of four windows may be chosen as the shrinking pattern. Each is distinguished by the direction itcompresses the image.

Shrinking toward top right —

Shrinking toward top left —

Shrinking toward bottom left —

Shrinking toward bottom right —

Each iteration of the algorithm consists of applying, in parallel, the selected shrinking window on everyelement of the image. Iteration continues until the original image has been reduced to the zero image. Aftereach iteration the number of isolated points is counted and added to a running total of isolated points. Eachisolated point corresponds to an 8-connected component of the original image.












This shrinking process is illustrated in the series (in left to right and top to bottom order) of images in Figure6.4.1. The original image (in the top left corner) consists of five 8-connected patterns. By the fifth iteration thecomponent in the upper left corner has been reduced to an isolated pixel. Then the isolated pixel is countedand removed in the following iteration. This process continues until each component has been reduced to anisolated pixel, counted, and then removed.

Figure 6.4.1 The parallel shrinking process.


Let a � {0, 1}X be the source image. We will illustrate the image algebra formulation using the shrinkingwindow that compresses toward the top right. This choice of shrinking window dictates the form ofneighborhoods M and N below. The template u is used to find isolated points.

The image variable a is initialized with the original image. The integer variable p is initially set to zero. Thealgorithm consists of executing the following loop until the image variable a is reduced to the zero image.

When the algorithm terminates, the value of p will represent the number of 8-connected components of theoriginal image.


The above image algebra formulation closely parallels the formulation presented in Levialdi [6]. However, itinvolves three convolutions in each iteration. We can reduce number of convolutions in each iteration to onlyone by using the following census template. A binary census template (whose weights are powers of 2), whenapplied to a binary image, encodes the neighborhood configurations of target pixels. From the configurationof a pixel, we can decide whether to change it from 1 to 0 or from 0 to 1, and we can also decide whether it isan isolated pixel or not.

The algorithm using the binary census template is expressed as follows:

In order to count 8-components, the function f should be defined as

The 4-components of the image can be counted by defining f to be

In either case, the movement of the shrinking component is toward the pixel at the top right of the component.


The maximum number of iterations required to shrink a component to its corresponding isolated point is equalto the d1 distance of the element of the region farthest from top rightmost corner of the rectanglecircumscribing the region.

6.5. Pruning of Connected Components

Pruning of connected components is a common step in removing various undesirable artifacts created duringpreceding image processing steps. Pruning usually removes thin objects such as feelers from thick,blob-shaped components. There exists a wide variety of pruning algorithms. Most of these algorithms areusually structured to solve a particular task or problem. For example, after edge detection and thresholding,various objects of interest may be reduced to closed contours. However, in real situations there may also bemany unwanted edge pixels left. In this case a pruning algorithm can be tailored so that only closed contoursremain.

In this section we present a pruning algorithm that removes feelers and other objects that are of single-pixelwidth while preserving closed contours of up to single-pixel width as well as 4-connectivity of componentsconsisting of thick bloblike objects that may be connected by thin lines. In this algorithm, a pixel is removedfrom an object if it contains fewer than two 4-neighbors. The procedure is iterative and continues until everyobject pixel has two or more 4-neighbors.


Let a � {0, 1}X be the source image and N denote the von Neumann neighborhood. The following simplealgorithm will produce the pruned version of a described above:

The results of applying this pruning algorithm are shown in Figure 6.5.1.

Figure 6.5.1 The source image a is shown at the left and the pruned version of a on the right.


The algorithm presented above was first used in an autonomous target detection scheme for detecting tanksand military vehicles in infrared images [7]. As such, it represents a specific pruning algorithm that removesonly thin feelers and non-closed thin lines. Different size neighborhoods must be used for the removal oflarger artifacts.

Since the algorithm presented here is an iterative algorithm that uses a small neighborhood, removal of longfeelers may increase computational time requirements beyond acceptable levels, especially if processing isdone on sequential architectures. In such cases, morphological filters may be more appropriate.

Search Tips

Advanced Search


Search this book:


6.6. Hole Filling

As its name implies, the hole filling algorithm fills holes in binary images. A hole in this context is a region of0-valued pixels bounded by an 8-connected set of 1-valued pixels. A hole is filled by changing the pixel valueof points in the hole from 0 to 1. See Figure 6.6.1 for an illustration of hole filling.

Figure 6.6.1 Original image (left) and filled image (right).


Let a � {0, 1}X be the source image, and let the template s and the neighborhood N be pictured below. Thehole filling image algebra code below is from the code derived by Ritter [8].












When the first while loop terminates, the image variable b will contain the objects in the original image afilled with 1’s and some extra 1’s attached to their exteriors. The second while loop “peels off” theextraneous 1’s to produce the final output image b. The algorithm presented above is efficient for filling smallholes. The alternate version presented next is more efficient for filling large holes.


Let a � {0, 1}X be the source image and t the template as previously defined. The holes in a are filled using theimage algebra pseudocode below.

When the loop terminates the “filled” image will be contained in the image variable c. The second methodfills the whole domain of the source image with 1’s. Then, starting at the edges of the domain of the image,extraneous 1’s are peeled away until the exterior boundary of the objects in the source image are reached.

Depending on how is implemented, it may be necessary to add the line

immediately after the statement that initializes c to 1. This is because template t needs to encounter a 0 beforethe peeling process can begin.

6.7. References


2 R. Cypher, J. Sanz, and L. Snyder, “Algorithms for image component labeling on SIMD meshconnected computers,” IEEE Transactions on Computers, vol. 39, no. 2, pp. 276-281, 1990.

3 H. Alnuweiri and V. Prasanna, “Fast image labeling using local operators on mesh-connectedcomputers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-13, no. 2,pp. 202-207, 1991.

4 H. Alnuweiri and V. Prasanna, “Parallel architectures and algorithms for image component labeling,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-14, no. 10, pp.1014-1034, 1992.

5 H. Shi and G. Ritter, “Image component labeling using local operators,” in Image Algebra andMorphological Image Processing IV, vol. 2030 of Proceedings of SPIE, (San Diego, CA), pp. 303-314,July 1993.

6 S. Levialdi, “On shrinking binary picture patterns,” Communications of the ACM, vol. 15, no. 1, pp.

7-10, 1972.

7 N. Hamadani, “Automatic target cueing in IR imagery,” Master’s thesis, Air Force Institute ofTechnology, WPAFB, Ohio, Dec. 1981.

8 G. Ritter, “Boundary completion and hole filling,” Tech. Rep. TR 87-05, CCVR, University ofFlorida, Gainesville, 1987.





















Search Tips

Advanced Search


Search this book:


Chapter 7Morphological Transforms and Techniques

7.1. Introduction

Mathematical morphology is that part of digital image processing that is concerned with image filtering andgeometric analysis by structuring elements. It grew out of the early work of Minkowski and Hadwiger ongeometric measure theory and integral geometry [1, 2, 3], and entered the modern era through the work ofMatheron and Serra of the Ecole des Mines in Fontainebleau, France [4, 5]. Matheron and Serra not onlyformulated the modern concepts of morphological image transformations, but also designed and built theTexture Analyzer System, a parallel image processing architecture based on morphological operations [6]. Inthe U.S., research into mathematical morphology began with Sternberg at ERIM. Serra and Sternberg werethe first to unify morphological concepts and methods into a coherent algebraic theory specifically designedfor image processing and image analysis. Sternberg was also the first to use the term image algebra [7, 8].

Initially the main use of mathematical morphology was to describe Boolean image processing in the plane,but Sternberg and Serra extended the concepts to include gray valued images using the cumbersome notion ofan umbra. During this time a divergence of definition of the basic operations of dilation and erosion occurred,with Sternberg adhering to Minkowski’s original definition. Sternberg’s definitions have been used moreregularly in the literature, and, in fact, are used by Serra in his book on mathematical morphology[5].

Since those early days, morphological operations and techniques have been applied from low-level, tointermediate, to high-level vision problems. Among some recent research papers on morphological imageprocessing are Crimmins and Brown [9], Haralick, et al. [10, 11], and Maragos and Schafer [12, 13, 14]. Therigorous mathematical foundation of morphology in terms of lattice algebra was independently established byDavidson and Heijmans [15, 16, 17]. Davidson’s work differs from that of Heijmans’ in that the foundationprovided by Davidson is more general and extends classical morphology by allowing for shift variantstructuring elements. Furthermore, Davidson’s work establishes a connection between morphology, minimaxalgebra, and a subalgebra of image algebra.












7.2. Basic Morphological Operations: Boolean Dilations and Erosions

Dilation and erosion are the two fundamental operations that define the algebra of mathematical morphology.These two operations can be applied in different combinations in order to obtain more sophisticatedoperations. Noise removal in binary images provides one simple application example of the operations ofdilation and erosions (Section 7.4).

The language of Boolean morphology is that of set theory. Those points in a set being morphologicallytransformed are considered the selected set of points, and those in the complement set are considered to be notselected. In Boolean (binary) images the set of pixels is the foreground and the set of pixels not selected is thebackground. The selected set of pixels is viewed as a set in Euclidean 2-space. For example, the set of allblack pixels in a Boolean image constitutes a complete description of the image and is viewed as a set inEuclidean 2-space.

A dilation is a morphological transformation that combines two sets by using vector addition of set elements.In particular, a dilation of the set of black pixels in a binary image by another set (usually containing theorigin), say B, is the set of all points obtained by adding the points of B to the points in the underlying pointset of the black pixels. An erosion can be obtained by dilating the complement of the black pixels and thentaking the complement of the resulting point set.

Dilations and erosions are based on the two classical operations of Minkowski addition and Minkowski

subtraction of integral geometry. For any two sets and , Minkowski addition is definedas

and Minkowski subtraction as

where B* = {-b : b � B} and A2 = {x � : x A}; i.e., B* denotes the reflection of B across the origin 0 =(0, 0, …, 0) � , while A2 denotes the complement of A. Here we have used the original notation employedin Hadwiger’s book [3].

Defining Ab = {a + b : a � A}, one also obtains the relations

and

where Bp = {b + p : b � B}. It is these last two equations that makes morphology so appealing to manyresearchers. Equation 7.2.1 is the basis of morphological dilations, while Equation 7.2.2 provides the basis formorphological erosions. Suppose B contains the origin 0. Then Equation 7.2.1 says that A × B is the set of allpoints p such that the translate of B by the vector p intersects A. Figures 7.2.1 and 7.2.2 illustrate thissituation.

Figure 7.2.1 The set with structuring element B.

Figure 7.2.2 The dilated set A × B.Note: original set boundaries shown with thin white lines.

The set A/B, on the other hand, consists of all points p such that the translate of B by the vector p iscompletely contained inside A. This situation is illustrated in Figure 7.2.3.

Figure 7.2.3 The eroded set A/B.Note: original set boundaries shown with thin black lines.

In the terminology of mathematical morphology when doing a dilation or erosion of A by B, it is assumed thatA is the set to be analyzed and that B is the measuring stick, called a structuring element. To avoid anomalieswithout practical interest, the structuring element B is assumed to include the origin, and both A and B areassumed to be compact. The set A corresponds to either the support of a binary image a � {0, 1}X or to thecomplement of the support.

The dilation of the image a using the structuring element B results in another binary image b � {0, 1}X whichis defined as

Similarly, the erosion of the image a � {0, 1}X by B is the binary image b � {0, 1}X defined by





















Search Tips

Advanced Search


Search this book:


The dilation and erosion of images as defined by Equations 7.2.3 and 7.2.4 can be realized by using max andmin operations or, equivalently, by using OR and AND operations. In particular, it follows from Equations7.2.3 and 7.2.1 that the dilated image b obtained from a is given by

Similarly, the eroded image b obtained from a is given by


Let a � {0, 1}X denote a source image (usually, is a rectangular array) and suppose we wish to dilateA 4 X, where A denotes the support of a, using a structuring element B containing the origin.

Define a neighborhood N : X ’ 2X by

The image algebra formulation of the dilation of the image a by the structuring element B is given by

The image algebra equivalent of the erosion of a by the structuring element B is given by

Replacing a by its Boolean complement ac in the above formulation for dilation (erosion) will dilate (erode)the complement of the support of a.


We present an alternate formulation in terms of image-template operations. The main rationale for thealternate formulation is its easy extension to the gray level case (see Section 7.6).












Let denote the 3-element bounded subgroup {-�,0, �} of . Define a tempalate by

The image algebra formulation of the dilation of the image a by the structuring element B is given by

The image algebra equivalent of the erosion of a by the structuring element B is given by

With a minor change in template definition, binary dilations and erosions can just as well be accomplished

using the lattice convolution operations and , respectively.

In particular, let = {0, 1, �} and define by

The dilation of the image a by the structuring element B is now given by

and the erosion of a by B is given by


It can be shown that

These equations constitute the basic laws that govern the algebra of mathematical morphology and providethe fundamental tools for geometric shape analysis of images.

The current notation used in morphology for a dilation of A by B is A • B, while an erosion of A by B is

denoted by A B. In order to avoid confusion with the linear image-template product •, we use Hadwiger’snotation [3] in order to describe morphological image transforms. In addition, we use the now morecommonly employed definitions of Sternberg for dilation and erosion. A comparison of the different notationand definitions used to describe erosions and dilations is provided by Table 7.2.1.

Table 7.2.1 Notation and Definitions Used to Describe Dilations and Erosions

7.3. Opening and Closing

Dilations and erosions are usually employed in pairs; a dilation of an image is usually followed by an erosionof the dilated result or vice versa. In either case, the result of successively applied dilations and erosionsresults in the elimination of specific image detail smaller than the structuring element without the globalgeometric distortion of unsuppressed features.

An opening of an image is obtained by first eroding the image with a structuring element and then dilating theresult using the same structuring element. The closing of an image is obtained by first dilating the image witha structuring element and then eroding the result using the same structuring element. The next section showsthat opening and closing provide a particularly simple mechanism for shape filtering.

The operations of opening and closing are idempotent; their reapplication effects no further changes to thepreviously transformed results. In this sense openings and closings are to morphology what orthogonalprojections are to linear algebra. An orthogonal projection operator is idempotent and selects the part of avector that lies in a given subspace. Similarly, opening and closing provide the means by which givensubshapes or supershapes of a complex geometric shape can be selected.

The opening of A by B is denoted by A B and defined as

The closing of A by B is denoted by A • B and defined as


Let a � {0, 1}X denote a source image and B the desired structuring element containing the origin. Define N :X ’ 2X is defined by

The image algebra formulation of the opening of the image a by the structuring element B is given by

The image algebra equivalent of the closing of a by the structuring element B is given by


It follows from the basic theorems that govern the algebra of erosions and dilations that

, and (A • B) • B = A • B. This shows theanalogy between the morphological operations of opening and closing and the specification of a filter by itsbandwidth. Morphologically filtering an image by all opening or closing operation corresponds to the idealnonrealizable bandpass filters of conventional linear filters. Once an image is ideal bandpass filtered, furtherideal bandpass filtering does not alter the result.

7.4. Salt and Pepper Noise Removal

Opening an image with a disk-shaped structuring element smooths the contours, breaks narrow isthmuses, andeliminates small islands. Closing an image with a disk structuring element smooths the contours, fuses narrowbreaks and long thin gulfs, eliminates small holes, and fills gaps in contours. Thus, a combination of openingsand closings can be used to remove small holes and small speckles or islands in a binary image. These smallholes and islands are usually caused by factors such as system noise, threshold selection, and preprocessingmethodologies, and are referred to as salt and pepper noise.


Search Tips

Advanced Search


Search this book:


Consider the image shown in Figure 7.4.1. Let A denote the set of all black pixels. Choosing the structuringelement B shown in Figure 7.4.2, the opening of A by B

removes all the pepper noise (small black areas) from the input images. Doing a closing on A B,

closes the small white holes (salt noise) and results in the image shown in Figure 7.4.3.

Figure 7.4.1 SR71 with salt and pepper noise.

Figure 7.4.2 The structuring element B.

Figure 7.4.3 SR71 with salt and pepper noise removed.













Let a � {0, 1}X denote a source image and N the von Neumann neighborhood

The image b derived from a using the morphological salt and pepper noise removal technique is given by


Salt and pepper noise removal can also be accomplished with the appropriate median filter. In fact, there is aclose relationship between the morphological operations of opening and closing (gray level as well as binary)and the median filter. Images that remain unchanged after being median filtered are called median rootimages. To obtain the median root image of a given input image one simply repeatedly median filters thegiven image until there is no change. An image that is both opened and closed with respect to the samestructuring element is a median root image.

7.5. The Hit-and-Miss Transform

The hit-and-miss transform (HMT) is a natural operation to select out pixels that have certain geometricproperties such as corner points, isolated points, boundary points, etc. In addition, the HMT performstemplate matching, thinning, thickening, and centering.

Since an erosion or a dilation can be interpreted as special cases of the hit-and-miss transform, the HMT isconsidered to be the most general image transform in mathematical morphology. This transform is oftenviewed as the universal morphological transformation upon which mathematical morphology is based.

Let B = (D, E) be a pair of structuring elements. Then the hit-and-miss transform of the set A is given by theexpression

For practical applications it is assumed that . The erosion of A by D is obtained by simplyletting E = Ø, in which case Equation 7.5.1 becomes

Since a dilation can be obtained from an erosion via the duality A × B = (A2/B*)2, it follows that a dilation isalso a special case of the HMT.


Let and B = (D, E). Define , where , by

The image algebra equivalent of the hit-and-miss transform applied to the image a using the structuringelement B is given by


Let the neighborhoods N, M : X ’ 2X be defined by

and

An alternate formulation image algebra for the hit-and-miss transform is given by


Davidson proved that the HMT can also be accomplished using a linear convolution followed by a simple

threshold [15]. Let , where the enumeration is such that

and . Define an

integer-valued template r from by

Then

is another image algebra equivalent formulation of the HMT.

Figure 7.5.1 shows how the hit-and-miss transform can be used to locate square regions of a certain size. Thesource image is to the left of the figure. The structuring element B = (D, E) is made up of the 3 × 3 solidsquare D and the 9 × 9 square border E. In this example, the hit-and-miss transform is designed to “hit”regions that cover D and “miss” E. The two smaller square regions satisfy the criteria of the example design.This is seen in the image to the right of Figure 7.5.1.

The template used for the image algebra formulation of the example HMT is shown in Figure 7.5.2. In asimilar fashion, templates can be designed to locate any of the region configuration seen in the source image.

Figure 7.5.1 Hit-and-miss transform used to locate square regions. Region A (corresponding to source imagea) is transformed to region B (image b) with the morphological hit-and-miss transform using structuring

element B = (D, E) by A = A B. In image algebra notation, c = a t with t as shown in Figure 7.5.2.

Figure 7.5.2 Template used for the image algebra formulation of the hit-and-miss tranform designed to locatesquare regions.

7.6. Gray Value Dilations, Erosions, Openings, and Closings

Although morphological operations on binary images provide useful analytical tools for image analysis andclassification, they play only a very limited role in the processing and analysis of gray level images. In order

to overcome this severe limitation, Sternberg and Serra extended binary morphology in the early 1980s togray scale images via the notion of an umbra. As in the binary case, dilations and erosions are the basicoperations that define the algebra of gray scale morphology.

While there have been several extensions of the Boolean dilation to the gray level case, Sternberg’s formulaefor computing the gray value dilation and erosion are the most straightforward even though the underlying

theory introduces the somewhat extraneous concept of an umbra. Let X 4 and be a

function. Then the umbra of f, denoted by , is the set 4 , defined by

Note that the notion of an unbounded set is exhibited in this definition; the value of xn+1 can approach -�.

Since , we can dilate by any other subset of . This observation provides theclue for dilation of gray-valued images. In general, the dilation of a function by a function g : X ’

, where X 4 , is defined through the dilation of their umbras (f) × (g) as follows. Let

and define a function .

We now define f × g a d. The erosion of f by g is defined as the function f/g a e , where and

.





















Search Tips

Advanced Search


Search this book:


Openings and closings of f by g are defined in the same manner as in the binary case. Specifically, the

opening of f by g is defined as f g = (f/g) × g while the closing of f by g is defined as f • g = (f × g)/g.

When calculating the new functions d = f × g and e = f/g, the following formulas for two-dimensionaldilations and erosions are actually being used:

for the dilation, and

for the erosion, where is defined by . In practice, the function frepresents the image, while g represents the structuring element, where the support of g corresponds to a

binary structuring element. That is, one starts by defining a binary structuring element B in about theorigin, and then adds gray values to the cells in B. This defines a real-valued function g whose support is B.Also, the support of g is, in general, much smaller than the array on which f is defined. Thus, in practice, thenotion of an umbra need not be introduced at all.

Since there are slight variations in the definitions of dilation and erosion in the literature, we again remind thereader that we are using formulations that coincide with Minkowski’s addition and subtraction.


Let denote the gray scale source image and g the structuring element whose support is B. Define an

extended real-valued template t from to by

Note that the condition y - x � B is equivalent to x � By, where By denotes the translation of B by the vector y.












The image algebra equivalents of a gray scale dilation and a gray scale erosion of a by the structuring elementg are now given by

and

respectively. Here

The opening of a by g is given by

and the closing of a by g is given by


Davidson has shown that a subalgebra of the full image algebra, namely

, contains the algebra of gray scale mathematical morphology asa special case [15]. It follows that all morphological transformations can be easily expressed in the languageof image algebra.

That the algebra is more general than mathematical morphology should come as no surprise as templatesare more general objects than structuring elements. Since structuring elements correspond to translationinvariant templates, morphology lacks the ability to implement translation variant lattice transformseffectively. In order to implement such transforms effectively, morphology needs to be extended to includethe notion of translation variant structuring elements. Of course, this extension is already a part of imagealgebra.

7.7. The Rolling Ball Algorithm

As was made evident in the previous sections, morphological operations and transforms can be expressed in

terms of image algebra by using the operators and . Conversely, any image transform that is based onthese operators and uses only invariant templates can be considered a morphological transform. Thus, manyof the transforms listed in this synopsis such as skeletonizing and thinning, are morphological imagetransforms. We conclude this chapter by providing an additional morphological transform known as therolling ball algorithm.

The rolling ball algorithm, also known as the top hat transform, is a geometric shape filter that corresponds tothe residue of an opening of an image [5, 7, 8]. The ball used in this image transformation corresponds to astructuring element (shape) that does not fit into the geometric shape (mold) of interest, but fits well into thebackground clutter. Thus, by removing the object of interest, complementation will provide its location.

In order to illustrate the basic concept behind the rolling ball algorithm, let a be a surface in 3-space. Forexample, a could be a function whose values a(x) represent some physical measurement such as reflectivity at

points . Figure 7.7.1 represents a one-dimensional analogue in terms of a two-dimensional sliceperpendicular to the (x, y)-plane. Now suppose s is a ball of some radius r. Rolling this ball beneath thesurface a in such a way that the boundary of the ball, and only the boundary of the ball, always touches a,results in another surface b which is determined by the set of points consisting of all possible locations of thecenter of the ball as it rolls below a. The tracing of the ball’s center locations is illustrated in Figure 7.7.2.

Figure 7.7.1 The signal a.

It may be obvious that the generation of the surface b is equivalent to an erosion. To realize this equivalence,

let s : X ’ be defined by , where . Obviously, the graph of s

corresponds to the upper surface (upper hemisphere) of a ball s of radius r with center at the origin of .Using Equation 7.6.1, it is easy to show that b = a/s.

Figure 7.7.2 The surface generated by the center of a rolling ball.

Next, let s roll in the surface b such that the center of s is always a point of b and such that every point of bgets hit by the center of s. Then the top point of s traces another surface c above b. If a is flat or smooth, withlocal curvature never less than r, then c := a. However, if a contains crevasses into which s does not fit — thatis, locations at which s is not tangent to a or points of a with curvature less than r, etc. — then c ` a. Figure7.7.3 illustrates this situation. It should also be clear by now that c corresponds to a dilation of b by s.Therefore, c is the opening of a by s, namely

Hence, c d a.

Figure 7.7.3 The surface generated by the top of a rolling ball.

In order to remove the background of a — that is, those locations where s fits well beneath a — one simplysubtracts c from a in order to obtain the image d = a - c containing only areas of interest (Figure 7.7.4).

Figure 7.7.4 The result of the rolling ball algorithm.





















Search Tips

Advanced Search


Search this book:


Let denote the digital source image and s a structuring element represented by digital hemisphere ofsome desired radius r about the origin. Note that the support of s is a digital disk S of radius r. The rolling ballalgorithm is given by the transformation a ’ d which is defined by


Let denote the source image. Define an extended real-valued template s from to by

The image algebra formulation of the rolling ball algorithm is now given by


It should be clear from Figure 7.7.4 that such an algorithm could be used for the easy and fast detection of hotspots in IR images. There are many variants of this basic algorithm. Also, in many applications it isadvantageous to use different size balls for the erosion and dilation steps, or to use a sequence of balls (i.e., asequence of transforms a ’ di, i = 1, 2, …, k) in order to obtain different regions (different in shape) of interest.Furthermore, there is nothing to prevent an algorithm developer from using shapes different from disks orballs in the above algorithm. The shape of the structuring element is determined by a priori informationand/or by the objects one seeks.

7.8. References

1 H. Minkowski, “Volumen und oberflache,” Mathematische Annalen, vol. 57, pp. 447-495, 1903.












2 H. Minkowski, Gesammelte Abhandlungen. Leipzig-Berlin: Teubner Verlag, 1911.

3 H. Hadwiger, Vorlesungen Über Inhalt, OberflSche und Isoperimetrie. Berlin: Springer-Verlag,1957.

4 G. Matheron, Random Sets and Integral Geometry. New York: Wiley, 1975.

5 J. Serra, Image Analysis and Mathematical Morphology. London: Academic Press, 1982.

6 J. Klein and J. Serra, “The texture analyzer,” Journal of Microscopy, vol. 95, 1972.

7 S. Sternberg, “Biomedical image processing,” Computer, vol. 16, Jan. 1983.

8 S. Sternberg, “Overview of image algebra and related issues,” in Integrated Technology for ParallelImage Processing (S. Levialdi, ed.), London: Academic Press, 1985.

9 T. Crimmins and W. Brown, “Image algebra and automatic shape recognition,” IEEE Transactionson Aerospace and Electronic Systems, vol. AES-21, pp. 60-69, Jan. 1985.

10 R. Haralick, L. Shapiro, and J. Lee, “Morphological edge detection,” IEEE Journal of Robotics andAutomation, vol. RA-3, pp. 142-157, Apr. 1987.

11 R. Haralick, S. Sternberg, and X. Zhuang, “Image analysis using mathematical morphology: Part I,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 532-550, July 1987.

12 P. Maragos and R. Schafer, “Morphological filters Part I: Their set-theoretic analysis and relationsto linear shift-invariant filters,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.ASSP-35, pp. 1153-1169, Aug. 1987.

13 P. Maragos and R. Schafer, “Morphological filters Part II : Their relations to median, order-statistic,and stack filters,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, pp.1170-1184, Aug. 1987.

14 P. Maragos, A Unified Theory of Translation-Invariant Systems with Applications to MorphologicalAnalysis and Coding of Images. Ph.D. dissertation, Georgia Institute of Technology, Atlanta, 1985.

15 J. Davidson, Lattice Structures in the Image Algebra and Applications to Image Processing. PhDthesis, University of Florida, Gainesville, FL, 1989.

16 J. Davidson, “Foundation and applications of lattice transforms in image processing,” in Advancesin Electronics and Electron Physics (P. Hawkes, ed.), vol. 84, pp. 61-130, New York, NY: AcademicPress, 1992.

17 H. Heijmans, “Theoretical aspects of gray-level morphology,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 13(6), pp. 568-582, 1991.





















Search Tips

Advanced Search


Search this book:


Chapter 8Linear Image Transforms

8.1. Introduction

A large class of image processing transformations is linear in nature; an output image is formed from linearcombinations of pixels of an input image. Such transforms include convolutions, correlations, and unitarytransforms. Applications of linear transforms in image processing are numerous. Linear transforms have beenutilized to enhance images and to extract various features from images. For example, the Fourier transform isused in highpass and lowpass filtering (Chapter 2) as well as in texture analysis. Another application is imagecoding in which bandwidth reduction is achieved by deleting low-magnitude transform coefficients. In thischapter we provide some typical examples of linear transforms and their reformulations in the language ofimage algebra.

8.2. Fourier Transform

The Fourier transform is one of the most useful tools in image processing. It provides a realization of animage that is a composition of sinusoidal functions over an infinite band of frequencies. This realizationfacilitates many image processing techniques. Filtering, enhancement, encoding, restoration, texture analysis,feature classification, and pattern recognition are but a few of the many areas of image processing that utilizethe Fourier transform.

The Fourier transform is defined over a function space. A function in the domain of the Fourier transform issaid to be defined over a spatial domain. The corresponding element in the range of the Fourier transform issaid to be defined over a frequency domain. We will discuss the significance of the frequency domain afterthe one-dimensional Fourier transform and its inverse have been defined.

The Fourier transform of the function is defined by












where Given , then f can be recovered by using the inverse Fourier transform which isgiven by the equation

The functions f and are called a Fourier transform pair.

Substituting the integral definitions of the Fourier transform and its inverse into the above equation, thefollowing equality is obtained

The inner integral (enclosed by square brackets) is the Fourier transform of f. It is a function of u alone.

Replacing the integral definition of the Fourier transform of f by we get

Euler’s formula allows e2Àiux to be expressed as cos(2Àux) + i sin(2Àux). Thus, e2Àiux is a sum of a real andcomplex sinusoidal function of frequency u. The integral is the continuous analog of summing over all

frequencies u. The Fourier transform evaluated at u, , can be viewed as a weight applied to the real andcomplex sinusoidal functions at frequency u in a continuous summation.

Combining all these observations, the original function f(x) is seen as a continuous weighted sum ofsinusoidal functions. The weight applied to the real and complex sinusoidal functions of frequency u is givenby the Fourier transform evaluated at u. This explains the use of the term “frequency” as an adjective for thedomain of the Fourier transform of a function.

For image processing, an image can be mapped into its frequency domain representation via the Fouriertransform. In the frequency domain representation the weights assigned to the sinusoidal components of theimage become accessible for manipulation. After the image has been processed in the frequency domainrepresentation, the representation of the enhanced image in the spatial domain can be recovered using theinverse Fourier transform.

The discrete equivalent of the one-dimensional continuous Fourier transform pair is given by

and

where and . In digital signal processing, a is usually viewed as having been obtained

from a continuous function by sampling f at some finite number of uniformly spaced points

and setting a(k) = f(xk).

For , the two-dimensional continuous Fourier transform pair is given by

and

For discrete functions we have the two-dimensional discrete Fourier transform

with the inverse transform specified by

Figure 8.2.1 shows the image of a jet and its Fourier transform image. The Fourier transform image iscomplex valued and, therefore, difficult to display. The value at each point in Figure 8.2.1 is actually themagnitude of its corresponding complex pixel value in the Fourier transform image. Figure 8.2.1 does show afull period of the transform, however the origin of the transform does not appear at the center of the display. Arepresentation of one full period of the transform image with its origin shifted to the center of the display canbe achieved by multiplying each point a(x,y) by (-1)x+y before applying the transform. The jet’s Fouriertransform image shown with the origin at the center of the display is seen in Figure 8.2.2.

Figure 8.2.1 The image of a jet (left) and its Fourier transform image (right).

Figure 8.2.2 Fourier transform of jet with origin at center of display.


The discrete Fourier transform of the one-dimensional image is given by the image algebraexpression

where is the template defined by

The template f is called the one-dimensional Fourier template. It follows directly from the definition of f thatf2 = f and (f*)2 = f* where f* denotes the complex conjugate of f defined by

. Hence, the equivalent of the discrete inverse Fourier transform of â isgiven by

The image algebra equivalent formulation of the two-dimensional discrete Fourier transform pair is given by

and

where the two-dimensional Fourier template f is defined by

and

8.3. Centering the Fourier Transform

Centering the Fourier transform is a common operation in image processing. Centered Fourier transforms areuseful for displaying the Fourier spectra as intensity functions, for interpreting Fourier spectra, and for thedesign of filters.

The discrete Fourier transform and its inverse exhibit the periodicity

for an m × n image a. Additionally, since , the magnitude of the Fourier transformexhibits the symmetry





















Search Tips

Advanced Search


Search this book:


Periodicity and symmetry are the key ingredients for understanding the need for centering the Fourier

transform for interpretation purposes. The periodicity property indicates that has period of length m inthe u direction and of length n in the v direction, while the symmetry shows that the magnitude is centeredabout the origin. This is shown in Figure 8.2.1 where the origin (0,0) is located at the upper left hand corner ofthe image. Since the discrete Fourier transform has been formulated for values of u in the interval [0, m -1]and values v in the interval [0, n - 1], the result of this formulation yields two half periods in these intervalsthat are back to back. This is illustrated in Figure 8.3.1 (a), which shows a one-dimensional slice along the

u-axis of the magnitude function . Therefore, in order to display one full period in both the u and v

directions, it is necessary to shift the origin to the midpoint and add points that shift to the outside ofthe image domain back into the image domain using modular arithmetic. Figures 8.2.2 and 8.3.1 illustrate theeffect of this centering method. The reason for using modular arithmetic is that, in contrast to Figure 8.3.1 (a),

when Fourier transforming where , there is no information available outside theintervals [0, m - 1] and [0, n - 1]. Thus, in order to display the full periodicity centered at the midpoint in the

interval [0, m - 1], the intervals and need to be flipped and their ends glued together. Thesame gluing procedure needs to be performed in the v direction. This amounts to doing modular arithmetic.Figure 8.3.2 illustrates this process.

Figure 8.3.1 Periodicity of the Fourier transform. Figure (a) shows the two half periods, one on the interval

and the other on the interval . Figure (b) shows the full period of the shifted Fourier transform onthe interval [0, m].


Let , where and m and n are even integers. Define the centering function center :












X ’ X by

Note that the centering function is its own inverse since center (center(p)) = p. Therefore, if denotes thecentered version of â, then given by

Figure 8.3.2 illustrates the mapping of points when applying center to an array X.

Figure 8.3.2 The centering function applied to a 16 × 16 array. The different shadings indicate the mappingof points under the center function.


If X is some arbitrary rectangular subset of , then the centering function needs to take into account thelocation of the midpoint of X with respect to the minimum of X, whenever min(X) ` (0, 0). In particular, bydefining the function

the centering function now becomes

Note that mid(X) does not correspond to the midpoint of X, but gives the midpoint of the set X - min(X). Theactual midpoint of X is given by center(min(X)). This is illustrated by the example shown in Figure 8.3.3. Inthis example, min(X) = (4.5), max(X) = (13.18), and mid(X) = (5, 7). Also, if , then min(X) =

(0, 0), max(X) = (m - 1, n - 1), and . This shows that the general center function reduces tothe previous center function.

Figure 8.3.3 The shift X - min(X) of a 10 × 14 array X. The point mid(X) = (5, 7) is shown in black.

If X is a square array of form then centering can be accomplished by multiplying the image aby the image b defined by b(x, y) = (-1)x+y prior to taking the Fourier transform. This follows from the simplefact expressed by the equation

which holds whenever . Thus, in this case we can compute by using the formulation

8.4. Fast Fourier Transform

In this section we present the image algebra formulation of the Cooley-Tukey radix-2 fast Fourier transform

(FFT) [1, 2]. The image algebra expression for the Fourier transform of is given in Section 8.2 by

where f is the Fourier template defined by

Expanding a •f, we get the following equation for â:

In this form it is easy to see that O (n2) complex adds and multiplications are required to compute the Fouriertransform using the formulation of Section 8.2. For each 0 d j d n - 1, n complex multiplications of a(k) by

are required for a total of n2 complex multiplications. For each 0 d j d n - 1 the sum

requires n - 1 complex adds for a total of n2 - n complex adds. The complex arithmetic involved

in the computation of the terms does not enter into the measure of computational complexity that isoptimized by the fast Fourier algorithm.

The number of complex adds and multiplications can be reduced to O (n log2 n) by incorporating theCooley-Tukey radix-2 fast Fourier algorithm into the image algebra formulation of the Fourier transform. It isassumed for the FFT optimization that n is a power of 2.

The separability of the Fourier transform can be exploited to obtain a similar computational savings on higherdimensional images. Separability allows the Fourier transform to be computed over an image as a successionof one-dimensional Fourier transforms along each dimension of the image. Separability will be discussed inmore detail later.

For the mathematical foundation of the Cooley-Tukey fast Fourier algorithm the reader is referred to Cooleyet al. [1, 2, 3]. A detailed exposition on the integration of the Cooley-Tukey algorithm into the image algebraformulation of the FFT can be found in Ritter [4]. The separability of the Fourier transform and the definition

of the permutation function used in the image algebra formulation of the FFT will be discussedhere.

The separability of the Fourier transform is key to the decomposition of the two-dimensional FFT into twosuccessive one-dimensional FFTs. The structure of the image algebra formulation of the two-dimensionalFFT will reflect the utilization of separability.





















Search Tips

Advanced Search


Search this book:


The discrete Fourier transform, as defined in Section 8.2, is given by

The double summation can be rewritten as

or

where

For each 0 d k d n - 1, â(u, k) is the one-dimensional Fourier transform of the kth column of a. For each 0 d ud m - 1,

is the one-dimensional Fourier transform of the uth row of â(u, k).












From the above it is seen that a two-dimensional Fourier transform can be computed by a two-stageapplication of one-dimensional Fourier transforms. First, a one-dimensional Fourier transform is applied tothe columns of the image. In the second stage, another one-dimensional Fourier transform is applied to therows of the result of the first stage. The same result is obtained if the one-dimensional Fourier transform isfirst applied to the rows of the image, then the columns. Computing the two-dimensional Fourier transform inthis way has “separated” it into two one-dimensional Fourier transforms. By using separability, any saving inthe computational complexity of the one-dimensional FFT can be passed on to higher dimensional Fouriertransforms.

For n = 2k, the permutation is the function that reverses the bit order of the k-bit binaryrepresentation of its input, and outputs the decimal representation of the value of the reversed bit order. For

example, the table below represents . The permutation function will be used in the imagealgebra formulation of the FFT. The algorithm used to compute the permutation function Án is presented next.

Table 8.4.1 The Permutation Á8 and Its Corresponding Binary Evaluations

i Binary i Reversed binary i Á8(i)

0 000 000 0

1 001 100 4

2 010 010 2

3 011 110 6

4 100 001 1

5 101 101 5

6 110 011 3

7 111 111 7

For n = 2k and i = 0,1,...,n - 1 compute Án(i) as follows:


One-Dimensional FFT

Let , where n = 2k for some positive integer k. Let P ={2i : = 0, 1,...,log2n-1} and for p � P, define theparameterized template t(p) by

where . The following image algebra algorithm computes the Cooley-Tukey

radix-2 FFT.

The function Án is the permutation function defined earlier.

By the definition of the generalized convolution operator, a •t(2i-1) is equal to

Notice that from the definition of the template t there are at most 2 values of l in the support of t for every 0 dj 8804 n - 1. Thus, only 2n complex multiplications and n complex adds are required to evaluate a •t(2i-1).Since the convolution a •t(2i-1) is contained within a loop that consists of log2n iterations, there are O(n log n)complex adds and multiplications in the image algebra formulation of the FFT.

One-Dimensional Inverse FFT

The inverse Fourier transform can be computed in terms of the Fourier transform by simple conjugation. That

is, . The following algorithm computes the inverse FFT of using the forward FFTand conjugation.

Two-Dimensional FFT

In our earlier discussion of the separability of the Fourier transform, we noted that the two-dimensional DFTcan be computed in two steps by successive applications of the one-dimensional DFT; first along each rowfollowed by a one-dimensional DFT along each column. Thus, to obtain a fast Fourier transform fortwo-dimensional images, we need to apply the image algebra formulation of the one-dimensional FFT in

simple succession. However, in order to perform the operations and a •t(p) specified by thealgorithm, it becomes necessary to extend the function Án and the template t(p) to two-dimensional arrays.For this purpose, suppose that , where n = 2h and n = 2k, and assume without loss ofgenerality that n d m.

Let P = {2i : i = 0,1,...,log2m - 1} and for p � P define the parameterized row template

where . Note that for each p � P, t(p) is a row template which is essentially identicalto the template used in the one-dimensional case.

The permutation Á is extended to a function r : X ’ X in a similar fashion by restricting its actions to the rowsof X. In particular, define

With the definitions of r and t completed, we are now in a position to specify the two-dimensional radix-2FFT in terms of image algebra notation.

If X, rm and t are specified as above and , then the following algorithm computes the two-dimensionalfast Fourier transform of a.

Two-Dimensional Inverse FFT

As in the one-dimensional case, the two-dimensional inverse FFT is also computed using the forward FFTand conjugation. Let . The inverse FFT is given by the following algorithm:





















Search Tips

Advanced Search


Search this book:



The formulation of the fast Fourier transform above assumes that the image-template operation • is onlyapplied over the support of the template. If this is not the case, the “fast” formulation using templates willresult in much poorer performance than the formulation of Section 8.2. The alternate image algebraformulation for the fast transform uses spatial transforms rather than templates. The alternate formulation ismore appropriate if the implementation of • does not restrict its operations to the template’s support.

For the alternate formulation, let the variables of the two-dimensional FFT above remain as defined. Definethe functions fi, gi : X ’ X as follows:

The images used in the alternate formulation are defined as

The alternate formulation for the fast Fourier transform using spatial transforms is given by












Note that the use of the transpose a2 before and after the second loop in the code above is used for notationalsimplicity, not computational efficiency. The use of transposes can be eliminated by using analogues of thefunctions fi, gi, and wi in the second loop that are functions of the column coordinate of X.


The discussion presented here concerns a general algebraic optimization of the Fourier transform. Manyimportant issues arise when optimizing the formulation for specific implementations and architectures [4].

The permutation function is not needed in the formulation of the Fourier transform of Section 8.2. Thecomputation of the permutation function Án does add to the computational cost of the FFT. For each i �{0,1,...,n - 1}, Án(i) requires O(log2n) integer operations. Thus, the evaluation of Án will require a total ofO(nlog2n) integer operations. The amount of integer arithmetic involved in computing Án is of the same orderof magnitude as the amount of floating point arithmetic for the FFT. Hence the overhead associated with bitreversal is nontrivial in the computation of the FFT, often accounting for 10% to 30% of the total computationtime.

8.5. Discrete Cosine Transform

Let . The one-dimensional discrete cosine transform is defined by

where

The inverse one-dimensional discrete cosine transform is given by

The cosine transform provide the means of expressing an image as a weighted sum of cosine functions. Theweights in the sum are given by the cosine transform. Figure 8.5.1 illustrates this by showing how the squarewave

is approximated using the first five terms of the discrete cosine function.

For the two-dimensional cosine transform and its inverse are given by

and

respectively.

Figure 8.5.2 shows the image of a jet and the image which represents the pixel magnitude of its cosinetransform image.

Figure 8.5.1 Approximation of a square wave using the first five terms of the discrete cosine transform.


The template t used for the two-dimensional cosine transform is specified by

The image algebra formulations of the two-dimensional transform and its inverse are

and

respectively.

Figure 8.5.2 Jet image (left) and its cosine transform image.

The one-dimensional transforms are formulated similarly. The template used for the one-dimensional case is

The image algebra formulations of the one-dimensional discrete cosine transform and its inverse are

and

respectively.

In [5], a method for computing the one-dimensional cosine transform using the fast Fourier transform isintroduced. The method requires only a simple rearrangement of the input data into the FFT. The even and

odd terms of the original image , n = 2k are rearranged to form a new image b according to theformula

Using the new arrangement of terms the cosine transform can be written as

Substituting x = n - 1 - x2 into the second sum and using the sum of two angles formula for cosine functionsyields

The above can be rewritten in terms of the complex exponential function as

where denotes the inverse Fourier transform.

Let . Since b is real-valued we have that d(n - u) = id*(u).

Consequently,

Therefore, it is only necessary to compute terms of the inverse Fourier transform in order to calculate

for u = 0,1,...,n - 1.

The inverse cosine transform can be obtained from the real part of the inverse Fourier transform of

Let the images c and e be defined by

The even indexed terms are calculated using

The odd indexed terms are given by

With the information above it is easy to adapt the formulations of the one-dimension fast Fourier transforms(Section 8.4) for the implementation of fast one-dimensional cosine transforms. The cosine transform and itsinverse are separable. Therefore, fast two-dimensional transforms implementations are possible by takingone-dimensional transforms along the rows of the image, followed by one-dimensional transforms along thecolumns [5, 6, 7].





















Search Tips

Advanced Search


Search this book:


8.6. Walsh Transform

The Walsh transform was first defined in 1923 by Walsh [8], although in 1893 Hadamard [9] had achieved asimilar result by the application of certain orthogonal matrices, generally called Hadamard matrices, whichcontain only the entries +1 and -1.

In 1931 Paley provided an entirely different definition of the Walsh transform, which is the one used mostfrequently by mathematicians [10] and is the one used in this discussion.

The one-dimensional Walsh transform of where n = 2k, is given by

The term bj(z) denotes the jth bit in the binary expansion of z. For example,

The inverse of the Walsh transform is

The set of functions created from the product

form the basis of the Walsh transform and for each pair (x, u), gu(x) represents the (x, u) entry of theHadamard matrix mentioned above. Rewriting the expression for the inverse Walsh transform as












it is seen that a is a weighted sum of the basis functions gu(x). The weights in the sum are given by the Walshtransform. Figure 8.6.1 shows the Walsh basis functions for n = 4 and how an image is written as a weightedsum of the basis functions.

Figure 8.6.1 The Walsh basis for n = 4.

Thus, the Walsh transform and the Fourier transform are similar in that they both provide the coefficients forthe representation of an image as a weighted sum of basis functions. The basis functions for the Fouriertransform are sinusoidal functions of varying frequencies. The basis functions for the Walsh transform are the

elements of defined above. The rate of transition from negative to positive value in the Walshbasis function is analogous to the frequency of the Fourier basis function. The frequencies of the basisfunctions for the Fourier transform increase as u increases. However, the rate at which Walsh basis functionschange signs is not an increasing function of u.

For , the forward and reverse Walsh transforms are given by

and

respectively. Here again, a can be represented as a weighted sum of the basis functions

with coefficients given by the Walsh transform. Figure 8.6.2 shows the two-dimensional Walsh basis for n =4. The function g(u, v)(x, y) is represented by the image guv in which pixel values of 1 are white and pixel valuesof -1 are black.

Figure 8.6.2 Two-dimensional Walsh basis for n = 4.

Figure 8.6.3 shows the magnitude image (right) of the two-dimensional Walsh transform of the image of a jet(left).

Figure 8.6.3 Jet image and the magnitude image of its Walsh transform image.


The image algebra formulation of the fast Walsh transform is identical to that of the fast Fourier formulation(Section 8.4), with the exception that the template t used for the Walsh transform is

The Walsh transform shares the important property of separability with the Fourier transform. Thus, thetwo-dimensional Walsh transform can also be computed by taking the one-dimensional Walsh transformsalong each row of the image, followed by another one-dimensional Walsh transform along the columns.


If the convolution • does not restrict its operations to the template’s support, the spatial transform approachwill be much more efficient. The only change that needs to be made to the alternate fast Fourier transform ofSection 8.2 is in the definition of the wi images. For the spatial transform implementation of the fast Walshtransform, the images wi � {-1,1}X are defined as

In [11, 12], Zhu provides a fast version of the Walsh transform in terms of the p-product. Zhu’s method alsoeliminates the reordering process required in most fast versions of the Walsh transform. Specifically, given a

one-dimensional signal , where m = 2k, the Walsh transform of a is given by

where wi = (1, 1, 1, -1) for i = l, ..., k.

Note that the 2-product formulation of the Walsh transform involves only the values +1 and -1. Therefore

there is no multiplication involved except for final multiplication by the quantity .

For , where m = 2k and n = 2l, the two-dimensional Walsh transform of a is given by

where , with r = q . h, is defined by

Thus, the function Æm converts the vector [w1 •2(w2•2(...(wk •2a2)))] back into matrix form.

Additional p-product formulations for signals whose lengths are not powers of two can be found in [11, 12].


Search Tips

Advanced Search


Search this book:


8.7. The Haar Wavelet Transform

The simplest example of a family of functions yielding a multiresolution analysis of the space of allsquare-integrable functions on the real line is given by the Haar family of functions. These functions are welllocalized in space and are therefore appropriate for spatial domain analysis of signals and images.Furthermore, the Haar functions constitute an orthogonal basis. Hence, the discrete Haar transform benefitsfrom the properties of orthogonal transformations. For example, these transformations preserve inner productswhen interpreted as linear operators from one space to another. The inverse of an orthogonal transformation isalso particularly easy to implement, since it is simply the transpose of the direct transformation. Andrews [13]points out that orthogonal transformations are entropy preserving in an information theoretic sense. Aparticularly attractive feature of the Haar wavelet transform is its computational complexity, which is linearwith respect to the length of the input signal. Moreover, this transform, as well as other waveletrepresentations of images (see Mallat [14]), discriminates oriented edges at different scales.

The appropriate theoretical background for signal and image analysis in the context of a multiresolutionanalysis is given by Mallat [14], where multiresolution wavelet representations are obtained by way ofpyramidal algorithms. Daubechies provides a more general treatment of orthonormal wavelet bases andmultiresolution analysis [15], where the Haar multiresolution analysis is presented. The Haar function,defined below, is the simplest example of an orthogonal wavelet. It may be viewed as the first of the family ofcompactly supported wavelets discovered by Daubechies.

The discrete Haar wavelet transform is a separable linear transform that is based on the scaling function

and the dyadic dilations and integer translations of the Haar function












A matrix formulation of the Haar transform using orthogonal matrices follows. A convenient factorization ofthese matrices leads to a pyramidal algorithm, which makes use of the generalized matrix product of ordertwo. The algorithm has complexity O (n).

Let , and be a real-valued, one-dimensional signal. Associate with a the column

vector (a0,a1,...,an-1)2, where . To determine an orthonormal basis for

the vector space define

for the range of integer indices and 0 d q < 2p. For fixed p and q, the function hpq(x) is a translationby q of the function h(2px), which is a dilation of the Haar wavelet function h(x). Furthermore, hpq(x) is

supported on an interval of length 2-p. The infinite family of functions together with the scaling function g(x) constitute an orthogonal basis, known as the Haar basis, for the space

L2[0,1] of square-integrable functions on the unit interval. This basis can be extended to a basis for . TheHaar basis was first described in 1910 [16].

To obtain a discrete version of the Haar basis, note that any positive integer i can be written uniquely as

where and 0 d q > 2p. Using this fact, define

and

where i = 2p + q for i = 1, 2, ..., n - 1. The factor 2(p-k)/2 in Equation 8.7.1 normalizes the Euclidean vector

norm of the ui. Hence, ||ui||2 = 1 for all . The vector u0 is a normalized, discrete version of the scalingfunction g(x). Similarly, the vectors ui for i = 1,2,...,n - 1 are normalized, discrete versions of the waveletfunctions hpq(x). Furthermore, these vectors are mutually orthogonal, i.e., the dot product

Therefore, is an orthonormal basis for the vector space of one-dimensionalsignals of length n.

Now define the Haar matrices Hn for n = 2k, k a positive integer, by letting be the ith row vectorof Hn. The orthonormality of the row vectors of Hn implies that Hn is an orthogonal matrix, i.e.,

, where In is the n × n identity matrix. Hence, Hn is invertible with inverse

. Setting · = 2-1/2, the normalization factor may be written as

The Haar matrices for n = 2, 4, and 8 in terms of powers of Á are

and

The signal a may be written as a linear combination of the basis vectors ui:

where the ci (for all ) are unknown coefficients to be determined. Equation 8.7.2 may be written as

where is the vector of unknown coefficients, i.e., c(i) = ci for each . Using and solving for c, one obtains

Equation 8.7.4 defines the Haar transform HaarTransform1D for one-dimensional signals of length n = 2k,i.e.,

Furthermore, an image may be reconstructed from the vector c of Haar coefficients using Equation 8.7.3.Hence, define the inverse Haar transform InverseHaarTransform1D for one-dimensional signals by

To define the Haar transform for two-dimensional images, let m = 2k and n = 2l, with , and let

be a two-dimensional image. The Haar transform of a yields an m × n matrix c = [cij] ofcoefficients given by

where aij = a(i, j) for all . Equation 8.7.5 defines the Haar transformHaarTransform2D of a two-dimensional m × n image:


Search Tips

Advanced Search


Search this book:


The image a can be recovered from the m × n matrix c of Haar coefficients by the inverse Haar transform,InverseHaarTransform1D, for two-dimensional images:

i.e.,

Equation 8.7.6 is equivalent to the linear expansion

where for all and for all . The outer product ui — vj may beinterpreted as the image of a two-dimensional, discrete Haar wavelet. The sum over all combinations of theouter products, appropriately weighted by the coefficients cij, reconstructs the original image a.


The image algebra formulation of the Haar transform is based on a convenient factorization of the Haarmatrices [13]. To factor Hn for n = 2, 4, and 8, let












and

The Haar matrices H2, H4, and H8 may be factored as follows:

and

From this factorization, it is clear that only pairwise sums and pairwise differences with multiplication by nare required to obtain the Haar wavelet transforms (direct and inverse) of an image. Furthermore, theone-dimensional Haar transform of a signal of length 2k may be computed in k stages. The number of stages

corresponds to the number of factors for the Haar matrix . Computation of the Haar transform of a signalof length n requires 4(n - 1) multiplications and 2(n - 1) sums or differences. Hence, the computationalcomplexity of the Haar transform is O (n). Further remarks concerning the complexity of the Haar transformcan be found in Andrews [13] and Strang [17].

The image algebra formulation of the Haar wavelet transform may be expressed in terms of the normalizationconstant · = 2-1/2, the Haar scaling vector g = (·, ·)2, and the Haar wavelet vector h = (·, -·)2, as describedbelow.

One-Dimensional Haar Transform

Let , with n = 2k, . The following pyramidal algorithm implements

c := HaarTransform1D(a, n),

where a is treated as a row vector.

Example: Let a = (a0,a1,...,a7). The Haar transform

of a is computed in three stages, as shown in Figure 8.7.1.

Figure 8.7.1 Pyramidal algorithm data flow.

Note that the first entry of the resulting vector is a scaled global sum, since

Two-Dimensional Haar Transform

Let m = 2k and n = 2l, with and let be a two-dimensional image. The followingalgorithm implements

The one-dimensional Haar transform of each row of a is computed first as an intermediate image b. Thesecond loop of the algorithm computes the Haar transform of each column of b. This procedure is equivalent

to the computation .

Example: Consider the gray scale rendition of an input image of a 32 × 32 letter “A” as shown in Figure8.7.2. In the gray scale images shown here, black corresponds to the lowest value in the image and whitecorresponds to the highest value. For the original image, black = 0 and white = 255.

Figure 8.7.2 Input image.

The row-by-row Haar transform of the original image is shown in Figure 8.7.3, and the column-by-columnHaar transform of the original image is shown in Figure 8.7.4.

Figure 8.7.3 Row-by-row Haar transform of the input image.

Figure 8.7.4 Column-by-column Haar transform of the input image.

Computing the column-by-column Haar transform of the image in Figure 8.7.3, or computing the row-by-rowHaar transform of the image in Figure 8.7.4, yields the Haar transform of the original image. The result isshown in Figure 8.7.5. Interpretation of Figure 8.7.5 is facilitated by recognizing that the 25 × 25 matrix ofcoefficients is partitioned into a multiresolution grid consisting of (5 + 1) × (5 + 1) subregions, as shown inFigure 8.7.6. Each subregion corresponds to a particular resolution or scale along the x- and y-axes. Forexample, the labeled subregion ´2,3 contains a 22 × 23 = 4 × 8 version of the original image at a scale of 1/8along the x-axis and a scale of 1/4 along the y-axis.

Figure 8.7.5 Gray scale Haar transform of the input image.

Figure 8.7.6 Haar transform multiresolution grid.

The large value in the upper left-hand pixel in Figure 8.7.5 corresponds to a scaled global sum. By eliminatingthe top row and the left column of pixels, the remaining pixels are rendered more clearly. The result is shownin Figure 8.7.7. The bright white pixel in the top row results from a dominant feature of the original image,namely, the right-hand diagonal of the letter “A”. The Haar wavelet transform is therefore sensitive to thepresence and orientation of edges at different scales in the image.

Figure 8.7.7 Gray scale Haar transform without top row and left column.

8.8. Daubechies Wavelet Transforms

The Daubechies family of orthonormal wavelet bases for the space of square-integrable functionsgeneralize the Haar wavelet basis. Each member of this family consists of the dyadic dilations and integertranslations

of a compactly supported wavelet function È(x). The wavelet function in turn is derived from a scalingfunction Æ(x), which also has compact support. The scaling and wavelet functions have good localization inboth the spatial and frequency domains [15, 18]. Daubechies was the first to describe compactly supportedorthonormal wavelet bases [18].





















Search Tips

Advanced Search


Search this book:


The orthonormal wavelet bases were developed by Daubechies within the framework of a multiresolutionanalysis. Mallat [14] has exploited the features of multiresolution analysis to develop pyramidaldecomposition and reconstruction algorithms for images. The image algebra algorithms presented here arebased on these pyramidal schemes and build upon the work by Zhu and Ritter [12], in which the Daubechieswavelet transform and its inverse are expressed in terms of the p-product. The one-dimensional wavelettransforms (direct and inverse) described by Zhu and Ritter correspond to a single stage of theone-dimensional pyramidal image algebra algorithms. Computer routines and further aspects of thecomputation of the Daubechies wavelet transforms may be found in Press et al. [19].

For every a discrete wavelet system, denoted by D2g is defined by a finite number of coefficients,a0,a1,...,a2g-1. These coefficients, known as scaling or wavelet filter coefficients, satisfy specific orthogonalityconditions. The scaling function Æ(x) is defined to be a solution of

Moreover, the set of coefficients

defines the wavelet function È(x) associated with the scaling function of the wavelet system:

To illustrate how the scaling coefficients are derived, consider the case g = 2. Let

and












Note that s and w satisfy the orthogonality relation s • w = 0. The coefficients a0,a1,a2, and a3 are uniquelydetermined by the following equations:

Equations 8.8.1 are equivalent to

Solving for the four unknowns in the four Equations 8.8.1 yields

In general, the coefficients for the discrete wavelet system D2g are determined by 2g equations in 2gunknowns. Daubechies has tabulated the coefficients for g = 2, 3,..., 10 [15, 18]. Higher-order wavelets aresmoother but have broader supports.


Let . For the direct and inverse D2g wavelet transforms, let

and

denote the scaling vector and the wavelet vector, respectively, of the wavelet transform. Furthermore, let

and

In the computation of the wavelet transform of a signal or image, the scaling vector acts as lowpass filter,while the wavelet vector acts as a bandpass filter.

The algorithms in this section make use of specific column and row representations of matrices. For

and , the column vector of f is defined by

and the row vector of f is defined by row(f) = (col(f))2.

When computing wavelet coefficients of a data vector near the end of the vector, the support of the scalingvector or wavelet vector may exceed the support of the data. In such cases, it is convenient to assume periodicboundary conditions on the data vector, i.e., the data vector is treated as a circular vector. Appropriate spatialmaps are defined below to represent required shifts for data addressing.

One-Dimensional Daubechies Wavelet Transform

Let , with and let be a one-dimensional signal.Circular shifts of a row vector may be represented by composition with the spatial map Kq : X ’ X defined by

where .

The following pyramidal algorithm computes the one-dimensional wavelet transform of f, wheref is treated as a row vector.

Remarks• The wavelet transform is computed in R stages. Each stage corresponds to a scale or level ofresolution. The number of stages required to compute the wavelet transform is a function of the lengthN = 2k of the input data vector and the length 2g of the wavelet or scaling vector.

• ÃR and ÃR are row vectors of length each.

• The case g = 1 yields the Haar wavelet transform with s =: s0 = g and w = w0 = h, where g and h arethe Haar scaling and wavelet vectors, respectively.

The original signal f may be reconstructed from the vector c by the inverse wavelet transform. The following

algorithm uses the tensor product to obtain the inverse wavelet transform of where c is treated asa row vector.

Remarks• The reconstructed data vector d has dimensions 1 × N.

• ds and dw are matrices having dimensions 2 × L.

Example: Let g = 2. The scaling coefficients for D4 are given by Equations 8.8.3. A typical wavelet in the D4

wavelet basis may be obtained by taking the inverse wavelet transform of a canonical unit vector of length N= 2k. For k = 8, the graph of the inverse D4 wavelet transform of e11 is shown in Figure 8.8.1.

Figure 8.8.1 Typical D4 wavelet.

Two-Dimensional Daubechies Wavelet Transform

Let , with M = 2k, N = 2l, and . Circular column shifts of atwo-dimensional image are easily represented by composition with the spatial map Kq : X ’ X defined by

with . Analogously, circular row shifts of two-dimensional images may be represented bycomposition with the spatial map Áq : X ’ X defined by

with .

Let be a two-dimensional image. The following algorithm computes the two-dimensional

wavelet transform of f, where f is treated as an M × N matrix. The one-dimensional transform ofeach row of f is computed first as an intermediate image c. The second loop of the algorithm computes theone-dimensional transform of each column of c.

Search Tips

Advanced Search


Search this book:


Example: Consider the gray scale rendition of an input image of a 32 × 32 letter “A” as shown in Figure8.8.2. In the gray scale images shown here, black corresponds to the lowest value in the image and whitecorresponds to the highest value. For the original image, black = 0 and white = 255.

Figure 8.8.2 Input image.

Suppose g = 2. The row-by-row D4 wavelet transform of the original image is shown in Figure 8.8.3, and thecolumn-by-column D4 wavelet transform of the original image is shown in Figure 8.8.4.

Figure 8.8.3 Row-by-row wavelet transform of the input image.

Figure 8.8.4 Column-by-column wavelet transform of the input image.

Computing the column-by-column wavelet transform of the image in Figure 8.8.3, or computing the












row-by-row wavelet transform of the image in Figure 8.8.4, yields the D4 wavelet transform of the originalimage. The result is shown in Figure 8.8.5. As in the Haar wavelet representation, the Daubechies waveletrepresentation discriminates the location, scale, and orientation of edges in the image.

Figure 8.8.5 Gray scale wavelet transform of the input image.


The alternate image algebra formulation of the one-dimensional wavelet transform presented here uses the

dual of the 2-product instead of the tensor product. Let

The following algorithm computes the inverse one-dimensional wavelet transform of

, where and .

8.9. References

1 J. Cooley, P. Lewis, and P. Welch, “The fast Fourier transform and its applications,” IEEETransactions on Education, vol. E-12, no. 1, pp. 27-34, 1969.

2 J. Cooley and J. Tukey, “An algorithm for the machine calculation of complex Fourier series,”Mathematics of Computation, vol. 19, pp. 297-301, 1965.



5 M. Narasimha and A. Peterson, “On the computation of the discrete cosine transform,” IEEETransactions on Communications, vol. COM-26, no. 6, pp. 934-936, 1978.

6 W. Chen, C. Smith, and S. Fralick, “A fast computational algorithm for the discrete cosinetransform,” IEEE Transactions on Communications, vol. COM-25, pp. 1004-1009, 1977.

7 N. Ahmed, T. Natarajan, and K. Rao, “Discrete cosine transform,” IEEE Transactions on Computers,vol. C-23, pp. 90-93, 1974.

8 J. Walsh, “A closed set of normal orthogonal functions,” American Journal of Mathematics, vol. 45,no. 1, pp. 5-24, 1923.

9 M. J. Hadamard, “Resolution d’une question relative aux determinants,” Bulletin des SciencesMathematiques, vol. A17, pp. 240-246, 1893.

10 R. E. A. C. Paley, “A remarkable series of orthogonal functions,” Proceedings of the LondonMathematical Society, vol. 34, pp. 241-279, 1932.


11 H. Zhu, The Generalized Matrix Product and its Applications in Signal Processing. Ph.D.dissertation, University of Florida, Gainesville, 1993.

12 H. Zhu and G. X. Ritter, “The p-product and its applications in signal processing,” SIAM Journal ofMatrix Analysis and Applications, vol. 16, no. 2, pp. 579-601, 1995.

13 H. Andrews, Computer Techniques in Image Processing. New York: Academic Press, 1970.

14 S. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEEPattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, 1989.

15 I. Daubechies, Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in AppliedMathematics, Philadelphia, PA: SIAM, 1992.

16 A. Haar, “Zur theorie der orthogonalen funktionensysteme,” Mathematische Annalen, vol. 69, pp.331-371, 1910.

17 G. Strang, “Wavelet transforms versus fourier transforms,” Bulletin of the American MathematicalSociety (new series), vol. 28, pp. 288-305, Apr. 1993.

18 I. Daubechies, “Orthonormal bases of wavelets,” Communications on Pure and AppliedMathematics, vol. 41, pp. 909-996, 1988.

19 W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C. Cambridge:Cambridge University Press, 1992.





















Search Tips

Advanced Search


Search this book:


Chapter 9Pattern Matching and Shape Detection

9.1. Introduction

This chapter covers two related image analysis tasks: object detection by pattern matching and shapedetection using Hough transform techniques. One of the most fundamental methods of detecting an object ofinterest is by pattern matching using templates. In template matching, a replica of an object of interest iscompared to all objects in the image. If the pattern match between the template and an object in the image issufficiently close (e.g., exceeding a given threshold), then the object is labeled as the template object.

The Hough transform provides for versatile methods for detecting shapes that can be described in terms ofclosed parametric equations or in tabular form. Examples of parameterizable shapes are lines, circles, andellipses. Shapes that fail to have closed parametric equations can be detected by a generalized version of theHough transform that employs lookup table techniques. The algorithms presented in this chapter address bothparametric and non-parametric shape detection.

9.2. Pattern Matching Using Correlation

Pattern matching is used to locate an object of interest within a larger image. The pattern, which represents theobject of interest, is itself an image. The image is scanned with the given pattern to locate sites on the imagethat match or bear a strong visual resemblance to the pattern. The determination of a good match between animage a and a pattern template p is usually given in terms of the metric

where the sum is over the support of p. The value of d will be small when a and p are almost identical andlarge when they differ significantly. It follows that the term £ap will have to be large whenever d is small.Therefore, a large value of £ap provides a good measure of a match. Shifting the pattern template p over allpossible locations of a and computing the match £ap at each location can therefore provide for a set candidatepixels of a good match. Usually, thresholding determines the final locations of a possible good match.












The method just described is known as unnormalized correlation, matched filtering, or template matching.There are several major problems associated with unnormalized correlation. If the values of a are large overthe template support at a particular location, then it is very likely that £ap is also large at that location, even ifno good match exists at that location. Another problem is in regions where a and p have a large number ofzeros in common (i.e., a good match of zeros). Since zeros do not contribute to an increase of the value £ap, amismatch may be declared, even though a good match exists. To remedy this situation, several methods ofnormalized correlation have been proposed. One such method uses the formulation

where ± = £a and the sum is over the region of the support of p. Another method uses the factor

which keeps the values of the normalized correlation between -1 and 1. Values closer to 1 represent bettermatches [1, 2].

The following figures illustrate pattern matching using normalized correlation. Figure 9.2.1 is an image of anindustrial site with fuel storage tanks. The fuel storage tanks are the objects of interest for this example, thusthe image of Figure 9.2.2 is used as the pattern. Figure 9.2.3 is the image representation for the values ofpositive normalized correlation between the storage tank pattern and the industrial site image for each point inthe domain of the industrial site image. Note that the locations of the six fuel storage tanks show up as brightspots. There are also locations of strong correlation that are not locations of storage tanks. Referring back tothe source image and pattern, it is understandable why there is relatively strong correlation at these falselocations. Thresholding Figure 9.2.3 helps to pinpoint the locations of the storage tanks. Figure 9.2.4represents the thresholded correlation image.

Figure 9.2.1 Industrial site with fuel storage tanks.

Figure 9.2.2 Pattern used to locate fuel storage tanks.

Figure 9.2.3 Image representation of positive normalized correlation resulting from applying the pattern ofFigure 9.2.2 to the image of Figure 9.2.1.

Figure 9.2.4 Thresholded correlation image.








Search Tips

Advanced Search


Search this book:



The exact formulation of a discrete correlation of an M × N image with a pattern p of size (2m - 1)× (2n - 1) centered at the origin is given by

For (x + k, y + l) X, one assumes that a(x + k, y + l) = 0. It is also assumed that the pattern size is generallysmaller than the sensed image size. Figure 9.2.5 illustrates the correlation as expressed by Equation 9.2.1.

Figure 9.2.5 Computation of the correlation value c(x, y) at a point (x, y) � X.

To specify template matching in image algebra, define an invariant pattern template t, corresponding to thepattern p centered at the origin, by setting

The unnormalized correlation algorithm is then given by

The following simple computation shows that this agrees with the formulation given by Equation 9.2.1.

By definition of the operation •, we have that












Since t is translation invariant, t(x, y)(u, v) = t(0, 0)(u - x, v - y). Thus, Equation 9.2.2 can be written as

Now t(0, 0)(u - x, v - y) = 0 unless (u - x, v - y) � S(t(0, 0)) or, equivalently, unless -(m -1) d u - x d m - 1 and -(n -1) d v - y d n - 1. Changing variables by letting k = u - x and l = v - y changes Equation 9.2.3 to

To compute the normalized correlation image c, let N denote the neighborhood function defined by N(y) =S(ty). The normalized correlation image is then computed as

An alternate normalized correlation image is given by the statement

Note that £t(0, 0) is simply the sum of all pixel values of the pattern template at the origin.


To be effective, pattern matching requires an accurate pattern. Even if an accurate pattern exists, slightvariations in the size, shape, orientation, and gray level values of the object of interest will adversely affectperformance. For this reason, pattern matching is usually limited to smaller local features which are moreinvariant to size and shape variations of an object.

9.3. Pattern Matching in the Frequency Domain

The purpose of this section is to present several approaches to template matching in the spectral or Fourierdomain. Since convolutions and correlations in the spatial domain correspond to multiplications in thespectral domain, it is often advantageous to perform template matching in the spectral domain. This holdsespecially true for templates with large support as well as for various parallel and optical implementations ofmatched filters.

It follows from the convolution theorem [3] that the spatial correlation a•t corresponds to multiplication in thefrequency domain. In particular,

where â denotes the Fourier transform of a, denotes the complex conjugate of , and the inverse

Fourier transform. Thus, simple pointwise multiplication of the image â with the image and Fouriertransforming the result implements the spatial correlation a •t.

One limitation of the matched filter given by Equation 9.3.1 is that the output of the filter depends primarilyon the gray values of the image a rather than on its spatial structures. This can be observed when consideringthe output image and its corresponding gray value surface shown in Figure 9.3.2. For example, the letter E inthe input image (Figure 9.3.1) produced a high-energy output when correlated with the pattern letter B shownin Figure 9.3.1. Additionally, the filter output is proportional to its autocorrelation, and the shape of the filteroutput around its maximum match is fairly broad. Accurately locating this maximum can therefore be difficult

in the presence of noise. Normalizing the correlated image , as done in the previoussection, alleviates the problem for some mismatched patterns. Figure 9.3.3 provides an example of anormalized output. An approach to solving this problem in the Fourier domain is to use phase-only matchedfilters.

The transfer function of the phase-only matched filter is obtained by eliminating the amplitude of through

factorization. As shown in Figure 9.3.4, the output of the phase-only matched filter provides amuch sharper peak than the simple matched filter since the spectral phase preserves the location of objects butis insensitive to the image energy [4].

Further improvements of the phase-only matched filter can be achieved by correlating the phases of both a

and t. Figure 9.3.5 shows that the image function approximates the Dirac ´-function at thecenter of the letter B, thus providing even sharper peaks than the phase-only matched filter. Note also thesuppression of the enlarged B in the left-hand corner of the image. This filtering technique is known as thesymmetric phase-only matched filter or SPOMF [5].

Figure 9.3.1 The input image a is shown on the left and the pattern template t on the right.

Figure 9.3.2 The correlated output image and its gray value surface.

Figure 9.3.3 The normalized correlation image , where N(y) = S(ty)and its gray value surface.

Figure 9.3.4 The phase-only correlation image and its gray value surface.

Figure 9.3.5 The symmetric phase-only correlation image and its gray value surface.





















Search Tips

Advanced Search


Search this book:



Although the methods for matched filtering in the frequency domain are mathematically easily formulated,

the exact digital specification is a little more complicated. The multiplication implies that we must have

two images â and of the same size. Thus a first step is to create the image from the pattern template t.

To create the image , reflect the pattern template t across the origin by setting p(x) = t(0, 0)(-x). This will

correspond to conjugation in the spectral domain. Since t is an invariant template defined on , p is an

image defined over . However, what is needed is an image over domain(a) = X. As shown in Figure 9.2.5,the support of the pattern template intersects X in only the positive quadrant. Hence, simple restriction of p toX will not work. Additionally, the pattern image p needs to be centered with respect to the transformed imageâ for the appropriate multiplication in the Fourier domain. This is achieved by translating p to the corners of aand then restricting to the domain of a. This process is illustrated in Figure 9.3.6. Specifically, define

Note that there are two types of additions in the above formula. One is image addition and the other isimage-point addition which results in a shift of the image. Also, the translation of the image p, achieved byvector addition, translates p one unit beyond the boundary of X in order to avoid duplication of theintersection of the boundary of X with p at the corner of the origin.

Figure 9.3.6 The image p created from the pattern template t using reflection, translations to the corners ofthe array X, and restriction to X. The values of p are zero except in the shaded area, where the values areequal the corresponding gray level values in the support of t at (0,0).












The image is now given by , the Fourier transform of p. The correlation image c can therefore beobtained using the following algorithm:

Using the image p constructed in the above algorithm, the phase-only filter and the symmetric phase-onlyfilter have now the following simple formulation:

and

respectively.


In order to achieve the phase-only matching component to the matched filter approach we needed to divide

the complex image by the amplitude image . Problems can occur if some pixel values of are equal tozero. However, in the image algebra pseudocode of the various matched filters we assume that

, where denotes the pseudoinverse of . A similar comment holds for the quotient

.

Some further improvements of the symmetric phase-only matched filter can be achieved by processing thespectral phases [6, 7, 8, 9].

9.4. Rotation Invariant Pattern Matching

In Section 9.2 we noted that pattern matching using simple pattern correlation will be adversely affected if thepattern in the image is different in size or orientation then the template pattern. Rotation invariant patternmatching solves this problem for patterns varying in orientation. The technique presented here is a digitaladaptation of optical methods of rotation invariant pattern matching [10, 11, 12, 13, 14].

Computing the Fourier transform of images and ignoring the phase provides for a pattern matching approachthat is insensitive to position (Section 9.3) since a shift in a(x, y) does not affect |â(u, v)|. This follows fromthe Fourier transform pair relation

which implies that

where x0 = y0 = N/2 denote the midpoint coordinates of the N × N domain of â. However, rotation of a(x, y)rotates |â(u, v)| by the same amount. This rotational effect can be taken care of by transforming |â(u, v)| to

polar form (u, v) (r, ¸). A rotation of a(x, y) will then manifest itself as a shift in the angle ¸. Afterdetermining this shift, the pattern template can be rotated through the angle ¸ and then used in one of thestandard correlation schemes in order to find the location of the pattern in the image.

The exact specification of this technique — which, in the digital domain, is by no means trivial — is providedby the image algebra formulation below.

Search Tips

Advanced Search


Search this book:



Let t denote the pattern template and let a , where N = 2n, denote the image containing therotated pattern corresponding to t. The rotation invariant pattern scheme alluded to above can be broken downinto seven basic steps.

Step 1. Extend the pattern template t to a pattern image over the domain of a.

This step can be achieved in a variety of ways. One way is to use the method defined in Section 9.3. Anothermethod is to simply set

This is equivalent to extending t(N/2,N/2) outside of its support to the zero image on .

Step 2. Fourier transform a and p.

Step 3. Center the Fourier transformed images (see Section 8.3).

Centering is a necessary step in order to avoid boundary effects when using bilinear interpolation at asubsequent stage of this algorithm.

Step 4. Scale the Fourier spectrum.












This step is vital for the success of the proposed method. Image spectra decrease rather rapidly as a functionof increasing frequency, resulting in suppression of high-frequency terms. Taking the logarithm of the Fourierspectrum increases the amplitude of the side lobes and thus provides for more accurate results whenemploying the symmetric phase-only filter at a later stage of this algorithm.

Step 5. Convert â and to continuous image.

The conversion of â and to continuous images is accomplished by using bilinear interpolation. An image

algebra formulation of bilinear interpolation can be found in [15]. Note that because of Step 4, â and are

real-valued images. Thus, if âb and denote the interpolated images, then , where

That is, âb and are real-valued images over a point set X with real-valued coordinates.

Although nearest neighbor interpolation can be used, bilinear interpolation results in a more robust matchingalgorithm.

Step 6. Convert to polar coordinates.

Define the point set

and a spatial function f : Y ’ X by

Next compute the polar images.

Step 7. Apply the SPOMF algorithm (Section 9.3).

Since the spectral magnitude is a periodic function of À and ¸ ranges over the interval [-À = ¸0, ¸N = À], the

output of the SPOMF algorithm will produce two peaks along the ¸ axis, ¸j and ¸k for some and

. Due to the periodicity, |¸j| + |¸k| = À and, hence, k = -(j + N/2). One of these two anglescorresponds to the angle of rotation of the pattern in the image with respect to the template pattern. Thecomplementary angle corresponds to the same image pattern rotated 180 °.

To find the location of the rotated pattern in the spatial domain image, one must rotate the pattern template (orinput image) through the angle ¸j as well as the angle ¸k. The two templates thus obtained can then be used inone of the previous correlation methods. Pixels with the highest correlation values will correspond to thepattern location.


The following example will help to further clarify the algorithm described above. The pattern image p andinput image a are shown in Figure 9.4.1. The exemplar pattern is a rectangle rotated through an angle of 15°while the input image contains the pattern rotated through an angle of 70°. Figure 9.4.2 shows the output ofStep 4 and Figure 9.4.3 illustrates the conversion to polar coordinates of the images shown in Figure 9.4.2.The output of the SPOMF process (before thresholding) is shown in Figure 9.4.4. The two high peaks appearon the ¸ axis (r = 0).

Figure 9.4.1 The input image a is shown on the left and the pattern template p on the right.

The reason for choosing grid spacing in Step 6 is that the maximum value of r is

which prevents mapping the polar coordinates outside the set X. Finer sampling gridswill further improve the accuracy of pattern detection; however, computational costs will increaseproportionally. A major drawback of this method is that it works best only when a single object is present inthe image, and when the image and template backgrounds are identical.

Figure 9.4.2 The log of the spectra of â (left) and (right).

Figure 9.4.3 Rectangular to polar conversion of â (left) and (right).

Figure 9.4.4 SPOMF of image and pattern shown in Figure 9.4.3.

9.5. Rotation and Scale Invariant Pattern Matching

In this section we discuss a method of pattern matching which is invariant with respect to both rotation andscale. The two main components of this method are the Fourier transform and the Mellin transform. Rotationinvariance is achieved by using the approach described in Section 9.4. For scale invariance we employ the

Mellin transform. Since the Mellin transform of an image a is given by

it follows that if b(x, y) = a(±x, ±y), then

Therefore,

which shows that the Mellin transform is scale invariant.

Implementation of the Mellin transform can be accomplished by use of the Fourier transform by rescaling theinput function. Specifically, letting ³ = logx and ² = logy we have

Therefore,

which is the desired result.

It follows that combining the Fourier and Mellin transform with a rectangular to polar conversion yields arotation and scale invariant matching scheme. The approach takes advantage of the individual invarianceproperties of these two transforms as summarized by the following four basic steps:

(1) Fourier transform

(2) Rectangular to polar conversion

(3) Logarithmic scaling of r

(4) SPOMF





















Search Tips

Advanced Search


Search this book:



The image algebra formulation of the rotation and scale invariant pattern matching algorithm follows thesame steps as those listed in the rotation invariant pattern matching scheme in Section 9.4 with the exceptionof Step 6, in which the point set Y should be defined as follows:

The rescaling of ri corresponds to the logarithmic scaling of r in the Fourier-Mellin transform.

As in the rotation invariant filter, the output of the SPOMF will result in two peaks which will be a distance Àapart in the ¸ direction and will be offset by a factor proportional to the scaling factor from the ¸ axis. For 0 d i< N/2, this will correspond to an enlargement of the pattern, while for N/2 < i d N -1, the proportional scalingfactor will correspond to a shrinking of the pattern [5]. The following example (Figures 9.5.1 to 9.5.3)illustrates the important steps of this algorithm.

Figure 9.5.1 The input image a is shown on the left and the pattern template p on the right.

Figure 9.5.2 The log of the spectra of â (left) and (right).

Figure 9.5.3 Rectangular to polar-log conversion of â (left) and (right).

9.6. Line Detection Using the Hough Transform

The Hough transform is a mapping from into the function space of sinusoidal functions. It was firstformulated in 1962 by Hough [16]. Since its early formulation, this transform has undergone intenseinvestigations which have resulted in several generalizations and a variety of applications in computer visionand image processing [1, 2, 17, 18, 19]. In this section we present a method for finding straight lines using theHough transform. The input for the Hough transform is an image that has been preprocessed by some type ofedge detector and thresholded (see Chapters 3 and 4). Specifically, the input should be a binary edge image.

Figure 9.5.4 SPOMF of image and pattern shown in Figure 9.5.3

A straight “line” in the sense of the Hough algorithm is a colinear set of points. Thus, the number of points ina straight line could range from one to the number of pixels along the diagonal of the image. The quality of astraight “line” is judged by the number of points in it. It is assumed that the natural straight lines in an imagecorrespond to digitized straight “lines” in the image with relatively large cardinality.

A brute force approach to finding straight lines in a binary image with N feature pixels would be to examine

all possible straight lines between the feature pixels. For each of the possible lines, N - 2tests for colinearity must be performed. Thus, the brute force approach has a computational complexity on theorder of N3. The Hough algorithm provides a method of reducing this computational cost.

To begin the description of the Hough algorithm, we first define the Hough transform and examine some of

its properties. The Hough transform is a mapping h from into the function space of sinusoidal functionsdefined by

To see how the Hough transform can be used to find straight lines in an image, a few observations need to bemade.

Any straight line l0 in the xy-plane corresponds to a point (Á0, ¸0) in the Á¸-plane, where ¸0 � [0, À) and

. Let n0 be the line normal to l0 that passes through the origin of the xy-plane. The angle n0

makes with the positive x-axis is ¸0. The distance from (0, 0) to l0 along n0 is |Á0|. Figure 9.6.1 belowillustrates the relation between l0, n0, ¸0, and Á0. Note that the x-axis in the figure corresponds to the point (0,0), while the y-axis corresponds to the point (0, À/2).

Figure 9.6.1 Relation of rectangular to polar representation of a line.

Suppose (xi, yi), 1 d i d n, are points in the xy-plane that lie along the straight line l0 (see Figure 9.6.1). Theline l0 has a representation (Á0, ¸0) in the Á¸-plane. The Hough transform takes each of the points (xi, yi) to asinusoidal curve Á = xicos(¸) + yisin(¸) in the ¸Á-plane. The property that the Hough algorithm relies on is thateach of the curves Á = xicos(¸) + yisin(¸) have a common point of intersection, namely (Á0, ¸0). Conversely,the sinusoidal curve Á = x cos(¸) + y sin(¸) passes through the point (Á0, ¸0) in the Á¸-plane only if (x, y) lieson the line (Á0, ¸0) in the xy-plane.

As an example, consider the points (1, 7), (3, 5), (5, 3), and (6, 2) in the xy-plane that lie along the line l0 with

¸ and Á representation and , respectively.Figure 9.6.2 shows these points and the line l0.

Figure 9.6.2 Polar parameters associated with points lying on a line.





















Search Tips

Advanced Search


Search this book:


The Hough transform maps the indicated points to the sinusoidal functions as follows:

The graphs of these sinusoidal functions can be seen in Figure 9.6.3. Notice how the four sinusoidal curves

intersect at and .

Figure 9.6.3 Sinusoidals in Hough space associated with points on a line.

Each point (x, y) at a feature pixel in the domain of the image maps to a sinusoidal function by the Houghtransform. If the feature point (xi, yi) of an image lies on a line in the xy-plane parameterized by (Á0, ¸0) (¸0,Á0), its corresponding representation as a sinusoidal curve in the Á¸-plane will intersect the point (Á0, ¸0).Also, the sinusoid Á = x cos (¸) + y sin(¸) will intersect (Á0, ¸0) only if the feature pixel location (x, y) lies onthe line (Á0, ¸0) in the xy-plane. Therefore, it is possible count the number of feature pixel points that lie alongthe line (Á0, ¸0) in the xy-plane by counting the number of sinusoidal curves in the Á¸ plane that intersect atthe point (Á0, ¸0). This observation is the basis of the Hough line detection algorithm which is described next.

Obviously, it is impossible to count the number of intersection of sinusoidal curves at every point in the Á¸-plane. The Á¸-plane for -R d Á d R, 0 d ¸ < À must be quantized. This quantization is represented as anaccumulator array a(i, j). Suppose that for a particular application it is decided that the Á¸-plane should be

quantized into an r × c accumulator array. Each column of the accumulator represents a increment in the

angle ¸. Each row in the accumulator represents a increment in Á. The cell location a(i, j) of the

accumulator is used as a counting bin for the point in the Á¸-plane (and thecorresponding line in the xy-plane).

Initially, every cell of the accumulator is set to 0. The value a(i, j) of the accumulator is incremented by 1 forevery feature pixel (x, y) location at which the inequality

is satisfied, where and µ is an error factor used to compensate forquantization and digitization. That is, if the point (Ái, ¸j) lies on the curve Á = x cos(¸) + y sin(¸) (within amargin of error), the accumulator at cell location a(i, j) is incremented. Error analysis for the Hough transformis addressed in Shapiro’s works [20, 21, 22].

When the process of incrementing cell values in the accumulator terminates, each cell value a(i, j) will beequal to the number of curves Á = x cos(¸) + y sin(¸) that intersect the point (Ái, ¸j) in the Á¸-plane. As wehave seen earlier, this is the number of feature pixels in the image that lie on the line (Ái, ¸j).

The criterion for a good line in the Hough algorithm sense is a large number of colinear points. Therefore, thelarger entries in the accumulator are assumed to correspond to lines in the image.


Let b � {0, 1}X be the source image and let the accumulator image a be defined over Y, where

Define the parametrized template by

The accumulator image is given by the image algebra expression

Computation of this variant template sum is computationally intensive and inefficient. A more efficientimplementation is given below.





















Search Tips

Advanced Search


Search this book:



For the c quantized values of ¸ in the accumulator, the computation of x cos(¸) + y sin(¸) is carried out for eachof the N feature pixel locations (x, y) in the image. Next, each of the rc cells of the accumulator are examinedfor high counts. The computational cost of the Hough algorithm is Nc + rc, or O(N). This is a substantialimprovement over the O(N3) complexity of the brute force approach mentioned at the beginning of thissection. This complexity comparison may be a bit misleading. A true comparison would have to take intoaccount the dimensions of the accumulator array. A smaller accumulator reduces computational complexity.However, better line detection performance can be achieved with a finer quantization of the Á¸-plane.

As presented above, the algorithm can increment more than one cell in the accumulator array for any choiceof values for the point (x, y) and angle ¸. We can insure that at most one accumulator cell will be increment bycalculating the unique value Á as a function of (x, y) and quantized ¸j as follows:

where [z] denotes the rounding of z to the nearest integer.

The equation for Á can be used to define the neighborhood function

by

The statement

computes the accumulator image since












A straight line l0 in the xy-plane can also be represented by a point (m0, c0), where m0 is the slope of l0 and c0

is its y intercept. In the original formulation of the Hough algorithm [16], the Hough transform took points inthe xy-plane to lines in the slope-intercept plane; i.e., h : (xi, yi) ’ yi = mxi + c. The slope interceptrepresentation of lines presents difficulties in implementation of the algorithm because both the slope and they intercept of a line go to infinity as the line approaches the vertical. This difficulty is not encountered usingthe Á¸-representation of a line.

As an example, we have applied to the Hough algorithm to the thresholded edge image of a causeway with abridge (Figure 9.6.4). The Á¸-plane has been quantized using the 41 × 20 accumulator seen in Table 9.6.1.Accumulator values greater than 80 were deemed to correspond to lines. Three values in the accumulatorsatisfied this threshold, they are indicated within the accumulator by double underlining. The three detectedlines are shown in Figure 9.6.5.

The lines produced by our example probably are not the lines that a human viewer would select. A finerquantization of ¸ and Á would probably yield better results. All the parameters for our example were chosenarbitrarily. No conclusions on the performance of the algorithm should be drawn on the basis of our example.It serves simply to illustrate an accumulator array. However, it is instructive to apply a straight edge to thesource image to see how the quantization of the Á¸-plane affected the accumulator values.

Figure 9.6.4 Source binary image.

Figure 9.6.5 Detected lines.

Table 9.6.1 Hough Space Accumulator Values

9.7. Detecting Ellipses Using the Hough Transform

The Hough algorithm can be easily extended to finding any curve in an image that can be expressedanalytically in the form f(x, p) = 0 [23]. Here, x is a point in the domain of the image and p is a parametervector. For example, the lines of Section 9.6 can be expressed in analytic form by letting g(x, p) = x cos(¸) + y

sin(¸) - Á, where p = (¸, Á) and . We will first discuss how the Hough algorithm extends forany analytic curve using circle location to illustrate the method.

The circle (x - Ç)2 + (y - È)2 = Á2 in the xy-plane with center (Ç, È) and radius Á can be expressed as f(x, p) =(x - Ç)2 + (y - È)2 - Á2 = 0, where p = (Ç, È, Á). Therefore, just as a line l0 in the xy-plane can parameterizedby an angle ¸0 and a directed distance Á0, a circle c0 in the xy-plane can be parametrized by the location of itscenter (x0, y0) and its radius Á0.

The Hough transform used for circle detection is a map defined over feature points in the domain of the imageinto the function space of conic surfaces. The Hough transform h used for circle detection is the map

Search Tips

Advanced Search


Search this book:


Note that the Hough transform is only applied to the feature points in the domain of the image. The surface (Ç- x0)2 + (È - y0)2 - Á2 = 0 is a cone in ÇÈÁ space with ÇÈ intercept (x0, y0, 0) (see Figure 9.7.1).

Figure 9.7.1 The surface (Ç - x0)2 + (È - y0)2 - Á2 = 0.

The point (xi, yi) lies on the circle in the xy-plane if and only if the conicsurface (Ç - xi)2 + (È - yi)2 - Á2 = 0 intersects the point (Ç0, È0, Á0) in ÇÈÁ-space. For example, the points (-1,0) and (1, 0) lie on the circle f((x, y), (0, 0, 1)) = x2 + y2 - 1 = 0 in the xy-plane. The Hough transform used forcircle detection maps these points to conic surfaces in ÇÈÁ-space as follows

These two conic surfaces intersect the point (0, 0, 1) in ÇÈÁ space (see Figure 9.7.2).

Figure 9.7.2 The two conics corresponding to the two points (-1, 0) and (1, 0) on the circle x2 + y2 = 1 underthe Hough transform h.

The feature point (xi, yi) will lie on the circle (Ç0, È0, Á0) in the xy-plane if and only if its image under the












Hough transform intersects the point (Ç0, È0, Á0) in ÇÈÁ-space. More generally, the point xi will lie on thecurve f(x, p0) = 0 in the domain of the image if and only if the curve f(xi, p) = 0 intersects the point p0 in theparameter space. Therefore, the number of feature points in the domain of the image that lie on the curve f(x,p0) = 0 can be counted by counting the number of elements in the range of the Hough transform that intersectp0.

As in the case of line detection, the parameter space must be quantized. The accumulator matrix is therepresentation of the quantized parameter space. For circle detection the accumulator a will be athree-dimensional matrix with all entries initially set to 0. The entry a(Çr, Ès, Át) is incremented by 1 forevery feature point (xi, yi) in the domain of the image whose conic surface in ÇÈÁ-space passes through (Çr,Ès, Át). More precisely, a(Çr, Ès, Át) is incremented provided

where µ is used to compensate for digitization and quantization. Shapiro [20, 21, 22] discusses error analysiswhen using the Hough transform. If the above inequality holds, it implies that the conic surface (Ç - xi)2 + (È -yi)2 - Á = 0 passes trough the point (Çr, Ès, Át) (within a margin of error) in ÇÈÁ space. This means the point(xi, yi) lies on the circle (x - Çr)2 + (y - Ès)2 - Át = 0 in the xy-plane, and thus the accumulator value a(Çr, Ès,Át) should be incremented by 1.





















Search Tips

Advanced Search


Search this book:


Circles are among the most commonly found objects in images. Circles are special cases of ellipses, andcircles viewed at an angle appear as ellipses. We extend circle detection to ellipse detection next. The methodfor ellipse detection that we will be discussing is taken from Ballard [23].

The equation for an ellipse centered at (Ç, È) with axes parallel to the coordinate axes (Figure 9.7.3) is

Differentiating with respect to x, the equation becomes

which gives us

Substituting into the original equation for the ellipse yields

Solving for È we get












It then follows by substituting for È in the original equation for the ellipse that

For ellipse detection we will assume that the original image has been preprocessed by a direction edgedetector and thresholded based on edge magnitude (Chapters 3 and 4). Therefore, we assume that an edgedirection image d � [0, 2À)X exist, where X is the domain of the original image. The direction d(x, y) is thedirection of the gradient at the point (x, y) on the ellipse. The tangent to the ellipse at

. Since the gradient is perpendicular to the tangent, the following holds:

Figure 9.7.3 Parameters of an ellipse.

Recall that so far we have only been considering the equation for an ellipse whose axes are parallel to the axesof the coordinate system. Different orientations of the ellipse corresponding to rotations of an angle ¸ about (Ç, È) can be handled by adding a fifth parameter ¸ to the descriptors of an ellipse. This rotation factor

manifests itself in the expression for , which becomes

With this edge direction and orientation information we can write Ç and È as

and

respectively.

The accumulator array for ellipse detection will be a five-dimensional array a. Every entry of a is initially setto zero. For every feature point (x, y) of the edge direction image, the accumulator cell a(Çr, Ès, ¸t, ±u, ²v) isincremented by 1 whenever

and

Larger accumulator entry values are assumed to correspond to better ellipses. If an accumulator entry isjudged large enough, its coordinates are deemed to be the parameters of an ellipse in the original image.

It is important to note that gradient information is used in the preceding description of an ellipse. As aconsequence, gradient information is used in determining whether a point lies on an ellipse. Gradientinformation shows up as the term

in the equations that were derived above. The incorporation of gradient information improves the accuracyand computational efficiency of the algorithm. Our original example of circle detection did not use gradientinformation. However, circles are special cases of ellipses and circle detection using gradient informationfollows immediately from the description of the ellipse detection algorithm.


The input image b = (c, d) for the Hough algorithm is the result of preprocessing the original image by adirectional edge detector and thresholding based on edge magnitude. The image c � {0, 1}X is defined by

The image d � [0, 2À)X contains edge direction information.

Let be the accumulator image, where

Let C(x, y, Ç, È, ¸, ±, ², µ1, µ2) denote the condition

Define the parameterized template t by

The accumulator array is constructed using the image algebra expression

Similar to the implementation of the Hough transform for line detection, efficient incrementing ofaccumulator cells can be obtained by defining the neighborhood function N : Y ’ 2X by

The accumulator array can now be computed by using the following image algebra pseudocode:





















Search Tips

Advanced Search


Search this book:


9.8. Generalized Hough Algorithm for Shape Detection

In this section we show how the Hough algorithm can be generalized to detect non-analytic shapes. Anon-analytic shape is one that does not have a representation of the form f(x, p) = 0 (see Section 9.7). Thelack of an analytic representation for the shape is compensated for by the use of a lookup table. This lookuptable is often referred to as the shape’s R-table. We begin our discussion by showing how to construct anR-table for an arbitrary shape. Next we show how the R-table is used in the generalized Hough algorithm. Theorigin of this technique can be found in Ballard [23].

Let S be an arbitrary shape in the domain of the image (see Figure 9.8.1). Choose any point (Ç, È) off theboundary of S to serve as a reference point for the shape. This reference point will serve to parameterize the

shape S(Ç, È). For any point on the boundary of S(Ç, È) let be the angle of the gradient at (x, y)

relative to the xy coordinate axis. The vector from (x, y) to (Ç, È) can be expressed as amagnitude-direction pair (r(x, y), ±(x, y)), where r(x, y) is the distance from (x, y) to (Ç, È) and ±(x, y) is the

angle makes with the positive x axis.

The R-table is indexed by values of gradient angles for points on the boundary of S(Ç, È). For each

gradient angle in the R-table, there corresponds a set of (r, ±) magnitude-angle pairs. The pair (r, ±)

is an element of if and only if there exists a point (x, y) on the boundary of S(Ç, È) whose gradient

direction is and whose vector has magnitude r and direction ±. Thus,

which we will shorten for notational ease to

where . Table 9.8.1 illustrates the structure of an R-table. It provides a representationthe non-analytic shape S(Ç, È) that can be used in the generalized Hough algorithm.












Figure 9.8.1 Non-analytic shape representation.

Table 9.8.1 The Structure of an R-Table

If (x, y) � �S(Ç, È) there is a , 1 d i d K, in the R-table such that . For that there is a

magnitude-angle pair in such that

and

An important observation to make about the R-table is that by indexing the gradient direction, gradientinformation is incorporated into the shape’s description. As in the case of ellipse detection (Section 9.7), thiswill make the Hough algorithm more accurate and more computationally efficient.

As it stands now, the shape S(Ç, È) is parameterized only by the location of its reference point. The R-tablefor S(Ç, È) can only be used to detect shapes that are the result of translations of S(Ç, È) in the xy plane.

A scaling factor can be accommodated by noting that a scaling of S(Ç, È) by s is represent by scaling all the

vectors in each of the , 1 d i d K, by s. If is the set of

vectors that correspond to for the original shape, then

correspond to for the scaled shape. Thus, information from the R-table is easily adapted to take care ofdifferent scaling parameters.

A rotation of the shape through an angle ¸ is also easily represented through adaptation of R-table information.

Let be the set of vectors that correspond to in an R-table for the rotated shape. The set

is equal to the set of vectors in of the original shape rotated through

the angle ¸ mod 2À. More precisely, if and , then

. Therefore, information from an R-table representing the shapeS(Ç, È) parameterized only by the location of its reference point can be adapted to represent a more general

shape parameterized by reference point, scale, and orientation.

Having shown how an R-table is constructed, we can discuss how it is used in the generalized Houghalgorithm. The input for the generalized Hough algorithm is an edge image with gradient information that has

been thresholded based on edge magnitude. Let d(x, y) denote the direction of the gradient at edge point (x, y).





















Search Tips

Advanced Search


Search this book:


The entries of the accumulator are indexed by a quantization of the shape’s ÇÈs¸ parameter space. Thus, theaccumulator is a four-dimensional array with entries

Entry a(Çs, Èt, su, ¸w) is the counting bin for points in the xy plane that lie on the boundary of the shape S(Çs,Èt, su, ¸w). Initially all the entries of the accumulator are 0.

For each feature edge pixel (x, y) add 1 to accumulator entry a(Çs, Èt, su, ¸w) if the following condition,denoted by C(x, y, Çs, Èt, su, ¸w, µ1, µ2, µ3), holds. For positive numbers µ1, µ2, and µ3, C(x, y, Çs, Èt, su, ¸w, µ1,µ2, µ3) is satisfied if there exists a 1 d k d Nj such that

and

when . The µ1, µ2, and µ3 are error tolerances to allow forquantization and digitization.

Larger counts at an accumulator cell mean a higher probability that the indexes of the accumulator cell are theparameters of a shape in the image.


The input image b = (c, d) for the generalized Hough algorithm is the result of preprocessing the originalimage by a directional edge detector and thresholding based on edge magnitude. The image c � {0, 1}X isdefined by












The image d � [0, 2À)X contains edge gradient direction information. The accumulator a is defined over thequantized parameter space of the shape

Define the template t by

The set is the set from the R-table that corresponds to the gradient angle . The accumulator isconstructed with the image algebra statement

Again, in actual implementation, it will be more efficient to construct a neighborhood function N : Y ’ 2X

satisfying the above parameters in a similar fashion as the construction of the neighborhood function in theprevious section (Section 9.7) and then using the three-line algorithm:

9.9. References



3 W. Rudin, Real and Complex Analysis. New York, NY: McGraw-Hill, 1974.

4 A. Oppenheim and J. Lim, “The importance of phase in signals,” IEEE Proceedings, vol. 69, pp.529-541, 1981.

5 Q. Chen, M. Defrise, and F. Deconick, “Symmetric phase-only matched filtering of Fourier-Mellintransforms for image registration and recognition,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 16, pp. 1156-1168, July 1994.

6 V. K. P. Kumar and Z. Bahri, “Efficient algorithm for designing a ternary valued filter yieldingmaximum signal to noise ratio,” Applied Optics, vol. 28, no. 10, 1989.

7 O. K. Ersoy and M. Zeng, “Nonlinear matched filtering,” Journal of the Optical Society of America,vol. 6, no. 5, pp. 636-648, 1989.

8 J. Horver and P. Gianino, “Pattern recognition with binary phase-only filters,” Applied Optics, vol.24, pp. 609-611, 1985.

9 D. Flannery, J. Loomis, and M. Milkovich, “Transform-ratio ternary phase-amplitude filterformulation for improved correlation discrimination,” Applied Optics, vol. 27, no. 19, pp. 4079-4083,1988.

10 D. Casasent and D. Psaltis, “Position, rotation and scale-invariant optical correlation,” AppliedOptics, vol. 15, pp. 1793-1799, 1976.

11 D. Casasent and D. Psaltis, “New optical transform for pattern recognition,” Proceedings of theIEEE, vol. 65, pp. 77-84, 1977.

12 Y. Hsu, H. Arsenault, and G. April, “Rotation-invariant digital pattern recognition using circularharmonic expansion,” Applied Optics, vol. 21, no. 22, pp. 4012-4015, 1982.

13 Y. Sheng and H. Arsenault, “Experiments on pattern recognition using invariant Fourier-Mellindescriptors,” Journal of the Optical Society of America, vol. 3, no. 6, pp. 771-776, 1986.

14 E. D. Castro and C. Morandi, “Registration of translated and rotated images using finite Fouriertransforms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, no. 5, pp.700-703, 1987.


16 P. Hough, “Method and means for recognizing complex patterns,” Dec. 18 1962. U.S. Patent3,069,654.

17 R. Duda and P. Hart, “Use of the Hough transform to detect lines and curves in pictures,”Communications of the ACM, vol. 15, no. 1, pp. 11-15, 1972.

18 E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities. New York: Academic Press,1990.

19 A. Rosenfeld, Picture Processing by Computer. New York: Academic Press, 1969.

20 S. D. Shapiro, “Transformations for the noisy computer detection of curves in noisy pictures,”Computer Graphics and Image Processing, vol. 4, pp. 328-338, 1975.

21 S. D. Shapiro, “Properties of transforms for the detection of curves in noisy image,” ComputerGraphics and Image Processing, vol. 8, pp. 219-236, 1978.

22 S. D. Shapiro, “Feature space transforms for curve detection,” Pattern Recognition, vol. 10, pp.129-143, 1978.

23 D. Ballard, “Generalizing the Hough transform to detect arbitrary shapes,” Pattern Recognition,vol. 13, no. 2, pp. 111-122, 1981.






















Search Tips

Advanced Search


Search this book:


Chapter 10Image Features and Descriptors

10.1. Introduction

The purpose of this chapter is to provide examples of image feature and descriptor extraction in an imagealgebra setting. Image features are useful extractable attributes of images or regions within an image.Examples of image features are the histogram of pixel values and the symmetry of a region of interest.Histograms of pixel values are considered a primitive characteristic, or low-level feature, while suchgeometric descriptors as symmetry or orientation of regions of interest provide examples of high-levelfeatures. Significant computational effort may be required to extract high-level features from a raw gray-valueimage.

Generally, image features provide ways of describing image properties, or properties of image components,that are derived from a number of different information domains. Some features, such as histograms andtexture features, are statistical in nature, while others, such as the Euler number or boundary descriptors ofregions, fall in the geometric domain. Geometric features are key in providing structural descriptors ofimages. Structural descriptions of images are of significant interest. They are an important component ofhigh-level image analysis and may also provide storage-efficient representations of image data. The objectiveof high-level image analysis is to interpret image information. This involves relating image gray levels to avariety of non-image-related knowledge underlying scene objects and reasoning about contents and structuresof complex scenes in images. Typically, high-level image analysis starts with structural descriptions ofimages, such as a coded representation of an image object or the relationships between objects in an image.The descriptive terms “inside of” and “adjacent to” denote relations between certain objects. Representationand manipulation of these kinds of relations is fundamental to image analysis. Here one assumes that apreprocessing algorithm has produced a regionally segmented image whose regions represent objects in theimage, and considers extracting the relation involving the image regions from the regionally segmentedimage.












10.2. Area and Perimeter

Area and perimeter are commonly used descriptors for regions in the plane. In this section image algebraformulations for the calculation of area and perimeter are presented.


Let R denote the region in whose points have pixel value 1. One way to calculate the area A ofR is simply to count the number of points in R. This can be accomplished with the image algebra statement

The perimeter P of R may be calculated by totaling the number of 0-valued 4-neighbors for each point in R.The image algebra statement for the perimeter of R using this definition is given by

where the templates t1 and t2 are defined pictorially as


The formulas for area and perimeter that follow have been adapted from Pratt [1]. They provide more accuratemeasures for images of continuous objects that have been digitized. Both the formula for area and perimeteruse the template t defined below.

Let b := a•t. The alternate formulations for area and perimeter are given by

and

respectively.

10.3. Euler Number

The Euler number is a topological descriptor for binary images. It is defined to be the number of connectedcomponents minus the number of holes inside the connected components [2].

There are two Euler numbers for a binary image a, which we denote e4(a) and e8(a). Each is distinguished bythe connectivity used to determine the number of feature pixel components and the connectivity used todetermine the number of non-feature pixel holes contained within the feature pixel connected components.The Euler number of a binary image is referenced by the connectivity used to determine the number of featurepixel components.

The 4-connected Euler number, e4(a), is defined to be the number of 4-connected feature pixel componentsminus the number of 8-connected holes, that is

e4(a) = c4(a) - h8(a).

Here, c4(a) denotes the number of 4-connected feature components of a and h8(a) is the number of8-connected holes within the feature components.

The 8-connected Euler number, e8(a), of a is defined by

e8(a) = c8(a) - h4(a),

where c8(a) denotes the number of 8-connected feature components of a and h4(a) is the number of4-connected holes within the feature components.

Table 10.3.1 shows Euler numbers for some simple pixel configuration. Feature pixels are black.

Table 10.3.1 Examples of Pixel Configurations and Euler Numbers

No. a c4(a) c8(a) h4(a) h8(a) e4(a) e8(a)

1. 1 1 0 0 1 1

2. 5 1 0 0 5 1

3. 1 1 1 1 0 0

4. 4 1 1 0 4 0

5. 2 1 4 1 1 -3

6. 1 1 5 1 0 -4

7. 2 2 1 1 1 1





















Search Tips

Advanced Search


Search this book:



Let a � {0,1}x be a Boolean input image and let b : a•t where is the template represented inFigure 10.3.1. The 4-connected Euler number is expressed as

and the 8-connected Euler number is given by

Figure 10.3.1 Pictorial representation of the template used for determining the Euler number.

The image algebra expressions of the two types of Euler numbers were derived from the quad patternformulations given in [1, 3].


The formulation for the Euler numbers could also be expressed using lookup tables. Lookup tables may bemore efficient. The lookup table for the 4-connected Euler number is












and the lookup table for the 8-connected Euler number is

The 4-connected and 8-connected Euler numbers can then be evaluated using lookup tables via theexpressions

respectively.

10.4. Chain Code Extraction and Correlation

The chain code provides a storage-efficient representation for the boundary of an object in a Boolean image.The chain code representation incorporates such pertinent information as the length of the boundary of theencoded object, its area, and moments [4, 5]. Additionally, chain codes are invertible in that an object can bereconstructed from its chain code representation.

The basic idea behind the chain code is that each boundary pixel of an object has an adjacent boundary pixelneighbor whose direction from the given boundary pixel can be specified by a unique number between 0 and7. Given a pixel, consider its eight neighboring pixels. Each 8-neighbor can be assigned a number from 0 to 7representing one of eight possible directions from the given pixel (see Figure 10.4.1). This is done with thesame orientation throughout the entire image.

Figure 10.4.1 The 8-neighborhood of x and the associated eight directions.

The chain code for the boundary of a Boolean image is a sequence of integers, c = {c(0), c(1), …,c(n-1)},

from the set {0, 1, …, 7}; i.e., . The number of elements in the sequencec is called the length of the chain code. The elements c(0) and c(n - 1) are called the initial and terminal pointof the code, respectively. Starting at a given base point, the boundary of an object in a Boolean image can betraced out using the head-to-tail directions that the chain code provides. Figure 10.4.2 illustrates the process oftracing out the boundary of the SR71 by following direction vectors. The information of Figure 10.4.2 is then“flattened” to derive the chain code for its boundary. Suppose we choose the topmost left feature pixel ofFigure 10.4.2 as the base point for the boundary encoding. The chain code for the boundary of the SR71 is thesequence

7, 6, 7, 7, 0,…, 1, 1, 1.

Given the base point and the chain code, the boundary of the SR71 can be completely reconstructed. Thechain code is an efficient way of storing boundary information because it requires only three bits (23 = 8) todetermine any one of the eight directions.

The chain code correlation function provides a measure of similarity in shape and orientation between two

objects via their chain codes. Let and be two chains of length m and n,

respectively, with m d n. The chain code correlation function is defined by

The value of j at which Ácc2 takes on its maximum is the offset of the segment of c2 that best matches c. Thecloser the maximum is to 1 the better the match.


Chain Code Extraction

In this section we present an image algebra formulation for extracting the chain code from an 8-connectedobject in a boolean image. This algorithm is capable of extracting the chain code from objects typicallyconsidered to be poor candidates for chain coding purposes. Specifically, this algorithm is capable of copingwith pinch points and feelers (as shown in Figure 10.4.3).

Figure 10.4.2 Chain code directions with associated direction numbers.

Let a � {0, 1}x be the source image. The chain code extraction algorithm proceeds in three phases. The firstphase uses the census template t shown in Figure 10.4.4 together with the linear image-template productoperation • to assign each point x � X a number between 0 and 511. This number represents the configurationof the pixel values of a in the 8-neighborhood about x. Neighborhood configuration information is stored in

the image , where

n := a •t.

This information will be used in the last phase of the algorithm to guide the selection of directions during theextraction of the chain code.

Figure 10.4.3 Boolean object with a pinch point and feelers.

Figure 10.4.4 Census template used for chain code extraction.

The next step of the algorithm extracts a starting point (the initial point) on the interior 8-boundary of theobject and provides an initial direction. This is accomplished by the statement:

Here we assume the lexicographical (row scanning) order. Hence, the starting point x0 is the lexicographicalminimum of the point set domain(a||1). For a clockwise traversal of the object’s boundary the startingdirection d is initially set to 0.

There are three functions , and that are used in the final phase of the algorithm. The function bk(i) returns the kth bit in the binaryrepresentation of the integer i. For example, b2(6) = 1, b1(6) = 1, and b0(6) = 0.

Given a direction, the function ´ provides the increments required to move to the next point along the path ofthe chain code in X. The values of ´ are given by

´(0) = (0, 1),´(1) = (-1, 1),

´(2) = (-1, 0),´(3) = (-1, -1),´(4) = (0, -1),´(5) = (1, -1),´(6) = (1, 0), and´(7) = (1, 1).





















Search Tips

Advanced Search


Search this book:


The key to the algorithm is the function f. Given a previous direction d and a neighborhood characterizationnumber c = n(x), the function f provides the next direction in the chain code. The value of f is computed usingthe following pseudocode:

f := -1i := 0while f = -1 loop

f := (d + 2 - i) mod 8if bf(c = 0 then)

f := -1end ifi := i + 1

end loop.

Note that f as defined above is designed for a clockwise traversal of the object’s boundary. For acounterclockwise traversal, the line

f := (d + 2 - i) mod 8

should be replaced by

f := (d + 6 + i) mod 8.

Also, the starting direction d for the algorithm should be set to 5 for the counterclockwise traversal.

Let a � {0, 1}x be the source image and let be the image variable used to store the chain code.Combining the three phases outlined above we arrive at the following image algebra pseudocode for chaincode extraction from an 8-connected object contained in a:

n := a •tx := x0

d := 0












d0 := f(d, n(x0))

i := 0loop

d := f(d, n(x))if d = d0 and x = x0 then

breakend ifc(i) := dx := x + ´(d)i := i + 1

end loop.

In order to reconstruct the object from its chain code, the algorithm below can be used to generate the object’sinterior 8-boundary.

a := 0a(x0 := 1)

x := x0 + ´(c(0))

for i in 1..n - 1 loopa(x) := 1x := x + ´(c(i))

end loop.

The region bounded by the 8-boundary can then be filled using one of the hole filling algorithms of Section6.6.

Chain Code Correlation

Let and be two chains where m d n. Let where is the

function defined by fj(i) = (i + j) mod n. The image that represents the chain code correlationbetween c and c2 is given by


The image algebra pseudocode algorithm presented here was developed by Robert H. Forsman at the Centerfor Computer Vision and Visualization of the University of Florida. To improve the performance of the chaincode extraction algorithm, the function f used for chain code extraction should be implemented using a lookuptable. Such an implementation is given in Ritter [6].

10.5. Region Adjacency

A region, for the purposes of this section, is a subset of the domain of whose points are all mappedto the same (or similar) pixel value by a. Regions will be indexed by the pixel value of their members, that is,region Ri is the set {x � X : a(x) = i}.

Adjacency of regions is an intuitive notion, which is formalized for by the adj relation.

Here d denotes the Euclidean distance between x = (x1,x2) and y = (y1,y2); i.e.,

. In other words, Ri and Rj are adjacent if and only if there is a point inRi and a point in Rj that are a distance of one unit apart.

The adjacency relation, denoted by adj, can be represented by a set of ordered pairs, an image, or a graph. Theordered pair representation of the adj relation is the set

The adj relation is symmetric (Ri adj Rj Ò Rj adj Ri). Hence, the above set contains redundant informationregarding the relation. This redundancy is eliminated by using the set {(i,j) : i > j and Ri adj Rj} to representthe adjacency relation.

The image b � {0,1}Y defined by

where Y = {(i,j) : 2 d i d ¦a, 1 d j < i} is the image representation of the adjacency relation among the regionsin X.

Search Tips

Advanced Search


Search this book:


Region numbers form the vertices of the graph representation of the adj relation; edges are the pairs (i,j)where Ri adj Rj. Thus, there are three easy ways to represent the notion of region adjacency. As an example,we provide the three representations for the adjacency relation defined over the regions in Figure 10.5.1.

Figure 10.5.1 Albert the alligator (Region 1) and environs.

(a) Ordered pair representation —

adj = {(2, 1), (3, 2), (4, 2), (5, 2)(5, 4), (6, 1), (6, 2), (6, 5)}

(b) Image representation —

i \ j 1 2 3 4 5 612 13 0 14 0 1 05 0 1 0 16 1 1 0 0 1

(c) Graph representation —












Let be the input image, where X is an m × n array. In order for the adjacency relation to bemeaningful, the image a in most applications will be the result of preprocessing an original image with sometype of segmentation technique; e.g., component labeling (Sections 6.2 and 6.3).

The adjacency relation for regions in X will be represented by the image b � {0,1}Y, where Y = {(i,j) : 2 d i d

Va, 1 d j < i}. The parameterized template used to determine of adjacency is defined by

The adjacency representation image is generated by the image algebra statement

For a fixed point (x0,y0) of X, is an image in {0,1}Y. If regions Ri and Rj “touch” at the point

(x0,y0), is equal to 1; otherwise it is 0. If regions Ri and Rj touch at any point in X, themaximum

will take on value 1; otherwise it will be 0. Therefore,

is equal to 1 if Ri and Rj are adjacent and 0 otherwise. The ordered pair representation of the adjacencyrelation is the set

domain(b||=1).


In the definition of adjacency above, two regions are adjacent if there is a point in one region and a point inthe other region that are unit distance apart. That is, each point is in the 4-neighborhood of the other. Anotherdefinition of adjacency is that two regions are adjacent if they touch along a diagonal direction. With thesecond definition two regions are adjacent if there is in each region a point which lies in the 8-neighborhoodof the other. Another way to express the second adjacency relation is that there exist points in each region that

are at distance of each other.

If the second definition of adjacency is preferred, the template below should replace the template used in theoriginal image algebra formulation.

The image algebra formulation of the algorithm presented above was first formulated in Ritter [6] and Shi andRitter [7].

10.6. Inclusion Relation

The inclusion relation describes the hierarchical structure of regions in an image using the relation “inside of.”Regions are defined as they were in Section 10.5. The inclusion relation can be represented by a set of orderedpairs, a directed graph, or an image. As an example, we present representations for the image shown in Figure10.6.1.

Figure 10.6.1 Inclusion of smaller fishes inside of a bigger fish.

(a) Ordered pairs — The ordered pair (i,j) is a element of the inside of relation iff region Ri is inside ofregion Rj. For Figure 10.6.1 the inside of relation is represented by the set of ordered pairs

{(6, 2), (6, 1), (5, 3), (5, 1), (4, 2), (4, 1), (3, 1),(2, 1), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}.

(b) Directed graph — The vertices of the graph representation of the inclusion relation are regionnumbers. The directed edge (i,j) with tail at i and head at j is an element of the graph iff Ri inside of Rj.The directed graph representation of the inclusion relation is shown below. The inside of relation isreflexive,

thus in the graph above each region is assumed to have an edge that points back to itself.

(c) Image — The inclusion is represented by the image a defined by

The following is the image representation of the inclusion relation for Figure 10.6.1.

i \ j 1 2 3 4 5 61 1 0 0 0 0 02 1 1 0 0 0 03 1 0 1 0 0 04 1 1 0 1 0 05 1 0 1 0 1 06 1 1 0 0 0 1

Note that regions are sets; however, the inside of relation and set inclusion † relation are distinct concepts.Two algorithms used to describe the inclusion relation among regions in a image are presented next.

Search Tips

Advanced Search


Search this book:



The first image algebra formulation bases its method on the tenet that Ri inside of Rj iff the domain of Ri iscontained (†) in the holes of Rj. The algorithm begins by making binary images ri = Ç=i(a) and rj = Ç=j(a) torepresent the regions Ri and Rj, respectively, of the source image a. Next, one of the hole filling algorithms of

Section 6.6 is used to fill the holes in rj to create the filled image . Region Ri is inside of RI iff

.

Let be the source image, where . Let , where k = ¦a.

We shall construct an image that will represent the inclusion relation. Initially, all pixel values ofb are set equal to 1. The image algebra pseudocode used to create the image representation b of the inclusionrelation among the regions of a is given by












The subroutine holefill is an implementation of one of the hole filling algorithms from Section 6.6.


The algorithm above requires applications of the hole filling algorithm. Each application of the holefilling algorithm is, in itself, computationally very expensive. The alternate image algebra formulationassumes that regions at the same level of the inside of graph are not adjacent. This assumption will allow forgreater computational efficiency at the cost of loss of generality.

The alternate image algebra formulation makes its determination of inclusion based on pixel value transitionsalong a row starting from a point (x0,y0) inside a region and moving in a direction toward the image’sboundary. More specifically, if the image a is defined over an m × n grid, the number of transitions fromregion Ri to region Rj starting at (x0,y0) moving along a row toward the right side of the image is given by

card{k : d k d N - y0 and (i,y) = (a(x0,y0 + k - 1),a(x0,y0 + k))}.

For example, in sequence

(x0,y0) (x0,n)

a(x,y) 3 4 4 6 6 4 4 6 6 7 7 1k 1 2 3 4 5 6 7 8 9 10 11 12

there are two transitions from 4 to 6 and one transition from 6 to 4.

It follows from elementary plane topology that if Ri is inside of Rj then starting from a point in Ri the numberof transitions from i to j is greater than the number of transitions from j to i. The image algebra formulationthat follows bases its method on this fact.

Let , where be the source image. Let I be the ordered pair representation of

the inclusion relation. For , define the parameterized template by

Let be the reflection f(x,y) = (y,x). The alternate image algebra formulation used to generatethe ordered pair representation of the inclusion relation for regions in a is

Here denotes the composition of h and f, and . A niceproperty of this algorithm is that one execution of the for loop finds all the regions that contain the regionwhose index is the current value of the loop variable. The first algorithm presented required nested for loops.

Note that for each point is an image whose value at (i,j) indicatewhether the line crosses the boundary from Ri to Rj at (x0,y0 + k - 1). Thus,

is the total number of boundary crossings from Ri to Rj,and h(i,j) > h(j,i) implies Ri inside of Rj.


Both formulations, which were first presented in Shi and Ritter [7], produce a preordering on the set ofregions. That is, the inside of relation is reflexive and transitive. The first formulation is not a partial orderingbecause it fails to be antisymmetric. To see this consider an image that consists of two regions, one of blackpixels and the other of white pixels that together make a checkerboard pattern of one pixel squares. Under itsassumption that regions at the same level are not adjacent, the second formulation does produce a partialordering.





















Search Tips

Advanced Search


Search this book:


10.7. Quadtree Extraction

A quadtree is a hierarchical representation of a 2n × 2n binary image based on successive partition of theimage into quadrants. Quadtrees provide effective structural descriptions of binary images, and the operationson quadtrees are simple and elegant. Gargantini [8] introduced a new representation of a quadtree called linearquadtree. In a linear quadtree, he encodes each black node (foreground node) with a quaternary integer whosedigits reflect successive quadrant subdivisions. He needs a special marker to encode big nodes with more thanone pixel. Here, we encode a quadtree as a set of integers in base 5 as follows:

Each black node is encoded as an n-digit integer in base 5. Each successive digit represents the quadrantsubdivision from which it originates. At the kth level, the NW quadrant is encoded with 1 at the kth digit, theNE with 2, the SW with 3, and the SE with 4. For a black node at the kth level, its last n - k digits are encodedas 0’s.

Figure 10.7.1 A binary image and its quadtree.

Figure 10.7.2 Codes of black pixels.

To obtain the set of quadtree codes for a 2n × 2n binary image, we assign the individual black pixels of theimage their corresponding quadtree codes, and work upwards to merge 4 codes representing 4 black nodes ata lower level into a code representing the black node consisting of those 4 black nodes if possible. Themerging process can be performed as follows: whenever 4 black nodes at the (k + 1)th level are encoded as












, we merge those codes into which represents a kth-levelblack node consisting of the 4 black nodes at the (k + 1)th level.This scheme provides an algorithm totransforms a 2n × 2n binary image into its quadtree representation.


Let the source image be an image on X = {(i,j) : 0 d i,j d 2n - 1}. Define an image by

where

and

Next, define the template t(k) as follows:

Finally, define two functions f1 and f2 by

f1(y1,y2) = (2y1,2y2)

and

Now, if Q denotes the set of quadtree codes to be constructed, then Q can be obtained by using the followingalgorithm.

In the above algorithm, which first appeared in Shi [7], ak is a 2k × 2k image in which any positive pixelencodes a kth-level black node which may be further merged to a big node at the (k - 1)th level. The image bk

is a 2k-1 × 2k-1 image in which any positive pixel encodes a black node merged from the corresponding four

black nodes at the kth level. The image ck is a 2k × 2k binary image in which any pixel with value 1 indicatesthat the corresponding node cannot be further merged. Thus, a positive pixel of ak · ck encodes a kth levelblack node which cannot be further merged.

10.8. Position, Orientation, and Symmetry

In this section, image algebra formulations for position, orientation, and symmetry, are presented. Position,orientation, and symmetry are useful descriptors of objects within images [9, 10].

Let be an image that represents an object consisting of positive-valued pixels that is set against abackground of 0-valued pixels. Position refers to the location of the object in the plane. The objects’s centroid

(or center of mass) is the point that is used to specify its position. The centroid is the point whosecoordinates are given by

For the digital image the centroid’s coordinates are given by

Figure 10.8.1 shows the location of the centroid for an SR71 object.

Orientation refers to how the object lies in the plane. The object’s moment of inertia is used to determine its

angle of orientation. The moment of inertia about the line with slope tan ¸ passing through is definedas

The angle ¸min that minimizes M¸ is the direction of the axis of least inertia for the object. If ¸min is unique, thenthe line in the direction ¸min is the principal axis of inertia for the object. For a binary image the principal axisof inertia is the axis about which the object appears elongated. The principal axis of inertia in the direction ¸ isshown for the SR71 object in Figure 10.8.1. The line in the direction ¸max is the axis about which the momentof inertia is greatest. In a binary image, the object is widest about this axis. The angles ¸min and ¸max areseparated by 90°.

Let

Figure 10.8.1 Centroid and angle ¸ of orientation.

The moment of inertia about the line in the direction ¸ can be written as

M¸ = m20sin2¸ - 2m11sin¸ cos¸ + m02cos2¸.

The moment of inertia is minimized by solving

for critical values of ¸. Using the identity the search for critical values leads to thequadratic equation

Solving the quadratic equations leads to two solutions for tan ¸, and hence two angles ¸min and ¸max. These arethe angles of the axes about which the object has minimum and maximum moments of inertia, respectively.Determining which solution of the quadratic equation minimizes (or maximizes) the moment of inertiarequires substitution back into the equation for M¸.





















Search Tips

Advanced Search


Search this book:


For the digital image the central moments of order 2 are given by

Symmetry is the ratio, , of the minimum moment of inertia to the maximum moment ofinertia. For binary images symmetry is a rough measure of how “elongated” an object is. For example, a circlehas a symmetry ratio equal to 1; a straight line has a symmetry ratio equal to 0.


Let , where , be an image that represents an object. Let and

be the projection images defined by

p1(x,y) = x

p2(x,y) = y.

The image algebra formulations for the coordinates of the centroid are












The image algebra formulas for the central moments of order 2 used to compute the angle of orientation aregiven by

10.9. Region Description Using Moments

Moment invariants are image statistics that are independent of rotation, translation, and scale. Momentinvariants are uniquely determined by an image and, conversely, uniquely determine the image (modulusrotation, translation, and scale). These properties of moment invariants facilitate pattern recognition in thevisual field that is independent of size, position, and orientation. (See Hu [11] for experiments using momentinvariants for pattern recognition.)

The moments invariants defined by Hu [11, 12] are derived from the definitions of moments, centralizedmoments, and normalized central moments. These statistics are defined as follows:

Let f be a continuous function defined over . The moment of order (p,q) of f is defined by

where p,q � {0,1,2,…}. It has been shown [13] that if f is a piecewise continuous function with boundedsupport, then moments of all orders exist. Furthermore, under the same conditions on f, mpq is uniquelydetermined by f, and f is uniquely determined by mpq.

The central moments of f are defined by

where

The point is called the image centroid. The center of gravity of an object is the physical analogue ofthe image centroid.

Let , where is an m × n array. The discrete counterpart of the centralized moment oforder (p,q) is given by

The normalized central moment, ·pq, is defined by

We now present the seven, Æ1,Æ2,…,Æ7, moment invariants developed in Hu [11]. For the continuous case,these values are independent of rotation, translation, and scaling. In the discrete case, some aberrations mayexists due to the digitization process.

Tale 10.9.1 lists moment invariants for transformations applied to the binary image of the character “A.”There is some discrepancy within the Æi due to digitization.

Table 10.9.1 Moment Invariants of an Image under Rigid-Body TransformationsÆ1 Æ1 Æ1 Æ1 Æ1 Æ1 Æ1

0.472542 0.001869 0.059536 0.005188 -0.000090 0.000223 0.000018

0.472452 0.001869 0.059536 0.005188 -0.000090 0.000223 0.000015

0.472452 0.001869 0.059536 0.005188 -0.000090 0.000223 0.000018

0.321618 0.000519 0.016759 0.001812 -0.000009 0.000040 0.000004

0.453218 0.001731 0.058168 0.003495 -0.000047 0.000145 0.000031


Let , where is an m × n array. Let denote the ith coordinate projection.The image algebra formulation of the (p,q)th moment is

The (p,q)th central moment is given by

where and are the coordinates of the image centroid. Computation of the invariantmoments follow immediately from their mathematical formulations.

10.10. Histogram

The histogram of an image is a function that provides the frequency of occurrence for each intensity level inthe image. Image segmentation schemes often incorporate histogram information into their strategies. Thehistogram can also serve as a basis for measuring certain textural properties, and is the major statistical toolfor normalizing and requantizing an image [12, 14].

Let X be a rectangular m × n array, for some fixed integer K, and

.

The histogram image, , that corresponds to a � P is defined by

h(a,i) = card{x � X : a(x) = i}.

That is, h(a, i) corresponds to the number of elements in the domain of a that are mapped to i by a. Figure10.10.1 shows an image of a jet superimposed over a graphical representation of its histogram.

Figure 10.10.1 Image of a jet and its histogram.

The normalized histogram image of a is defined by

It is clear that . The normalized histograms corresponds, in some ways, to a probability

distribution function. Using the probability analogy, is viewed as the probability that a(x) takes onvalue i.





















Search Tips

Advanced Search


Search this book:



Define a parametrized template

by defining for each the template t with parameter a, i.e., , by

The image obtained from the code

h := £t(a)

is the histogram of a. This follows from the observation that since ,

= the number of pixels having value j.

If one is interested in the histogram of only one particular value j, then it would be more efficient to use thestatement

since represents the number of pixels having value j i.e., since andÇj(a(x)) = 1 Ô a(x) = j.


Note that the template formulation above is designed for parallel implementation. For serial computers it ismuch more efficient to iterate over the source image’s domain and increment the value of the histogram












image at gray level g whenever the gray level g encountered in the source image.

10.11. Cumulative Histogram

For a given gray level of an image, the value of the cumulative histogram is equal to the number of elementsin the image’s domain that have gray level value less than or equal to the given gray level. The cumulativehistogram can provide useful information about an image. For example, the cumulative histogram finds itsway into each of the histogram specification formulas in Section 2.12. Additional uses can be found in [12,15].

Let and . The cumulative histogram is defined by

c(a,i) = card{x � X : a(x) d i}.

The normalized cumulative histogram is defined by

The normalized cumulative histogram is analogous to the cumulative probability function of statistics.

Figure 10.11.1 shows an image of a jet superimposed over a graphical representation of its cumulativehistogram.

Figure 10.11.1 Image of a jet and its cumulative histogram.


The image algebra formulation for the cumulative histogram is very similar to the formulation for thehistogram (Section 10.10). Only the template t needs to be modified. The equality in the case statement of thetemplate becomes an inequality so that the template t used for the cumulative histogram is

The cumulative histogram is then represented by the image algebra statement

c :=£t(a).


As with the histogram in Section 10.10, the template formulation for the cumulative histogram is designed forparallel computers. For serial computers the cumulative histogram c should be calculated by iterating over thedomain of the source image. Whenever the gray level g is encountered, the value of c at all gray levels lessthan or equal to g should be incremented.

10.12. Texture Descriptors: Gray Level Spatial Dependence Statistics

Texture analysis is an important problem in image processing. Unfortunately, there is no precise definition oftexture. In this section, the approach to texture analysis is based on the statistical properties of an image. Inparticular, statistical properties derived from spatial gray level dependence matrices of an image areformulated.

Spatial gray level dependence (SGLD) matrices have proven to be one of the most popular and effectivesources of features in texture analysis. The SGLD approach computes an intermediate matrix of statisticalmeasures from an image. It then defines features as functions of this matrix. These features relate to texturedirectionality, coarseness, contrast, and homogeneity on a perceptual level. The values of an SGLD matrixcontains frequency information about the local spatial distribution of gray level pairs. Various statistics [16]

derived from gray level spatial dependence matrices have been proposed for use in classifying image textures.We will only present a subset of these statistics. Our main purpose is to illustrate how their derivations areformulated in image algebra. Information on the interpretation of spatial dependence statistics and how theycan be used for classifying textures can be found in various sources [16, 17, 18].

The statistics examined here are energy, entropy, correlation, inertia, and inverse difference moment. For a

given image , where and G denotes the number of expected gray levels, allsecond-order statistical measures are completely specified by the joint probability

s(”x,”y : i,j) = probability{a(x,y) = i and a(x + ”x,y + ”y) = j},

where . Thus, s(”x,”y : i,j) is the probability that an arbitrary pixel location (x,y) has graylevel i, while pixel location (x + ”x,y + ”y) has gray level j.

By setting and , s(”x,”y : i,j) can be rewritten as

s(d, ¸ : i,j) = probability {a(x,y) = i and a(x + dcos¸,y + dsin¸) = j}.

This provides for an alternate interpretation of s; the four-variable function s(d,¸ : i,j) represents theprobability that an arbitrary pixel location (x,y) has gray level i, while at an inter sample spacing distance d inangular direction ¸ the pixel location (x + dcos¸,y + sin¸) has gray level j.

It is common practice to restrict ¸ to the angles 0°, 45°, 90°, and 135°, although different angles could be used.Also, distance measures other than the Euclidean distance are often employed. Many researchers prefer thechessboard distance, while others use the city block distance.

For each given distance d and angle ¸ the function s defines a G × G matrix s(d,¸) whose (i,j)th entry s(d,¸)ij isgiven by s(d,¸)ij = s(d,¸ : i,j). The matrix s(d,¸) is called a gray level spatial dependence matrix (associatedwith a). It is important to note that in contrast to regular matrix indexing, the indexing of s(d,¸) starts atzero-zero; i.e., 0 d i, j d G - 1.





















Search Tips

Advanced Search


Search this book:


The standard computation of the matrix s(”x,”y) is accomplished by setting

s(”x, ”y : i,j) = card{(x,y) : a(x,y) = i and a(x + ”x,y + ”y) = j}.

For illustration purposes, let X be a 4 × 4 grid and be the image represented in Figure 10.12.1below. In this figure, we assume the usual matrix ordering of a with a(0,0) in the upper left-hand corner.

Figure 10.12.1 The 4 × 4 example image with gray values in .

Spatial dependence matrices for a, computed at various angles and chessboard distance d = max{|”x|,|”y|}, arepresented below.

For purposes of texture classification, many researchers do not distinguish between s(d,¸ : i,j) and s(d,¸ : j,i);that is, they do not distinguish which one of the two pixels that are (”x, ”y) apart has gray value i and whichone has gray value j. Therefore comparison between two textures is often based on the texture co-occurencematrix

c(d,¸) = s(d,¸) + s2(d,¸),

where s2 denotes the transpose of s. Using the SGLD matrices of the above example yields the followingtexture co-occurence matrices associated with the image shown in Figure 10.12.1:












Since co-occurence matrices have the property that c(d,¸) = c(d,¸ + À) — which can be ascertained from theirsymmetry — the rationale for choosing only angles between 0 and À becomes apparent.

Co-occurence matrices are used in defining the following more complex texture statistics. The marginalprobability matrices cx(d,¸ : i) and cy(d,¸ : j) of c(d,¸) are defined by

and

respectively. The means and variances of cx and cy are

Five commonly used features for texture classification that we will reformulate in terms image algebra aredefined below.

(a) Energy:

(b) Entropy:

(c) Correlation:

(d) Inverse Difference Moment:

(e) Inertia:


Let , where X is an m × n grid and G represents the number of gray levels in the image. First, we

show how to derive the spatial dependence matrices s(d,¸), where , and ¸ � {0°, 45°, 90°, 135°}.

By taking d to be the chessboard distance, s(d,¸) can be computed in terms of the intersample spacings ”x and”y. In particular, let N denote the neighborhood function

N(x,y) = {(x + ”x,y + ”y)}.

The image s can now be computed using the image algebra following pseudocode:

Thus, for example, choosing ”x = 1 and ”y = 0 computes the image s(1,0°), while choosing ”x = 1 and ”y = 1computes s(1, 45°).

The co-occurence image c is given by the statement

c(”x,”y) := s(”x,”y) + s2(”x,”y).

The co-occurence image can also be computed directly by using the parametrized template

defined by

The computation of c(”x,”y) now reduces to the simple formula

To see how this formulation works, we compute c(”x,”y)(i,j) for i ` j. Let

By definition of template sum, we obtain

If i=j, then we obtain which corresponds to the doubling effect along the diagonal when addingthe matrices s + s2. Although the template method of computing c(”x,”y) can be stated as a simple one-lineformula, actual computation using this method is very inefficient since the template is translation variant.

The marginal probability images cx(”x,”y) and cy(”x,”y) can be computed as follows:

The actual computation of these sums will probably be achieved by using the following loops:

and

Another version for computing cx(”x,”y) and cy(”x,”y), which does not involve loops, is obtained by defining

the neighborhood functions and by

Nx(i) = {(i,j) : j = 0, 1, … , G - i}

and

Ny(j) = {(i,j) : i = 0, 1, … , G - 1},

respectively. The marginal probability images can then be computed using the following statements:

This method of computing cx(”x,”y) and cy(”x,”y) is preferable when using special mesh-connectedarchitectures.

Next, let be the identity image defined by

The means and variances of cx(”x,”y) and cy(”x,”y) are given by

The image algebra pseudocode of the different statistical features derived from the spatial dependence imageare

(a) Energy:

(b) Entropy:

(c) Correlation:

where the coordinate projections for the G × G grid Y are defined by

p1(i,j) = i and

p2(i,j) = j.

(d) Inverse Difference Moment:

(e) Inertia:





















Search Tips

Advanced Search


Search this book:



If the images to be analyzed are of different dimensions, it will be necessary to normalize the spatialdependence matrix so that meaningful comparisons among statistics can be made. To normalize, each entryshould be divided by the number c(d,¸) of point pairs in the domain of the image that satisfy the relation ofbeing at distance d of each other in the angular direction ¸. The normalized spatial dependence image

is thus given by

Suppose c(1,0°) is the spatial dependence matrix (at distance 1, direction 0°) for an image defined over an m ×n grid. There are 2n(m - 1) nearest horizontal neighborhood pairs in the m × n grid. Thus, for this example

10.13. References




4 H. Freeman, “On the encoding of arbitrary geometric configurations,” IRE Transactions onElectronic Computers, vol. 10, 1961.

5 H. Freeman, “Computer processing of line-drawing images,” Computing Surveys, vol. 6, pp. 1-41,Mar. 1974.

6 G. Ritter, “Recent developments in image algebra,” in Advances in Electronics and Electron Physics













(P. Hawkes, ed.), vol. 80, pp. 243-308, New York, NY: Academic Press, 1991.

7 H. Shi and G. Ritter, “Structural description of images using image algebra,” in Image Algebra andMorphological Image Processing III, vol. 1769 of Proceedings of SPIE, (San Diego, CA), pp. 368-375,July 1992.

8 I. Gargantini, “An effective way to represent quadtrees,” Communications of the ACM, vol. 25, pp.905-910, Dec. 1982.

9 A. Rosenfeld and A. Kak, Digital Image Processing. New York: Academic Press, 1976.

10 J. Russ, Computer-Assisted Microscopy: The Measurement and Analysis of Images. New York,NY: Plenum Press, 1990.

11 M. Hu, “Visual pattern recognition by moment invariants,” IRE Transactions on InformationTheory, vol. IT-8, pp. 179-187, 1962.


13 A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: McGraw Hill,1965.

14 E. Hall, “A survey of preprocessing and feature extraction techniques for radiographic images,”IEEE Transactions on Computers, vol. C-20, no. 9, 1971.


16 R. Haralick, K. Shanmugan, and I. Dinstein, “Textural features for image classification,” IEEETransactions on Systems, Man, and Cybernetics, vol. SMC-3, pp. 610-621, Nov. 1973.

17 R. Haralick, “Statistical and structural approaches to textures,” Proceedings of the IEEE, vol. 67,pp. 786-804, May 1979.

18 J. Weszka, C. Dyer, and A. Rosenfeld, “A comparative study of texture measures for terrainclassification,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, pp. 269-285, Apr.1976.





















Search Tips

Advanced Search


Search this book:


Chapter 11Neural Networks and Cellular Automata

11.1. Introduction

Artificial neural networks (ANNs) and cellular automata have been successfully employed to solve a varietyof computer vision problems [1, 2]. One goal of this chapter is to demonstrate that image algebra provides anideal environment for precisely expressing current popular neural network models and their computations.Artificial neural networks (ANNs) are systems of dense interconnected simple processing elements. Thereexist many different types of ANNs designed to address a wide range of problems in the primary areas ofpattern recognition, signal processing, and robotics. The function of different types of ANNs is determinedprimarily by the processing elements’ pattern of connectivity, the strengths (weights) of the connecting links,the processing elements’ characteristics, and training or learning rules. These rules specify an initial set ofweights and indicate how weights should be adapted during use to improve performance.

The theory and representation of the various ANNs is motivated by the functionality and representation ofbiological neural networks. For this reason, processing elements are usually referred to as neurons whileinterconnections are called axons and/or synaptic connections. Although representations and models maydiffer, all have the following basic components in common:

(a) A finite set of neurons a(1), a(2),…, a(n) with each neuron a(i) having a specific neural value attime t which we will denote by at(i).

(b) A finite set of axons or neural connections W = (wij), where wij denotes the strength of theconnection of neuron a(i) with neuron a(j).

(c) A propagation rule

(d) An activation function f which takes Ä as an input and produces the next state of the neuron

where ¸ is a threshold and f a hard limiter, threshold logic, or sigmoidal function which introduces a












nonlinearity into the network.

It is worthwhile noting that image algebra has suggested a more general concept of neural computation thanthat given by the classical theory [3, 4, 5, 6].

Cellular automata and artificial neural networks share a common framework in that the new or next stage of aneuron or cell depends on the states of other neurons or cells. However, there are major conceptual andphysical differences between artificial neural networks and cellular automata. Specifically, an n-dimensional

cellular automaton is a discrete set of cells (points, or sites) in . At any given time a cell is in oneof a finite number of states. The arrangement of cells in the automaton form a regular array, e.g., a square orhexagonal grid.

As in ANNs, time is measured in discrete steps. The next state of cell is determined by a spatially andtemporally local rule. However, the new state of a cell depends only on the current and previous states of itsneighbors. Also, the new state depends on the states of its neighbors only for a fixed number of steps back intime. The same update rule is applied to every cell of the automaton in synchrony.

Although the rules that govern the iteration locally among cells is very simple, the automaton as a whole candemonstrate very fascinating and complex behavior. Cellular automata are being studied as modeling tools ina wide range of scientific fields. As a discrete analogue to modeling with partial differential equations,cellular automata can be used to represent and study the behavior of natural dynamic systems. Cellularautomata are also used as models of information processing.

In this chapter we present an example of a cellular automaton as well as an application of solving a problemusing cellular automata. As it turns out, image algebra is well suited for representing cellular automata. Thestates of the automata (mapped to integers, if necessary) can be stored in an image variable

as pixel values. Template-image operations can be used to capture the stateconfiguration of a cell’s neighbors. The synchronization required for updating cell states is inherent in theparallelism of image algebra.

11.2. Hopfield Neural Network

A pattern, in the context of the N node Hopfield neural network to be presented here, is an N-dimensionalvector P = (p1, p2, …, pN) from the space P = {-1, 1}N. A special subset of P is the set of exemplar patterns E

= {ek : 1 d k d K}, where . The Hopfield net associates a vector from P with anexemplar pattern in E. In so doing, the net partitions P into classes whose members are in some way similar tothe exemplar pattern that represents the class. For image processing applications the Hopfield net is bestsuited for binary image classification. Patterns were described as vectors, but they can just as easily be viewedas binary images and vice versa. A description of the components in a Hopfield net and how they interactfollows next and is based on the description given by Lippmann [7]. An example will then be provided toillustrate image classification using a Hopfield net.





















Search Tips

Advanced Search


Search this book:


As mentioned in the introduction, neural networks have four common components. The specifics for theHopfield net presented here are outlined below.

1. Neurons

The Hopfield neural network has a finite set of neurons a(i), 1 d i d N, which serve as processing units.Each neuron has a value (or state) at time t denoted by at(i). A neuron in the Hopfield net can be in oneof two states, either -1 or +1; i.e., at(i) � {-1, + 1}.

2. Synaptic Connections

The permanent memory of a neural net resides within the interconnections between its neurons. Foreach pair of neurons, a(i) and a(j), there is a number wij called the strength of the synapse (orconnection) between a(i) and a(j). The design specifications for this version of the Hopfield net requirethat wij = wji and wii = 0 (see Figure 11.2.1).

Figure 11.2.1 Synaptic connections for nodes a(i) and a(j) of the Hopfield neural network.

3. Propagation Rule

The propagation rule (Figure 11.2.2) defines how states and synaptic strengths combine as input to aneuron. The propagation rule Ät(i) for the Hopfield net is defined by

4. Activation Function

The activation function f determines the next state of the neuron at+1(i) based on the value Ät(i)calculated by the propagation rule and the current neuron value at(i) (see Figure 11.2.2). The activationfunction for the Hopfield net is the hard limiter defined below.












Figure 11.2.2 Propagation rule and activation function for the Hopfield network.

The patterns used for this version of the Hopfield net are N-dimensional vectors from the space P = {-1, 1}N.

Let denote the kth exemplar pattern, where 1 d k d K. The dimensionality of thepattern space determines the number of nodes in the net. In this case the net will have N nodes a(1), a(2), …,a(N). The Hopfield net algorithm proceeds as outlined below.

Step 1. Assign weights to synaptic connections.

This is the step in which the exemplar patterns are imprinted onto the permanent memory of the net.Assign weights wij to the synaptic connections as follows:

Note that wij = wji, therefore it is only necessary to perform the computation above for i < j.

Step 2. Initialize the net with the unknown pattern.

At this step the pattern that is to be classified is introduced to the net. If p = (p1, p2, …, pN) is theunknown pattern, set

Step 3. Iterate until convergence.

Calculate next state values for the neurons in the net using the propagation rule and activation function,that is,

Continue this process until further iteration produces no state change at any node. At convergence, theN-dimensional vector formed by the node states is the exemplar pattern that the net has associated with theinput pattern.

Step 4. Continue the classification process.

To classify another pattern, repeat Steps 2 and 3.

Example

As an example, consider a communications system designed to transmit the six 12 × 10 binary images 1, 2, 3,4, 9, and X (see Figure 11.2.3). Communications channels are subject to noise, and so an image may becomegarbled when it is transmitted. A Hopfield net can be used for error correction by matching the corruptedimage with the exemplar pattern that it most resembles.

Figure 11.2.3 Exemplar patterns used to initialize synaptic connections wij.

A binary image b � {0,1}X, where , can be translated into a pattern p = (p1, p2, …, p120), andvice versa, using the relations

and

The six exemplar images are translated into exemplar patterns. A 120-node Hopfield net is then created byassigning connection weights using these exemplar patterns as outlined earlier.

The corrupted image is translated into its pattern representation p = (p1, p2, …, pN) and introduced to theHopfield net (a0(i) is set equal to pi, 1 d i d N). The input pattern evolves through neuron state changes into

the pattern of the neuron states at convergence . If the netconverges to an exemplar pattern, it is assumed that the exemplar pattern represents the true (uncorrupted)image that was transmitted. Figure 11.2.4 pictorially summarizes the use of a Hopfield net for binary patternclassification process.


Let be the image used to represent neuron states. Initialize a with the unknown image

pattern. The weights of the synaptic connections are represented by the template which isdefined by

where is the ith element of the exemplar for class k. Let f be the hard limiter function defined earlier. Theimage algebra formulation for the Hopfield net algorithm is as follows:

Asynchronous updating of neural states is required to guarantee convergence of the net. The formulationabove does not update neural states asynchronously. The implementation above does allow more parallelismand hence increased processing speed. The convergence properties using the parallel implementation may beacceptable; if so, the parallelism can be used to speed up processing.








Search Tips

Advanced Search


Search this book:



If asynchronous behavior is desired in order to achieve convergence, then either the template t needs to beparameterized so that at each application of a • t only one randomly chosen neuron changes state, or thefollowing modification to the formulation above can be used.

Figure 11.2.4 Example of a Hopfield network

Figure 11.2.5 Modes of convergence for the Hopfield net.













The Hopfield net is guaranteed to converge provided neuron states are updated asynchronously and thesynaptic weights are assigned symmetrically, i.e., wij = wji. However, the network may not converge to thecorrect exemplar pattern, and may not even converge to an exemplar pattern. In Figure 11.2.5 three corruptedimages have been created from the “1” image by reversing its pixel values with a probability of 0.35. Thenetwork is the same as in the earlier example. Each of the three corrupted images yields a different mode ofconvergence when input into the network. The first converges to the correct exemplar, the second to theincorrect exemplar, and the third to no exemplar.

The two major limitations of the Hopfield net manifest themselves in its convergence behavior. First, thenumber of patterns that can be stored and accurately recalled is a function of the number of nodes in the net. Iftoo many exemplar patterns are stored (relative to the number of nodes) the net may converge to an arbitrarypattern which may not be any one of the exemplar patterns. Fortunately, this rarely happens if the number ofexemplar patterns it small compared to the number of nodes in the net.

The second limitation is the difficulty that occurs when two exemplar patterns share too many pixel values incommon. The symptom shows up when a corrupted pattern converges to an exemplar pattern, but to thewrong exemplar pattern. For example, the Hopfield net of the communication system tends to associate the“9” image with a corrupted “4” image. If the application allows, the second problem can be ameliorated bydesigning exemplar patterns that share few common pixel values.

11.3. Bidirectional Associative Memory (BAM)

An associative memory is a vector space transform which may or may not be

linear. The Hopfield net (Section 11.2) is an associative memory . Ideally, the Hopfieldnet is designed so that it is equal to the identity transformation when restricted to its set of exemplars E, thatis,

The Hopfield net restricted to its set of exemplars can be represented by the set of ordered pairs

A properly designed Hopfield net should also take an input pattern from that is not an exemplar patternto the exemplar that best matches the input pattern.

A bidirectional associative memory (BAM) is a generalization of the Hopfield net [8, 9]. The domain and the

range of the BAM transformation need not be of the same dimension. A set of associations

is imprinted onto the memory of the BAM so that

That is, for 1 d k d K. For an input pattern a that is not an element of {ak : 1 d k d K} theBAM should converge to the association pair (ak, bk) for the ak that best matches a.

The components of the BAM and how they interact will be discussed next. The bidirectional nature of theBAM will be seen from the description of the algorithm and the example provided.

The three major components of a BAM are given below.

1. Neurons

Unlike the Hopfield net, the domain and the range of the BAM need not have the same dimensionality.Therefore, it is necessary to have two sets of neurons Sa and Sb to serve as memory locations for the

input and output of the BAM

The state values of a(m) and b(n) at time t are denoted at(m) and bt(n), respectively. To guaranteeconvergence, neuron state values will be either 1 or -1 for the BAM under consideration.

2. Synaptic Connection Matrix

The associations (ak, bk), 1 d k d K, where and , arestored in the permanent memory of the BAM using an M × N weight (or synaptic connection) matrix w.The mnth entry of w is given by

Note the similarity in the way weights are assigned for the BAM and the Hopfield net.

3. Activation Function

The activation function f used for the BAM under consideration is the hard limiter defined by

The next state of a neuron from Sb is given by

where at = (at(1),at(2),…,at(M)) and w·n is the nth column of w.

The next state of a(m) � Sa is given by

where bt = (bt(1),bt(2),…,bt(N)) and is the mth column of the transpose of w.





















Search Tips

Advanced Search


Search this book:


The algorithm for the BAM is as outlined in the following four steps.

Step 1. Create the weight matrix w.

The first step in the BAM algorithm is to generate the weight matrix w using the formula

where and are from the association (ak, bk).

Step 2. Initialize neurons.

For the unknown input pattern p = (p1, p2, …, pM) initialize the neurons of Sa as follows:

The neurons in Sb should be assigned values randomly from the set {-1, 1}, i.e.,

Step 3. Iterate until convergence.

Calculate the next state values for the neurons in Sb using the formula

then calculate the next state values for the neurons in Sa using

The alternation between the sets Sb and Sa used for updating neuron values is why this type ofassociative memory neural net is referred to as “bidirectional.” The forward feed (update of Sb)












followed by the feedback (update of Sa) improves the recall accuracy of the net.

Continue updating the neurons in Sb followed by those in Sa until further iteration produces no statechange for any neuron. At time of convergence t = c, the association that the BAM has recalled is (a,b), where a = (ac(1), ac(2), …, ac(M)) and b = (bc(1), bc(2), …, bc(N)).

Step 4. Continue classification.

To classify another pattern repeat Steps 2 and 3.

Example

Figure 11.3.1 shows the evolution of neuron states in a BAM from initialization to convergence. Threelowercase-uppercase character image associations (a, A), (b, B), and (c, C) were used to create the weightmatrix w. The lowercase characters are 12 × 12 images and the uppercase characters are 16 × 16 images. Theconversion from image to pattern and vice versa is done as in the example of Section 11.2. A corrupted “a” isinput onto the Sa neurons of the net.


Let a and be the image variables used to represent the state for neuronsin the sets Sa and Sb, respectively. Initialize a with the unknown pattern. The neuron values of Sb are

initialized to either -1 or 1 randomly. The weight matrix W is represented by the template given by

where is the mth component of ak and is the nth component of bk in the association pair (ak, bk). Theactivation function is denoted f.

The image algebra pseudocode for the BAM is given by


The BAM has the same limitations that the Hopfield net does. First, the number of associations that can beprogrammed into the memory and effectively recalled is limited. Second, the BAM may have a tendency toconverge to the wrong association pair if components of two association pairs have too many pixel values incommon. Figure 11.3.2 illustrates convergence behavior for the BAM.

The complement dC of a pattern d is defined by

Note that the weight matrix w is defined by

Figure 11.3.1 Bidirectional nature of the BAM.

which is equivalent to

Therefore, imprinting the association pair (a, b) onto the memory of a BAM also imprints (aC, bC) onto thememory. The effect of this is seen in Block 5 of Figure 11.3.2.

Figure 11.3.2 Modes of convergence for the BAM.

11.4. Hamming Net

The Hamming distance h between two patterns is equal to the number ofcomponents in a and b that do not match. More precisely,

or

The Hamming net [7, 10] partitions a pattern space into classes Cn, 1 d n d N. Each class

is represented by an exemplar pattern en � Cn. The Hamming net takes as input a pattern and assigns it to class Ck if and only if

That is, the input pattern is assigned to the class whose exemplar pattern is closest to it as measured byHamming distance. The Hamming net algorithm is presented next.

There is a neuron a(n) in the Hamming net for each class Cn, 1 d n d N (see Figure 11.4.1). The weight tij

assigned to the connection between neurons a(i) and a(j) is given by

where 0 < � < and 1 d i, j d N. Assigning these weights is the first step in the Hamming algorithm.

When the input pattern p = (p1, p2, …, pM) is presented to the net, the neuron value a0(n) isset equal to the number of component matches between p and the exemplar en. If

is the nth exemplar, a0(n) is set using the formula

where

The next state of a neuron in the Hamming net is given by

where f is the activation function defined by

Search Tips

Advanced Search


Search this book:


The next state of a neuron is calculated by first decreasing its current value by an amount proportional to thesum of the current values of all the other neuron values in the net. If the reduced value falls below zero thenthe new neuron value is set to 0; otherwise it assumes the reduced value. Eventually, the process of updatingneuron values will lead to a state in which only one neuron has a value greater than zero. At that point theneuron with the nonzero value, say ac(k) ` 0, represents the class Ck that the net assigns to the input pattern. Attime 0, the value of a(k) was greater than all the other neuron values. Therefore, k is the number of the classwhose exemplar had the most matches with the input pattern.

Figure 11.4.1 Hamming network.

Figure 11.4.2 shows a six-neuron Hamming net whose classes are represented by the exemplars 1, 2, 3, 4, 9,and X. The Hamming distance between the input pattern p and each of the exemplars is displayed at thebottom of the figure. The number of matches between exemplar en and p is given by the neuron states a0(n), 1d n d 6. At time of completion, only one neuron has a positive value, which is ac(5) = 37.5. The Hamming netexample assigns the unknown pattern to Class 5. That is, the input pattern had the most matches with the “9”exemplar which represents Class 5.


Let be the image variable used to store neural states. The activation function f is as defined

earlier. The template defined by












Figure 11.4.2 Hamming net example.

where is the nth exemplar pattern, is used to initialize the net. The template

defined by

is used to implement the propagation rule for the net.

Given the input pattern the image algebra formulation for the Hamming net is as follows:

The variable c contains the number of the class that the net assigns to the input pattern.


The goal of our example was to find the number of the exemplar that best matches the input pattern asmeasured by Hamming distance. The formulation above demonstrated how to use image algebra to implementthe Hamming neural network approach to the problem. A simpler image algebra formula that accomplishedthe same goal is given by the following two statements:

11.5. Single-Layer Perceptron (SLP)

A single-layer perceptron is used to classify a pattern p = (p1, p2, …, pm) � into one of twoclasses. An SLP consists of a set of weights,

and a limiter function defined by

Let p = (p1, p2, …, pm) � P, in order to classify p the perceptron first calculates the sum of products

and then applies the limiter function. If f g(p) < 0, the perceptron assigns p to class C0. The pattern p isassigned to class C1 if f g(p) > 0. The SLP is represented in Figure 11.5.1.

Figure 11.5.1 Single-layer perceptron.

The graph of 0 = g(x) = w0 + w1x1 + ··· + wmxm is a hyperplane that divides into two regions. Thishyperplane is called the perceptron’s decision surface. Geometrically, the perceptron classifies a pattern basedon which side the pattern (point) lies on. Patterns for which g(p) < 0 lie on one side of the decision surfaceand are assigned to C0. Patterns for which g(p) > 0 lie on the opposite side of the decision surface and areassigned to C1.

The perceptron operates in two modes — a classifying mode and a learning mode. The pattern classificationmode is as described above. Before the perceptron can function as a classifier, the values of its weights mustbe determined. The learning mode of the perceptron is involved with the assignment of weights (or thedetermination of a decision surface).

If the application allows, i.e., if the decision surface is known a priori, weights can be assigned analytically.However, key to the concept of a perceptron is the ability to determine its own decision surface through alearning algorithm [7, 11, 12]. There are several algorithms for SLP learning. The one we present here can befound in Rosenblatt [11].

Let be a training set, where pk = ( ) � P and yk � {0,1} is the classnumber associated with Pk. Let wi(t) be the value of ith weight after the tth training pattern has been presentedto the SLP. The learning algorithm is presented below.

Step 1. Set each wi(0), 0 d i d m, equal to random real number.

Step 2. Present pk to the SLP. Let denote the computed class number of pk, i.e.,

Step 3. Adjust weights using the formula

Step 4. Repeat Steps 2 and 3 for each element of the training set, recycling if necessary, untilconvergence or a predefined number of iterations.

The constant, 0 < · d 1, regulates the rate of weight adjustment. Small · results in slow learning. However, if ·is too large the learning process may not be able to home in on a good set of weights.

Note that if the SLP classifies pk correctly, then and there is no adjustment made to the

weights. If , then the change in the weight vector is proportional to the input pattern.

Search Tips

Advanced Search


Search this book:


Figure 11.5.2 illustrates the learning process for distinguishing two classes in . Class 1 points have beenplotted with diamonds and Class 0 points have been plotted with crosses. The decision surface for thisexample is a line. The lines plotted in the figure represent decision surfaces after n = 0, 20, 40, and 80 trainingpatterns have been presented to the SLP.

Figure 11.5.2 SLP learning — determination of the perceptron’s decision surface.


Let be the image variable whose components represent the weights of the SLP. Let

be the pattern that is to be classified. Augment p with a 1 to form

. The function f is the limiter defined earlier. The class number yof p is determined using the image algebra statement

To train the SLP, let be a training set. Initially, each component of the weight imagevariable w is set to a random real number. The image algebra pseudocode for the SLP learning algorithm(iterating until convergence) is given by













If the set of input patterns cannot be divided into two linear separable sets then an SLP will fail as a classifier.Consider the problem of designing an SLP for XOR classification. Such an SLP should be defined over thepattern space {(0, 0), (0, 1), (1, 0), (1, 1)}. The classes for the XOR problem are

The points in the domain of the problem are plotted in Figure 11.5.3. Points in Class 0 are plotted with open

dots and points in Class 1 are plotted with solid dots. A decision surface in is a line. There is no line in

that separates classes 0 and 1 for the XOR problem. Thus, an SLP is incapable of functioning as aclassifier for the XOR problem.

Figure 11.5.3 Representation of domain for XOR.

11.6. Multilayer Perceptron (MLP)

As seen in Section 11.5, a single-layer perceptron is not capable of functioning as an XOR classifier. This isbecause the two classes of points for the XOR problem are not linearly separable. A single-layer perceptron is

only able to partition into regions separated by a hyperplane. Fortunately, more complex regions in

can be specified by feeding the output from several SLPs into another SLP designed to serve as amultivariable AND gate. The design of a multilayer perceptron for the XOR problem is presented in thefollowing example.

In Figure 11.6.1, the plane has been divided into two regions. The shaded region R1 lies between the lineswhose equations are -1 + 2x + 2y = 0 and 3 - 2x - 2y = 0. Region R0 consists of all points not in R1. The pointsof class C1 for the XOR problem lie in R1 and the points in C0 lie in R0. Region R1 is the intersection of thehalf-plane that lies above -1 + 2x + 2y = 0 and the half-plane that lies below 3 - 2x - 2y = 0. Single-layerperceptrons SLP1 and SLP2 can be designed with decision surfaces 0 = -1 + 2x + 2y and 0 = 3 - 2x - 2y,respectively. The intersection required to create R1 is achieved by sending the output of SLP1 and SLP2

through another SLP that acts as an AND gate.

Figure 11.6.1 Multilayer perceptron solution strategy for XOR problem.

One SLP implementation of an AND gate and its decision surface is shown in Figure 11.6.2. All the

components of a two-layer perceptron for XOR classification can be seen in Figure 11.6.3. The first layerconsists of SLP1 and SLP2. An input pattern (x, y) is presented to both SLP1 and SLP2. Whether the point liesabove - 1 + 2x + 2y = 0 is determined by SLP1; SLP2 determines if it lies below 3 - 2x - 2y = 0. The secondlayer of the XOR perceptron takes the output of SLP1 and SLP2 and determines if the input satisfies both theconditions of lying above -1 + 2x + 2y = 0 and below 3 - 2x - 2y = 0.

Figure 11.6.2 Single-layer perceptron for AND classification.

Figure 11.6.3 Two-layer perceptron for XOR classification.

By piping the output of SLPs through a multivariable AND gate, a two-layer perceptron can be designed forany class C1 whose points lie in a region that can be constructed from the intersection of half-planes. Adding athird layer to a perceptron consisting of a multivariable OR gate allows for the creation of even more complexregion for pattern classification. The outputs from the AND layer are fed to the OR layer which serves tounion the regions created in the AND layer. An OR gate is easily implemented using an SLP.

Figure 11.6.4 shows pictorially how the XOR problem can be solved with a three-layer perceptron. The ANDlayer creates two quarter-planes from the half-planes created in the first layer. The third layer unions the twoquarter-planes to create the desired classification region.

The example above is concerned with the analytic design of a multilayer perceptron. That is, it is assumed that

the classification regions in are known a priori. Analytic design is not accordant with the perceptronconcept. The distinguishing feature of a perceptron is its ability to determine the proper classification regions,on its own, through a “learning by example” process. However, the preceding discussion does point out howthe applicability of perceptrons can be extended by combining single-layer perceptrons. Also, by firstapproaching the problem analytically, insights will be gained into the design and operation of a “true”perceptron. The design of a feedforward multilayer perceptron is presented next.

A feedforward perceptron consists of an input layer of nodes, one or more hidden layers of nodes, and anoutput layer of nodes. We will focus on the two-layer perceptron of Figure 11.6.5. The algorithms for thetwo-layer perceptron are easily generalized to perceptrons of three or more layers.





















Search Tips

Advanced Search


Search this book:


A node in a hidden layer is connected to every node in the layer above and below it. In Figure 11.6.5 weightwij connects input node xi to hidden node hj and weight vjk connects hj to output node ok. Classification beginsby presenting a pattern to the input nodes xi, 1 d i d l. From there data flows in one direction (as indicated bythe arrows in Figure 11.6.5) through the perceptron until the output nodes ok. 1 d k d n, are reached. Outputnodes will have a value of either 0 or 1. Thus, the perceptron is capable of partitioning its pattern space into 2n

classes.

Figure 11.6.4 Three-layer perceptron implementation of XOR.

Figure 11.6.5 Two-layer perceptron.

The steps that govern data flow through the perceptron during classification are as follows:

Step 1. Present the pattern to the perceptron, i.e., set xi = pi for 1 d i d l.

Step 2. Compute the values of the hidden-layer nodes using the formula












Step 3. Calculate the values of the output nodes using the formula

Step 4. The class c = (c1, c2, …, cn) that the perceptron assigns p must be a binary vector which, wheninterpreted as a binary number, is the class number of p. Therefore, the ok must be thresholded at somelevel Ä appropriate for the application. The class that p is assigned to is then c = (c1, c2, …, cn), whereck = ÇeÄ(ok).

Step 5. Repeat Steps 1, 2, 3, and 4 for each pattern that is to be classified.

Above, when describing the classification mode of the perceptron, it was assumed that the values of theweights between nodes had already been determined. Before the perceptron can serve as a classifier, it mustundergo a learning process in which its weights are adjusted to suit the application.

The learning mode of the perceptron requires a training set where is a patternand ct � {0, 1}n is a vector which represents the actual class number of pt. The perceptron learns (adjusts itsweights) using elements of the training set as examples. The learning algorithm presented here is known asbackpropagation learning.

For backpropagation learning, a forward pass and a backward pass are made through the perceptron. Duringthe forward pass a training pattern is presented to the perceptron and classified. The backward passrecursively, level by level, determines error terms used to adjust the perceptron’s weights. The error terms atthe first level of the recursion are a function of ct and output of the perceptron (o1, o2, …, on). After all theerror terms have been computed, weights are adjusted using the error terms that correspond to their level. Thebackpropagation algorithm for the two-layer perceptron of Figure 11.6.5 is detailed in the steps which follow.

Step 1. Initialize the weights of the perceptron randomly with numbers between -0.1 and 0.1; i.e.,

Step 2. Present from the training pair (pt, ct) to the perceptron andapply Steps 1, 2, and 3 of the perceptron’s classification algorithm outlined earlier. This completes theforward pass of the backpropagation algorithm.

Step 3. Compute the errors , in the output layer using

where represents the correct class of pt. The vector (o1, o2, …, on)represents the output of the perceptron.

Step 4. Compute the errors , in the hidden-layer nodes using

Step 5. Let vjk(t) denote the value of weight vjk after the tth training pattern has been presented to theperceptron. Adjust the weights between the output layer and the hidden layer according to the formula

The parameter 0 < · d 1 governs the learning rate of the perceptron.

Step 6. Adjust the weights between the hidden layer and the input layer according to

Step 7. Repeat Steps 2 through 6 for each element of the training set. One cycle through the training setis called an epoch. The performance that results from the network’s training may be enhanced byrepeating epochs.


Let and o be the image variables used to store the values of the input layer and

the output layer, respectively. The template will be used to represent the weightsbetween the input layer and the hidden layer. Initialize w as follows:

The template represents the weights between the hidden layer and the output layer.Initialize v by setting

The activation function for the perceptron is

Let be the training set for the two-layer perceptron, where

is a pattern and ct � {0, 1}n is a vector which represents the actual class

number of pt. Define to be The parameterized templates t(d, h)

and u(d, p) are defined by

and

The image algebra formulation of the backpropagation learning algorithm for one epoch is

Search Tips

Advanced Search


Search this book:


To classify p = (p1, p2, …, pl), first augment it with a 1 to form = (1, p1, p2, …, pl). The pattern p is thenassigned to the class represented by the image c using the image algebra code:


The treatment of perceptrons presented here has been very cursory. Our purpose is to demonstrate techniquesfor the formulation of perceptron algorithms using image algebra. Other works present more comprehensiveintroductions to perceptrons [7, 10, 12, 13].

11.7. Cellular Automata and Life

Among the best known examples of cellular automata is John Conway’s game Life [2, 14, 15]. The lifeprocesses of the synthetic organisms in this biosphere are presented next. Although this automaton is referredto as a game, it will provide an illustration of how easily the workings of cellular automata are formulated inimage algebra.

Life evolves on a grid of points . A cell (point) is either alive or not alive. Life and itsabsence are denoted by state values 1 and 0, respectively. The game begins with any configuration of livingand nonliving cells on the grid. Three rules govern the interactions among cells.

(a) Survivals — Every live cell with two or three live 8-neighbors survives for the next generation.

(b) Deaths — A live cell that has four or more 8-neighbors dies due to over-population. A live cellwith one 8-neighbor or none dies from loneliness.

(c) Birth — A cell is brought to life if exactly three of its 8-neighbors are alive.

Figure 11.7.1 shows the life cycle of an organism. Grid 0 is the initial the initial state of the life-form. Afterten generations, the life-form cycles between the configurations in grids 11 and 12.












Figure 11.7.1 Life cycle of an artificial organism.

Depending on the initial configuration of live cells, the automaton will demonstrate one of four types ofbehavior [14]. The pattern of live cells may:

(a) vanish in time

(b) evolve to a fixed finite size

(c) grow indefinitely at a fixed speed

(d) enlarge and contract irregularly


Let a � {0, 1}X, where , be the image variable that contains the state values of cells in theautomata. The cells state at time t is denoted at. The template used to capture the states of neighboring cell isshown below.

Let a0 be the initial configuration of cells. The next state of the automaton is given by the image algebraformula

11.8. Solving Mazes Using Cellular Automata

In this section an unconventional solution to finding a path through a maze is presented. The method uses acellular automata (CA) approach. This approach has several advantageous characteristics. The memoryrequired to implement the algorithm is essentially that which is needed to store the original maze image. TheCA approach provides all possible solutions, and it can determine whether or not a solution exists [16]. TheCA method is also remarkably simple to implement.

Conventional methods to the maze problem use only local information about the maze. As the mouseproceeds through the maze, it marks each intersection it passes. If the corridor that the mouse is currentlyexploring leads to a dead-end, the mouse backtracks to the last marker it placed. The mouse then tries anotherunexplored corridor leading from the marked intersection. This process is continued until the goal is reached.Thus, the conventional method uses a recursive depth-first search.

The price paid for using only local information with the conventional method is the memory needed forstoring the mouse’s search tree, which may be substantial. The CA approach requires only the memory that isneeded to store the original maze image. The CA approach must have the information that a vieweroverlooking the whole maze would have. Depending on the application the assumption of the availability of aglobal overview of the maze may or may not be reasonable.

The maze for this example is a binary image. Walls of the maze have pixel value 0 and are represented inblack (Figure 11.8.1). The corridors of the maze are 4-connected and have pixel value 1. Corridors must beone pixel wide. Corridor points are white in Figure 11.8.1.

In some ways the CA approach to maze solving may be viewed as a poor thinning algorithm (Chapter 5). Adesirable property of a thinning algorithm is the preservation of ends. To solve the maze the CA removesends. Ends in the maze are corridor points that are at the terminus of a dead-end corridor. A corridor point at

the end of a dead-end corridor is changed to a wall point by changing its pixel value from 1 to 0. At eachiteration of the algorithm, the terminal point of each dead-end corridor is converted into a wall point. Thisprocess is continued until no dead-end corridors exist, which will also be the time at which further iterationproduces no change in the image. When all the dead-end corridors are removed, only solution paths willremain. The result of this process applied to the maze of Figure 11.8.1 is seen in Figure 11.8.2.

Figure 11.8.1 Original maze.

For this example, the next state of a point is function of the current states of the points in a von Neumannneighborhood of the target point. The next state rules that drive the maze solving CA are

(a) A corridor point that is surrounded by three of four wall points becomes a wall point.

(b) Wall points always remain wall points.

(c) A corridor point surrounded by two or fewer wall points remains a corridor point.


Let a � {0, 1}X be the maze image. The template t that will be used to capture neighborhood configurations isdefined pictorially as

Figure 11.8.2 Maze solution.

The image algebra code for the maze solving cellular automaton is given by








Search Tips

Advanced Search


Search this book:


11.9. References

1 S. Chen, ed., Neural and Stochastic Methods in Image and Signal Processing, vol. 1766 ofProceedings of SPIE, (San Diego, CA), July 1992.

2 S. Wolfram, Theory and Applications of Cellular Automata. Singapore: World Scientific PublishingCo., 1986.

3 C. P. S. Araujo and G. Ritter, “Morphological neural networks and image algebra in artificialperception systems,” in Image Algebra and Morphological Image Processing III (P. Gader, E.Dougherty, and J. Serra, eds.), vol. 1769 of Proceedings of SPIE, (San Diego, CA), pp. 128-142, July1992.

4 J. Davidson, “Simulated annealing and morphological neural networks,” in Image Algebra andMorphological Image Processing III, vol. 1769 of Proceedings of SPIE, (San Diego, CA), pp. 119-127,July 1992.

5 J. Davidson and R. Srivastava, “Fuzzy image algebra neural network for template identification,” inSecond Annual Midwest Electro-Technology Conference, (Ames, IA), pp. 68-71, IEEE Central IowaSection, Apr. 1993.

6 J. Davidson and A. Talukder, “Template identification using simulated annealing in morphologyneural networks,” in Second Annual Midwest Electro-Technology Conference, (Ames, IA), pp. 64-67,IEEE Central Iowa Section, Apr. 1993.

7 R. Lippmann, “An introduction to computing with neural nets,” IEEE Transactions on Acoustics,Speech, and Signal Processing, vol. ASSP-4, pp. 4-22, 1987.

8 B. Kosko, “Adaptive bidirectional associative memories,” in IEEE 16th Workshop on AppliedImages and Pattern Recognition, (Washington, D.C.), pp. 1-49, Oct. 1987.

9 J. A. Freeman and D. M. Skapura, Neural Networks: Algorithms, Applications, and ProgrammingTechniques. Reading, MA: Addison Wesley, 1991.

10 G. Ritter, D. Li, and J. Wilson, “Image algebra and its relationship to neural networks,” inTechnical Symposium Southeast on Optics, Electro-Optics, and Sensors, Proceedings of SPIE,(Orlando, FL), Mar. 1989.

11 F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.












Washington, D.C.: Spartan Books, 1962.

12 K. Knight, “Connectionist ideas and algorithms,” Communications of the ACM, vol. 33, no. 11, pp.59-74, 1990.


14 S. Wolfram, “Cellular automata as models of complexity,” Nature, vol. 311, pp. 419-424, Oct.1884.

15 M. Gardner, “The fantastic combinations of John Conway’s new solitaire game “life”,” ScientificAmerican, vol. 223, pp. 120-123, Oct. 1970.

16 B. Nayfeh, “Cellular automata for solving mazes,” Dr. Dobb’s Journal, pp. 32-38, Feb. 1993.





















Search Tips

Advanced Search


Search this book:

Table of Contents

AppendixThe Image Algebra C++ Library

A.1. Introduction

In 1992, the U.S. Air Force sponsored work at the University of Florida to implement a C++ class library tosupport image algebra, iac++. This appendix gives a brief tour of the library then provides examples ofprograms implementing some of the algorithms provided in this text.

Current information on the most recent version of the iac++ class library is available via the WorldwideWeb from URL

http://www.cise.ufl.edu/projects/IACC/.

The distribution and some documentation are available for anonymous ftp from

ftp://ftp.cise.ufl.edu/pub/ia/iac++.

The class library has been developed in the C++ language [1, 2]. Executable versions of the library havebeen developed and tested for the following compiler/system combinations:

Table A.1.1 Tested iac++ platforms.

Compiler System

g++ v 2.7.0 SunOS 4.1.3SunOS 5.2.5Silicon Graphics IRIX 5.3

Metrowerks v6 MacOS 7.5 (PowerPC)

The library is likely to be portable to other systems if one of the supported compilers is employed. Porting toother compilers might involve significant effort.












http://www.cise.ufl.edu/projects/iacc/

ftp://ftp.cise.ufl.edu/pub/ia/iac++

A.2. Classes in the iac++ Class Library

The iac++ class library provides a number of classes and related functions to implement a subset of theimage algebra. At the time of this writing the library contains the following major classes to support imagealgebra concepts:

(a) IA_Point<T>(b) IA_Set<T>(c) IA_Pixel<P,T>(d) IA_Image<P,T>(e) IA_Neighborhood<P,Q>(f) IA_DDTemplate

All names introduced into the global name space by iac++ are prefixed by the string IA_ to lessen thelikelihood of conflict with programmer-chosen names.

Points

The IA_Point<T> class describes objects that are homogeneous points with coordinates of type T. Thelibrary provides instances of the following point classes:

(a) IA_Point<int>(b) IA_Point<double>

The dimensionality of a point variable is determined at the time of creation or assignment, and may changedby assigning a point value having different dimension to the variable. Users may create points directly withconstructors. Functions and operations may also result in point values.

One must #include "ia/IntPoint.h" to have access to the class of points with int coordinates andits associated operations. One must #include "ia/DblPoint.h" to have access to the class of pointswith double coordinates and its associated operations.

Point operations described in the sections that follow correspond to those operations presented in Section 1.2of this book.

Point Constructors and Assignment

Let i, i1, i2, ..., represent int-valued expressions. Let iarray represent an array of int values with nelements. Let d, d1, d2, ..., represent double-valued expressions. Let darray represent an array ofdouble values with n elements. And let ip and dp represent points with int and double coordinatevalues, respectively.

Table A.2.1 Point Constructors and Assignment

construct a copy IA_Point<int>(ip)IA_Point<double> (dp)

construct from coordinates IA_Point<int> (i)IA_Point<int> (i1,i2)IA_Point<int> (i1,i2,i3)IA_Point<int> (i1,i2,i3,i4)IA_Point<int> (i1,i2,i3,i4,i5)IA_Point<double> (d)IA_Point<double> (d1,d2)IA_Point<double> (d1,d2,d3)IA_Point<double> (d1,d2,d3,d4)IA_Point<double>(d1,d2,d3,d4,d5)

construct from array of coordinates IA_Point<int> (iarray, n)IA_Point<double> (darray, n)

assign a point ip1 = ip2dp1 = dp2

Binary Operations on Points

Let p1 and p2 represent two expressions both of type IA_Point<int> or IA_Point<double>.

Table A.2.2 Binary Operations on Points

addition p1 + p2

subtraction p1 - p2

multiplication p1 * p2

division p1 / p2

supremum sup (p1, p2)

infimum inf (p1, p2)

dot product dot (p1, p2)

cross product cross (p1, p2)

concatenation concat (p1, p2)

Unary Operations on Points

Let p represent an expression of type IA_Point<int> or IA_Point<double>.

Table A.2.3 Unary Operations on Points

negation -p

ceiling ceil (p)

floor floot (p)

rounding rint (p)

projection (subscripting) p(i)P[i]!

sum sum (p)

product product (p)

maximum max (p)

minimum min (p)

Euclidean norm enorm (p)

L1 norm mnorm (p)

L� norm inorm (p)

dimension p.dim( )

This function applies only to points of type IA_Point<double>.!Subscripting with ( ) yields a coordinate value, but subscripting with [ ] yields a coordinate reference towhich one may assign a value, thus changing a point’s value.

Relations on Points

Let p1 and p2 represent two expressions both of type IA_Point<int> or IA_Point<double>. Notethat a relation on points is said to be strict if it must be satisfied on all of the corresponding coordinates oftwo points to be satisfied on the points themselves.

Table A.2.4 Relations on Points

less than (strict) p1 < p2

less than or equal to (strict) p1 <= p2

greater than (strict) p1 > p2

greater than or equal to (strict) p1 >= p2

equal to (strict) p1 == p2

not equal to (complement of ==) p1 != p2

lexicographic comparison pointcmp (p1, p2)

Examples of Point Code Fragments

One can declare points using constructors.

IA_Point<int> point1 (0,0,0); // point1 == origin in 3D integral Cartesian space IA_Point<double> dpoint (1.3,2.7); // dpoint is a point in 2D real Cartesian space

One may subscript the coordinates of a point. (Note that point coordinates use zero-based addressing inkeeping with C vectors.)

point1[0] = point1[1] = 3; // point1 == IA_Point<int> (3,3,0)

One can manipulate such points using arithmetic operations.

point1 = point1 + IA_Point<int> (1,2,3); // point1 == IA_Point<int> (4,5,3)

And one may apply various functions to points.

point1 == floor (dpoint); // point1 == IA_Point<int> (1,2)

Sets

The C++ template class IA_Set<T> provides an implementation of sets of elements of type T. Thefollowing instances of IA_Set are provided:

(a) IA_Set<IA_Point<int> >(b) IA_Set<IA_Point<double> >(c) IA_Set<bool>(d) IA_Set<unsigned char>(e) IA_Set<int>(f) IA_Set<float>(g) IA_Set<IA_complex>(h) IA_Set<IA_RGB>

IA_Set<IA_Point<int> > and IA_Set<IA_Point<double> >, provide some capabilitiesbeyond those provided for other sets.

Sets of Points and Their Iterators

The point set classes IA_Set<IA_Point<int> > and IA_Set<IA_Point<double> > provide theprogrammer with the ability to define and manipulate sets of point objects all having the same type anddimension.

One must #include "ia/IntPS.h" to have access to the class of sets of int points and its associatedoperations. One must #include "ia/DblPS.h" to have access to the class of sets of double pointsand its associated operations. To gain access to point set iterator classes and their associated operations, onemust #include "ia/PSIter.h".

Point set operations described in the sections that follow correspond to those operations presented in Section1.2 of this book.

Point Set Constructors and Assignment

In Table A.2.5, let ip, ip1, ip2, ..., represent IA_Point<int>-valued expressions. Let iparrayrepresent an array of IA_Point<int> values, each of dimension d, with n elements. Let dp, dp1, dp2,

..., represent IA_Point<double>-valued expressions. Let dparray represent an array of double values,each of dimension d, with n elements. And let ips and dps represent sets with IA_Point<int> andIA_Point<double> elements, respectively.

The function IA_boxy_pset creates the set containing all points bounded by the infimum and supremumof its two point arguments. The type of set is determined by the point types of the arguments. TheIA_universal_ipset and IA_empty_ipset create universal or empty sets of typeIA_Point<int> of dimension specified by the argument. Sets of type IA_Point<double> are createdin a corresponding fashion by IA_universal_dpset and IA_empty_dpset.

Table A.2.5 Point Set Constructors and Assignment

construct a copy IA_Set<IA_Point<int> > (ips)IA_Set<IA_Point<double> > (dps)

construct from points IA_Set<IA_Point<int> > (ip)IA_Set<IA_Point<int> > (ip1,ip2)IA_Set<IA_Point<int> > (ip1,ip2,ip3)IA_Set<IA_Point<int> > (ip1,ip2,ip3,ip4)IA_Set<IA_Point<int> > (ip1,ip2,ip3,ip4,ip5)IA_Set<IA_Point<double> > (dp)IA_Set<IA_Point<double> > (dp1,dp2)IA_Set<IA_Point<double> > (dp1,dp2,dp3)IA_Set<IA_Point<double> > (dp1,dp2,dp3,dp4)IA_Set<IA_Point<double> >(dp1,dp2,dp3,dp4,dp5)

construct from array ofpoints

IA_Set<IA_Point<int> > (d,iparray,n)IA_Set<IA_Point<double> > (d,dparray n)

assign a point set ips1 = ips2dps1 = dps2

functions returning pointsets

IA_boxy_pset (ip1, ip2)IA_boxy_pset (dp1, dp2)IA_universal_ipset (dim)IA_universal_dpset (dim)IA_empty_ipset (dim)IA_empty_dpset (dim)

Binary Operations on Point Sets

In Table A.2.6, let ps, ps1 and ps2 represent expressions all of type IA_Set<IA_Point<int> > orIA_Set<IA_Point<double> >. Let p represent an expression having the same type as the elements ofps.

Table A.2.6 Binary Operations on Point Sets

addition ps1 + ps2

subtraction ps1 - ps2

point addition ps + pp + ps

point subtraction ps - pp - ps

union ps1 | ps2

intersection ps1 & ps2

set difference ps1 / ps2

symmetric difference ps1 ^ ps2

Unary Operations on Point Sets

Let ps represent an expression of type IA_Set<IA_Point<int> > orIA_Set<IA_Point<double> >.

Table A.2.7 Unary Operations on Point Sets

negation -ps

complementation ~ps

supremum sup (ps)

infimum inf (ps)

choice function ps.choice ( )

cardinality ps.card ( )

Relations on Point Sets

Let ps, ps1, and ps2 represent expressions all of type IA_Set<IA_Point<int> > orIA_Set<IA_Point<double> >. Let p represent an expression having the same type as the elements ofps.

Table A.2.8 Relations on Point Sets

containment test ps.contains(p)

proper subset ps1 < ps2

(improper) subset ps1 <= ps2

proper superset ps1 > ps2

(improper) superset ps1 >= ps2

equality ps1 == ps2

inequality ps1 != ps2

emptiness test ps.empty( )

Point Set Iterators

The class IA_PSIter supports iteration over the elements of sets of typeIA_Set<IA_Point<int> > and IA_Set<IA_Point<double> >. This provides a singleoperation, namely function application (the parentheses operator), taking a reference argument of theassociated point type and returning a bool. The first time an iterator object is applied as a function, itattempts to assign its argument the initial point in the set (according to an implementation specifiedordering). If the set is empty, the iterator returns a false value to indicate that it failed; otherwise it returns atrue value. Each subsequent call assigns to the argument the value following that which was assigned on theimmediately preceding call. If it fails (due to having no more points) the iterator returns a false value;otherwise it returns a true value.

Examples of Point Set Code Fragments

One can declare point sets using a variety of constructors.

IA_Set<IA_Point<int> > ps1; // uninitialized point set

IA_Set<IA_Point<int> > ps2(IA_IntPoint (0,0,0)); // point set with a single point

IA_Set<IA_Point<int> > ps3(IA_boxy_pset(IA_IntPoint (-1,-1), IA_IntPoint (9,9))); // contains all points in the rectangle // with corners (-1,-1) and (9,9)

// Introduce a predicate function int pred(IA_IntPoint p) {return (p[0] >= 0)? 1 : 0; }

IA_Set<IA_Point<int> > ps4(2, &pred); // the half plane of 2D space having nonnegative 0th coord.

IA_Set<IA_Point<int> > ps5(3, &pred); // the half space of 3D space having nonnegative 0th coord.

One can operate upon those sets.

ps1 = ps3 & ps4; // intersection of ps3 and ps4 contains all points // in the rectangle with corners (0,0) and (9,9)

ps1 = ps3 | ps4; // union of ps3 and ps4

ps1 = ps3 + IA_Point<int> (5,5); // pointset translation

ps1 = ps3 + ps3; // Minkowski (pairwise) addition

// relational operations and predicates if ((ps2 <= ps3) || ps2.empty( ) ) ps2 = ~ps2; // complementation

And one can iterate over the elements in those sets.

IA_PSIter<IA_Point<int> > iter(ps3); // iter will be an iterator over the points // in set ps3 IA_Point<int> p;

// the while loop below iterates over all the points in the // set ps3 and writes them to the standard output device while(iter(p)) {cout < to construct andmanipulate extensive sets of values of any C++ type T for which there is an ordering. Instances of sets withbool, unsigned char, int, float, IA_complex, and IA_RGB elements are instantiated in theiac++ library. The class IA_Set<T> imposes a total ordering on the elements of type T in the set, andsupports operations max and min. If the value type does not provide underlying definitions for max andmin (as with IA_complex), then the library chooses some arbitrary (but consistent) definition for them.

One must #include "ia/Set.h" to have access to instances of the value set class and associatedoperations. To gain access to the iterators and associated operations, one must #include"ia/SetIter.h".

Value Set Constructors and Assignment

Let v, v1, v2, ..., represent T-valued expressions. Let varray represent an array of T values with nelements, and let vs, vs1, and vs2 represent sets with T-valued elements.

Table A.2.9 Value Set Constructors and Assignment

construct a copy IA_Set<T>(vs)

construct from values IA_Set<T> (v)IA_Set<T> (v1,v2)IA_Set<T> (v1,v2,v3)IA_Set<T> (v1,v2,v3,v4)IA_Set<T> (v1,v2,v3,v4,v5)

construct from array ofvalues

IA_Set<T> (varray,n)

assign a value set vs1 = vs2

Binary Operations on Value Sets

Let vs1 and vs2 represent two expressions, both of type IA_Set<T>.

Table A.2.10 Binary Operations on Value Sets

union vs1 | vs2

intersection vs1 & vs2

set difference vs1 / vs2

symmetric difference vs1 ^ vs2

Unary Operations on Value Sets

Let vs represent an expressions of type IA_Set<T>.

Table A.2.11 Unary Operations on Value Sets

maximum max (vs)

Minimum min (vs)

choice function vs.choice()

cardinality vs. card()

Relations on Value Sets

Let vs, vs1 and vs2 represent expressions all of type IA_Set<T>.

Table A.2.12 Relations on Value Sets

containment test vs.contains(v)

proper subset vs1 < vs2

(improper) subset vs1 <= vs2

proper superset vs1 > vs2

(improper) superset vs1 >= vs2

equality vs1 == vs2

inequality vs != vs2

emptiness test vs.empty()

Value Set Iterators

The class IA_Setlter<T> supports iteration over the elements of an IA_Set<T>. This provides a singleoperation, namely function application (the parentheses operator), taking a reference argument of type T andreturning a bool. The first time an iterator object is applied as a function, it attempts to assign its argumentthe initial point in the set (according to an implementation-specified ordering). If the set is empty, theiterator returns a false value to indicate that it failed; otherwise it returns a true value. Each subsequent callassigns to the argument the value following that which was assigned on the immediately preceding call. If itfails (due to having no more points) the iterator returns a false value; otherwise it returns a true value.

Examples of Value Set Code Fragments

Value sets can be constructed by explicitly listing the contained values.

IA_Set<int> v1(1,2,3); // constructs a set containing three integers. // sets of up to 5 elements can be // constructed in this way

float vec[] = {1.0, 2.0, 3.0,

1.0, 2.0, 3.0, 3.0, 2.0, 1.0 IA_Set<float> v2( vec, 9 ); // constructs a set containing the three // unique values 1.0, 2.0, and 3.0, that // are contained in the 9 element vector vec

One can apply operations to those value sets

v2 = v2 & IA_Set<float>(2.0, 3.0); // the intersection of v2 and the // specified set is assigned to v2 v2 = v2 | 6.0; // the single element 6.0 is united with set v2

cout << v2.card() << "\n"; // writes the cardinality of set

if (IA_Set<int>(1, 2) <= v1) // test for subset cout << v1 << "\n"; // write the set

One can iterate over the values in those sets.

IA_SetIter<int> iter(v1); int i;

// write all the elements of a set while (iter(i)) (cout <,comprising the images defined over sets of points of type P having values of type T. The following instancesof the IA_Image class are provided in the library:

(a) IA_Image<IA_Point<int>, bool>(b) IA_Image<IA_Point<int>, unsigned char>(c) IA_Image<IA_Point<int>, int>(d) IA_Image<IA_Point<int>, float>(e) IA_Image<IA_Point<int>, IA_complex>(f) IA_Image<IA_Point<int>, IA_RGB>(g) IA_Image<IA_Point<double>, float>(h) IA_Image<IA_Point<double>, IA_complex>

To gain access to the class definitions and associated operations for each of these image types, one must#include the associated header file. The basic unary and binary operations on the supported image typesare included in the following files, respectively:

(a) ia/BoolDI.h(b) ia/UcharDI.h(c) ia/IntDI.h(d) ia/FloatDI.h(e) ia/CplxDI.h(f) ia/RGBDI.h(g) ia/FloatCI.h(h) ia/CplxCI.h

In these file names, the DI designation (discrete image) denotes images over sets of point with intcoordinates and CI (continuous image) denotes images over sets of points with double coordinates.

A pixel is a point together with a value. Pixels drawn from images mapping points of type P into values oftype T are supported with the class IA_Pixel<P, T> which can be instantiated for any combination of Pand T. Objects belonging to this class have the two publicly accessible fields point and value. AnIA_Pixel object, while it does bring together a point and a value, is not in any way associated with aspecific image. Thus, assigning to the value field of an IA_Pixel object does not change the actualvalue associated with a point in any image. The definition of the IA_Pixel class is contained in the headerfile ia/Pixel.h.

Image Constructors and Assignment

Given a point type P and a value type T one can create and assign images mapping points of type P intovalues of type T. In the following table img, img1, and img2 denote objects of type IA_Image<P, T>,p denotes an object of type IA_Set, t denotes a value of type T, pixarray denotes an array of nobjects of type IA_Pixel<P, T>, and f denotes a function with signature T f (P) or T f (constP&).

Table A.2.13 Image Constructors and Assignment

construct an empty image IA_Image<P, T> ()

construct a copy IA_Image<P, T> (img)

construct a constant-valuedimage

IA_Image<P, T> (ps, t)

construct from an array ofpixels

IA_Image<P, T> (pixarray,n)

construct from apoint-to-value function

IA_Image<P, T> (ps, f)

assign an image img1 = img2

assign each pixel aconstant value

img = t

assign to img1 theoverlapping parts of img2

img1.restrict-assign(img2)

Binary Operations on Images

Let img1 and img2 represent two expressions both of type IA_Image<IA_Point<int>, T>.

Table A.2.14 Binary Operations on Images

addition img1 + img2

pointwise maximum max (img1, img2)

subtraction img1 - img2

multiplication img1 * img2

division img1 / img2

pseudo-division pseudo_div (img1, img2)

modulus img1 % img2

binary and img1 & img2

binary or img1 | img2

binary exclusive or img1 ^ img2

logical and img1 && img2

logical or img1 || img2

left arithmetic shift img1 << img2

right arithmetic shift img1 >> img2

pointwise minimum min (img1, img2)

characteristic less than chi_lt (img1, img2)

characteristic less than orequal to

chi_le (img1, img2)

characteristic equal chi_eq (img1, img2)

characteristic greater than chi_gt (img1, img2)

characteristic greater thanor equal

chi_ge (img1, img2)

characteristic value setcontainment

chi-contains (img1, vset)

Available only for images for which T is an integer type (bool, unsigned char, or int).

Unary Operations on Images

Let img represent an expressions of type IA_Image<IA_Point<int>, T>.

Table A.2.15 Unary Operations on Images

projection img(p)img[p]

negation -img

pointwise one’scomplement

~img

pointwise logical not !img

cardinality img. card()

domain extraction img.domain()

range extraction img.range()

sum sum (img)

maximum max (img)

minimum min (img)

product prod (img)

absolute value abs (img)

ceiling ceil (img)

floor floor (img)

exponential exp (img)

natural logarithm log (img)

cosine cos (img)

sine sin (img)

tangent tan (img)

complex magnitude abs_f (img)

complex angle arg_f (img)

complex real part real_f (img)

complex imaginary part imag_f (img)

sqare root sqrt (img)

integer square root isqrt (img)

square sqr (img)

Subscripting with () yields the value associated with the point p, but subscripting with [] yields a referenceto the value associated with p. One may assign such a reference a new value, thus changing the image. Thisoperation is potentially expensive. This is discussed at greater length in the section providing example codefragments below.

Domain Transformation Operations on Images

Let img, img1, and img2 represent expressions of type IA_Image<P, T>. Let tset represent an

expression of type IA_Set<T>. Let pset represent an expression of type IA_Set and let p representan element of such a point set.

Table A.2.16 Domain Transformation Operations on Images

translation translate (img, p)

restriction of domain to apoint set

restrict (img, pset)

restriction of domain by avalue set

restrict (img, tset)

extension of one image byanother

extend (img1, img2)

Relations on Images

Let img1 and img2 represent two expressions both of type IA_Image<P, T>.

Table A.2.17 Relations on Images

less than img1 < img2

less than or equal to img1 <= img2

equal to img1 == img2

greater than img1 > img2

greater than or equal to img1 >= img2

not equal to complement ofequal to)

img1 != img2

strictly not equal to strict_ne (img1, img2)

Input/Output Operations on Images

The iac++ library supports the reading and writing of images in extended portable bitmap format (EPBM)[3]. This format supports the commonly used pbm, pgm, and ppm formats. The EPBM supports onlytwo-dimensional rectangular images and does not represent an image’s point set, thus not all imagesrepresentable in the iac++ library can be directly read or written. When an EPBM image is read, it isassigned an int coordinate point set spanning from the origin to the point whose coordinates are one lessthat the number of rows and number of columns in the image. To insure that we write only rectangularimages, any image to be written is extended to the smallest enclosing rectangular domain with the value 0used wherever the image was undefined.

Let istr represent an istream and let in_file_name represent a character string containing an inputfile name. Let max_ptr be a pointer to an int variable that will receive as its value the maximum value inthe image read.

Table A.2.18 Input Operations on Images

read a bit image read_PBM (in_file_name)read-PBM (istr)

read an unsigned charimage

read_uchar_PGM (in_file_name)read_uchar_PGM (in_file_name, max_ptr)read_uchar_PGM (istr)read_uchar_PGM (istr, max_ptr)

read an integer image read_int_PGM (in_file_name)read_int_PGM (in_file_name, max_ptr)read_int_PGM (istr)read_int_PGM (istr, max-ptr)

read an RGB image read-PPM (in_file_name)read_PPM (in_file_name, max_ptr)read_PPM (istr)read_PPM (istr, max_ptr)

One may write either to a named file or to an ostream. If writing to a named file, the image output functionreturns no value. If writing to an ostream, the image output function returns the ostream.

Let ostr represent an ostream and let out_file_name represent a character string containing anoutput file name. Let bit_img denote an image with bool values, uchar_img denote an image withunsigned char values, int_img denote an image with int values, and rgb_img denote an imagewith IA_RGB values. Let maxval be an unsigned int representing the maximum value in an image.

Table A.2.19 Output Operations on Images

write a bit image write_PBM (bit_img, out_file_name)write_PBM (bit_img, ostr)

write an unsigned charimage

write_PGM (uchar_img, out_file_name)write_PGM (uchar_img, out_file_name, maxval)write_PGM (uchar_img, ostr)write_PGM (uchar_img, ostr, maxval)

write an integer image write-PGM (int_img, out_file_name)write_PGM (int_img, out_file_name, maxval)write_PGM (int_img, ostr)write_PGM (int_img, ostr, maxval)

write an RGB image write_PPM (rgb_img, out_file_name)write_PPM (rgb_img, out_file_name, maxval)write_PPM (rgb_img, ostr)write_PPM (rgb_img, ostr, maxval)

Image Iterators

The class IA_Pixel<P, T> supports storing of a point and value as a unit and provides the two fieldselectors point and value. Instances of the IA_Pixel class are provided for the same combinations ofP and T on which IA_Image instances are provided.

Iterators over either the values or the pixels of an image can be constructed and used. The classIA_IVIter<P, T> supports image value iteration and class IA_IPIter<P, T> supports pixeliteration. The value iterator provides an overloading of function call with a reference argument of theimage’s range type. The pixel iterator function call overloading takes a reference argument of typeIA_Pixel<P, T>.

The first time an image value (pixel) iterator object is applied as a function, it attempts to assign itsargument the value (pixel) that is associated with the first point in the image’s domain point set. If the imageis empty, a false value is returned to indicate that it failed, otherwise a true value is returned. Eachsubsequent call assigns to the argument the value (pixel) associated with the next point in the image’sdomain. If the iterator fails to find such a value (pixel) because it has exhausted the image’s domain pointset, the iterator returns a false value, otherwise it returns a true value.

Image Composition Operations

Two kinds of composition operations are supported for images. Since an image maps points to values, onemay generate a new image by composing a value-to-value mapping with an image or by composing animage with a spatial transform (or point-to-point mapping). To composition of a value-to-value functionwith an image, one must #include "ia/ImgComp.h".

Let img be an image of type IA_Image<P, T>, let value_map be a function with signature Tvalue_map (T) , let point_map be a function pointer with signature P point_map(const P&),and let result-pset be the set of type IA_Set over which the result of the composition is to bedefined. The result of composing the point-to-point mapping function with the image at any point p in theresult_pset is equal to img(point_map(p)). The result of composing the value-to-value mappingfunction with the image is value-map (img(p)).

Table A.2.20 Image Composition Operations

compose an image with apoint-to-point function

compose (img, point_map, result_pset)

compose a value-to-valuefunction with an image

compose (value-map, img)

Note that the composition of an image with a point-to-point mapping function takes a third argument,namely the point set over which the result is to be defined. This is necessary because determining the set ofpoints over which the result image defines values would require us to solve the generally unsolvableproblem of computing the inverse of the point-map function. Thus, we require the user to specify thedomain of the result.

Examples of Image Code Fragments

One may construct images.

IA_Image <IA_Point<int>, int> il; // uninitialized image IA_Image<IA_Point<int>, int> i2(ps3, 5); // a constant image over point set ps3 // having value 5 at each point

// we may use a function such as p0 below to create an // image with value at each point specified by a function int p0(IA_Point<int> p) { return p[0]; } IA_Image<IA_Point<int>, int> i2(ps3, &p0);

It is important to note the significant difference in application of an image as a function (as in i2 (...))versus subscripting of an image (as in i2 [...]). In the first case, the value of the image at a point isreturned. In the second case, a reference to a pixel, the specific value mapping of a point within an image, isreturned. An image pixel can have its value assigned through such a reference, thus modifying the image.Assigning an image to a variable can be quite time consuming. The iac++ library tries to avoid suchcopying wherever possible. Assigning an image to a variable usually does not copy storage. Instead, it uses areference counting technique to keep track of multiple uses of a single image representation. If an imagehaving multiple readers is subscripted (with operation []), a unique version of the image is created so thatthe subscripter (writer) will get a unique copy of the potentially modified image. While this insulates otherreaders of that image from future changes, it also imposes a significant penalty in both time and space. Thus,it is preferable to use function application notation rather than subscripting wherever possible.

i1 = i2; // i1 and 2 now share the same value map

// write a point of image i2 (same as i1) cout << i2(IA_IntPoint(3,5));

i1[IA_IntPoint(2,4)] = 10; // Subscripting a pixel of i1 causes new // storage for its value mapping to be allocated // whether or not an assignment is actually made

The iac++ library also supports assignment to a subregion of an image with the restrict_assignmember function.

i2.restrict_assign (i1); // For each point p in both i1 and i2's domains // This performs assignment // i2[p] = i1(p);

One may restrict the value of an image to a specified set of points or to the set of points containing values ina specified set, and one may extend an image to the domain of another image.

i1 = restrict(i2, ps4); // i1's domain will be the intersection of // ps4 and i2.domain()

i1 = restrict(i2, IA_Set<int>(1, 2, 3)); // i1's domain will be all those // points in i2's domain associated with value // 1, 2, or 3 by i2.

i1 = extend(i2, IA_Image<IA_Point<int>, int>(ps4, 1)); // i1's domain will be the union of i2.domain and ps4 // and value 1 will be associated with those // points of ps4 not in i2.domain().

One may reduce an image to a single value with a binary operations on the range type if the image has anextensive domain.

// binary operation defined by a function int add(int i, int j) { return i + j; }

cout << i1.reduce(&add, 0) << "\n"; // writes the result of adding // all the pixel values of i1

One can compose a C++ function of one parameter with an image having elements of the same type as thefunction’s parameter. This lets one efficiently apply a user-defined function to each element of an image.

int my_function(int i) { . . . } i1 = compose (my_function, i2); // This is equivalent to assigning i1 = i2 // and for each point p in i1.domain() executing // i1[p] = my_function(i1(p))

Likewise, since an image is conceptually a function composition of an image with a function mapping pointsto points will yield a new image. The point set of such an image cannot be effectively computed, thus onemust specify the set of points in the resulting image.

IA_Point<int> reflect_through_origin(const IA_Point<int> &p) { return -p; }

i1 = compose (i2, reflect_through_origin, IA_boxy_pset (-max(i2.domain()), -min(i2.domain())); // After executing this, // i1(p) == i2(reflect_through_origin(p)) = i2(-p).

Input/output, arithmetic, bitwise, relational, and type conversion operations are provided on the imageclasses.

IA_Image<IA_Point<int>, int> i3, i4; IA_Image<IA_Point<int>, float> f1, f2;

// image I/O is supported for pbm, pgm, and ppm image formats i3 = read_int_PGM ("image1.pgm"); i4 = read_int_PGM ("image3.pgm");

// conversions mimic the behavior of C++ casts f1 = to_float (i3*i3 + i4*i4); f2 = sqrt (f1);

// All relational operations between images, with the // exception of !=, return a true boolean value // if the relation holds at every pixel. // != is the complement of the == relation. if (i3 < i4) write_PGM (max(i3) - i3);

Iteration over the values or pixels in an image is supported by the library.

IA_IVIter<int> v_iter(i1); // declares v-iter to be an iterator over the // values of image i1

IA-IPIter<int> p_iter(i1); // declares p_iter to be an iterator over the // pixels of image i1

int i, sum = 0;

// iterate over values in i1 and collect sum while (v-iter(i)){sum += i; } cout << sum << "\n"; // prints the sum

IA_Pixel<IA_Point<int>, int> pix; IA_Point<int> psum = extend_to_point(0, i1.domain.dim());

// sum together the value-weighted pixel locations in i1 while (p_iter(pix)) psum += pix.point * pix.value;

cout << (psum / il.card) << "\n"; // prints i1's centroid.

Neighborhoods and Image-Neighborhood Operations

A neighborhood is a mapping from a point to a set of points. The iac++ library template classIA_Neighborhood<P, Q> provides the functionality of image algebra neighborhoods. To specify aninstance of this class, one gives the type of point in the neighborhood’s domain and the type of point whichis an element of the range. The library contains the single instance IA_Neighborhood<IA_Point<int>, IA_Point<int> >, mapping from points with integral coordinates to sets ofpoints with integral coordinates. The library provides the following kinds of operations:

(a) constructors and assignment,

(b) function application of the neighborhood to a point,

(c) domain extraction, and

(d) image-neighborhood reduction operations.

Neighborhood Constructors and Assignment

Let Q be the argument point type of a neighborhood and P be the type of elements in the result point set. Letnbh, nbh1, and nbh2 denote objects of type IA_Neighborhood<P, Q>. Let result_set be a setof type IA_Set containing points with dimensionality dim that is to be associated with 4the origin of atranslation invariant neighborhood. Let domain_set be a set of type IA_Set<Q> over which aneighborhood shall be defined. Let point_to_set_map be a function pointer with prototypeIA_Set point_to_set_map (Q) or with prototype IA_Setpoint_to_set_map(const Q&).

Table A.2.21 Neighborhood Constructors and Assignment

construct an uninitializedneighborhood

IA_Neighborhood<P, Q> ()

construct a copy of aneighborhood

IA_Neighborhood<P, Q> (nbh)

construct a translationinvariant niehgborhood

IA_Neighborhood<P, Q> (dim, result_set)

construct a translationvariant neighborhood froma function

IA_Neighborhood<P, Q> (domain_set,point_to_set_map)

assign a neighborhoodvalue

nbh1 = nbh2

Neighborhood operations presented in the sections that follow are introduced in Section 1.7.

Image-Neighborhood Reduction Operations

The definitions for the neighborhood reduction operations are provided in the following header files:

(a) ia/BoolNOps.h for images of type IA_Image<P, bool>

(b) ia/UcharNOps.h for images of type IA_Image<P, unsigned char>

(c) ia/IntNOps.h for image of type IA_Image<P, int>

(d) ia/FloatNOps.h for images of type IA_Image<P, float>

(e) ia/CplxNOps.h for image of type IA_Image<P, IA_Complex>

In the table following, img denotes an image object, nbh denotes a neighborhood, and result_psetdenotes the point set over which the result image is to be defined. If no result point set is specified, theresulting image has the same domain as img. For the generic neighborhood reduction operation, thefunction with prototype T binary_reduce (T, T) is a commutative function with identity zeroused to reduce all the elements of a neighborhood. Alternatively, the function T n_ary_reduce (T*,unsigned n), which takes an array of n values of type T and reduces them to a single value, can be usedto specify a neighborhood reduction.

Table A.2.22 Image-Neighborhood Reduction Operations

calculate the rightimage-neighborhood sum

sum (img, nbhd, result_pset)sum (img, nbhd)

calculate the rightimage-neighborhood product

prod (img, nbhd, result_pset)prod (img, nbhd)

calculate the rightimage-neighborhood maximum

max (img, nbhd, result_pset)max (img, nbhd)

calculate the rightimage-neighborhood minimum

min (img, nbhd, result_pset)min (img, nbhd)

calculate a rightimage-neighborhood product with aprogrammer-specified reductionfunction

neighborhood_reduction (img, nbhd,result_pset, binary_reduce, zero)neighborhood_reduction (img, nbhd,result_pset, n_ary_reduce)

Examples of Neighborhood Code Fragments

The neighborhood constructors provided in the library support both translation invariant and variantneighborhoods. A translation invariant neighborhood is specified by giving the set the neighborhoodassociates with the origin of its domain.

IA_Neighborhood<IA_Point<int>, IA_Point<int> > n1 (2, IA_boxy_pset (IA_Point<int>(-1,-1), IA_Point<int>( 1, 1))); // n1 is a neighborhood defined for all // 2 dimensional points, // associating a 3x3 neighborhood with each point.

// A neighborhood mapping function. IA_Set<IA_Point<int> > nfunc (IA_Point<int> p) { ... }

// A neighborhood constructed from a mapping function. IA_Neighborhood<(IA_Point<int>, IA_Point<int> > n2 (IA_boxy_pset(IA_Point<int>(0,0) IA_Point<int>(511, 511), nfunc)

Neighborhood reduction functions map an image and a neighborhood to an image by reducing the neighborsof a pixel to a single value. The built-in neighborhood reduction functions max, min, product, and sumare provided, together with a functions supporting a user-specified reduction function.

IA_Image<IA_Point<int>, int> i1, i2;

i1 = max (i2, n1); // Finds the maximum value in the neighbors of each pixel // in image i1 and assigns the resultant image to i2.

// Specify a reduction function. int or (int x, int y) {return (x | y); }

// Reduce an image by giving the image, neighborhood, // reduction function, and the identity of the reduction // functions as arguments to the generic reduction functional.

i1 = neighborhood_reduction (i2, n1, i1.domain(), or, 0);

Templates and Image-Template Product Operations

An image algebra template can be thought of as an image which has image values. Thus, image algebratemplates concern themselves with two different point sets, the domain of the template itself, and thedomain of the images in the template’s range. The iac++ library currently supports the creation oftemplates — in which both of these point sets are discrete — with the C++ template classIA_DDTemplate which takes as a class parameter an image type. The iac++ class library providesthe following instances of IA_DDTemplate:

(a) IA_DDTemplate<IA_Image<IA_Point<int>,bool> >(b) IA_DDTemplate<IA_Image<IA_Point<int>,unsigned char> >(c) IA_DDTemplate<IA_Image<IA_Point<int>,int> >(d) IA_DDTemplate<IA_Image<IA_Point<int>,float> >(e) IA_DDTemplate<IA_Image<IA_Point<int>, IA_Complex> >

The operations defined upon these IA_DDTemplate instances are defined in the following include files:

(a) ia/BoolProd.h(b) ia/UcharProd.h(c) ia/IntProd.h(d) ia/FloatProd.h(e) ia/CplxProd.h

Template Constructors

Let the argument image type I an IA_DDTemplate class be IA_Image<P, T>. Let templ,templ1, and templ2 denote objects of type IA_DDTemplate. Let img denote the image to beassociated with the origin by an image algebra template. Let dim denote the dimensionality of the domainof img. Let domain_pset denote the point set over which an image algebra template is to be defined. Letthe function I templ_func (IA_Point<int>) be the point-to-image mapping function of atranslation variant template.

Table A.2.23 Template Constructors and Assignment

construct an uninitializedtemplate

IA_DDTemplate ()

construct a copy of atemplate

IA_DDTemplate (temp1)

construct a translationinvariant template

IA_DDTemplate (dim, img)IA_DDTemplate (domain_pset, img)

construct a translationvariant template from afunction

IA_DDTemplate (domain_pset, templ_func)

assign a template value temp11 = temp12

Template operations presented in the sections that follow are introduced in Section 1.5.

Image-Template Product Operations

In Table A.2.24, let img denotes an image object of type I which is IA_Image<P, T>. Let templdenote an object of type IA_DDTempl, and result_pset denotes the point set over which the resultimage is to be defined. If no result point set is specified, the resulting image has the same domain as img.For the generic template reduction operation, the function with prototype T pwise_op (T, T) is acommutative function with identity pwise_zero used as a pointwise operation on the source image andtemplate image and the function with prototype T binary_reduce (T, T) is a commutative functionwith identity reduce-zero used to reduce the image resulting from the pointwise combining.Alternatively, the function T n_ary_reduce (T*, unsigned n) which takes an array of n valuesof type T and reduces them to a single value, can be used to specify a template reduction.

In the product operations, the following correspondences may be drawn between iac++ library functionnames and the image algebra symbols used to denote them:

(a) linear_product corresponds to •,

(b) additive_maximum corresponds to ,

(c) additive_minimum corresponds to ,

(d) multiplicative_maximum corresponds to ,

(e) multiplicative-minimum corresponds to , and

(f) generic_product corresponds to with pwise_op corresponding to ,binary_reduce corresponding to ³, and n_ary reduce corresponding to “.

Examples of Template Code Fragments

Constructors for IA_DDTemplate support both translation invariant and variant templates. The translationinvariant templates are specified by giving the image result of applying the template function at the origin.

Table A.2.24 Image-Neighborhood Reduction Operations

calculate the rightimage-template linearproduct

linear_product (img, temp1, result_pset)linear_product (img, temp1)

calculate the rightimage-neighborhoodadditive maximum

addmax_product (img, temp, result_pset)addmax_product (img, templ)

calculate the rightimage-neighborhoodadditive minimum

addmin_product (img, templ, result_pset)addmin_product (img, templ)

calculate the rightimage-neighborhoodmultiplicative maximum

multmax_product (img, templ, result_pset)multmax_product (img, templ)

calculate the rightimage-neighborhoodmultiplicative minimum

multmin_product (img, templ, result_pset)multmin_product (img, templ)

calculate a rightimage-neighborhoodproduct with aprogrammer-specifiedreduction function

generic_product (img, templ, result_pset,pwise_op, binary_reduce, reduce_zero,pwise_zero)generic_product (img, templ, result_pset,pwise_op, n_ary_reduce, pwise_zero)

calculate the leftimage-template linearproduct

linear_product (templ, img, result_pset)linear_product (templ, img)

calculate the leftimage-neighborhoodadditive maximum

addmax_product (templ, img, result_pset)addmax_product (templ, img)

calculate the leftimage-neighborhoodadditive minimum

addmin_product (templ, img, result_pset)addmin_product templ, img)

calculate the leftimage-neighborhoodmultiplicative maximum

multmax_product (templ, img, result_pset)multmax_product (img, templ, img)

calculate the leftimage-neighborhoodmultiplicative minimum

multmin_product (templ, img, result_pset)multmin_product (templ, img)

calculate a leftimage-neighborhoodproduct with aprogrammer-specifiedreduction function

generic_product (templ, img, result_pset,pwise_op, binary reduce, reduce_zero,pwise_zero)generic_product (templ, img, result_pset,pwise-op, n_ary_reduce, pwise_zero)

IA_Image<IA_Point<int>, int> i1, i2; IA_DDTemplate<IA_Image<IA_Point<int>, int> > t1, t2;

i1 = read_int_PGM ("temp-file.pgm"); // read the template image i2 = read_int_PGM ("image.pgm"); // read an image to process

t1 = IA_DTemplate<IA_Image<IA_Point<int>,int> > (i2.domain(), i1); // Define template t1 over the same domain as image i2.

t2 = IA_DDTemplate<IA_Image<IA_Point<int>, int> (2, i1); // Define template t2to apply anywhere // in 2D integer Cartesian space.

Templates can be defined by giving a function mapping points into their image values.

IA_DDTemplate<IA_Image<IA_Point<int>, int> > tv; const IA_Set<IA_Point<int> > ps (IA_boxy_pset (IA_Point<int>(0,0), IA_Point<int>(511,511)));

// function from point to image IA_Image<IA_Point<int>, int> f(const IA_IntPoint &p) { return IA_Image<IA_Point<int>, int>(ps, p[0]); }

tv = IA_DDTemplate<IA_Image<IA_Point<int>, int> >(ps, &f); // tv applied to any point p, yields the image

// f(p) as its value.

Finally, any of these types of templates may be used with image-template product operations. Theoperations linear_product, addmax_product, addmin_product, multmax_product, andmultmin_product are provided by the iac++ library together with a generic image-template productoperation supporting any user specifiable image-template product operations.

IA_DDTemplate<IA_Image<IA_Point<int>, int> > templ; IA_Image<IA_Point<int>, int> source, result; IA_Set<IA_Point<int> > p;

// ... code to initialize source and templ

result = linear_product(source, templ, ps); // result will be the linear product of source and // templ defined over point set ps.

result = addmax_product(source, templ); // result will be the add_max product of source and // templ defined over default point // set source.domain()

result = multmin_product(source, templ);

A.3. Examples of Programs Using iac++

The examples that follow present brief programs employing the iac++ library. The first example showshow to perform averaging of multiple images. This example involves the use of image input and outputoperations and binary operations upon images. The second example shows how to compose an image to apoint-to-point function to yield a spatial/geometric modification of the original image. The third examplepresents local averaging using a neighborhood operation in two ways: using the sum reduction operation(and incurring an edge effect) and using an n-ary reduction operation. The fourth and final example presentsthe Hough transform.

Example 1. Averaging of Multiple Images

//example1.c -- Averaging Multiple Images//// usage: example1 file-1.pgm [file-2.pgm ... file-n.pgm]//// Read a sequence of images from files and average them.// Display the result.// Assumes input files are pgm images all having the same pointset// and containing unsigned char values.//

#include "ia/UcharDI.h"#include "ia/FloatDI.h"

int main(int argc, char **argv){ IA_Image<IA_Point<int>, float> accumulator; IA_Image<IA_Point<int>, u_char> result;

if (argc < 2) f { cerr << "usage: " << argv[0] << " file-1.pgm [file-2.pgm ... file-n.pgm]" << endl;

abort(); }

// Get the first image accumulator = to_float (read_uchar_PGM (argv[1]));

// Sum the first image together with the rest for (int i = 1; i < argc - 1; i++) { cout << "Reading " << argv[i+1] << endl; accumulator += to_float (read_uchar_PGM (argv[i+1])); }

result = to_uchar (accumulator / float (i)); display (result);}

Example 2. Composing an Image with a Point-to-Point Function

//// example2.c -- Composition of an Image with a Function//

#include <iostream.h>#include "ia/UcharDI.h"

//// point-to-point mapping function//

IA_Point<int>reflect_through_origin (const IA_Point<int> &p){ return -p;}

intmain (){ // read in image from cin IA_Image<IA_Point<int>, u_char> img = read_uchar_PGM (cin);

IA_Image<IA_Point<int>, u_char> result;

result = compose (img, reflect_through_origin, IA_boxy_pset (-max (img.domain), -min (img.domain()))); display (result);

return 0;}

Example 3. Using Neighborhood Reductions for Local Averaging

//// example1.c -- Local Averaging//// usage: example1 < file.pgm//// Read an image from cin and average it locally

// with a 3x3 neighborhood.//

#include "math.h"#include "ia/UcharDI.h"#include "ia/IntDI.h"#include "ia/Nbh.h"#include "ia/UcharNOps.h"u_charaverage (u_char *uchar_vector, unsigned num){ if (0 == num) { return 0; } else { int sum = 0; for (int i = 0; i < num; i++) { sum += uchar-vector[i]; } return u_char (irint (float (sum) / num)); }}

int main(int argc, char **argv){ IA_Image<IA_Point<int>, u_char> source_image = read_uchar_PGM (cin);

IA_Neighborhood <IA_Point<int>, IA_Point<int> > box (2, IA_boxy_pset (IA_Point<int> (-1, -1), IA_Point<int> (1, 1)));

// // Reduction with sum and division by nine yields a boundary // effect at the limits of the image point set due to the // lack of nine neighbors. // display (to_uchar (sum (source_image, box) / 9));

// // Reduction with the n-ary average function correctly // generates average values even at the boundary. // display (neighborhood_reduction (source_image, box, source_image.domain(), average));}

Example 4. Hough Transform

main.c

//// example1.c -- Averaging Multiple Images//// usage: example1 file-1.pgm [file2.pgm ... file-n.pgm]//// Read a sequence of images from files and average them.// Display the result.// Assumes input files are pgm images all having the same pointset// and containing unsigned char values.

//

#include "ia/IntDI.h" // integer valued images#include "ia/UcharDI.h" // unsigned character images#include "ia/IntNOps.h" // neighborhood operations

#include "hough.h" // Hough transform functions

int main(int argc, char **argv){ IA_Image<IA_Point<int>, int> source = read_int_PGM (argv[1]);

IA_Image<IA_Point<int>, int> result;

IA_Set<IA_Point<int> > accumulator_domain = source.domain();

// // Initialize parameters for the Hough Neighborhood // hough_initialize (source.domain(), accumulator_domain);

IA_Neighborhood<IA_Point<int>, IA_Point<int> > hough_nbh (accumulator_domain, hough_function);

IA_Image<IA_Point<int>, int> accumulator;

display (to_uchar (source * 255 / max(source)));

// // Map feature points to corresponding locations // in the accumulator array. // accumulator = sum (source, hough_nbh, accumulator_domain);

display (to_uchar(accumulator * 255 / max(accumulator)));

// // Threshold the accumulator //

accumulator = to_int(chi_eq (accumulator, max(accumulator)));

display (to_uchar(accumulator*255));

restrict (accumulator, IA_Set<int>(1));

// // Map back to see the corresponding lines in the //source domain. // result = sum (hough_nbh, accumulator, source.domain());

display (to_uchar(result * 255 / max(result)));}

hough.h

// hough.h

//// Copyright 1995, Center for Computer Vision and Visualization,// University of Florida. All rights reserved.

#ifndef _hough_h_#define _hough-h_

#include "ia/Nbh.h"#include "ia/IntPS.h"

voidhough_initialize (IA_Set<IA_Point<int> > image_domain, IA_Set<IA_Point<int> > accumulator_domain);

IA_Set<IA_Point<int> >hough_function (const IA_Point<int> &r_t);

// Accomplish the Hough Transform as follows://// Given a binary image 'Source' containing linear features//// Call hough-initialize with first argument Source.domain()// and with second argument being the accumulator domain// (an r by t set of points with r equal to the number of// accumulator cells for rho and t equal to the number of// accumulator sells for theta).//// Create an IA_Neighborhood as follows:// IA_Neighborhood<IA_Point<int>, IA_Point<int> >// HoughNbh (AccumulatorDomain, hough_function);//// Then calculate the Hough Transform of a (binary) image// as follows://// Accumulator = sum (Source, HoughNbh);

#endif

hough.c

// hough.c//// Copyright 1995, Center for Computer Vision and Visualization,// University of Florida. All rights reserved.

#include "hough.h"#include "math.h"#include <iostream.h>

static const double HoughPi = atan(1.0)*4.0;static double *HoughCos;static double *HoughSin;

static int RhoCells;static int ThetaCells;static IA_Point<int> IterationMin;static IA_Point<int> IterationMax;static IA_Point<int> ImageSize;static IA_Point<int> Delta;

//

// hough_initialize://// Initializes parameters for HoughFunction to allow a neighborhood// to be created for given image and accumulator image point sets.//

voidhough_initialize (IA_Set<IA_Point<int> > image_domain, IA_Set<IA_Point<int> > accumulator_domain){ // // Check to make sure the image domain and accumulator domain // are both 2 dimensional rectangular point sets. //

if (!image_domain.boxy() || !accumulator_domain.boxy() || image_domain.dim != 2 || accumulator_domain.dim != 2) {

cerr << "Hough transform needs 2-D rectangular domains." << endl; }

// // Record data necessary to carry out rho,theta to // source pointset transformation. //

ImageSize = image_domain.sup() - image_domain.inf() + 1; Delta = ImageSize / 2; IterationMin = image_domain.inf() - Delta; IterationMax = image_domain.sup() - Delta;

RhoCells = accumulator_domain.sup()(0) - accumulator_domain.info()(0) + 1; ThetaCells = accumulator_domain.sup()(1) - accumulator_domain.info()(1) + 1; // // Create sine and cosine lookup tables for specified // accumulator image domain. //

HoughSin = new double [ThetaCells]; HoughCos = new double [ThetaCells];

double t = HoughPi / ThetaCells; for (int i = accumulator_domain.inf()(1); i <= accumulator_domain.sup()(1); i++ ) { HoughSin[i] = sin (t * i); HoughCos[i] = cos (t * i); }}

//// hough_function//// This function is used to construct a Hough transform// neighborhood. It maps a single accumulator cell location// into a corresponding set of (x,y) coordinates in the

// source image.//

IA_Set<IA_Point<int> >hough_function (const IA_Point<int> &r_t){ double theta, rho;

// // Convert accumulator image pixel location to // correct (rho, theta) location. //

rho = double (r_t(0) - RhoCells/2) * enorm(imageSize)/RhoCells; theta = r_t(1)* HoughPi / ThetaCells;

IA_Point<int> *p_ptr, *pp; int coord, i; int num_points;

// // Construct vector of (x,y) points associated with (rho,theta) // We check theta to determine whether we should make // x a function of y or vice versa. //

if (theta > HoughPi/4.0 && theta < 3.0*HoughPi/4.0) { // // Scan across all Oth coordinate indices // assigning corresponding 1st coordinate indices. // num_points ImageSize(0); p_ptr = new IA_Point<int> [num_points];

for (coord = IterationMin(0), pp = p_ptr; coord <= IterationMax(0); coord++, pp++) {

*pp = Delta + IA_Point<int> (coord, nint (rho - (coord * HoughCos [r_t(1)] / HoughSin [r_t(1)]))); } } else { // // Scan across all 1st coordinate indices // assigning corresponding 0th coordinate indices. // num_points = ImageSize(1); p_ptr = new IA_Point<int> [num_points];

for (coord = IterationMin(1), pp p_ptr; coord <= IterationMax(1); coord++, pp++) {

*pp = Delta + IA_Point<int> (nint (rho - (coord * HoughSin [r_t(1)] / HoughCos [r-t(1)])),

coord); } }

// // Turn vector of points into a point set //

IA_Set<IA_Point<int> > result (2, p_ptr, num_points); delete [] p_ptr; return result;}

Figure A.3.1 Source image (left) and associated accumulator array image (right).

Figure A.3.2 Binarized accumulator image (left) and the line associated with the identified cell (right).

A.4. References

1 B. Stroustrup, The C++ Programming Language Second Edition. Reading, MA: Addison-Wesley,1991.

2 M. Ellis and B. Stroustrup, The Annotated C++ Reference Manual. Reading, MA:Addison-Wesley, 1990.

3 J. Poskanzer, “Pbmplus: Extended portable bitmap toolkit.” Internet distribution, Dec. 1991.

Table of Contents

Search Tips

Advanced Search


Search this book:

Table of Contents

INDEX

AAdditive maximum 27

Additive minimum 27

Averaging of multiple images 51, 347

Averaging, local 348

BBidirectional associative memory (BAM) 296

Binary image boundary finding 79

Binary image operations 16

Blob coloring 161

Bounded lattice ordered group 11

Butterworth highpass filter 76

CCellular array computer 1

Cellular automata

life 315

maze solving 316

Chain code 260

Characteristic function 6

Chessboard distance 6












City block distance 6

Closing 178, 183

Composition, functional 15, 337, 348

Concatenation of images 19

Connected component algorithms

binary image component labeling 161

counting components by shrinking 166

hole filling 170

pruning 169

sequential labeling 164

Cooley-Tukey radix-2 fast Fourier transform 195

Correlation 225

Cosine transform 201

Crack edge detection 99

Cross product 5

Cumulative histogram 280

DDaubechies wavelet transform 217

Diamond distance 6

Dilation 173, 183

Directional edge detection 95

Discrete differencing 81

Distance 6

Distance transforms 147

Divide-and-conquer boundary detection 116

Domain, of an image 19

Dot product 5

Dynamic programming, edge following as 119

EEdge detection and boundary finding binary image boundaries 79

crack edge detection 99

directional edge detection 95

discrete differencing 81

divide-and-conquer boundary detection 116

dynamic programming, edge following as 119

Frei-Chen edge and line detection 90

hierarchical edge detection 104

Hueckel edge operator 110

k-forms edge detector 106

Kirsch edge detector 93

local edge detection in 3-D images 102

Prewitt edge detector 85

product of the difference of averages 98

Roberts edge detector 84

Sobel edge detector 87

Wallis edge detector 89

Ellipse detection 246

Erosion 173, 183

Euclidean distance 6

Euclidean norm 6

Euler number 258

Exponential highpass filter 76

Extension 18

FFast Fourier transform 195

Floor 7

Fourier transform 189, 195

Fourier transform, centered 192

Frei-Chen edge and line detection 90

Frequency domain pattern matching 229

GGeneralized matrix product (p-product) 41

Global thresholding 125

HHaar wavelet transform 209

Hadamard product 5

Hamming net 301

Heterogeneous algebra 4, 10

Hierarchical edge detection 104

Highpass filtering 76

Histogram 278

cumulative 280

equalization 66

modification 67

Hit-and-miss transform 181

Hole filling 170, 265

Homogeneous algebra 10

Hopfield neural network 290

Hough transform 239, 246, 251, 349

Hueckel edge operator 110

Iiac++ 321

Image I/O 335

images 331

neighborhoods 340

pixels 331

point sets 325

points 322

sets 324

templates 343

value sets 329

Ideal highpass filter 76

Image 13, 331

Image algebra C++ library (iac++) 321

Image enhancement

averaging of multiple images 51

highpass filtering 76

histogram equalization 66

histogram modification 67

iterative conditional local averaging 54

local averaging 53

local contrast enhancement 65

lowpass filtering 68

max-min sharpening 55

median filter 60

smoothing binary images by association 56

unsharp masking 63

variable local averaging 53

Image features and descriptors

area and perimeter 257

chain code extraction and correlation 260

Euler number 258

histogram 278

histogram, cumulative 280

moment invariants 276

orientation 274

position 274

quadtree 271

region adjacency 265

region inclusion relation 268

spatial dependence matrix 281

symmetry 274

texture 281

Image operations 22

binary 16, 22, 333

concatenation 19

domain 19, 333

extension 18, 334

functional composition 15, 337, 348

induced 13

multi-valued 20

pseudo inverse 17

restriction 18, 334

unary 15, 23, 333

Image-neighborhood product 40, 340

Image-template product 25, 28

additive maximum 27

additive minimum 27

iac++ 343

left 26

linear 25

multiplicative maximum 28

multiplicative minimum 28

recursive 34

right 25

Inclusion relation on regions 268

Infimum (inf) 8

Iterative conditional local averaging 54

KK-forms edge detector 106

Kirsch edge detector 93

LLeft image-template product 26

Life, game of 315

Line detection 239

Linear image transforms

cosine transform 201

Daubechies wavelet transform 217

fast Fourier transform 195

Fourier transform 189

Fourier transform, centered 192

Haar wavelet transform 209

separability 196

Walsh transform 205

Linear image-template product 25

Local averaging 53, 348

Local contrast enhancement 65

Local edge detection in 3-D images 102

Lowpass filtering 68

M

Matched filter 225

Mathematical morphology 1

Max-min sharpening 55

Maze solving 316

Medial axis transform 145

Median filter 60

Mellin transform 237

Moment invariants 276

Moore neighborhood 7

Morphological transforms

Boolean dilations and erosions 173

gray value dilations, erosions, openings, and closings 183

hit-and-miss transform 181

opening and closing 178

rolling ball algorithm 185

salt and pepper noise removal 179

Morphology 1

Multilevel thresholding 128

Multiplicative maximum 28

Multiplicative minimum 28

NNeighborhood 7, 37, 340

Neural networks

bidirectional associative memory (BAM) 296

Hamming net 301

Hopfield neural network 290

multilayer perceptron (MLP) 308

single-layer perceptron (SLP) 305

Norm 6

OOpening 178, 183

Pp-Product 41

Parameterized template 25

Partially ordered set (poset) 33

Pattern matching and shape detection

correlation 225

frequency domain matching 229

Hough transform ellipse detection 246

Hough transform for generalized shape detection 251

Hough transform line detection 239

matched filter 225

phase-only matched filter 229

rotation and scale invariant pattern matching 237

rotation invariant pattern matching 234

symmetric phase-only matched filter 229

template matching 225

Pavlidis thinning algorithm 143

Perceptron

multilayer 308

single-layer 305

Phase-only matched filter 229

Pixel 13, 331

Point 4, 322

Point operations 5, 7, 322

Point set 4, 325

Point set operations 8, 325

Power set 6

Prewitt edge detector 85

Product of the difference of averages 98

Projection 6

Pruning connected components 169

Pyramid 104

QQuadtree 271

RRecursive template 33

Region

adjacency 265

area and perimeter 257

moment invariants 276

orientation 274

position 274

symmetry 274

Region inclusion relation 268

Restriction 18

Right image-template product 25

right xor maximum 28

right xor minimum 28

Roberts edge detector 84

Rolling ball algorithm 185

Rotation and scale invariant pattern matching 237

Rotation invariant pattern matching 234

SSalt and pepper noise removal 179

Scale and rotation invariant pattern matching 237

Semithresholding 126

Smoothing binary images by association 56

Sobel edge detector 87

Spatial gray level dependence matrix 281

Spatial transform 337

Spatial transforms 8, 348

Supremum (sup) 8, 9

Symmetric phase-only matched filter 29

TTemplate 23

iac++ 343

parameterized 25

recursive 33

Template matching 225

Template operations 29, 32

recursive 36

Template product (see Image-template product) 25

Texture descriptors 281

Thinning and skeletonizing

distance transforms 147

medial axis transform 145

Pavlidis thinning algorithm 143

thinning edge magnitude images 156

Zhang-Suen skeletonizing 151, 154

Thinning edge magnitude images 156

Threshold selection

maximizing between class variance 131

mean and standard deviation 129

simple image statistic 137

Thresholding techniques

global thresholding 125

multilevel thresholding 128

semi thresholding 126

variable thresholding 129

UUnsharp masking 63

VValue set 10, 329

Variable local averaging 53

Variable thresholding 129

von Neumann neighborhood 7

WWallis edge detector 89

Walsh transform 205

Wavelet transform 209, 217

ZZhang-Suen skeletonizing 151, 154

Table of Contents




















[17].handbook of computer vision algorithms in image algebra

Documents