real-time morphology processing using highly parallel 2-d cellular automata cam/sup 2

9
2018 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000 Real-Time Morphology Processing Using Highly Parallel 2-D Cellular Automata CAM Takeshi Ikenaga, Member, IEEE, and Takeshi Ogura, Member, IEEE Abstract—Mathematical morphology is a promising computer paradigm based on set theory and has many applications in image processing. Although some architectures have been pro- posed, there are as yet no compact, practical computers that can handle a variety of morphological operations with large, complex structuring elements at video rates. This has prevented the great potential of morphology from being fully realized. This paper describes a morphology processing method that uses a highly parallel two-dimensional (2-D) cellular automaton architecture called it CAM (Cellular AutoMata on Content Addressable Memory). New mapping methods achieve high-throughput com- plex morphology processing. Evaluation results show that CAM performs one morphological operation for basic structuring elements within 30 s. Furthermore, CAM can also handle an extremely large and complex structuring element of at video rates. CAM will increase the potential use of morphology and make a significant contribution to the development of various real-time image processing systems. Index Terms—Cellular automaton, content addressable memory, mathematical morphology, pattern spectrum, real-time image processing. I. INTRODUCTION M ATHEMATICAL morphology [1] is an image trans- formation technique that locally modifies geometric features through set operations. It is a powerful tool with various applications [2]–[6], such as nonlinear image filtering, noise sup- pression, smoothing and shape recognition; and it is becoming very common in image processing. There are three prerequisites for the fuller realization of the potential of morphology: • complex processing combining various morphological op- erations (including other operations, such as discrete-time cellular neural networks [7], linear filtering [8], and area calculation); • processing with large and complex structuring elements; • high-speed (real-time) processing. The achievement of these goals requires hardware with ex- tremely high performance and high-frequency memory accesses. That makes general-purpose sequential machines like personal computers (PC) and workstations (WS) totally unsuitable. To address these problems, some special-purpose architec- tures for morphology have been proposed [9]–[13]. Most of Manuscript received February 26, 1998; revised June 12, 2000. This work was supported by Y. Sakai, O. Karatsu, K. Takeya, and R. Kasai. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Sridhar Lakshmanan. The authors are with NTT Lifestyle and Environmental technology Labora- tories, Kanagawa 243-0198, Japan (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(00)10069-7. them employ a pipeline technique in which a raster-scan image is sequentially fed into a processing element (PE) array and the morphological operations are carried out in parallel in each PE. Since the functions of the PEs and the network structure are fully tuned to morphology, other operations crucial to practical image processing can not be performed. The fixed network structure also limits the size and shape of the structuring elements. Fur- thermore, there are at most several dozen PEs. This prevents the full use of the abundant parallelism (pixel order) of morphology, and, as a result, the processing speed of the pipeline type is not high enough for many real-time applications. Against this back- drop, it is pretty clear that none of these conventional architec- tures are suitable for building a morphology processing platform that satisfies the above three prerequisites. A two-dimensional (2-D) cellular array architecture [14]–[16], which consists of 2-D PEs and interconnection networks, is an- other candidate for the platform because it is the most natural architecture for morphology. The drawback of the conventional fully parallel approach is the huge amount of hardware involved. At most, only several dozen PEs can be embedded onto one VLSI chip. So, enormous numbers of VLSI chips are required to realize pixel-order parallelism, which is crucial for extracting the performance. Moreover, 2-D interconnection networks cause I/O bottlenecks, so it is difficult to increase the number of PEs even if state-of-the-art LSI technology is used. This paper describes a morphology processing method that uses CAM . CAM is a compact, high-performance, flexible, and highly parallel 2-D cellular automata (CA) [17]. CAM can attain pixel-order parallelism on a single PC board because it is composed of a content addressable memory (CAM), which makes it possible to embed great numbers of PEs, corresponding to CA cells, onto one VLSI chip. New mapping methods achieve high-throughput complex morphology processing. Evaluation results show that CAM performs one morphological operation for basic structuring elements within 30 s. This means that more than 1000 operations can be carried out on the whole pixel image at video rates (33 ms). CAM also handles an extremely large and complex structuring element at video rates. Furthermore, it performs practical image pro- cessing, such as pattern spectrum and multiple object tracking, through a combination of morphology and other algorithms. Section II presents the features of CAM . This is followed by a description of the morphology processing method including pattern spectrum processing in Sections III and IV. After dis- cussing the application development environment in Section V, performance evaluation results and some examples of image processing combining morphology and other algorithms are presented in Section VI. 1057–7149/00$10.00 © 2000 IEEE

Upload: t

Post on 09-Mar-2017

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

2018 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

Real-Time Morphology Processing Using HighlyParallel 2-D Cellular Automata CAM2

Takeshi Ikenaga, Member, IEEE,and Takeshi Ogura, Member, IEEE

Abstract—Mathematical morphology is a promising computerparadigm based on set theory and has many applications inimage processing. Although some architectures have been pro-posed, there are as yet no compact, practical computers that canhandle a variety of morphological operations with large, complexstructuring elements at video rates. This has prevented the greatpotential of morphology from being fully realized. This paperdescribes a morphology processing method that uses a highlyparallel two-dimensional (2-D) cellular automaton architecturecalled it CAM 2 (Cellular AutoMata on Content AddressableMemory). New mapping methods achieve high-throughput com-plex morphology processing. Evaluation results show that CAM2

performs one morphological operation for basic structuringelements within 30 s. Furthermore, CAM2 can also handle anextremely large and complex structuring element of100 100 atvideo rates. CAM2 will increase the potential use of morphologyand make a significant contribution to the development of variousreal-time image processing systems.

Index Terms—Cellular automaton, content addressablememory, mathematical morphology, pattern spectrum, real-timeimage processing.

I. INTRODUCTION

M ATHEMATICAL morphology [1] is an image trans-formation technique that locally modifies geometric

features through set operations. It is a powerful tool with variousapplications [2]–[6], suchasnonlinear image filtering,noisesup-pression, smoothing and shape recognition; and it is becomingvery common in image processing. There are three prerequisitesfor the fuller realization of the potential of morphology:

• complex processing combining various morphological op-erations (including other operations, such as discrete-timecellular neural networks [7], linear filtering [8], and areacalculation);

• processing with large and complex structuring elements;• high-speed (real-time) processing.

The achievement of these goals requires hardware with ex-tremelyhighperformanceandhigh-frequencymemoryaccesses.That makes general-purpose sequential machines like personalcomputers (PC) and workstations (WS) totally unsuitable.

To address these problems, some special-purpose architec-tures for morphology have been proposed [9]–[13]. Most of

Manuscript received February 26, 1998; revised June 12, 2000. This work wassupported by Y. Sakai, O. Karatsu, K. Takeya, and R. Kasai. The associate editorcoordinating the review of this manuscript and approving it for publication wasDr. Sridhar Lakshmanan.

The authors are with NTT Lifestyle and Environmental technology Labora-tories, Kanagawa 243-0198, Japan (e-mail: [email protected]).

Publisher Item Identifier S 1057-7149(00)10069-7.

them employ a pipeline technique in which a raster-scan imageis sequentially fed into a processing element (PE) array and themorphological operations are carried out in parallel in each PE.Since the functions of the PEs and the network structure are fullytuned to morphology, other operations crucial to practical imageprocessing can not be performed. The fixed network structurealso limits the size and shape of the structuring elements. Fur-thermore, there are at most several dozen PEs. This prevents thefull use of the abundant parallelism (pixel order) of morphology,and, as a result, the processing speed of the pipeline type is nothigh enough for many real-time applications. Against this back-drop, it is pretty clear that none of these conventional architec-tures are suitable for building a morphology processing platformthat satisfies the above three prerequisites.

Atwo-dimensional (2-D)cellulararrayarchitecture [14]–[16],which consists of 2-D PEs and interconnection networks, is an-other candidate for the platform because it is the most naturalarchitecture for morphology. The drawback of the conventionalfully parallel approach is the huge amount of hardware involved.At most, only several dozen PEs can be embedded onto oneVLSI chip. So, enormous numbers of VLSI chips are required torealize pixel-order parallelism, which is crucial for extracting theperformance.Moreover, 2-D interconnectionnetworkscause I/Obottlenecks, so it is difficult to increase the number of PEs even ifstate-of-the-art LSI technology is used.

This paper describes a morphology processing method thatuses CAM. CAM is a compact, high-performance, flexible,and highly parallel 2-D cellular automata (CA) [17]. CAMcanattain pixel-order parallelism on a single PC board because itis composed of a content addressable memory (CAM), whichmakes it possible to embed great numbers of PEs, correspondingto CA cells, onto one VLSI chip. New mapping methods achievehigh-throughput complex morphology processing. Evaluationresults show that CAMperforms one morphological operationfor basic structuring elements within 30s. This means thatmore than 1000 operations can be carried out on the whole

pixel image at video rates (33 ms). CAMalso handles anextremely large and complex structuring elementat video rates. Furthermore, it performs practical image pro-cessing, such as pattern spectrum and multiple object tracking,through a combination of morphology and other algorithms.

Section II presents the features of CAM. This is followed bya description of the morphology processing method includingpattern spectrum processing in Sections III and IV. After dis-cussing the application development environment in Section V,performance evaluation results and some examples of imageprocessing combining morphology and other algorithms arepresented in Section VI.

1057–7149/00$10.00 © 2000 IEEE

Page 2: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

IKENAGA AND OGURA: REAL-TIME MORPHOLOGY PROCESSING 2019

Fig. 1. Image processing system based on HiPIC.

II. FEATURES OFCAM

A. Key Technologies: CAM and HiPIC

CAM was established on our CAM LSI and CAM-basedsystem technologies. As a CAM-based system model, a Highly-parallel Integrated Circuits and System (HiPIC) was proposed[18], [19] for real-time image processing and various practicalreal-time image processing systems [20]–[24] have been de-veloped. Fig. 1 illustrates a typical image processing systembased on HiPIC. The configuration is very simple. It consistsof a video camera, a personal computer, and add-on boardsbased on HiPIC. Using HiPIC, an application-specific systemthat achieves high performance and flexibility can be easily re-alized. That is why we also employed it for it CAM.

Fig. 2 shows a block diagram of CAM. According to theHiPIC concept, CAM consists of a highly parallel PE array,a reconfigurable logic element, a RISC processor or DSP, andsome memory. The highly parallel PE array, a 2-D array of dedi-cated CAMs, executes SIMD (Single Instruction, Multiple Datastream) processing for high-volume image data. The logic ele-ment controls the PE array and interfaces with the image dataand an external processor. The processor performs serial dataprocessing. The memory stores images, microprograms, andtemporary data.

The main feature of CAMis a dedicated CAM for the highlyparallel PE array. A CAM performs various types of paralleldata processing for CA with words as the basic unit. Moreover,since its memory-based structure is the most suitable for im-plementation with LSI technology, several hundred thousandPEs, that is CA cells, can be realized on a single PC boardusing state-of-the-art deep-submicron CMOS technology. Fur-thermore, multiple zigzag mapping [17] enables 2-D CA cellsto be mapped into CAM words, even though physically a CAMhas a one-dimensional structure as shown in Fig. 2.

Another important feature is a control scheme that usesan FPGA, which is a reconfigurable logic element. Since anFPGA can easily generate various command sequences, CAMefficiently performs practical image processing using not onlybinary but also gray-scale morphology in combination withother algorithms, such as discrete-time cellular neural networks[25].

B. Basic Functions of CAM

CA processing using CAMis carried out by iterative op-erations of CA-value transfer and update. In the former, the

Fig. 2. Block diagram of CAM.

value of the original cell is transferred to its nearest neigh-boring cells. In the latter, the next value of the original cellis calculated by a particular transition rule. In order to carryout them, CAM has not only normal RAM operation, suchas word reads and writes using addresses (3.1), but also thefollowing three functions:

• maskable OR search (3.2);• partial and parallel write (3.3);• shift up/down mode of hit-flags (3.4).

For the search, the results are accumulated in hit-flag registersby means of OR logic. For the writes, the data are written intospecific bit positions of multiple words for which the value ofthe hit-flag register is 1. For the shift, the hit flags are shifted toupper or lower words. Although they are very simple, any typeof CA operations can be carried out in a bit-serial, word-parallelmanner through the iteration of these operations.

Since CAM has only simple functions (thus allowinghigh-density implementation of CAM), the processing powerof each CA cell (PE) is much lower than that of conventionalhighly parallel machines, which support a variety of multibitarithmetic functions. Because of this drawback, processing timebecomes longer as the complexity and bit length of operationsincrease. However, morphology requires only simple operationslike logical OR and maximum, not complex ones like multi-plication, which is commonly used in image processing filters.Furthermore, the dynamic range of morphological operationsis fixed; for example, 1 bit and 8 bits for SP (set processing)and FSP (function and set processing), respectively. Therefore,the drawback mentioned above is not a serious obstacle tomorphological processing. On the contrary, the simplicity is anadvantage because it enables an enormous number of PEs to bebuilt on a single CAM chip, which allows the parallelism ofmorphology to be more fully exploited.

III. M ORPHOLOGYPROCESSINGUSING CAM

A. Definition of Morphology

Morphology falls into three categories [2]: set processing(SP), function and set processing (FSP), and function pro-cessing (FP). Each has four basic operations: dilation, erosion,

Page 3: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

2020 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

closing, and opening. Dilation () and erosion ( ) are definedby

SP

(1)

(2)

FSP

(3)

(4)

FP

(5)

(6)

where is the original image and is the structuring element(SE).

As shown in the equations, dilation in SP employs theMinkowski addition of the original image and the structuringelement. For erosion, addition is replaced by subtraction.FSP and FP are used for gray-scale image processing, whichemploys maximum and minimum operations. Closing ()and opening ( ) are combinations of dilation and erosion.

B. Morphology Mapping to CAM

CAM is a 2-D cellular automaton defined as follows:

• set of 2-D cells (PEs) each with its own value;• all the cells update their value simultaneously in discrete

steps by a transition rule using the values of the originaland the nearest neighbors.

For efficient execution of morphological equations by CAM,the following mapping scheme was devised:

• map each pixel of the original image to a CA cell (PE) ofCAM ;

• next value of the CA cells (the result of morphologicaloperations) is determined by set operations (logicalOR/AND, maximum, minimum, etc) for the values ofthe original and its neighboring cells. The cell location isdefined by the structuring element.

If this mapping is adopted, morphology can be considered tobe CA, in which neighbors are determined by the structuringelement.

An example of dilation in FSP, in which the original imageis gray scale and the structuring element is binary, is shown inFig. 3. The set operation is maximum. For cell (7, 7), the valuein the pixel below is the maximum of the values of the originaland neighboring pixels. So, this value is selected as the dilationresult. And for cell (7, 3), all the values are 0. So, the dilationresult is also 0. Any type of morphological processing can bedone in the same way.

C. Morphology Processing Method

In this section, morphology processing using CAMis ex-plained in detail. As an example, dilation in FSP with a rhombicstructuring element is used.

Fig. 4 shows the CAM word configuration for it. Each CAMword consists of an original image field, a dilation image field,

Fig. 3. Example of dilation (FSP).

Fig. 4. CAM word configuration for dilation (rhombus).

Fig. 5. CAM word configuration forP bit greater than operation.

neighboring cell fields, and a temporary field. The original cellfield ( ) and the dilation image field ( ) store the value of theoriginal image and dilation image, respectively. The neighborcell fields ( , , and ) store values in the right (R),left (L), up (U), and down (D) cells. The temporary field is usedfor storing carry, flag, and so on.

The dilation is executed in the following sequence:

1) Load all pixel data of the original gray-scale image intothe field of the corresponding CA cells of CAM.

2) Transfer the data of to the field of the same cell.3) Transfer the data of of four neighboring cells (right,

left, upper and lower cells) to the corresponding neigh-boring cell fields.

4) Find out the maximum value among the data of and, and store it into the field.

5) Read out the dilation result from the field.The image data loading and retrieval processing in steps 1 and 5can be done by the normal RAM operations (3.1). Steps 2 and 3are also effectively performed by the combination of intra-CAMand inter-CAM transfer [17]. Step 4 is executed by the iterationof “greater than” operations. Next, the processing method is ex-plained in more detail.

Fig. 5 shows the CAM word configuration for the “P-bitgreater than” operation, where the P bits of theand fieldsare compared, and the one that is larger is stored in thefield.

Page 4: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

IKENAGA AND OGURA: REAL-TIME MORPHOLOGY PROCESSING 2021

The “greater than” operation is executed in the following se-quence.

1) Set search and write mask. \item Maskable search for cellswhose are .

2) Maskable search for cells whose are.

3) Maskable OR search for cells whoseare .

4) Parallel writing of to of the hitcells.

5) Maskable search for cells whose are.

6) Parallel writing of to of the hitcells.

7) Maskable search for cells whose are.

8) Parallel writing of to of the hitcells.

9) Repeat 1-8 from MSB ( ) to LSB ( )

In the sequence, and are used for flags and are stored inthe temporary field. The initial values of and are both 1.The condition indicates that “ ” isdetermined and the condition indicates theopposite.

As this example shows, the operation is carried out through theiteration of the maskable search (3.2) and the parallel write (3.3).For example, in steps 2 and 3, CAM words for which the value of

is greater than that of , are detected, and “1” is written intothe field of the words in step 4. Fig. 6 shows examples of themaskable search in step 7 and the parallel write in step 8.

In the operation, the processing time is proportional to the bitlength. It is nine cycles per bit for the greater-than operation.However, since all the words are processed in parallel, the op-erations can be finished in an extremely short period of time.

D. Processing Method for Large and Complex SEs

The size and shape of the structuring element are importantfactors in increasing the potential use of morphology. For theprocessing, the CA value must be transferred a long distance andto optional-position CA cells. To do this efficiently, we devisedthe following method.

The CAM word configuration is shown in Fig. 7. A CAMword consists of the original image field (), processed imagefield ( ), and shift image fields ( , , ). Theseparated cell value is transferred efficiently and processed asfollows.

1) Transfer the data of of horizontal cells to the fieldusing intra-CAM transfer.

2) Execute the set operation to and if the corre-spondent structuring element is defined.

3) Repeat steps 1 and 2 until the horizontally defined struc-turing element runs out.

4) Transfer the data of of vertical cells to (after thator are used alternately) using inter-CAM

transfer. This step is carried out at the same time as step 1.5) Repeat steps 1 to 4 using or instead of C until

the vertically defined structuring element runs out.

Fig. 6. Examples of maskable search and parallel write.

Fig. 7. CAM word configuration for large and complex SEs.

In the sequence, any shape of structuring element can be copedwith by determining whether the set operation is executed or notaccording to the structuring element.

IV. PATTERN SPECTRUMPROCESSING

The pattern spectum processing [26] has been proposed as amorphology-based algorithm. It is very useful for getting infor-mation on the global features of target objects, and some appli-cations, such as a gender recognition [27], have been devised.

Fig. 8 shows an example of pattern spectrum processing,where and show scales and structuring elements, respec-tively. shows the area of image. As shown in Fig. 8, to

Page 5: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

2022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

Fig. 8. Examples of pattern spectrum processing.

obtain a pattern spectrum, operations other than morphologicalones are required. They are summarized as follows:

• pixel-by-pixel subtraction ( );• area calculation (number of black pixels in (

) image);• null set ( ) assessment of .

The following mapping schemes provide an efficient way todo these.

A. Pixel-by-Pixel Subtraction

The first scheme is for pixel-by-pixel subtraction. Op-tional-bit-width subtraction can be performed at the rate ofabout ten cycles per bit by combining the maskable search(3.2) and parallel write (3.3), just as in the “greater than”operations described in Section III-C. However, making useof the relationship of opening, , shortens theprocessing time still more. This can be done in a sequence thattakes only two cycles:

1) set search mask;2) maskable search for cells whose is 1 and

is 0.A positive result for a particular cell is stored in its hit-flag reg-ister. Thus, the processing time can be shortened significantlyby exploiting the features of target algorithms, even though theperformance of each cell (PE) of CAMis not very high, asmentioned before.

B. Area Calculation

To calculate an area, the pixel values stored in all the cellsmust be summed up. Generally speaking, however, highly par-

Fig. 9. CAM structure for area calculation.

allel machines based on local operations, including cellular au-tomata, cannot handle such global operation efficiently. Indeed,normal CAM has only one global network, which is the data I/O.So, to perform the calculation, data in each cell (word) must beretrieved through the I/O one by one and summed up using anexternal processor or some special circuits.

To address these problem, CAMhas both counters in eachCAM block (to count the number of hit flags) and horizontaland vertical inter-CAM connection networks (for data transferbetween adjacent CAM blocks), as shown in Fig. 9. These func-tions can be implemented just by changing the peripheral cir-cuit of CAM, i.e., changing the memory cell part is unnecessary.Moreover, the counter can be shared with the pipeline registerfor the transfer. So, they can be implemented without degradingthe high density of CAM.

The area calculation is done as follows.

1) Shift the hit-flag registers of CA cells in which the pixelvalues are stored by means of pixel-by-pixel subtractionand count the number of hit flags using the counters. (Thisyields the area of each CAM block.)

2) Sum up the areas of all the CAM blocks through the iter-ation of the inter-block transfer using the connection net-works and addition using the maskable searches (3.2) andthe parallel writes (3.3). (This operation moves throughthe area data in a tree-like fashion and stores the area ofthe whole image in a particular cell.)

Step 1 of this sequence is carried out in parallel in each block.Moreover, the number of iterations in step 2 varies logarithmi-cally with the number of blocks. So, an area calculation onlytakes a short time.

C. Null Set Assessment

The final mapping technique is for null set assessment. Thescale when becomes a null set changes according tothe geometric features of the input image. So, null set assess-ment is required in order to eliminate redundant transitions.In the assessment, we must examine whether valuesstored in all the cells become “0” or not. To do this efficiently, weoutput a hit flag (HFO) that is the logical OR of all the hit-flagregisters as shown in Fig. 9.

The assessment using the HFO is executed as follows:

1) set search mask;2) maskable search for cells whose is 1;

Page 6: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

IKENAGA AND OGURA: REAL-TIME MORPHOLOGY PROCESSING 2023

TABLE IPROCESSINGTIME FOR BASIC SEs (�s)

3) null set assessment (If “HFO ,” then end the pro-cessing. Otherwise repeat the processing for a new scale.).

When all values become 0, the result of the maskablesearch for all the cells becomes unhit. So, HFO becomes 0. Anull set is assessed by examining the value of HFO. Since CAMperforms these sequences in only several cycles, the null setassessment is also finished in an extremely short period of time.

V. APPLICATION DEVELOPMENT ENVIRONMENT

To execute various types of processing including morphologyusing CAM , control logic for generating the various commandsequences, described in Sections III-C, III-D and IV, must bemapped into the FPGA on CAM. For efficient mapping, a pro-gramming language (CAMPL) and an application develop-ment environment (CAM ADE) for CAM have been devel-oped.

CAM PL includes various arithmetic and logical operations,such as addition and logical OR, and various associative op-erations, such as maskable search and parallel write. Further-more, to describe various morphological operations easily, thefollowing dedicated operations are added:

• (dilation type se);• (erosiontype se);

where the morphology category (SP, FSP, and FP) and shapeof the structuring element (rhombus, square, etc) are given in“ type” and “se,” respectively. An example of CAMPL is pre-sented in Section VI-B3.

CAM ADE consists of a compiler and a simulator. The com-piler compiles CAM PL, and generates a microprogram for aCAM board. The simulator reports simulation results, like pro-cessing speed, for various input images and test data. They areused for debugging and evaluation. CAMADE should signif-icantly speed up system development.

VI. EVALUATION

A. Processing Performance

Morphology processing performance is evaluated in this sec-tion. We have already finished the design of CAMand havedescribed it in Verilog HDL [28]. This data comes from the Ver-ilog functional simulator. In the evaluation, the system clock ofCAM was assumed to be 40 MHz. Image size waspixels.

Table I shows the processing performance for one dilationof basic SEs shown in Fig. 10. Erosion is performed in almostthe same time. Through a combination of these structuring el-ements, morphological operations with various-sized regular-shaped structuring elements, such as square and circle,can be performed. As shown in the table, CAMperforms one

Fig. 10. Basic structuring elements.

Fig. 11. Processing time for large SEs.

dilation within 30 s. This means that more than 1000 dilationscan be carried out on the whole pixel image at videorates (33 ms).

Fig. 11 shows the processing performance for one dilation oflarge SEs. In the evaluation, regular square structuring elementswith size L were used. The figure shows that the executiontime for one dilation is almost proportional to the size of thestructuring element. Therefore, an extremely large structuringelement with a size of about can be handledat video rates.

In view of these simulation results, we think CAMsatisfiesthe three prerequisites mentioned in Section I, and, therefore,has great potential for morphology processing.

B. Image Processing

Some examples of image processing using CAMare shownin this section. These data were calculated by CAMADEbased on the Verilog functional simulator. Although theseexamples make it necessary to perform iterative morphologicaloperations, the Verilog simulator takes an extremely long timeto run. So, a CAM with CA cells and aimage were used here. Since CAMhas scalability, aimage can be processed in almost the same time (except for dataloading and retrieval processing) if a CAMwithCA cells is used.

1) Data Loading and Retrieval Processing:The completionof image processing requires not only morphological processingbut also data loading and retrieval processing. For loading, allthe pixel data of the input image are loaded into the corre-sponding CA cells of CAM. For retrieval, the processed dataare retrieved from the result fields of all CA cells.

Using parallel loading and partial word retrieval techniques[17], CAM can also handle such processing effectively. It takesabout 0.1 ms for both the data loading and retrieval of a

image. The processing time needed for data loading and re-trieval processing lengthens with image size. However, since it

Page 7: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

2024 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

Fig. 12. Pattern spectrum for images with and without crack.

TABLE IIPATTERN SPECTRUMPROCESSINGTIME

only takes 1.6 ms even for a relatively large image ( ),more than 30 ms is available in which to perform morphologyor other CA algorithms for real-time, or video rate (33 ms), ap-plications.

2) Pattern Spectrum:Fig. 12 shows examples of patternspectrum processing (SE: circle) for two different images,in which one object has a crack and the other one does not.As shown in Fig. 12, since the size and shape of the objectsin image 1 are uniform, the spectrum concentrates on scales 5and 6. In contrast, since the object in image 2 has cracks, thespectrum is scattered. Since the features of the spectra are quitedifferent, it is easy to distinguish them.

Table II shows the processing time for the images. As shownin the table, the processing time for opening increases with thescale because the size of the structuring element becomes largeras the scale increases. In contrast, the processing time for thearea calculation is constant, and is about that of the con-ventional method using the data I/O and the external processor.It takes only 0.6 s per scale for the rest of the processing, suchas pixel-by-pixel subtaction.

It takes about 1 ms for the whole pattern spectrum processing.When data loading time is included, the total time becomes1.1 ms and 1.8 ms for and images, respec-

Fig. 13. Example of image processing (morphology).

Fig. 14. Example of CAM PL.

tively. Thus, a pattern spectrum can also be obtained at videorates.

3) Multiple Object Tracking:Another example of imageprocessing using CAM is shown in Fig. 13. By applyingvarious morphological operations to perform line erasure, edgedetection, hole filling and noise reduction, a binary image oftarget objects can be obtained. The processing requires 15morphological operations with various structuring elements.CAM can do it in just 200 s. As shown in Section VI-B1,the data loading and retrieval times are 0.1 ms and 1.6 ms for

and images, respectively. The processingcan be finished at video rates.

Fig. 14 shows an example of CAMPL [25] for the pro-cessing in Fig. 13. Here, “copy_8” means the intra-word copyof 8 bits and “sub_data8” means pixel-by-pixel subtraction of 8bits. “Dilation” and “erosion” are the dedicated operations fordescribing morphological operations with various structuringelements, as mentioned before. Using these operations, the pro-cessing is described in only 20 operations.

As discussed above, CAMefficiently performs not onlymorphology, but also other CA-based algorithms. Using these

Page 8: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

IKENAGA AND OGURA: REAL-TIME MORPHOLOGY PROCESSING 2025

Fig. 15. Example of image processing (other CA).

Fig. 16. CAM board with256� 256 CA cells.

algorithms, the center points of target objects and a distancemap for them can be obtained as shown in Fig. 15. By applyingthe processings in Figs. 13 and 15 to the input image and byfinding the center points nearest those in the previous frame,multiple object tracking can be performed.

These examples demonstrate that CAMis flexible enough toperform practical image processing employing a combination ofmorphology and other algorithms.

VII. CONCLUSION

This paper has described a morphology processing methodbased on a highly-parallel 2-D cellular automata called CAMand has presented some evaluation results. New mappingmethods using maskable search, partial & parallel write andhit-flag shift achieve high-throughput complex morphologyprocessing. Evaluation results show that CAMperforms onemorphological operation for basic structuring elements within30 s. This means that more than 1000 operations can becarried out on an entire pixel image at video rates (33ms). CAM can also handle an extremely large and complex

structuring element at video rates. Furthermore,CAM can perform practical image processing, such as patternspectrum and multiple object tracking, through a combinationof morphology and other algorithms. Thus, CAMwill enablefuller realization of the potential of morphology and make asignificant contribution to the development of real-time imageprocessing systems based on morphology and other algorithms.

APPENDIX

We have completed our development of a dedicated 1-MbCAM LSI for CAM [29], [30]. We fabricated a chip capableof operating at 56 MHz and 2.5 V using 0.25-m full-customCMOS technology with five aluminum layers. Since it has 16kwords, or CA cells, a single chip can process pixels inparallel. We have also developed a prototype CAMboard using

the LSIs. Fig. 16 is a photograph of the board. The board con-sists of a highly parallel PE array, FPGAs, and some memory.The PE array is a 2-D array of 1-Mb CAM LSIs, andcan handle a image. (Since the CAM LSI is scal-able, a larger image can be processed by increasing the numberof chips.) The FPGA generates various command sequences toperform practical image processing based on morphology andother CA-based algorithms. In addition, PCI bus and NTSCvideo interfaces are also embedded in this board. So, a compactimage-processing platform can be built simply by connectingthe board to a personal computer and a video camera.

This prototype board demonstrates that an economically fea-sible morphology platform can actually be obtained. Using thisCAM board, we plan to develop various real-time image pro-cessing applications based on morphology and other algorithms.

ACKNOWLEDGMENT

The authors would like to thank Y. Takahashi, Y. Fujino, T.Tsuchiya, T. Nakanishi, M. Nakanishi, and E. Hosoya for theirmany valuable suggestions and constructive discussions.

REFERENCES

[1] J. Serra,Image Analysis and Mathematical Morphology. New York:Academic, 1982.

[2] P. Maragos, “Tutorial on advances in morphological image processingand analysis,”Opt. Eng., vol. 26, 1987.

[3] R. M. Haralick, S. R. Sternberg, and X. Zhuang, “Image analysis usingmathematical morphology,”IEEE Trans. Pattern Anal. Machine Intell.,vol. 9, no. 4, pp. 532–550, 1987.

[4] L. Vincent, “Graphs and mathematical morphology,”Signal Process.,vol. 16, no. 4, pp. 365–388, 1989.

[5] S. Yamamoto, M. Matsumoto, Y. Tateno, T. Iinuma, and T. Matsumoto,“Quoit filter—A new filter based on mathematical morphology to ex-tract the isolated shadow, and its application to automatic detection oflung cancer in X-ray CT,” inProc. 13th Int. Conf. Pattern Recognition(ICPR’96), vol. 2, 1996, pp. 3–7.

[6] Y. Takahashi, A. Shio, and K. Ishii, “Morphology based thresholding forcharacter extraction,”IEICE Trans. Inform. Syst., vol. E76-D, no. 10, pp.1208–1215, 1997.

[7] H. Harrer, “Multiple layer discrete-time cellular neural networks usingtime-variant templates,”IEEE Trans. Circuits Syst. II, vol. 40, pp.191–199, Mar. 1993.

[8] E. R. Dougherty and P. A. Laplante,Real-Time Imaging. New York:IEEE Press, 1995.

[9] M. Hassoun, T. Meyer, P. Siqueira, J. Basart, and S. Gopalratnam, “AVLSI gray-scale morphology processor for real-time NDE image pro-cessing applications,”SPIE Image Algebra Morphological Image Pro-cessing, 1990.

[10] R. Lin and E. K. Wong, “Logic gate implementation for gray-scale mor-phology,”Pattern Recognit. Lett., vol. 13, no. 7, 1992.

[11] C. H. Chen and D. L. Yang, “Realization of morphological operations,”IEE Proc. Circuits Devices Systems, vol. 142, 1995.

[12] L. Lucke and C. Chakrabarti, “A digital-serial architecture for gray-scale morphological filtering,”IEEE Trans. Image Processing, vol. 4,pp. 387–391, Mar. 1995.

[13] E. R. Dougherty and D. Sinha, “Computational gray-scale mathemat-ical morphology on lattices (a comparator-based image algebra)—PartI: Architecture,”Real-Time Imag., vol. 1, pp. 69–85, 1995.

[14] T. Kondoet al., “Pseudo MIMD array processor-AAP2,” inProc. 13thSymp. Computer Architecture Conf., 1986, pp. 330–337.

[15] Thinking Machines Corp, Connection machine model, CM-2 tech. sum-mary, Ver. 5.1, 1989.

[16] J. R. Nickolls, “The design of the MasPar MP-1: A cost-effective mas-sively parallel computer,” inProc. COMPCON Spring’90, 1990, pp.25–28.

[17] T. Ikenaga and T. Ogura, “CAM: A highly-parallel 2D cellular au-tomata architecture,”IEEE Trans. Comput., vol. 47, pp. 788–801, July1998.

Page 9: Real-time morphology processing using highly parallel 2-D cellular automata CAM/sup 2

2026 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

[18] T. Ogura, M. Nakanishi, T. Baba, Y. Nakabayashi, and R. Kasai,“A 336-kbit content addressable memory for highly parallel imageprocessing,” inProc. Custom Integrated Circuits Conf. (CICC’96),1996, pp. 273–276.

[19] T. Ogura and M. Nakanishi, “CAM-based highly-parallel imageprocessing hardware,”IEICE Trans. Electron., vol. E80-C, no. 7, pp.868–874, 1997.

[20] Y. Fujino, T. Ogura, and T. Tsuchiya, “Facial image tracking systemarchitecture utilizing real-time labeling,” inProc. SPIE VCIP’93, 1993.

[21] M. Nakanishi and T. Ogura, “A real-time CAM-based Hough trans-form algorithm and its performance evaluation,”13th Int. Conf. PatternRecognition (ICPR’96), vol. 2, pp. 516–521, 1996.

[22] M. Nakanishi and T. Ogura, “Real-time extraction using a highly par-allel Hough transform board,”Proc. IEEE Int. Conf. Image Processing(ICIP’97), vol. 2, pp. 582–585, 1997.

[23] M. Meribout, M. Nakanishi, and T. Ogura, “Hough transform imple-mentation on a reconfigurable highly parallel architecture,” inProc.Computer Architectures Machine Perception (CAMP’97), 1997, pp.276–279.

[24] E. Hosoya, T. Ogura, and M. Nakanishi, “Real-time 3D featureextraction hardware algorithm with feature point matching capability,”in Proc. IAPR Workshop Machine Vision Applications (MVA’96), 1996,pp. 430–433.

[25] T. Ikenaga and T. Ogura, “A DTCNN universal machine based onhighly-parallel 2-D cellular automata CAM,” IEEE Trans. CircuitsSyst. I, vol. 45, pp. 538–546, May 1998.

[26] P. Maragos, “Pattern spectrum and multiscale shape representation,”IEEE Trans. Pattern. Anal. Machine Intell., vol. 11, pp. 701–716, July1989.

[27] K. Sudo, J. Yamato, and A. Tomono, “Determining gender using mor-phological pattern spectrum” (in Japanese),IEICE Trans. Inform. Syst.,vol. J80-D-II, no. 5, pp. 1037–1045, 1997.

[28] E. Sternheimet al., Digital Design with Verilog HDL. New York: Au-tomata, 1990.

[29] T. Ikenaga and T. Ogura, “A fully-parallel 1 Mb CAM LSI for real-timepixel-parallel image processing,” inIEEE Int. Solid-State Circuits Conf.(ISSCC99) Dig. Tech. Papers, 1999.

[30] T. Ikenaga and T. Ogura, “A fully parallel 1-Mb CAM LSI for real-timepixel-parallel image processing,”IEEE J. Solid State Circuits, vol. 35,pp. 536–544, Apr. 2000.

Takeshi Ikenaga(M’95) received the B.E. and M.E.degrees in electrical engineering from Waseda Uni-versity, Tokyo, Japan, in 1988 and 1990, respectively.

He joined LSI Laboratories, Nippon Telegraphand Telephone Corporation (NTT), in 1990, wherehe has been working on the research of the designand test methodologies for high-performance ASICs.He is presently a Senior Research Engineer withthe Parallel Processing Systems Research Group,NTT Lifestyle and Environmental TechnologyLaboratories, Kanagawa, Japan. His current interests

are highly parallel system design and its applications to computer vision. In1999–2000, he was a Visiting Researcher with the University of Massachusetts,Amherst.

Mr. Ikenaga is a member of the Institute of Electronics, Information, andCommunication Engineers (IEICE) of Japan, and the Information ProcessingSociety of Japan (IPSJ). He received the IEICE Research Encouragement Awardin 1992 for his paper “A test pattern generation for arithmetic execution units.”

Takeshi Ogura (M’86) received the B.S., M.S., andPh.D. degrees in electrical engineering from OsakaUniversity, Osaka, Japan, in 1976, 1978, and 1991,respectively.

In 1978, he joined Musashino Electrical Commu-nication Laboratories, Nippon Telegraph and Tele-phone Public Corporation (NTT), Tokyo, Japan. Heis currently an Executive Manager with the Multi-media Electronics Laboratory of NTT Lifestyle andEnvironmental Technology Laboratories, Kanagawa,Japan. He is engaged in the research and development

of CAM LSIs and their applications and is also engaged in the development ofimage processing and encoding LSIs.

Dr. Ogura is a member of the Institute of Electronics, Information, and Com-munication Engineers (IEICE) of Japan and the Information Processing Societyof Japan (IPSJ).