scalable multiresolution color image segmentation

18
Signal Processing 86 (2006) 1670–1687 Scalable multiresolution color image segmentation Fardin Akhlaghian Tab a, , Golshah Naghdy a , Alfred Mertins b a School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia b Signal Processing Group, Institute of Physics, University of Oldenburg, Germany Received 16 May 2005; received in revised form 20 September 2005 Available online 25 October 2005 Abstract This paper presents a novel multiresolution image segmentation method based on the discrete wavelet transform and Markov Random Field (MRF) modeling. A major contribution of this work is to add spatial scalability to the segmentation algorithm producing the same segmentation pattern at different resolutions. This property makes it suitable for scalable object-based wavelet coding. To optimize segmentation at all resolutions of the wavelet pyramid, with scalability constraint, a multiresolution analysis is incorporated into the objective function of the MRF segmentation algorithm. Examining the corresponding pixels at different resolutions simultaneously enables the algorithm to directly segment the images in the YUV or similar color spaces where luminance is in full resolution and chrominance components are at half resolution. Allowing for smoothness terms in the objective function at different resolutions improves border smoothness and creates visually more pleasing objects/regions, particularly at lower resolutions where down-sampling distortions are more visible. In addition to spatial scalability, the proposed algorithm outperforms the standard single and multiresolution segmentation algorithms, in both objective and subjective tests, yielding an effective segmentation that particularly supports scalable object-based wavelet coding. r 2005 Elsevier B.V. All rights reserved. Keywords: Multiresolution image segmentation; Markov random field; Scalability; Color image; Smoothness 1. Introduction Image segmentation is the process of dividing an image into homogenous regions, which is an essential step toward higher level image processing such as image analysis, pattern recognition and computer vision. In particular, effective segmenta- tion is crucial for the emerging object-based image/ video standards such as object-based coding stan- dards defined by MPEG-4 [1] and content-based shape descriptor used in MPEG-7 [2]. In scalable object-based coding, a single code- stream can be sent to different users with different processing capabilities and network bandwidths by selectively transmitting and decoding the related parts of the codestream [3]. Some of the desirable scalable functionalities are signal to noise ratio (SNR) scalability, spatial scalability and temporal scalability [3]. A scalable bitstream includes em- bedded parts that offer increasingly better SNR, greater spatial resolution or higher frame rates. Therefore considering the spatial scalability, it is ARTICLE IN PRESS www.elsevier.com/locate/sigpro 0165-1684/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2005.09.016 Corresponding author. Tel.: +98 2 4221 4398; fax: +98 2 4221 3244. E-mail addresses: [email protected] (F.A. Tab), [email protected] (G. Naghdy), [email protected] (A. Mertins).

Upload: uow

Post on 21-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

ARTICLE IN PRESS

0165-1684/$ - se

doi:10.1016/j.sig

�Correspondfax: +982 4221

E-mail addr

[email protected]

alfred.mertins@

Signal Processing 86 (2006) 1670–1687

www.elsevier.com/locate/sigpro

Scalable multiresolution color image segmentation

Fardin Akhlaghian Taba,�, Golshah Naghdya, Alfred Mertinsb

aSchool of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, AustraliabSignal Processing Group, Institute of Physics, University of Oldenburg, Germany

Received 16 May 2005; received in revised form 20 September 2005

Available online 25 October 2005

Abstract

This paper presents a novel multiresolution image segmentation method based on the discrete wavelet transform and

Markov Random Field (MRF) modeling. A major contribution of this work is to add spatial scalability to the

segmentation algorithm producing the same segmentation pattern at different resolutions. This property makes it suitable

for scalable object-based wavelet coding. To optimize segmentation at all resolutions of the wavelet pyramid, with

scalability constraint, a multiresolution analysis is incorporated into the objective function of the MRF segmentation

algorithm. Examining the corresponding pixels at different resolutions simultaneously enables the algorithm to directly

segment the images in the YUV or similar color spaces where luminance is in full resolution and chrominance components

are at half resolution. Allowing for smoothness terms in the objective function at different resolutions improves border

smoothness and creates visually more pleasing objects/regions, particularly at lower resolutions where down-sampling

distortions are more visible. In addition to spatial scalability, the proposed algorithm outperforms the standard single and

multiresolution segmentation algorithms, in both objective and subjective tests, yielding an effective segmentation that

particularly supports scalable object-based wavelet coding.

r 2005 Elsevier B.V. All rights reserved.

Keywords: Multiresolution image segmentation; Markov random field; Scalability; Color image; Smoothness

1. Introduction

Image segmentation is the process of dividing animage into homogenous regions, which is anessential step toward higher level image processingsuch as image analysis, pattern recognition andcomputer vision. In particular, effective segmenta-tion is crucial for the emerging object-based image/

e front matter r 2005 Elsevier B.V. All rights reserved

pro.2005.09.016

ing author. Tel.: +98 2 4221 4398;

3244.

esses: [email protected] (F.A. Tab),

du.au (G. Naghdy),

uni-oldenburg.de (A. Mertins).

video standards such as object-based coding stan-dards defined by MPEG-4 [1] and content-basedshape descriptor used in MPEG-7 [2].

In scalable object-based coding, a single code-stream can be sent to different users with differentprocessing capabilities and network bandwidths byselectively transmitting and decoding the relatedparts of the codestream [3]. Some of the desirablescalable functionalities are signal to noise ratio(SNR) scalability, spatial scalability and temporalscalability [3]. A scalable bitstream includes em-bedded parts that offer increasingly better SNR,greater spatial resolution or higher frame rates.Therefore considering the spatial scalability, it is

.

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1671

necessary to extract and present objects’ shape atdifferent resolutions for the scalable object-basedencoder/decoder systems. For an effective scalableobject-based coding algorithm, it is essential thatthe shapes of the extracted objects at differentresolutions be similar or equivalently, the pattern ofsegmented regions should be similar at differentresolutions. We call the segmentation algorithmwith the similar patterns at different resolutionsscalable segmentation.

Multiresolution image segmentation algorithmsanalyze the image at different resolutions resultingin some advantageous over the single-resolutionsegmentation such as

less computational complexity, � improvement in the convergence rate, � reduction in over-segmentation cases, � less sensitivity to noise, � ability to capture the image structures at different

resolutions,

� less dependence on initial segmentation.

These algorithms consider the inter-scales imagedata correlation in the segmentation procedure. Inthe most straightforward case, these algorithmsconsider inter-scale correlation by projecting thelower resolution segmentation result to the nexthigher resolution as an initial segmentation estima-tion. The segmentation is further refined at thecurrent higher resolution by a single resolutionsegmentation. This procedure continues progres-sively until highest resolution is segmented [4–7].One of the challenges arising from this approach isthat higher resolution segmentations are morerigorous than the lower resolution segmentationsand segmentation maps at higher and lowerresolutions are not quite identical. For example,some objects are not detected or are partly detectedat low resolutions while they are perfectly detectedat higher resolutions. This makes the higherresolution segmentation more reliable for objectextraction applications. Therefore the semanticsegmentation and object extraction algorithmsextract objects from the highest resolution segmen-tation.

In [4], Pappas presents an adaptive clusteringalgorithm based on maximum a posteriori (MAP)estimation which allows for slow space-varyingmean statics for each region. Local intensityvariations are considered in an iterative procedurewhich estimates the Markov Random Field para-

meters over a sliding window whose size decreasesas the algorithm progresses. In the hierarchicalmode, segmentation starts at the coarsest resolu-tion. Once the image at this resolution is segmented,the result is used as an initial segmentation for thenext finer resolution until the finest resolution issegmented. In [5] the Pappas algorithm is updatedby considering the edge information to increase theaccuracy of the segmentation and reduce the under-segmentation produced by the Pappas algorithm. In[6] noisy images are decomposed by redundantdiscrete wavelet transform (RDWT). In RDWTdecomposition, first the low pass sub band image isfiltered L times and then down sampling is doneagain L times. Similarly, the segmentation startsfrom lower resolution and the result is propagatedto the higher resolution. One of the disadvantageousof this algorithm is under-segmentation. In thisalgorithm low resolution segmentation could besignificantly different from highest resolution seg-mentation and scalability is not provided. Munozet al. [7] propose a multiresolution image segmenta-tion which integrates both region and boundaryinformation. The image is decomposed into severalresolutions. First, at the coarsest resolution, themost relevant edges are detected. Then seeds areplaced far from the edges and the region growingalgorithms obtain the regions. Using a global energyfunction and a greedy optimization algorithm, allthe pixels are classified. The segmentation issubsequently projected to the next higher resolutionwhere non-bounder pixels are used to model theregions in the higher resolution. The greedyoptimization algorithm again obtains the regions.The algorithm cannot detect small objects/regions ifthey are not detected in the lowest resolutions.Considering the inter-scale correlation in the pixelclassification procedure or extension to the scalablemode is not easily possible.

In the second group of segmentation algorithms,the inter-scale correlations is considered in thestatistical models and decision at each pixel/blockis based on the information of the differentresolutions [8–12]. However, often only the causalinter-scale correlation with the latest lower resolu-tion [8,9,12,11] or the next higher resolution isconsidered [10]. Considering the other resolutions,results in a very complex model and increases thecomputational complexity. Bouman et al. [8]introduce hierarchical MRF variables defined as acoarse to fine Markov chain of levels. Theassociated interaction structure is a quadtree which

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–16871672

correlates the current node with the parent at thelast resolution. It does not include the spatialcorrelation at the same resolution and also it isnot shift invariant since two pixels that would beadjacent in terms of spatial lattice may be actuallyapart in the graph structure. In [9] Comer et al.consider the inter-scale correlation as well as intra-scale correlation. They propose a multiresolutiontexture segmentation which fits an auto regressivemodel to the pyramid representation of the image.The MAP optimization criterion is replaced with themultiresolution maximization of the posteriorimarginal (MMPM) estimation which facilitates theuse of EM algorithm to estimate the parameterssuch as auto regressive model coefficients andprediction variances of different textures. Thecoarsest resolution is segmented in a single resolu-tion mode and the segmentation is propagateddown to the other levels of the pyramid. In thisapproach, correlation is only between adjacentresolutions and low resolution segmentation errorwill propagate to higher resolutions. Although someparameters are estimated, many parameters such asspatial interaction or number of segmentationclasses are determined experimentally. Wilsonet al. [12] update the statistical models used byBouman et al. [8] with the view that each scale orresolution data is conditioned not only on itsimmediate predecessor but also directly dependenton its neighbors at its own scale. Segmentationstarts from the coarse resolution and is projected tothe next resolution as an initial estimation. At eachresolution, the image is divided into blocks andevery block is classified by a MRF-based segmenta-tion. Because the classification is block based, afterthe highest resolution optimization, a line proces-sing refines the regions’ borders to the actualborders. This methods suffers from a number ofshort comes. Detecting new regions in higherresolutions specially small regions is not possible.Interaction between different resolutions are causalfrom low to highest resolution. It needs a regionboundary refinement which increases the computa-tional complexity and also its procedure does notinteract with region labeling. The method cannot beextended to scalable mode easily. Kato et al. [10]introduce a novel hierarchical segmentation whichincludes a three dimensional neighborhood system.In addition to spatial correlation, the interactionwith both higher and lower resolutions are con-sidered. The algorithm alternates between para-meter estimation and segmentation algorithm.

Unfortunately, the resulting parameter estimationand segmentation procedures require considerablecomputing time. In addition considering both high-er and lower resolutions produces a complex modeland increases the computational complexity.

None of these works and similar ones in theliterature, consider inter-scale correlation betweenall pyramid resolutions. In addition their extensionto scalable segmentation, producing the samesegmentation patterns at different resolution, isnearly impossible or results in an algorithm withlarge computational complexity. In order to pro-duce similar objects/regions at different resolutions,we present a novel MRF-based multiresolutiongrey/color image segmentation algorithm whichextends the statistical model to consider thecorrelation between all the resolutions withoutoverly increasing the computational complexity. Itproduces the same segmentation patterns at differ-ent resolutions and it is applicable to object-basedwavelet coding algorithms. Bearing in mind thedown-sampling of pixels to lower resolution inwavelet decomposition, corresponding pixels atdifferent resolutions are considered as vector ofpixels which are supposed to have the samesegmentation label. A vector or multiresolution/multidimensional analysis is incorporated in theobjective function of the MRF-based segmentationalgorithm which aligns the segmentation algorithmwith the object-based wavelet coding spatial scal-ability. This vector analysis keeps wavelet spatialscalability as a constraint which fits multiresolutionMRF segmentation to wavelet-based scalable objectcoding.

The multiscale analysis uses different-resolutioninformation concurrently, which produces betterresults than regular single and multiresolutionBayesian segmentation algorithms. It combinesand processes both coarse global information oflow resolution with fine local information obtainedfrom higher resolutions of the wavelet decomposi-tion pyramid. Therefore, it combines good featuresof both single and multiresolution segmentations.While it is noise resistance, it detects objects/regionsbetter than regular multiresolution segmentationand also results in a lower number of regions thansingle level segmentation. To optimize the objectivefunction of the segmentation algorithm, the IteratedCondition Mode (ICM) algorithm according to [4],matched to the scalable multiscale analysis, is used.

In this work, a smoothness criterion is incorpo-rated in the objective function of the segmentation

ARTICLE IN PRESS

Fig. 1. Circles in different resolutions: (a) closer approximation of a digital circle at high resolution; (b) down-sampling to low resolution;

(c) worse approximation of a digital circle at high resolution; (d) down-sampling of (c) to low resolution.

F.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1673

algorithm which results in a more normal or visuallypleasing objects/regions. Many natural objectsexhibit smooth borders/edges, and distortions suchas down sampling often result in rough borders/edges. Hence, to some extent there is correlationbetween visually pleasing objects and smoothness.In order to accentuate smoothness at lower resolu-tions, bigger smoothness coefficients are chosen forlower resolutions. By considering different coeffi-cients for the smoothness term of different resolu-tions, the distortion effects of down-sampling isreduced.

Color images have more information than grey-level images which results in more reliable separa-tion of foreground regions from background incolor image segmentation algorithms. It has beenrecognized that selection of an appropriate colorspace produces more perceptually effective segmen-tation results [13,14]. In particular, segmentation inYUV or LUV spaces produces more favorableresults than the RGB space [13–15]. Many of theimages and image sequences in the databases are inYUV format where Y is in full resolution while Uand V components are in half resolution. The factthat the Y, U, and V channels are presented atdifferent resolutions is not considered in any of theexisting regular single or multiresolution colorimage segmentation algorithms. However, this factcalls for a fitted multiresolution algorithm toperform the segmentation task effectively. Theproposed algorithm has enough flexibility todirectly segment color images. In the vectoranalysis, only available components of color dataat different resolutions are used to classify thevector to one of the segmentation labels. For thisreason, the proposed algorithm can segment grey-level images as well.

In order to produce similar objects/regions atdifferent resolutions, an alternative method issingle-resolution segmentation followed by down-sampling. In single-resolution region-based image/

video segmentation algorithms, features such asintensity/color, texture, motion, etc. are consideredat the highest resolution. In this method, goodfeatures of multiresolution segmentation such asreduced noise sensitivity are lost, and producingoptimized and visually pleasing objects/regions atdifferent resolutions, as a criterion, is not consid-ered. Furthermore, down-sampling distorts shapesand cannot preserve their topology at lowerresolutions for all possible shapes [16]. In otherwords, achieving visually pleasing objects/regions athigher resolution does not necessarily ensure similarquality at lower resolutions. For example in Fig. 1,down-sampling of two digital circles is compared. Itcan be seen that down sampling of the betterapproximation of a digital circle at high resolutioncan result in worse shape at lower resolution.

This paper is organized as follows. Section 2refers to the scalability in object-based waveletcoding. In Section 3, the proposed scalable multi-resolution segmentation algorithm, which includes astatistical image modeling and optimization pro-cesses, is explained. Some experimental results anddiscussion are presented in Section 4, and finally,conclusions are drawn in Section 5.

2. Object-based wavelet coding scalability

Scalability means the capability of decoding acompressed sequence at different data rates. It isuseful for image/video communication over hetero-genous networks which require a high degree offlexibility from the coding system. Some of thedesirable scalable functionalities are SNR scalabil-ity, spatial scalability and temporal scalability [3]. Inparticular spatial scalability means that, dependingon the end user’s capabilities (bandwidth, displayresolution, etc.), a resolution is selected and all theshape and texture information is sent and decodedat the appropriate resolution. Scalable image/videocoding has also different applications such as web

ARTICLE IN PRESS

E E O E O E O

EOEOEOEO

E O E O O O

EOEOEOEO

L H

E E O O

EOEOEOEO

HL

LL LH

E O E OHH(a) (b) (c)

O E E

Fig. 2. Decomposition of a non rectangular object with odd-length filters: (a) the object, shown in dark grey; (b) the decomposed object

after horizontal filtering; (c) decomposed object after vertical filtering. The letters ‘‘E’’ and ‘‘O’’ indicate the position (even or odd) of a

pixel in the horizontal and vertical dimensions.

F.A. Tab et al. / Signal Processing 86 (2006) 1670–16871674

browsing, image/video database systems, videotelephony, etc.

In wavelet-based spatial scalability applications,due to the self similarity feature of the wavelettransform, the shape in lower scale is the shape inthe lowpass (LL) subband. The exact relationshipbetween the full-resolution shape and its low-resolution versions depends on the kind of wavelettransform used for the decomposition. In this paperwe use an odd length filter (e.g. 9/7), where all shapepoints with even indices1 are downsampled for thelowpass band [17]. Fig. 2 further illustrates thewavelet decomposition of arbitrarily shaped objectswhen using an odd-length filter. The final four-banddecomposition is depicted in Fig. 2(c). As a result,every shape pixel with even indices has a corre-sponding pixel on the lower resolution and everyshape pixel on the lower level has a correspondingpixel on the next higher level. By considering the selfsimilarity of the wavelet transform, it is straightfor-ward to suppose that the pixels of a shape with evenindices have the same segmentation classificationsas the corresponding pixels on the lower level.

The wavelet self similarity extends to all low passsubband shapes of different levels. Therefore thediscussed relationship between corresponding pixelsis extended to shapes at different scales. Corre-sponding pixels at different resolutions have thesame segmentation class.

3. Spatial segmentation algorithm

Markov Random Field statistical modeling isused in many image processing applications. Inorder to solve an image processing problem by theMRF technique, a statistical image model has to be

1Suppose indices start from zero or an even number.

fitted to the application which captures the intrinsiccharacter of the image in a few parameters. Image/Video processing problems, including all uncertain-ties and constraints, can therefore be converted to amathematical parameter optimization problem [18].

3.1. Statistical color image model

The main challenge in multiresolution imagesegmentation for scalable object-based waveletcoding is to keep the same relation betweenextracted objects/regions at different resolutions asit exists between the decomposed objects/regions atdifferent resolutions in a shape adaptive wavelettransform. The other constraint is border smooth-ness particularly in lower resolutions. Differentsmoothness coefficients defined at different resolu-tions give some degree of freedom to put moreemphasis on the low-resolution smoothness. Tomeet these challenges, Markov random field model-ing is selected as it includes low level processing atpixel level and has enough flexibility in definingobjective functions matched with the problem athand [18]. We first explain the statistical model ofsingle resolution grey/color image segmentation andthen extend it to the scalable multiresolutionsegmentation mode. In a regular single-levelMRF-based image segmentation the problem isformulated using a criterion such as the maximuma posteriori (MAP) probability. The desired seg-mentation is denoted by X, and Y is the observedcolor image with three channels shown by a threedimensional vector Y ¼ ½Y 1;Y 2;Y 3�. Then accord-ing to the Bayes rule, the a posteriori prob-ability density of the segmentation variables canbe written as

PðX jY Þ / PðY jX ÞPðX Þ, (1)

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1675

where PðX jY Þ represents the conditional probabil-ity of the segmentation label, given the observationY. By assuming the conditional independence of thechannels given the segmentation field [13], we havePðY jX Þ ¼ PðY 1jX ÞPðY 2jX ÞPðY 3jX Þ, and the con-ditional probability in (1) becomes

PðX jY Þ / PðY 1jX ÞPðY 2jX ÞPðY 3jX ÞPðX Þ. (2)

The label field X is normally modeled by a MRFstochastic variable. Spatial continuity is easilyincorporated into the segmentation, because it isinherent to MRFs [19]. Using a four or eight pixelneighborhood system considering only pairwisecliques, PðX Þ is then a Gibbs distribution [4] andis defined by its energy function UðX Þ such that

PðX Þ ¼1

Zexp �

1

TUðX Þ

� �; UðX Þ ¼

Xc2C

V cðX Þ,

(3)

where C is the set of all cliques, and V c is the cliquepotential function. A clique is a set of neighboringpixels. A clique function depends only on the pixelsthat belong to the clique. In single-resolutionsegmentation, usually one or two pixel cliques areused as shown in Fig. 3(a), and for one pixel cliqueswe assume that the one pixel clique potentials arezero, which means that all region types are equallylikely [4]. Spatial connectivity of the segmentation isimposed by assigning the following clique function:

V cðs; rÞ ¼�b if X ðsÞ ¼ X ðrÞ;

þb if X ðsÞaX ðrÞ;ðs; rÞ 2 C:

((4)

Herein b is a positive number and s and r are apair of neighboring pixels. Note that a low potentialor energy corresponds to a higher probability forpixel pairs with identical labels and lower prob-ability for pairs with different labels, which auto-matically encourages spatially connected regions.

The conditional probability density PðY ijX Þ,i ¼ 1; 2; 3, is modeled as a white Gaussian process,

(a)

Fig. 3. (a) Normal one and two pixels cliques sets. (b) A clique

with mean miX ðsÞðsÞ and variance s2i for channel i.

Each region is characterized by a mean vector½m1X ðsÞðsÞ, m

2X ðsÞðsÞ, m

3X ðsÞðsÞ� which is a slowly varying

function of s. Therefore PðY jX Þ can be described bythe following equation:

PðY jX Þ / exp �X

s

X3i¼1

1

2s2iðY iðsÞ � mi

X ðsÞðsÞÞ2

!( ).

(5)

Considering Eqs. (1), (3) and (5), the conditionalprobability density of the segmentation variablebecomes

PðX jY Þ / exp �X

s

X3i¼1

1

2s2iðY iðsÞ � mi

X ðsÞðsÞÞ2

(

þ1

T

Xr2qs

V cðs; rÞ

!). ð6Þ

It is easy to see that the parameters si, i ¼ 1; 2; 3, T

and b are interdependent. Therefore, to simplify theexpression, the parameters 2s2i , i ¼ 1; 2; 3, and T areset to one, and the segmentation result is controlledby the value of b in the V c function. The probabilitydensity function has two components. One forcesthe region intensity to be close to the data, and theother imposes spatial continuity. Considering theMAP criterion, we maximize the probabilityPðX jY Þ, which is equivalent to minimizing thenegative value of argument of the exponentialfunction in Eq. (6). This results in the followingcost or objective function which has to be mini-mized with respect to X ðsÞ:

EðX Þ ¼X

s

X3i¼1

ðY iðsÞ � miX ðsÞðsÞÞ

1

T

Xr2qs

V cðs; rÞ

!.

(7)

In grey-scale images only the intensity channelexists, and the terms Y 1 and Y 2 in Eq. (7) have to bedeleted.

(b)

of two vectors with the vectors’ dimension equal to two.

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–16871676

To obtain the final segmentation, this objectivefunction is minimized by one of the several MRFobjective minimization methods [18]. To tailor thisobjective function to scalable multiresolution colorimage segmentation, initially, the wavelet transformis applied to the original image and a pyramid ofdecomposed images at various resolutions is cre-ated. Let Y ¼ ½Y 1;Y 2;Y 3�, where Y i; i ¼ 1; 2; 3, isthe intensity of channel i of the pyramid’s pixels.The segmentation of the image into regions atdifferent resolutions will be denoted by X.

As mentioned earlier, considering scalability, apixel and its corresponding pixels at all otherpyramid levels have the same segmentation label.Therefore they change together during the segmen-tation process. To change the segmentation label ofa pixel, the pixel and all its corresponding pixels atall other levels have to be analyzed together. As aresult, an analysis of a set of pixels in a multi-dimensional space instead of a single resolutionanalysis needs to be used. The term ‘‘vector’’ is usedto refer to multidimensional space. A vectorincludes corresponding pixels at different resolu-tions of the pyramid. A symbol fsg shows a vectorwhich includes pixel s. The dimension of a vector isequal to the number of it’s pixels which are locatedat different resolutions. If the corresponding pixelsare determined according to the wavelet trans-form down-sampling, the vector dimension dependson the index of pixels, and it can be 1, 2 ormore. Using these primary definitions, the cliqueconcept is extended to vector space. The extendedcliques act on two vectors instead of two pixels.Fig. 3(a) shows regular one and two pixels cliquesets. In Fig. 3(b), the extension of one of thesecliques to the array mode in two dimensional spacecan be seen.

The extension of clique functions is achievedthrough the following steps: Eq. (4) is used forcliques of length two at a resolution where pixels s

and r are two neighboring pixels at the sameresolution level. Eq. (8) is defined for multiplelevels, where fsg and frg are vectors correspondingto two neighboring pixels s and r. The neighboringpixels of the two vectors fsg and frg at level k aredenoted as sk and rk. The lowest resolution whichinclude a pixel of vector fsg is denoted as M and N isthe dimension of vectors fsg and frg. A positivevalue is assigned to the parameter b, so thatadjacent pixels, of two neighboring vectors, aremore likely to belong to a same region than todifferent regions. Increasing the value of b decreases

the sensitivity to intensity changes [4].

V cðfsg; frgÞ ¼

PMþN�1k¼M LK

N

! XMþN�1

k¼M

ð�1ÞLk :b,

Lk ¼1 if X ðskÞ ¼ X ðrkÞ;

0 if X ðskÞaX ðrkÞ;

(

sk 2 fsg; rk 2 frg; rk 2 qsk. ð8Þ

It is notable that Eq. (8) extends the cliquedefinition to multiresolution mode. In vector fSgcorresponding pixels at different resolutions aredetermined according to the down-sampling rela-tionship of arbitrary shape wavelet transform. Inother words the pixels of the lower resolution occurat down-sampled positions of the higher-resolutionpixels in the pyramid. However, pixels of frg areneighbors of fsg at different resolutions. As a resultof the clique extension to multiresolution space,segmentation processing will continue in the vectorspace, therefore, intensity average and segmentationlabel functions are also extended to vector space.The intensity of pixels at different channels in set fsgform a vector Y ðfsgÞ ¼ ½Y 1ðfSgÞ;Y 2ðfSgÞ;Y 3ðfSgÞ�,and similarly, mðfsgÞ ¼ ½m1ðfSgÞ;m2ðfSgÞ;m3ðfSgÞ� isthe mean vector. Therefore, the objective function isextended to vector space as follows:

EðX Þ ¼XfSg

X3i¼1

kY iðfsgÞ � miX ðfsgÞðfsgÞk

2

(

þXfrg2qfsg

V cðfsg; frgÞ

). ð9Þ

The outer summation is over vectors, while thefirst inner summation is related to the distances ofthe pixel’s intensities from the estimated average foreach channel of color images. The second innersummation is over all neighborhood vectors ofvector fsg. The two vectors fsg and frg are neighborsif pixels of fsg and frg located at the same resolutionare also neighbors. The approach used in thissection, expressed by Eqs. (7)–(9), is a generalizationof regular single-resolution segmentation to scalablemultiresolution segmentation.

3.2. Smoothness criterion

Traditionally, in region-based image/video seg-mentation algorithms, the image features such aspixels’ grey level or color have been considered. In

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1677

most of these approaches, emphasis is put on theaccuracy of segmentation. However objects/regionsshape delineation, and producing a well-pleasingobjects’/regions’ shape has not attracted enoughattention due to the ill-posed problem nature of thesegmentation task [20,21]. In contour/edge-basedsegmentation algorithms, another important criter-ion, related to the appearance of the extractedobjects/regions, has been considered. In thesealgorithms, the extracted objects/regions bordersare smoothed [22,23]. Ideally, borders are edges inthe image, which are one of the most importantproperties for visual perception. Because mostnatural objects exhibit smooth edges and distortionssuch as down-sampling often creates rough edges,there is a correlation between border smoothnessand visually pleasing objects. Therefore bordersmoothness terms corresponding to different resolu-tions have been added to the objective function tocontribute in our MRF-based segmentation ap-proach.

The proposed smoothness definition is based onthe border’s curvature, which is the rate of the anglechange between a curve and the tangent line to thecurve. In a digital environment an estimation ofcurvature can be used. The estimation is explainedin Fig. 4. Minimizing the proposed estimation ofsmoothness prevents unwanted fluctuations in theborder pixels.

A large number of pixels ensures border smooth-ness at high resolutions, however, at lowerresolutions the visual quality can suffer due todown-sampling distortion and lack of sufficientinformation. To reduce this effect, the smoothnesscould be enhanced at lower resolutions morerigorously than higher resolutions. The priority isrealized by bigger coefficients for lower resolutionsmoothness. Therefore the objective function is

β2β2

A A

β1

α2α2

α1

β2β2

β1

(a) (b)

Fig. 4. Curvature estimation; kðAÞ ¼ 05:absðða2þ a1Þ � ðb2þ b1ÞÞ: (adirection point, b1 ¼ b2 ¼ 45; a1 ¼ a2 ¼ 0; k ¼ 45; (c) b1 ¼ b2 ¼ 45; a1

updated according to the following equation:

EðX Þ ¼XfSg

X3i¼1

jjY iðfsgÞ � miX ðfsgÞðfsgÞjj

2

(

þXfrg2qfsg

V cðfsg; frgÞ þXq2fsg

lresðqÞ � nðqÞ

),

ð10Þ

where nðqÞ shows the curvature estimation of pixel q

a pixel of vector fsg, and lresðqÞ is a coefficient whichdecreases when resolution increases. In grey-levelimages, only the grey-intensity channels are avail-able, therefore it is enough to only consider Y 1 inthe second summation in Eq. (10) which gives thefollowing objective function for grey-level images:

EðX Þ ¼XfSg

½Y ðfsgÞ � mX ðfsgÞðfsgÞ�2

þXfrg2qfsg

V cðfsg; frgÞ þXq2fsg

lresðqÞ � nðqÞ

),

ð11Þ

where Y is the grey-intensity function and m is thegrey-intensity average function.

The proposed smooth object extraction is differ-ent from a simple objects’ border smoothness as hasbeen done in [24] which is a filtering of the extractedvideo object shape to remove the small elongationintroduced during the segmentation process, in thefollowing areas. (1) Our smoothing process takespart in the segmentation algorithm and changes thesegmentation outcome. (2) With sufficient contrast,the proposed algorithm produces borders that aremore faithful to the regions shape. (3) On someoccasions, some background pixels are added to theforeground regions to produce better lookingshapes especially at different resolution. (4) Thesmoothness factor could be adjusted for different

α 2 = α 2 = α1= 01= 0

(c)

A

α 2α 2

α1

β2β2

β1

) corner point, b1 ¼ b2 ¼ 45; a1 ¼ a2 ¼ �45; k ¼ 90; (b) change

¼ a2 ¼ 45; k ¼ 0 same direction.

ARTICLE IN PRESS

Fig. 5. Scalable segmentation of a digital circle with emphasis on low level smoothness: (a) original image; (b) noisy Image; (c)

segmentation at 20� 20 resolution; (d) segmentation at 10� 10 resolution.

F.A. Tab et al. / Signal Processing 86 (2006) 1670–16871678

resolutions to produce visually pleasing shapes atdifferent resolutions with scalability as a constraint.

As an example of smoothness effect in spatialsegmentation, consider the circle in Fig. 5(a). It hastwo grey levels, 100 in the background area and 200in the foreground area. We consider a uniform noisein the range ð0; 50Þ added to the background andsubtracted from the object intensity. This noisechanges the image from binary to grey level andreduces the pixels intensity variation of the fore-ground to the background pixels. The image issegmented by the proposed algorithm at tworesolutions 20� 20 and 10� 10. We augment thelower resolution smoothness by decreasing thesmoothness coefficients to zero for the highest leveland increasing the smoothness coefficient for lowerresolution. The results are shown in Fig. 5(c) and(d). In this example, the smoothness criterion hasdeleted some pixels of the shapes at differentresolution. The results could be compared withFig. 1(a), (b) at low and high resolutions consideredas regular segmentation. The proposed segmenta-tion method extracts a more pleasing shape at lowerresolutions, albeit sometimes adding some distor-tion at higher resolution. However, large number ofpixels at higher resolutions ensures more smooth-ness and visually pleasing objects.

3.3. MAP estimation

The ICM optimization method [25] is used tominimize the objective function in Eq. (10). Thesegmentation is initialized with the k-means cluster-ing algorithm for each channel separately. Thenneighboring pixels with equal labels at all threechannels form a region. The segmentation estima-tion is improved using ICM optimization [25]. Insingle-resolution image segmentation, ICM opti-mizes the objective function pixel by pixel in araster-scan order until convergence is achieved. Ateach pixel, the segmentation of the processed pixel is

updated given the current X at all other pixels.Therefore only the terms in the objective functionrelated to the current pixel need to be minimized:

EðX ðsÞÞ ¼X3i¼1

ðY iðsÞ � miX ðsÞðsÞÞ

2�Xr2qs

V cðs; rÞ. (12)

ICM, as used in the single-level segmentationalgorithm for grey-level images by Pappas [4], andextended to color images by Chang et al. [13], ismodified to adapt to the scalable multiresolutionsegmentation algorithm. Similarly, the objectivefunction term corresponding to the current vectoris optimized given the segmentation at all othervectors of the pyramid. The resulting objectivefunction terms related to the current vector are

EðX fsgÞ ¼X3i¼1

jjY iðfsgÞ � miX ðfsgÞðfsgÞjj

2

þXfrg2qfsg

V cðfsg; frgÞ þXq2fsg

lqnðqÞ. ð13Þ

For grey-level images there is only the intensitychannel and the objective function is simplified to

EðX fsgÞ ¼ ðY ðfsgÞ � mX ðfsgÞðfsgÞ2

þXfrg2qfsg

V cðfsg; frgÞ þXq2fsg

lqnðqÞ. ð14Þ

During the optimization process for each pixels s

of a vector fsg, the terms miðsÞ; i ¼ 1; 2; 3, areestimated by averaging the channel intensities ofall pixels that belong to the region i and are inside awindow with width w centered at pixels s. Thewindow size w is doubled when we move to the nexthigher resolution. The average of any pixel s and itscorrespondences at all other levels in fsg are used toclassify the pixels of fsg to a label which minimizesEq. (13). To reduce computational complexity, it isenough to consider only labels of fsg and itsneighboring vectors to select the best label by theenergy minimization through Eq. (13). Thereforefor the pixels inside a region there is no computation

ARTICLE IN PRESS

Table 1

Number of vectors with different lengths

Length of vectors 1 2 3 4 5

Number of vectors 3=4 M �N 3=16 M �N 3=64 M �N 3=256 M �N 1=256 M �N

F.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1679

and the regions’ border are gradually refined.Furthermore, this border processing prevents iso-lated noise pixels from becoming a new cluster,resulting in fewer wrongly detected boundaries [26].

Let us consider the overall optimization algo-rithm now. As mentioned above, the initial segmen-tation of the pyramid is obtained by the k-meansclustering algorithm [27,28]. The pyramid’s pixelsare processed progressively from low to highresolutions. At each resolution, pixels are visitedin a raster scan order. The intensity average miðsÞ,i ¼ 1; 2; 3, at each pixel s and its correspondingpixels at the other resolutions for all possible classesare estimated with a pre-determined window size w

used for estimation. We then update the estimate ofX fsg using the ICM approach with a multi-levelanalysis using Eq. (14). By updating the segmenta-tion labels of pixels at the current resolution, thecorresponding pixels at the other levels are alsoupdated. After convergence at the current resolu-tion, the algorithm moves to the next higherresolution and updates the estimates of m and X

and so on, until all resolutions are processed. Thestopping criterion at each resolution is the numberof X update which should be below a pre-definedthreshold. To reduce the number of iterations, otherconvergence criteria can also be used. The wholeprocedure is repeated with a smaller window size.The algorithm stops when the minimum windowsize for the lowest level is reached. We haveconsidered the minimum window size being eightfor the lowest level.

Scalable and multi-dimensional analysis tiespixels in high and low resolutions together, so thathigh resolution refinement influences low resolutionrefinement, too. On the other hand, optimizationincludes several stages of refinements from low tohigh resolution with decreasing window size. There-fore, the proposed segmentation algorithm with itsobjective function and the optimization routineperforms repeated low to high resolution segmenta-tion refinement and feedback from high to lowresolution segmentation until convergence isreached. The combination of the proposed objectivefunction and the optimization method provide

effective low to high resolution and high to lowresolution interactions, therefore providing a reli-able and scalable segmentation algorithm.

To quantify the computational complexity of theproposed vector-based processing, the vector size aswell as the number of iterations required for theoptimization process have to be considered. Becausethe down-sampling from a given resolution to thenext lower one reduces the number of pixels to 1

4th,

the average length of vectors is not much larger thanone. To give an example, we consider the segmenta-tion of an image of size M �N, using five levels ofdown-sampling. This means that vectors exist withlengths from one to five. Table 1 shows the numberof vectors of each length. The average vector lengthis easily calculated to be equal to 1:33. Since thenumber of iterations needed for convergence isimage dependent, it is difficult to assign a value to it.However, from the average vector size it can beconfidently asserted that vector-based processingincreases the computational complexity only mod-erately. Experimental results confirm that vectorprocessing along with the proposed optimizationprocess increases the computational complexity byless than a factor of 2:5. The major part of thisincrease is related to the proposed optimizationprocess, which includes several times low to highresolution processing, and not to the vector lengthitself. On the other hand, as experiments haveshown, imposing the smoothness criterion to thesegmentation algorithm contributes more towardsincreasing the computational complexity than thevector-based processing. Smoothness estimation isrepeated at each pixel increasing the computationalcomplexity considerably. In different examples, ithas been seen that the smoothness criterionincreases the complexity of the scalable segmenta-tion by more than a factor of 6.

4. Experimental results and discussion

In this section, experimental results obtainedusing the algorithm introduced in Section 3 arepresented. The results are compared with theregular single-level and multiresolution segmenta-

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–16871680

tion algorithms [4,13]. In the first step, the image isdecomposed into three resolutions, using the ð97Þwavelet filter. Then in each level of the decomposi-tion, the image is segmented while scalabilitybetween regions in different resolutions, as requiredfor the arbitrary shape wavelet transform, isachieved with the proposed algorithm. Some para-meters such as the number of the labels, thecontinuity b and smoothness coefficients are enteredinto the algorithm. Automatic determination ofthese parameters is beyond the scope of this workand could be the subject of further research.

As a first example, the proposed algorithm istested using frame 5 of the SIF sequence TableTennis. For grey-level images only the intensity ofthe grey channel is considered in Eq. (14). Fig. 6represent the results achieved by the proposedmultiresolution scalable, regular single and multi-resolution segmentation algorithms. The result ofthe proposed scalable multiresolution segmentationalgorithm is presented only at the finest resolution,because the lower resolutions results have the samepatterns and figures.

In regular multiresolution segmentation algo-rithms, brief, coarse and filtered versions of theimage are processed at the lower resolutions,therefore, some small size or low contrast regionsare not detected. This drawback is called under-segmentation. In contrast in the proposed algorithmthe effects of high resolutions on low-resolutionresults in the detection of details which is notpossible using regular multiresolution segmentation.In other words, the sensitivity to grey-level changesis increased, resulting in a better detection of smallor low-contrast objects especially in low resolutions.Table 2 shows the number of detected regions of theTable Tennis image in three spatial resolutions fordifferent segmentation algorithms. The proposedscalable segmentation detects more relevant regionsthan the regular multiresolution algorithm. Forexample, consider the segmentation of the texturedwall and detection of the ball in the Table Tennisimage as presented in Fig. 6 by the proposed

Table 2

Number of regions in Table Tennis segmentation

Seg. algorithm 60� 120 120� 176 240� 352

Multiresolution 19 55 164

Scalable 42 83 184

Single level — — 314

multiresolution scalable, the regular single-resolu-tion, and the multiresolution segmentation algo-rithm. The single-level segmentation detects the ball,but it also detects a number of spurious regions dueto the textured background as the number ofregions in Table 2 shows. This drawback is calledover-segmentation. The regular multiresolutionalgorithm misses the ball at different resolutionsaltogether. The proposed algorithm, however,detects the ball as well as avoiding unsightlysegmentation of the textured background.

It is significant to note that while our algorithmhas improved sensitivity to grey-level variation itstill maintains noise tolerance. To test the scalablesegmentation algorithm on noisy images, first auniform noise in the range ð�30;þ30Þ is added tothe Table Tennis images, and then differentsegmentation algorithms were performed. Thenumber of misclassified pixels for the Table Tennisobject including arm, racket and ball (11 033 pixels)as well as the entire image pixels are counted. Theresults in Table 3 confirm that the proposedalgorithm can deal with noisy images as effectivelyas multiresolution segmentation and much betterthan single-level segmentation. This result confirmsthat the introduced multilevel segmentation algo-rithm maintains most advantages of multiresolutionsegmentations over single-level segmentations suchas better segmentation of noisy images.

The proposed segmentation can be used ingeneral segmentation applications. However, it isespecially suited to scalable wavelet-based imageobject coding which allows us only the pixelsbelonging to an arbitrarily shaped object to becoded [29,30]. To facilitate ‘‘object-of-interest’’extraction, in a semi-automatic procedure the userdetermines the rough boundary of the ‘‘object-of-interest’’ through a graphical user interface (GUI).Subsequently, all the regions with a predeterminedpercentage of their area inside this closed con-tour are selected as the regions belonging to the‘‘object-of-interest’’. Combining all the selectedregions creates the final object. As an example, a

Table 3

Misclassified pixels in noisy image

Algorithm Multiresolution Scalable Single

resolution

Object. %4:13 %3:04 %8:85Image %7:96 %6:20 %18:74

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1681

user has roughly determined the objects of interestin Fig. 8. Subsequently the exact borders of theobject at different resolutions are determined. Theextracted objects by both the scalable segmentationand regular single-resolution segmentation algo-rithms at three different resolutions are shown inFig. 7. A comparison of the extracted objectsconfirms the superiority of the scalable segmenta-tion algorithm in a subjective test. As a comple-mentary step, using a quantitative test, the bordersmoothness is measured at different resolutions.Table 4 shows the average smoothness for all borderpixels in scalable segmentation and single-resolution

Fig. 6. Table Tennis image segmentation with k ¼ 6 clusters and b ¼ 1

algorithm at 240� 352; (c) regular single level segmentation; (d)

segmentation; (f) 60� 88 segmentation.

Fig. 7. Table Tennis object extraction, objects are extracted form sc

segmentation in the second row: (a) 240� 352 ; (b) 120� 176; (c) 60�

segmentation, down-sampled to lower levels at threedifferent resolutions. The quantitative test results,similar to subjective test, show that the proposedalgorithm ensures smoother edges compared tothe down-sampled versions of single-resolutionsegmentation. Finally, the extracted image objectcan be coded by scalable object-based codingalgorithms [30].

In the second example, frame 34 of the Motherand Daughter sequence is segmented. The image isin qcif format and is given in the YUV color space.Similar to many other sequences in databases, Uand V, the chrominance components, are in half

00: (a) the main image; (b) segmentation by the proposed scalable

regular multiresolution segmentation 240� 352; (e) 120� 176

alable segmentation in the first row and form single-resolution

88; (d) 240� 352; (e) 120� 176; (f) 60� 88.

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–16871682

resolution. Regular color-image segmentation needsthe information in the same resolution. Therefore,in the first solution, the image is segmented in grey-

Table 4

Means of segmentation curvature estimation

Algorithm 60� 88 120� 176 240� 352

Downsample 25:58 22:68 21:33Scalable 17 17:3 15:17Improvement %33:2 %23:65 %28:9

Fig. 9. Frame 34 of qcif size Mother and Daughter image sequence se

level image; (b) regular grey-level single-resolution segmentation; (c) c

resolution; (d) proposed scalable segmentation.

Fig. 8. Table Tennis object selection by user.

level space by a single-resolution statistical imagesegmentation algorithm [4]. The result is shown inFig. 9(b). The left area of the daughter’s face has notbeen well separated from the background becausethere is not enough grey-level contrast between faceand background. The same shortcoming happensfor the other grey-level segmentation algorithmsexcept when there is over-segmentation with a largenumber of detected regions, which is not desired forsegmentation applications. To successfully separateobject’s regions from background, color segmenta-tion is performed as an alternative solution. Theproposed scalable segmentation algorithm can per-form color segmentation using half-resolutionchrominance components. The result of segmenta-tion by the scalable color image segmentation isshown in Fig. 9(d). The number of regions in grey-level segmentation is 273 while in color segmenta-tion it is 112, which shows a reasonable color imagesegmentation algorithm.

In the third example, the color image of Lena inYUV space, where Y is in 256� 256 resolution andthe chrominance components U and V are in halfresolution, is segmented. The original image isshown in Fig. 10(a). In the first experiment, U andV are projected to higher resolution by a 1:4 pixel

gmentation with k ¼ 7; 2; 2 clusters and b ¼ 40: (a) original grey-

olor image of Mother and Daughter where U and V are in half

ARTICLE IN PRESS

Fig. 10. Lena image segmentation at 256� 256 with k ¼ 6; 4; 4 clusters and b ¼ 100: (a) original image; (b) regular single-resolution

segmentation where U and V are projected to higher resolution; (c) proposed scalable segmentation where U and V are in half resolution.

F.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1683

transform and then single-resolution segmentationis performed [13]. The result is shown in Fig. 10(b).It can be seen that the top part of the hat is not wellseparated from the background. Finally, result forthe proposed multiresolution scalable segmentationalgorithm, which uses half resolution U and V, isshown in Fig. 10(c). This algorithm can separate allthe foreground (Lena) regions from the backgroundsuccessfully. It is interesting to note that the single-resolution method divides the image into 578regions, while the proposed scalable segmentationseparates the image into 427 regions, which is a %26reduction in the number of regions. This confirmsthat the proposed algorithm reduces the number ofregions or over-segmentation compared to single-level segmentations while still separating the objects’regions from the background. Similarly, single-resolution segmentation in RGB space cannotseparate the hat from the background and dividesthe image into 779 regions with similar parameters.Considering the over-segmentation and under-seg-mentation of respectively of single resolutionsegmentation and multiresolution segmentationalgorithms, the failure to separate object’s regionsin single resolution will for sure lead to even biggerfailure in multiresolution segmentation algorithm.

In the next example, frame 30 of the qcif sequenceforeman is considered. The original image is inYUV format where Y is in full resolution but U andV are in half resolution. In Fig. 11(a) the originalimage can be seen. The image is segmented with theproposed scalable multiresolution segmentation.The result is compared with the other segmentationalgorithms. To perform other algorithms, U and Vof color components are again projected to fullresolution by a 1:4 pixel transform and the regularsingle, multiresolution and the proposed scalablesegmentation algorithms are performed. The initialsegmentation estimation comes from k-means clus-tering for different channels and the number ofclasses were chosen as k ¼ 10; 4; 2 for the used YUVor RGB color channels, respectively. In the firstexperiment the components U and V are projectedto the next higher resolution and then the proposedscalable and the regular multiresolution segmenta-tion algorithms are performed. The results can beseen in Figs. 11(b) and (c). It is clear that in thisexample regular multiresolution segmentation can-not separate the foreground (foreman) from back-ground regions. This is more pronounced inseparating the left area of the hat from the back-ground. Furthermore some other details such as the

ARTICLE IN PRESS

Fig. 11. Segmentation of frame 32 of qcif size foreman sequence with K ¼ 10, 4, 2 clusters at different channels: (a) original image; (b)

scalable segmentation where U and V are projected to higher resolution; (c) regular multiresolution segmentation; (d) scalable

segmentation where YUV are in full resolution; (e) scalable segmentation in YUV space where U and V are in half resolution; (f) scalable

segmentation in RGB space where components are in full resolution.

Table 5

Misclassified pixels in Foreman image segmentation

Resolution 72� 88 144� 176 288� 352

Algorithm A 50 186 786

Algorithm B 123 511 2118

Improvement %59 %64 %63

F.A. Tab et al. / Signal Processing 86 (2006) 1670–16871684

left eyebrow has not been detected. Similarly, thescalable segmentation using projected U and V tohigher resolution could not detect the corner of thehat.

The image with real full resolution size of U andV is segmented and will be considered as a groundtruth for comparison in the following. The qcif sizeU and V components are taken from the availableYUV, CIF size image sequence. Segmentation bythe scalable segmentation which uses real fullresolution qcif size U and V are in Fig. 11(c).Fig. 11(d) shows the segmentation with the pro-posed algorithm which uses full resolution Y andhalf resolution U and V components. As can beseen, the proposed algorithm separates the fore-ground regions from the background successfully,as the scalable algorithm with the full-resolutioninformation does. As a statistical test, scalablesegmentation using half resolution U and V (Algo-rithm A) and scalable segmentation using projectedU and V (Algorithm B) are compared with theground truth. The number of misclassified pixels inthe proposed algorithm using half resolution U(Algorithm A) and V is about %30 of the ones ofthe algorithm which uses the projected U and Vcomponents in high resolution (Algorithm B). Thenumbers of misclassified pixels at different resolu-tions are shown in Table 5. In the Fig. 11(e) thesegmented image in the RGB space using the fullresolution information is shown. The right and toparea of the hat are not separated well. To remedythe problem, we have to increase the number of

classes from 10, 4, 2, to 10, 10, 10 classes to separatethe hat resulting in the increase of the number ofregions which is over-segmentation. Increasing thenumber of regions will increase the computationalcomplexity of the segmentation algorithm. Thenumber of regions with K ¼ 10, 4, 2 is 279 for theproposed algorithm in YUV space and 337 for RGBspace which increases to 739 regions for K ¼ 10, 10,10 in RGB space.

As the last segmentation example, the 256� 256Guitar image is segmented. The original grey-levelimage and the single resolution segmentation areshown in Figs. 12(a) and (b). The segmentationresult shows that many meaningful regions are notwell detected, and border fluctuations occur in someareas of the image. Figs. 12(c)–(e) show the multi-resolution segmentation results. Although multireso-lution segmentation has less border fluctuations, theresulting under-segmentation means that moresemantic regions are missed than in single resolutionsegmentation. Table 6 shows the number of regionsfor single and multiresolution segmentation algo-rithms at different resolutions.

ARTICLE IN PRESS

Fig. 12. Guitar segmentation: (a) the 256� 256 grey image; (b) single-resolution segmentation; (c) multiresolution 64� 64 segmentation;

(d) multiresolution 128� 128 segmentation; (e) multiresolution 256� 256 segmentation.

Table 6

Number of regions for segmentation of grey Guitar

Segmentation 88� 72 144� 176 288� 352

Uni-resolution seg. 82 189 447

Multiresolution seg. 82 98 157

F.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1685

To enhance the segmentation process, in a furtherexperiment, color information has been used in thesegmentation algorithm. Fig. 13(a) shows theoriginal 256� 256 Guitar color image. The single-resolution segmentation of the color image in theYUV color space is shown in Fig. 13(b). 560 regionshave been detected with many non-meaningfulregions, which indicates over-segmentation. Borderroughness decreases for many regions creating abetter visual quality. Figs. 13(c)–(e) show thestandard multiresolution segmentations of the colorimage at different resolutions. While bordersmoothness is increased, many meaningful regionshave not been detected. This, therefore, over-corrects the over-segmentation of the single-resolu-tion segmentation, resulting in under-segmentation.Many meaningful regions of the Guitar instrument

and filing cabinet have not been well detected andare mixed irreversibly with the background. Thenumbers of regions are shown in Table 7. Finally,the proposed scalable multiresolution segmentationwith the smoothness constraint at three differentresolutions is shown in Figs. 13(f)–(h). It can be seenthat the most important and meaningful regionshave been extracted and that the segmentation mapsat different resolutions are similar. The borders aresignificantly smoother.

5. Conclusions

We have presented a multiresolution scalablegrey-level/color image segmentation algorithmwhich extracts objects/regions with similar segmen-tation pattern at different resolutions, which isuseful for object-based wavelet coding applications.In addition to scalability, a new quantitativecriterion is added to the segmentation algorithm.This criterion, which is a smoothness function basedon the pixel segmentation labels, represents thevisual quality of the objects/regions at differentresolutions. To reduce the down-sampling distor-tion in object-based wavelet transforms, differentsmoothness coefficients are considered for different

ARTICLE IN PRESS

Fig. 13. Guitar segmentation: (a) the 256� 256 grey image of Guitar; (b) single-resolution segmentation; (c) multiresolution segmentation

at 64� 64; (d) multiresolution segmentation at 128� 128; (f) scalable segmentation at 64� 64; (g) scalable segmentation at 128� 128; (h)

scalable segmentation at 256� 256.

Table 7

Number of regions for segmentation of color Guitar

Segmentation 88� 72 144� 176 288� 352

Single resolution seg. 139 308 447

Multiresolution seg. 139 143 172

Scalable seg. 173 288 342

F.A. Tab et al. / Signal Processing 86 (2006) 1670–16871686

resolutions. The proposed multiscale analysis in-corporate in the objective function of Bayesiansegmentation, improves the sensitivity to grey-levelvariations while maintaining high performance in

noisy environments. The novel objective functiongives flexibility to the proposed algorithm tosegment YUV color images where Y is in fullresolution but U and V are in half resolution.

References

[1] F. Pereira, T. Ebrahimi, The MPEG-4 Book, Prentice-Hall

PTR, Englewood Cliffs, NJ, 2003.

[2] B.S. Manjunath, P. Salembier, T. Sikora, Introduction to

MPEG-7 Multimedia Content Description Interface, Wiley,

New York, 2002.

[3] H. Danyali, Highly scalable wavelet image and video coding

for transmission over heterogeneous networks, Ph.D. Thesis,

ARTICLE IN PRESSF.A. Tab et al. / Signal Processing 86 (2006) 1670–1687 1687

School of Electrical, Computer and Telecommunication

Engineering, University of Wollongogn, Wollongong, NSW,

Australia, 2003.

[4] T.N. Pappas, An adaptive clustering algorithm for image

segmentation, IEEE Trans. Image Process. 40 (4) (April

1992) 901–914.

[5] Y.A. Tolias, N.A. Kanlis, S.M. Panas, A hierarchical edge-

stressing algorithm for adaptive image segmentation, in:

Electronics, Circuits, and Systems, 1996. ICECS ’96,

Proceedings of the Third IEEE International Conference,

vol. 1, pp. 199–202.

[6] L. Zheng, J.C. Liu, A.K. Chan, W. Smith, Object-based

image segmentation using dwt/rdwt multiresolution markov

random field, in: Acoustics, Speech, and Signal Processing,

1999, ICASSP ’99. Proceedings, 1999 IEEE International

Conference, vol. 6, pp. 3485–3488.

[7] X. Munoz, J. Marti, X. Cufi, J. Freixenet, Unsupervised

active regions for multiresolution image segmentation, in:

Pattern Recognition, 2002. Proceedings, 16th International

Conference, vol. 2, pp. 905–908.

[8] C.A. Bouman, M. Shapiro, A multiscale random field model

for bayesian image segmentation, IEEE Trans. Image

Process. 13 (1994) 162–177.

[9] M.L. Comer, E.J. Delp, Segmentation of textured images

using a multiresolution gaussian autoregressive model, IEEE

Trans. Image Process. 8 (3) (1999) 408–420.

[10] Z. Kato, J. Zerubia, M. Berthod, Unsupervised parallel

image classification using Markovian models, Pattern

Recognition 32 (1999).

[11] C.T. Li, Multiresolution image segmentation integrating

gibbs sampler and region merging algorithm, Signal Process.

83 (2002).

[12] R. Wilson, C.T. Li, A class of discrete multiresolution

random fields and its application to image segmentation,

IEEE Trans. Pattern Anal. Machine Intell. 25 (2003).

[13] M.M. Chang, M.I. Sezan, A.M. Tekalp, Adaptive bayesian

segmentation of color images, J. Electronic Imaging 3 (4)

(1994) 404–414.

[14] J. Luo, R.T. Gray, H.C. Lee, Towards physics-based

segmentation of photographic color images, in: Image

Processing, 1997. Proceedings, International Conference,

1997, vol. 3, pp. 58–61.

[15] J. Gao, J. Zhang, M.G. Fleming, A novel multiresolution

color image segmentation technique and its application to

dermatoscopic image segmentation, in: Image Processing,

2000. Proceedings, 2000 International Conference, 1997, vol.

3, pp. 408–411.

[16] G. Borgefors, G. Ramella, G. Sanniti di Baja, S. Svenson,

On the multiscale representation of 2d and 3d shapes,

Graphical Models and Image Processing 61 (1) (1999)

44–62.

[17] A. Mertins, S. Singh, Embedded wavelet coding of arbitrary

shaped objects, in: Proceedings of SPIE 4076-VCIP’00, Pert,

Australia, June 2000, pp. 357–367.

[18] S.Z. Li, Markov Random Field Modeling in Image Analysis,

second ed., Springer, Tokyo, Japan, 2001.

[19] A. Murat Tekalp, Digital Video Processing, second ed.,

Prentice-Hall, USA, 1995.

[20] J. Gao, Object shape delineation during tracking process, in:

Image Processing, 2003. Proceedings, 2003 International

Conference, 2003, vol. 3, pp. III–969–72 vol. 2.

[21] M. Bertero, T.A. Poggio, V. Torre, Ill-posed problems in

early vision, Proc. IEEE 76 (8) (1988) 869–889.

[22] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour

models, Int. J. Comput. Vis. 1 (4) (1987) 321–331.

[23] J. Canny, A computational approach to edge detection,

IEEE Trans. Pattern Anal. Machine Intell. 8 (6) (1986)

679–998.

[24] F. Marques, J. Llach, Tracking of generic objects for video

object generation, in: International Conference on Image

Processing (ICIP), vol. 3, 1998, pp. 628–632.

[25] J. Besag, On the statistical analysis of lattice systems, J. Roy.

Statist. Soc. Ser. B 48 (3) (1986) 259–279.

[26] T. Meier, K.N. Ngan, G. Grebbin, A robust markovian

segmentation based on highest confidence first (hcf), in:

IEEE International Conference on Image Processing, vol. 1,

1997, pp. 216–219.

[27] S. Ray, R. Turi, Determination of number of clusters in k-

means clustering and application in colour image segmenta-

tion, in: Proceedings of the Fourth International Conference

on Advances in Pattern Recognition and Digital Techniques,

1999, pp. 137–143.

[28] C. Rosenberger, K. Chehdi, Unsupervised clustering method

with optimal estimation of the number of clusters: applica-

tion to image segmentation, in: Proceedings of 15th

International Conference on Pattern Recognition, vol. 1,

2000, pp. 656–659.

[29] A. Said, W.A. Pearlman, A new fast and efficient image

codec based on set partitioning in hierarchical trees, IEEE T.

Circ. Syst. Vid. 6 (1996) 243–250.

[30] H. Danyali, A. Mertins, Highly scalable image compression

based on sphit for network applications, in: Proceedings of

IEEE International Conference on Image Processing,

Rochester, NY, USA, September 2002, pp. 217–220.