ieee transactions on image processing, vol. 23 ...ieee transactions on image processing, vol. 23,...

11
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale, and Mirror Invariant Descriptor for Fast Shape Retrieval Bin Wang and Yongsheng Gao, Senior Member, IEEE Abstract—This paper presents a novel approach for both fast and accurately retrieving similar shapes. A hierarchical string cuts (HSCs) method is proposed to partition a shape into multiple level curve segments of different lengths from a point moving around the contour to describe the shape gradually and completely from the global information to the finest details. At each hierarchical level, the curve segments are cut by strings to extract features that characterize the geometric and distribution properties in that particular level of details. The translation, rotation, scale, and mirror invariant HSC descriptor enables a fast metric-based matching to achieve the desired high accuracy. Encouraging experimental results on four databases demonstrated that the proposed method can consistently achieve higher (or similar) retrieval accuracies than the state-of-the-art benchmarks with a more than 120 times faster speed. This may suggest a new way of developing shape retrieval techniques in which a high accuracy can be achieved by a fast metric matching algorithm without using the time-consuming correspondence optimization strategy. Index Terms— Shape description, shape retrieval, hierarchical string cuts. I. I NTRODUCTION H OW to both quickly and effectively retrieve similar shapes from a large image database is an important and challenging problem that continues attracting attentions of many researchers in computer vision and pattern recog- nition. Its importance lies in that more and more practical applications, such as plant leaf image retrieval [43], fish image retrieval [20], trademark image retrieval [21], medical tumor shape retrieval [22], have encountered the speed and accuracy trade-off barrier. The challenges are twofold. (1) Nowadays, image databases have been growing larger and larger, which make the computational cost of shape retrieval become pro- hibitively expensive (e.g. there are about 400,000 species of Manuscript received December 17, 2012; revised December 30, 2013; accepted July 18, 2014. Date of publication August 7, 2014; date of current version August 18, 2014. This work was supported in part by the Australian Research Council under Grant DP0877929 and in part by the National Natural Science Foundation of China under Grant 61372158. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gang Hua. B. Wang is with the Key Laboratory of Electronic Business, Nanjing University of Finance and Economics, Nanjing 210046, China, and also with the School of Engineering, Griffith University, Brisbane, QLD 4111, Australia (e-mail: [email protected]; bin.wang@griffith.edu.au). Y. Gao is with the School of Engineering, Griffith University, Brisbane, QLD 4111, Australia (e-mail: yongsheng.gao@griffith.edu.au). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2343457 Fig. 1. Example leaves from the same plant species that have large intra-class shape variations. Fig. 2. Five example leaves from different plant species that have small inter-class differences. plants all over the world [23] and millions of plant leaf images will be stored in the database). On the other hand, many real-time retrieval tasks (e.g. online shape retrieval) require the retrieval systems respond to the queries very quickly. Therefore, retrieval efficiency is becoming a more important issue to be addressed in real applications. (2) The shapes in databases usually have large intra-class variations (see Fig. 1) and small inter-class differences (see Fig. 2), which make it very difficult for the retrieval system to achieve a desirable retrieval accuracy. Shape description and matching are two key parts of shape retrieval. The former extracts effective and perceptually impor- tant shape features and organizes them in a data structure such as vector, string, tree and graph. While the later uses the obtained shape descriptors to determine the similarity (or dissimilarity) value of the two shapes to be compared based on a shape distance measure. The performance of any shape retrieval method ultimately depends on the type of shape descriptor used and the matching algorithm applied [31]. According to MPEG-7, a favourable shape descriptor should have a high discriminability so that it can group similar shapes together and separate dissimilar shapes into different groups, and a reliable shape descriptor should be rotation, scaling, and translation invariant [1]. In the past decade, various methods have been proposed for shape retrieval. To evaluate the performance of different meth- ods, MPEG-7 set up a group of challenge datasets (MPEG-7 Core Experiment CE-shape-1) for objective experimental com- parison [15], [32]. Many works of recent ten years [3]–[5], [12], [13], [18], [33]–[35], [37], [38] have reported good 1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 23-Apr-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101

Hierarchical String Cuts: A Translation, Rotation,Scale, and Mirror Invariant Descriptor

for Fast Shape RetrievalBin Wang and Yongsheng Gao, Senior Member, IEEE

Abstract— This paper presents a novel approach for bothfast and accurately retrieving similar shapes. A hierarchicalstring cuts (HSCs) method is proposed to partition a shapeinto multiple level curve segments of different lengths froma point moving around the contour to describe the shapegradually and completely from the global information to thefinest details. At each hierarchical level, the curve segments arecut by strings to extract features that characterize the geometricand distribution properties in that particular level of details. Thetranslation, rotation, scale, and mirror invariant HSC descriptorenables a fast metric-based matching to achieve the desired highaccuracy. Encouraging experimental results on four databasesdemonstrated that the proposed method can consistently achievehigher (or similar) retrieval accuracies than the state-of-the-artbenchmarks with a more than 120 times faster speed. This maysuggest a new way of developing shape retrieval techniques inwhich a high accuracy can be achieved by a fast metric matchingalgorithm without using the time-consuming correspondenceoptimization strategy.

Index Terms— Shape description, shape retrieval, hierarchicalstring cuts.

I. INTRODUCTION

HOW to both quickly and effectively retrieve similarshapes from a large image database is an important

and challenging problem that continues attracting attentionsof many researchers in computer vision and pattern recog-nition. Its importance lies in that more and more practicalapplications, such as plant leaf image retrieval [43], fish imageretrieval [20], trademark image retrieval [21], medical tumorshape retrieval [22], have encountered the speed and accuracytrade-off barrier. The challenges are twofold. (1) Nowadays,image databases have been growing larger and larger, whichmake the computational cost of shape retrieval become pro-hibitively expensive (e.g. there are about 400,000 species of

Manuscript received December 17, 2012; revised December 30, 2013;accepted July 18, 2014. Date of publication August 7, 2014; date of currentversion August 18, 2014. This work was supported in part by the AustralianResearch Council under Grant DP0877929 and in part by the National NaturalScience Foundation of China under Grant 61372158. The associate editorcoordinating the review of this manuscript and approving it for publicationwas Dr. Gang Hua.

B. Wang is with the Key Laboratory of Electronic Business, NanjingUniversity of Finance and Economics, Nanjing 210046, China, and also withthe School of Engineering, Griffith University, Brisbane, QLD 4111, Australia(e-mail: [email protected]; [email protected]).

Y. Gao is with the School of Engineering, Griffith University, Brisbane,QLD 4111, Australia (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2014.2343457

Fig. 1. Example leaves from the same plant species that have large intra-classshape variations.

Fig. 2. Five example leaves from different plant species that have smallinter-class differences.

plants all over the world [23] and millions of plant leaf imageswill be stored in the database). On the other hand, manyreal-time retrieval tasks (e.g. online shape retrieval) requirethe retrieval systems respond to the queries very quickly.Therefore, retrieval efficiency is becoming a more importantissue to be addressed in real applications. (2) The shapes indatabases usually have large intra-class variations (see Fig. 1)and small inter-class differences (see Fig. 2), which make itvery difficult for the retrieval system to achieve a desirableretrieval accuracy.

Shape description and matching are two key parts of shaperetrieval. The former extracts effective and perceptually impor-tant shape features and organizes them in a data structuresuch as vector, string, tree and graph. While the later usesthe obtained shape descriptors to determine the similarity(or dissimilarity) value of the two shapes to be comparedbased on a shape distance measure. The performance of anyshape retrieval method ultimately depends on the type ofshape descriptor used and the matching algorithm applied [31].According to MPEG-7, a favourable shape descriptor shouldhave a high discriminability so that it can group similar shapestogether and separate dissimilar shapes into different groups,and a reliable shape descriptor should be rotation, scaling, andtranslation invariant [1].

In the past decade, various methods have been proposed forshape retrieval. To evaluate the performance of different meth-ods, MPEG-7 set up a group of challenge datasets (MPEG-7Core Experiment CE-shape-1) for objective experimental com-parison [15], [32]. Many works of recent ten years [3]–[5],[12], [13], [18], [33]–[35], [37], [38] have reported good

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

4102 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014

retrieval rates (>75%) on the MPEG-7 CE-1 Part B dataset,with some of the most recent ones [4], [5], [13], [38], [37]achieved very high retrieval rates (>85%). However, their highretrieval accuracies are obtained with high computational timecost as these algorithms mainly rely on the primitive corre-spondence techniques such as dynamic programming (DP).For example, the inner-distance shape context (IDSC) [5]achieved a very high retrieval rate of 85.40% using DP match-ing, whereas it only obtained 68.83% when using the shapecontext distance measure to replace the DP matching part.In theory, the DP based shape matching methods have thetime complexity of O(N3) for matching two shape contourswith N sample points [3]. Take N = 100 as an example.DP based inner-distance shape context (IDSC + DP) [5] andshape tree [13] spend 8.6 hours (on the reported 2.8GHz PC)and 13.9 hours (on the reported 3GHz computer) respectivelyto retrieve a shape in a large database with 100,000 images.Actually, almost all the existing methods reported a retrievalrate greater than 80% on the MPEG-7 CE-1 Part B datasetadopted time consuming DP matching. This makes the currentmethods with high accuracies not suitable for shape retrievalin large image databases and online retrieval, resulting in anaccuracy-speed trade-off dilemma to users.

The above limitation of current algorithms motivates us todevelop a novel shape description and matching method thatconsiders both speed and accuracy in its design. A Hierar-chical String Cuts (HSC) approach is proposed in this paper,which uses a hierarchical coarse to fine cutting and codingframework to describe and match shapes. Each subcurve of theshape is characterized by a set of geometric and distributionfeatures that can capture more discriminative information ofthe shape than only using a single geometry feature suchas curvature [39], angle pattern [12], [48], [49], pairwisedistances [46], [47], integral invariants [19], [42] and trianglearea [4], [43], [53]. The proposed HSC achieved 87.31%retrieval rate on the MPEG-7 CE-1 Part B dataset, the highestaccuracy on leaf retrieval, leaf classification and the secondhighest retrieval accuracy on marine animal retrieval experi-ment, which were achieved with a speed of over 120 timesfaster than the state-of-the-art benchmarks.

The remainder of the paper is organized as follows:A brief review of related work is presented in Section 2.In Section 3, we describe the details of the proposed Hierarchi-cal String Cuts (HSC) method. The computational complexityof HSC is analysed and compared to the state-of-the-artbenchmark methods in Section 4. Four groups of experimentsare presented and analysed in Section 5. Finally, we drawconclusions in Section 6.

II. RELATED WORK

Various shape description and matching methods have beenproposed in the past decades [10], [31]. They can be broadlyclassified into two categories, feature metric measurementmethod and primitive correspondence method. A feature met-ric measurement method extracts discriminative features of thetwo shapes to be matched to form two numeric vectors of aspecified length, whose similarity or dissimilarity is measuredby a metric distance such as L1 or L2. In the contrary,

a primitive correspondence method regards the shape as anaggregation of primitives, where primitives can be points,curve segments, line segments, etc. At the shape descriptionstage, a set of features/attributes is extracted for each primitive,and at the shape matching stage, all pairs of primitives ofthe two shapes are first compared to generate a cost matrix(or distance matrix). The least matching cost obtained by theoptimal correspondence of the primitives between the twoshapes is considered as their dissimilarity.

A. Feature Metric Measurement Method

Fourier descriptors (FDs) [40] is a classical feature metricmeasurement method, in which a 2D shape contour is firstrepresented as an 1D signature, such as complex coordi-nates [5], centroid distance [41], farthest point distance [9].A discrete Fourier transform is then applied to the signa-ture to generate a feature vector using Fourier coefficients.Zhang et al. [8] conducted a large amount of shape retrievalexperiments to evaluate the performance of FDs derived fromvarious shape signatures and reported that the FDs derivedfrom the centroid distance signature had a higher retrievalperformance and is more suitable for shape retrieval than usingother shape signatures. Most recently, EI-ghaza et al. [26]treat the curvature-scale-space representation of a shape con-tour as a binary image and apply a 2D Fourier transform to it.The obtained descriptor captures the detailed dynamics of theshape curvature and the Euclidean distance metric is used forshape matching.

Other spectral transform based shape descriptors have alsobeen proposed. Chuang et al. [17] used 1D continuouswavelet transform to create a multi-scale shape representation.Yang et al. [44] proposed a wavelet descriptor independentof the starting-point location of a contour. As the waveletcoefficients are not invariant to the rotation of a shape,Kunttu et al. [7] applied Fourier transform to the waveletcoefficients and used the obtained Fourier coefficients todescribe the shape. Wang et al. [30] used sequency-orderedcomplex Hadamard transform [27] to build a new shapedescriptor.

The spectral transform based descriptors are very compact.This is because the energy of the signal concentrates on thelow frequency components and only dozens of low frequencycoefficients can describe the shape effectively. In addition,they are also robust to noise because the noise predominantlydistributes in the high frequency coefficients which havebeen discarded in building the shape descriptor. However,these methods only use a single geometric feature such ascurvature, centroid distance, and farthest point distance toconstruct 1D shape signature and lose detailed informationthat is discarded in the transform domain. This often results inweak discrimination ability of this type of shape recognitionmethods. According to the reported results on the MPEG-7CE-1 Part B dataset [17], [30] which are also consistentwith the results from our implementations of these algorithms(see Table II), their retrieval rates are not better than 70%.

Efforts have also been made to yield shape descriptordirectly from the spatial domain. These methods can beclassified into curvature based and boundary point relationship

Page 3: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

WANG AND GAO: HIERARCHICAL STRING CUTS 4103

based techniques. Curvature is a fundamental property ofshape. The curvature scale space method (CSS) [52] is a well-known curvature based method, which has been recommendedas one of the standard shape descriptor by MPEG-7 [11].In CSS, Gaussian kernel with increasing standard deviation isused to smooth the contour at different scales. The inflectionpoints are located at each scale which results in a binary image,termed CSS image. The maxima of the CSS images’ contourare used for shape matching. The typical method of computingcurvatures is differential techniques. However it amplifiesnoise and is not stable. To address this issue, Manay et al. [19]used integral invariants instead of differential invariants tocompute the curvature features of a shape contour. Mostrecently, Kumar et al. [42] used the area integral invariantscombined with the arc-length integral invariants for plant leafidentification.

The features of the triangles [4], [53] formed by shapeboundary points are also used as an alternative of curvature.Rubé et al. [53] used the signed triangle area (TAR) to reflectthe concave/convex properties of the shape boundary. Thewavelet transform is used for smoothing and decomposing theshape boundaries into multiscale levels. The TAR image ateach scale that is similar to the CSS image is used for shapematching. Mouine et al. [43] recently extended the trianglearea representation into two new descriptors, termed triangleoriented angles (TOA) and triangle side lengths and anglerepresentation (TSLA) which can provide more discriminativeinformation than those of only using the area of triangles. Thelocality sensitive hashing [29] technique is used for matching.

Foteini et al. [49] used the multiscale angle codesequence incorporated with the Mutual Nearest PointDistance (MNPD) [28] for shape matching. The limitation ofthis method is its expensive computation cost. Most recently,Hu et al. [48] used the angles formed by the boundary pointand its two neighbor points of equal distance from it toform an angle pattern (AP). The relationships of the neighborAPs are encoded into a binary string, termed binary angularpattern (BAP). The multiscale AP and BAP are used to buildhistogram for shape matching.

Boundary point relationship based methods focus on cap-turing the geometrical properties from the space relationshipof the pairs of boundary points. Hu et al. [47] recentlyproposed a novel shape descriptor, termed multiscale distancematrix (MDM). In MDM, the distances between all possiblepairs of shape boundary points are calculated to form amultiscale distance matrix, where scale is the arc length. Eachrow of the obtained matrix captures certain range of geometricproperties. Through shifting and sorting operations, an invari-ant matrix is created for shape description. The L1 metric isused for shape similarity measurement. Biswas et al. [46] used,for each pair of landmark points, the inner distance betweentwo points, relative angle, contour distance and articulation-invariant center of mass associated with the pair of pointsto form a feature vector. Each of them is quantized and ismapped on the appropriate bins in the hash table. The numbersof feature vectors hashed to various bins are formed into asequence. The χ2 distance metric is used for shape similaritymeasure.

B. Primitive Correspondence Method

In recent years, many researchers dedicate to developeffective shape description and matching methods using cor-respondence based techniques to improve the recognitionaccuracy. The well-known shape context (SC) [18] builds ahistogram primitive for each contour point to describe therelative distribution between the point and the remainingpoints, which provides rich shape information for finding thebest point correspondence between two shapes in a point-by-point manner. It reported 76.51% retrieval rate on the MPEG-7CE-1 Part B dataset, and achieved 86.8% when dynamicprogramming (DP) is used in the matching process [14].To make the shape context descriptor robust to articula-tion, Ling et al. [5] replace Euclidean distance with inner-distance (ID), which is defined as the shortest path betweentwo contour points within the shape boundary, to extend theSC method to a novel shape description method termed inner-distance shape context (IDSC). IDSC achieved a promis-ing retrieval rate of 85.4% on the MPEG-7 CE-1 Part Bdataset.

Inspired by the observation that smoothing a closed con-tour makes the convex and concave points move inside andoutside the contour, respectively, Adamek et al. [3] proposeda novel multi-scale convexity/concavity (MCC) method forshape matching. MCC used the relative displacement of acontour point with respect to its position at the preceding scalelevel to measure the convexity and concavity properties at aparticular scale. The relative displacements of all the scalelevels for each contour point are captured to calculate thedistance between each pair of contour points. The resultingdistance matrix is used to find the best one-to-one dense pointcorrespondence. A retrieval rate of 84.93% is reported on theMPEG-7 CE-1 Part B dataset. Alajlan et al. [4] proposedanother multi-scale shape descriptor, termed triangle area rep-resentation (TAR). It utilizes the areas of the triangles formedby the boundary points to measure the convexity/concavity ofeach contour point at different scales, where scale denotes thelength of the arc which associates with the triangle formed bythe contour point and its two neighbour points. The multi-scaletriangle areas associated with each contour point are used toconstruct the distance matrix for finding the best one-to-onepoint correspondence. Tested on the MPEG-7 CE-1 Part Bdataset, it reported a 85.03% of retrieval rate and achieved87.23% if three global shape features aspect ratio, eccentricityand solidity are included in the shape matching.

To explicitly capture both global and local geometric proper-ties of the shape of an object, Felzenszwalb et al. [13] proposeda shape tree method, in which a shape contour is modelled toa full binary tree by recursively dividing it into two subcurvesof the same length. The dividing points are taken as the node ofthe tree. Each node of the tree stores the midpoint location ofthe subcurve relative to their start and end points. Those nodesat the bottom of the tree capture local geometric propertieswhile the nodes near the root capture more global information.When matching curves A and B, a shape tree is built forA to look for a mapping from points in A to points in Bsuch that the shape tree of A is deformed as little as possible.

Page 4: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

4104 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014

Fig. 3. String and string cut. (a) A curve segment of horse contour and its string. (b) The string cuts the curve segment into three groups; one falls on thestring, and the other two fall to the two sides of the string, respectively.

Shape tree is an interesting method and achieved a very highretrieval rate (87.70%) on MPEG-7 CE-1 Part B dataset.

Primitive correspondence methods, including [12],[33]–[38], [50], and [51] are reported higher discriminativecapability and often achieved higher than 80% retrievalaccuracy on the MPEG-7 CE-1 Part B dataset. However theprimitive correspondence methods are computationally slowmaking them less practical in real world applicationsparticular for large database retrieval, as they use optimisationalgorithm such as dynamic programming to look for thebest primitive correspondence between two shapes. Theircomputational complexities range from O(N2) to O(N4).

III. THE PROPOSED METHOD

In this section, we first describe the proposed string cutprocess for characterizing a curve segment, and then introducethe hierarchical string cuts that can encode a shape completelyfrom global and coarse information to fine local details.Next, the invariances of the proposed hierarchical string cutsdescriptor are discussed. Finally, the dissimilarity measure ispresented for matching shapes.

A. String Cut

A shape can be effectively represented by a sequence ofpoints sampled from the contour on the object with uniformspacing [3]–[5], [13], [14]. The benefits of this representationis that it is not required to seek key-points such as points ofmaximum curvature and we can obtain as good an approx-imation to the underlying continuous shapes as desired bypicking the number of sample points to be sufficiently large(see Fig. 3(a) as an example). Therefore, a shape ϕ can bedescribed in a form of sample point sequence ϕ = {pi (xi , yi ),i = 1, . . . , N}, where i is the index of the sample pointaccording to its order along the contour in counter-clockwisedirection, (xi , yi ) is the coordinates of the sample points pi ,and N is the total number of sample points on the contour.In the design of the proposed HSC, N is required to be in thepower of two (N = 256 in our experiments). Since the contouris a closed curve, we have pN+i = pi and p−i = pN−i , fori = 1, . . . , N , i.e. the contour is treated as a periodic sequencewith a period of N .

1) Curve Partition by String: Let sequence Aij ={pi , pi+1, . . . , p j } denote a piece of curve segment of thecontour of the shape ϕ, which starts from the point pi and ends

at the point p j . We define the straight line passing through thepoints pi and p j as the string ξi j to cut the curve segment.Aij can be possibly cut into three groups Sr , Sl and So, whereSr is the set of the points falling to the right side of the stringξi j , Sl is the set of the points falling to the left side of the stringξi j , and So is the set of the points falling on the string ξi j .Obviously we have Sr ∪Sl ∪So = Aij , Sr ∩Sl = φ, Sr ∩So = φand Sl ∩ So = φ, i.e. {Sr , Sl , So} is a string cut partition ofcurve segment Aij . Fig. 3 (b) illustrates an example of thestring cut process on a curve segment of horse shape from theMPEG-7 database.

2) String Cut Features: Given a piece of curve segment Aij ,let {Sr , Sl , So} be its string cuts. The geometric properties ofthe curve segment Aij can be depicted by the distributions ofthese cuts including deviations to string ξi j on both sides ( f1and f2), imbalance of cut ( f3), and degree of bending ( f4) asdefined below.

f1 = max

⎛⎝ 1

Tr

∑pk∈Sr

h(pk, ξi j ),1

Tl

∑pk∈Sl

h(pk, ξi j )

⎞⎠, (1)

f2 = min

⎛⎝ 1

Tr

∑pk∈Sr

h(pk, ξi j ),1

Tl

∑pk∈Sl

h(pk, ξi j )

⎞⎠, (2)

f3 = Tr − Tl , (3)

f4 = Li j

d(pi , p j ), (4)

where Tr , Tl and To are the numbers of sample points inSr , Sl and So respectively, Li j is the length of the curvesegment, h(pk, ξi j ) is the perpendicular distance from pointpk to the string ξi j which can be calculated by

h(pk, ξi j ) =∣∣(xk − xi )(y j − yi )− (yk − yi )(x j − xi )

∣∣√(xi − x j )2 + (yi − y j )2

, (5)

and d(pi , p j ) denotes the Euclidean distance between thepoints pi and p j .

These string cut features as a whole express the configu-ration and the behavior of the entire curve segment relativeto the reference string and also ensure their invariance to thepossible swapping of start and end points of the curve segment(see invariance analysis in Section C).

Page 5: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

WANG AND GAO: HIERARCHICAL STRING CUTS 4105

Fig. 4. An example result of hierarchical string cuts for a point S athierarchical levels t = 1, . . . , 4.

B. Hierarchical String Cuts and Their Signatures

The string cuts on longer curve segments capture the moreglobal shape information, while those on shorter ones areassociated with the finer details of the curves. Here we proposea Hierarchical String Cuts (HSC) method that divides a closedcontour into curve segments of different lengths, to provide amultiple level coarse-to-fine description.

At hierarchical level t , for each sample point pi , i =1, . . . , N , we take a piece of curve segment Aij ={pi , pi+1, . . . , p j } starting from pi and ending at p j .The location of the end point p j and the length ofthe curve segment are determined by the hierarchicallevel t using j = i + N

2t . The string cut featuresfor point pi at level t, { f t

1 (i), f t2 (i), f t

3 (i), f t4 (i)}, can be

calculated using Eqs (1-4). Thus, for each hierarchicallevel t , we obtain four N-dimension Sting Cut Signaturesf t1 (i), f t

2 (i), f t3 (i), f t

4 (i), i = 1, . . . , N to describe the shapein a coarse/fine level controlled by t . Each string cut signatureis a sequence (with a length of N) of string cut features.As the number of sample points on the contour ϕ is N , thereare log N − 1 levels for t = 1, . . . , log N − 1 that divide ϕinto curve segments from the coarsest ( N

2 + 1)-point lengthto the finest 3-point length. Fig. 4 gives an example of thehierarchical string cut process for a single contour point Sthat divide the shape into curve segments of different lengthsat the coarsest hierarchical levels t = 1, . . . , 4.

C. Invariance of Shape Descriptor

A good shape descriptor should be translation, scale, rota-tion and mirror invariant. Since the string cut features ofa curve segment are calculated solely with respect to itsstring, they have the intrinsic invariance to translation; andso do the string cut signatures that are derived from thesefeatures. To make the string cut signatures invariant to scaling,each signature f t

g (i), g = 1, . . . , 4 is locally normalisedby dividing its maximum value max

i=1,...,Nf tg (i). It should be

pointed out that since f t3 (i) can take negative values, it is

normalized by maxi=1,...,N

∣∣ f t3 (i)

∣∣. When maxi=1,...,N

f tg (i) = 0, the

above normalisation division cannot be executed. In this case,the normalization division is omitted in implementing thealgorithm as the signature with all 0’s is already invariant toscaling.

At hierarchical level t = 1, . . . , log N − 1, for shapesignature f t

g (i), g = 1, . . . , 4, the magnitudes of their Fouriertransform coefficients are calculated by

Ftg(k)=

1

N

∣∣∣∣∣N−1∑i=0

f tg (i) exp

(− j2π ik

N

)∣∣∣∣∣ , k = 0, . . . , N − 1.

(6)

From theory, the above obtained Ftg(k), k = 0, . . . , N − 1

are invariant to the location of start point pi of string cutcurve segments, and thus invariant to rotation of the wholeshape. To make the generated shape descriptor robust tonoise and compact, the lowest M order coefficients Ft

g(k),t = 0, . . . ,M − 1 where M � N are used to describe theshape.

Finally the hierarchical string cuts (HSC) shape descriptorψ = {ψ1, . . . , ψlog N−1} is a combination of descriptors ψt

from all hierarchical levels of t defined as

ψt = {Ftg(k), σ

tg |g = 1, . . . , 4; k = 0, . . . ,M − 1}, (7)

where σ tg is the standard deviation of the string cut features.

It contains supplementary information to Ftg(0) (which is the

mean value) for each signature f tg .

Mirror invariant matching is one of the requirements forplanar shape recognition methods. Most of the existing tech-niques [3]–[5] address this invariance problem by performingthe same shape matching algorithm twice, one is betweenshapes A and B and the other is between shape A and themirrored shape of B. To avoid the computational cost ofapplying the matching algorithm on the mirrored shape, theproposed HSC shape descriptor is designed intrinsically asmirror invariant, that is, ψ = {ψ1, . . . , ψlog N−1} remainsthe same if the start point pi and the end point p j areinterchanged. In Eqs (1) and (2), f1 and f2 are designed asthe deviations to its string on both sides according to theirvalues instead of locations, ensuring their invariance to theinterchange of pi and p j . Imbalance of cut f3 changes signwhen pi and p j are interchanged to make its shape signaturef t3 (i) function only have a 180° phase difference, resulting a

same Ft3(k), k = 0, . . . , N−1. As f4 is also independent to the

direction of shape contour, the HSC shape descriptor is mirrorinvariant.

D. Shape Dissimilarity Measure

Given two HSC descriptors ψ(A) = {F (A)tg (k), σ (A)tg } andψ(B) = {F (B)tg (k), σ (B)tg }, extracted from shapes A andshape B, respectively, where g = 1, . . . , 4, k = 0, . . . ,M − 1and t = 1, . . . , log N − 1, we first compare them ineach hierarchical level t = 1, . . . , log N − 1 by calculat-ing the sub-level dissimilarity at hierarchical level t using

Page 6: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

4106 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014

TABLE I

COMPARISON OF THE COMPUTATIONAL COMPLEXITY OF DIFFERENT METHODS AT THE STAGE OF SHAPE DISSIMILARITY MEASUREMENT

Eq. (8):

Dt (A, B)=4∑

g=1

wg

(M−1∑k=0

∣∣∣F (A)tg (k)−F (B)tg (k)∣∣∣+

∣∣∣σ (A)tg −σ (B)tg

∣∣∣),

(8)

where wg are the weights to allow the adjustment of contri-bution from each string cut feature. The dissimilarity betweentwo shapes A and B can be easily considered as an aggregationof log N − 1 sub-level dissimilarities as

D(A, B) =log N−1∑

t=1

Dt (A, B). (9)

However, the case of t = 0 is not considered yet in the design,in which the length of subcurve equals to N , i.e. the curvesegment is the whole shape contour. Many contour globalfeatures such as circularity, eccentricity, convexity, ratio ofprinciple axis, elliptic variance, circular variance, have alreadybeen developed for shape description. To complete the designof the proposed approach, here, we take the two existingglobal contour features, eccentricity (E) and rectangularity (R),as the features of hierarchical level t = 0. Without losinggenerality, other global contour features can be used as well.The dissimilarity of hierarchical level 0 is:

D0 =∣∣∣E (A) − E (B)

∣∣∣ +∣∣∣R(A) − R(B)

∣∣∣ . (10)

And the overall dissimilarity between two shapes A and Bbecomes

D(A, B) =log N−1∑

t=0

Dt (A, B). (11)

IV. COMPUTATIONAL COMPLEXITY

The computational cost of shape retrieval consists of twoparts. One is the time for computing the shape descriptor,and the other is that used for performing shape dissimilaritymeasurement. The later part is more important and plays adominant role in determining the retrieval speed than theformer one, particularly when the size of database becomeslarger. This is because for a shape retrieval task, only thequery shape is required to create its descriptor online and allthe descriptors of gallery shapes can be calculated offline tobe stored in the database beforehand; whereas shape dissimi-larity measurements are conducted online between the querydescriptor and all gallery descriptors in the database. In thissection, we provide a theoretical computational complexityanalysis for the proposed HSC in comparison with the state-of-the-art methods (Experimental comparisons in computationaltime are also conducted in Section V).

In creating the HSC descriptor, for each hierarchical levelt = 1, . . . , log N − 1, the complexity of calculating the stringcut signatures for the whole contour is O

(N2

2t

)because the

time to compute the string cut features for each curve segmentAij = {pi , pi+1, . . . , p j } is O

( N2t

), where N is the number

of the sample points of the contour and j = i + N2t . Then the

complexity of calculating the signatures for all the hierarchicallevels is

O

(N2

(1

21 + · · · + 1

2K

))= O

(N2

(1− 1

2K

))= O

(N2

).

(12)

For each obtained signature (4(log N − 1) signatures in total),the complexity of calculating its Fourier transform coefficientsand standard deviation are O(N log N) and O(N) respectively.Therefore the time of computing the Fourier coefficients andstandard deviations for all the signatures is

O(4(log N − 1)(N log N + N)) = O(N log2 N). (13)

The overall complexity of creating the proposed HSC shapedescriptor is O(N2 + N log2 N).

At the dissimilarity measurement part, the complexity ofcalculating Eq. (8) is O(4(M + 1)) = O(M). Therefore, thecomplexity of calculating Eq. (9) is

O(M(log N − 1)) = O(M log N). (14)

Since calculating Eq. (10) only requires time O(1), the time ofcomputing Eq. (11), i.e. the overall computational complexityof dissimilarity measurement is O(M log N). In Table I,we compare our method with several recent representativemethods which are reported high retrieval accuracies (>80%)on the MPEG-7 Part B retrieval test, where N is the numberof sample points of the contour, and K (for IDSC + DP [5]and SC + DP [14]) denotes the number of possible startingpoints for alignment used in their dynamic programmingpart. From this table, we can see that the proposed HSChas the lowest order of computational complexity among thecompeting methods with high recognition accuracies.

V. EXPERIMENTS

To evaluate the effectiveness and efficiency of the pro-posed approach, an extensive experimental investigation isconducted on MPEG-7 shape, plant leaf shape, and marineanimal shape databases. The performances of the proposedmethod are compared with the state-of-the-art approaches inboth accuracy and speed. In all the experiments, the sameparameters (M = 7,w1 = 1.4,w2 = 0.5,w3 = 0.5,w4 = 0.4)that is heuristically determined, are used without tuning thesystem setting to best suit individual dataset.

Page 7: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

WANG AND GAO: HIERARCHICAL STRING CUTS 4107

Fig. 5. MPEG-7 Part B shape database, which contains 70 classes with20 images in each class.

† [14] replaced the thin plate spline matching process used in [18]with Dynamic Programming, which increased the retrieval rate from76.51% [18] to 86.8%.

A. MPEG-7 Shape Database

The MPEG-7 CE-1 Part B shape database [15] is a widelyused dataset for evaluating performances of similarity-basedshape retrieval. It contains 1400 shape images, consistingof 70 classes of various shapes with 20 images in eachclass (see Fig. 5). The retrieval accuracy is measured usingthe well-known “bulls-eye test” [3]–[5], [14], [15]. In thismeasurement, each shape is used in turn as a query andmatched with all the shapes in the database. The number ofcorrect matches (that is the retrieved shape and the queryshape belong to the same class) in the top 2 × 20 = 40retrieved shapes that have the smallest dissimilarity valuesare counted. Since the maximum number of correct matchesfor a single shape is 20, the total number of correct matchesis 1400 × 20 = 28000. The percentage of matched shapesout of 28000 is the retrieval rate of the bulls-eye test. Thecomputational time of each shape retrieval is the time ofmatching the query with all the 1400 shapes including thefeature extraction time of the query shape. To finely comparethe behavior of the proposed HSC against benchmarks, theirPrecision-Recall (PR) curves are provided.

The retrieval rate and speed of the proposed Hierarchi-cal String Cuts (HSC) approach together with those of thestate-of-the-art approaches, including the well-known innerdistance (IDSC) [5], shape contexts (SC) [18] and their

Fig. 6. Precision-Recall (PR) curves of the proposed HSC and the state-of-theart approaches, including TSDIZ [2], TAR+WT [53], MDM [47], CBFD [26],ASD&CCD [49], FD [6], [8], FPD + Global [9], CSS [52], TAR + DP [4],IDSC + DP [5], MCC [3] and SC + DP [4], [18], obtained on the MPEG-7Part B shape database. The non-DP methods are plotted in red, while the DPmethods are plotted in blue. ∗ indicates that the results are from the publishedpapers.

Fig. 7. The comparative retrieval results of 4 shapes obtained by the proposedHSC, SC + DP [18], and IDSC + DP [5]. Each column shows the queryshape (top image) and the 10 most similar shapes retrieved from the MPEG-7database (the 2nd-11th images). The number below each retrieved shape is itsdissimilarity score. The incorrect hits are circled in red colour.

variants are tabulated in Table II. Their Precision-Recall (PR)curves are plotted in Fig. 6. Fig. 7 provides 4 detailed examplesthat list the top 10 retrieved shapes and their scores obtainedby the proposed HSC, IDSC [5], SC [18].

Page 8: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

4108 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014

It can be seen that the proposed approach achieved similarretrieval accuracy to inner distance with dynamic program-ming (IDSC + DP), shape contexts with dynamic program-ming (SC + DP), and other dynamic programming basedalgorithms with the highest reported accuracies. The 87.31%of retrieval rate of the proposed HSC is slightly higher thanthe renowned IDSC + DP (85.40%), SC + DP (86.80%), butlower than that of Shape Tree method [13] by 0.30%, placingit among the most accurate shape retrieval algorithms. Onthe other hand, the above accuracy is obtained in a speedover 120 times faster than those methods with comparableaccuracies (only 0.61%, 0.51%, 0.77% and 0.65% of theretrieval time used by MCC [3], TAR + DP [4], IDSC +DP [5] and SC + DP [14], [18]). In our experiments, allthe algorithms are written in Matlab and are run on a PCwith Intel Core-2 Duo 2.8 GHz CPU and 2 GB DDR2 RAMunder Windows XP. Due to the very large computation cost indynamic programming (DP) part of the benchmark algorithms,the DP part in MCC [3], TAR + DP [4], IDSC + DP [5]and SC + DP [14], [18] are implemented in C allowing thecomparative experiments to be completed in an acceptabletime. Note that the retrieval speed of Shape Tree [13] wasnot reported. But, as its computational complexity is O(N4)(see Table I), it is computationally more expensive than theother four benchmarks (whose complexities are O(N3) andO(K N2) respectively).

The performances of some fast non-DP shape matchingalgorithms are also listed in Table II for comparison purpose,whose accuracies are much lower than the proposed methodand the state-of-the-art benchmarks by around 20%.

Fig. 7 gives the retrieval results of four typical shapesobtained using the SC + DP, IDSC + DP and the proposedapproach. The results are listed and sorted in ascending orderof dissimilarity (the first 10 ranked matches are shown). Forthe first query shape with small occlusions, the proposedmethod obtains 7 correct hits which is more than 5 forSC+DP and 2 for IDSC+DP, indicating the more robustnessof the proposed approach to small occlusions than SC andIDSC. For the second shape which have larger occlusions,the performance of the proposed method decreases drastically(only obtains 2 correct hits), while the SC and IDSC arestable which indicates that SC and IDSC are more robustto larger occlusions than the proposed method. For the thirdand the fourth query shapes with large intra-class variations,the proposed method obtains 9 and 7 correct hits respectivelywhich are much better than 0 and 4 for SC, and 3 and 6for IDSC. These comparative results shows that the proposedapproach, without using dynamic programming, can achieve asimilar/better accuracy as SC and IDSC with an over 120 timesfaster speed for large database on-line retrieval.

B. Plant Leaf Databases

In this section, we apply the proposed HSC on a real andchallenging application, plant leaf image retrieval and clas-sification, and compare its performance with the benchmarkapproaches. The challenge of this application lies in the highinter-class similarity between some of the leaf shapes and

Fig. 8. Plant leaf image database collected by ourselves, which contains100 species with 12 images in each class.

TABLE III

RETRIEVAL RESULTS ON THE PLANT LEAF IMAGE DATABASE

the large intra-class variations of plant leaves from the samespecies.

1) Leaf Retrieval: The same performance evaluationmethod as used in MPEG-7 shape dataset is applied on aplant leaf image database collected by ourselves. It contains1200 leaf images collected from 100 plant species, with12 different leaves from each class (see Fig. 8).

Table III shows the comparative results of the proposedHierarchical String Cuts (HSC) approach and the state-of-the-art benchmarks including the well-known inner distance(IDSC+DP) [5] and shape contexts (SC+DP) [14], [18]. ThePrecision-Recall (PR) curves of these methods are plotted inFig. 9. It can be seen that the proposed approach achieves thehighest retrieval rate and obtains the best Precision-Recall (PR)curve among all the competing methods. The 89.40% ofretrieval rate of the proposed HSC is 3.76% and 2.58% higherthan the inner distance (IDSC) [5] and the shape contexts[14], [18] respectively, and is 12.30% and 11.74% higher thanthe other dynamic programming based algorithms MCC [3]and TAR + DP [4] respectively. It is worth noting that thishigh accuracy is achieved with a speed of over 125 timesfaster than those dynamic programming based methods (only0.61%, 0.54%, 0.80% and 0.70% of the retrieval time usedby MCC [3], TAR + DP [4], IDSC + DP [5] and SC + DP[14], [18], respectively), The performances of some fast shapematching algorithms are also listed in Table III for comparisonpurpose, whose accuracies are much lower than the proposedmethod by around 25%.

Page 9: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

WANG AND GAO: HIERARCHICAL STRING CUTS 4109

Fig. 9. The Precision-Recall (PR) curves of the proposed HSC and thestate-of-the-art approaches on the plant leaf dataset collected by ourselves.

Fig. 10. Fifteen samples (one per species) from the Swedish leaf database.

TABLE IV

CLASSIFICATION RATES ON THE SWEDISH LEAF DATABASE. THE VALUES

WITH ∗ ARE FROM THE PUBLISHED PAPERS

2) Leaf Classification: The publicly available Swedish leafdatabase, which comes from a leaf classification project atLinkoping University and the Swedish Museum of NaturalHistory [45] is also used in our experiments. The databasecontains 15 different Swedish tree species (see Fig. 10) with75 samples in each species. Due to the nature of this database(small number of classes and large number of samples ineach class), the same leaf classification protocol and accuracymeasurement as used in [5], [13], and [45] are adopted in ourexperiment. 25 images randomly selected from each speciesare used as models and the remaining images are used astesting images. The classification rate is calculated using thenearest-neighbor classification rule. The comparative results ofthe proposed Hierarchical String Cuts (HSC) approach withthe state-of-the-art approaches are listed in Table IV. FromTable IV, it can be seen that the proposed approach maintained

Fig. 11. Samples from 200 bream fishes (a) and 1100 marine animals (b).

TABLE V

RETRIEVAL RESULTS ON THE MARINE DATABASE

the highest classification accuracy (96.91%), consistent withthe previous results in retrieval setting.

C. Marine Database

The marine database was originally built byMokhtarian et al. [24]. It consists of 1100 marine animals(samples are shown in Fig. 11 (b)) which are unclassified.In MPEG-7 (Part C), this dataset is combined with 200 breamfish images which are frames extracted from a video clip of aswimming bream fish (examples are shown in Fig. 11 (a)) toform a database consisting of 1100 + 200 = 1300 images fortesting the robustness to shape changes caused by no-rigidmotion and different viewing angle [15]. In our experiment,each of the 200 bream fish images is used as a query inturn to match against the 1300 images in the database. Thenumber of bream fish images in the top 200 most similarshapes is counted, and the percentage of the matched shapesout of 200 is calculated as retrieval rate (see also [12], [16],[17], [25], [26]).

The comparative results of the proposed Hierarchical StringCuts (HSC) approach with MCC [3], TAR + DP [4], IDSC +DP [5], and SC + DP [14], [18] are given in Table V.Considering the large shape variations caused by non-rigidmotion and different viewing direction (see Fig. 11 (a)), itis very encouraging to see that the proposed HSC possessesthe similar level of tolerance to shape distortions as the bestperforming approaches, while its computational speed is morethan 133 times faster than the competing algorithms (188.3,184.7, 133.8 and 165.1 times faster than MCC, TAR + DP,IDSC + DP and SC + DP respectively). The average retrievalrate (over 200 tests by using every bream fish shape asthe query in each test) for HSC remains higher than MCC,TAR + DP and Shape Context. It is also worth commentingthat our result shows that the Inner Distance approach is themost robust to distortions (which support the claim in [5]) aswe can see that IDSC+DP becomes the most accurate one in

Page 10: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

4110 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014

Fig. 12. Retrieval accuracy of the proposed method on MPEG-7 Part Bshapes corrupted by different levels of Gaussian noise.

this experiment, while its accuracy is ranked the second andthe third in the leaf databases experiments.

D. Sensitivity to Noise

To evaluate the sensitivity to noise, a group of experimentswith different levels of additive noise are conducted. We adoptthe same noise perturbation scheme as used in [9], [19],[26], and [50]. For each shape image in MPEG-7 Part Bdataset, its coordinates (x, y) of the boundary points are ran-domly varied by a Gaussian noise with different variance. Thenoise corruption of shape is measured by the signal-to-noiseratio (SNR) as

SN R = 10 logσ 2

s

σ 2n, (15)

where σ 2s and σ 2

n are the variances of the signal and noisesequences, respectively. In our experiments, various levels ofGaussian noise (SNR = 20dB − 50dB) are added to thequery shapes in MPEG-7 Part B dataset, respectively. Fig. 12summarizes the accuracies of retrieving shapes corrupted byvarious levels of noise. It can be seen that the accuracyis not affected when the additive noise is at the level ofSNR = 50dB, and then drops gracefully when noise increasesto SNR = 30dB. When the shapes are heavily corrupted tothe level of SNR = 20dB (see the leftmost corrupted shape),the proposed HSC still maintained an accuracy over 70%.

VI. CONCLUSION

In this paper, we have presented a new hierarchical stringcuts (HSC) approach to both fast and effectively retrievesimilar shapes. From a start point moving along the shapecontour, the shape is partitioned into multiple level curvesegments of different lengths, which are cut by their corre-sponding strings for extracting their geometric and distribu-tion properties. At the coarse hierarchical levels, the stringcut features of longer curve segments describe the globalproperties of the shape; while at the fine levels, the featuresextracted from the shorter ones depict the detailed informationof the shape. The proposed HSC descriptor is a translation,rotation, scaling and mirror invariant shape descriptor. Thedissimilarity of two shapes can be efficiently measured by

comparing their string cut signatures of all hierarchical levelsusing the L1 distance without the need of using the timecostly dynamic programming algorithm. The performance ofthe proposed HSC has been evaluated extensively on threewell-known shape databases and a leaf database of 100 plantspecies collected by ourselves. The experimental results showthat the proposed HSC method can consistently achieve thesame level of highest retrieval accuracy as the state-of-the-artmethods with an over 120 times faster speed. This indicatesthe potential of metric based shape matching strategy indeveloping both fast and accurate algorithms for large databaseretrieval and/or online retrieval.

REFERENCES

[1] H. Kim and J. Kim, “Region-based shape descriptor invariant to rota-tion, scale and translation,” Signal Process, Image Commun., vol. 16,nos. 1–2, pp. 87–93, 2000.

[2] F. A. Andaló, P. A. V. Miranda, R. da S. Torres, and A. X. Falcäo,“Shape feature extraction and description based on tensor scale,” PatternRecognit., vol. 43, no. 1, pp. 26–36, 2010.

[3] T. Adamek and N. E. O’Connor, “A multiscale representation methodfor nonrigid shapes with a single closed contour,” IEEE Trans. CircuitsSyst. Video Technol., vol. 14, no. 5, pp. 742–743, May 2004.

[4] N. Alajlan, I. E. Rube, M. S. Kamel, and G. Freeman, “Shape retrievalusing triangle-area representation and dynamic space warping,” PatternRecognit., vol. 40, no. 7, pp. 1911–1920, 2007.

[5] H. Ling and D. W. Jacobs, “Shape classification using the inner-distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2,pp. 286–299, Feb. 2007.

[6] T. P. Wallace and P. A. Wintz, “An efficient three-dimensional aircraftrecognition algorithm using normalized Fourier descriptors,” Comput.Graph. Image Process., vol. 13, no. 2, pp. 99–126, 1980.

[7] I. Kunttu, L. Lepistö, J. Rauhamaa, and A. Visa, “Multiscale Fourierdescriptors for defect image retrieval,” Pattern Recognit. Lett., vol. 27,no. 2, pp. 123–132, 2006.

[8] D. Zhang and G. Lu, “Study and evaluation of different Fourier methodsfor image retrieval,” Image Vis. Comput., vol. 23, no. 1, pp. 33–49, 2005.

[9] A. EI-ghazal, O. Basir, and S. Belkasim, “Farthest point distance:A new shape signature for Fourier descriptors,” Signal Process., ImageCommun., vol. 24, no. 7, pp. 572–586, 2009.

[10] D. Zhang and G. Lu, “Review of shape representation and descriptiontechniques,” Pattern Recognit., vol. 37, no. 1, pp. 1–19, 2004.

[11] (2008). The MPFG Home Page. [Online]. Available: http://www.chiariglione.org/mpeg

[12] N. Arica and F. T. Y. Vural, “BAS: A perceptual shape descriptor basedon the beam angle statistics,” Pattern Recognit. Lett., vol. 24, nos. 9–10,pp. 1627–1639, 2003.

[13] P. F. Felzenszwalb and J. D. Schwartz, “Hierarchical matching ofdeformable shapes,” in Proc. IEEE Int. Conf. Comput. Vis. PatternRecognit., vol. 1. Jun. 2007, pp. 1–8.

[14] X. Bai, B. Wang, C. Yao, W. Liu, and Z. Tu, “Co-transduction for shaperetrieval,” IEEE Trans. Image Process., vol. 21, no. 5, pp. 2747–2757,May 2012.

[15] L. J. Latecki, R. Lakämper, and U. Eckhardt, “Shape descriptors fornon-rigid shapes with a single closed contour,” in Proc. IEEE Int. Conf.Comput. Vis. Pattern Recognit., vol. 1. 2000, pp. 424–429.

[16] L. J. Latecki and R. Lakämper, “Shape similarity measure based oncorrespondence of visual parts,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 22, no. 10, pp. 1185–1190, Oct. 2000.

[17] G. C.-H. Chuang and C.-C. J. Kuo, “Wavelet descriptor of planar curves:Theory and applications,” IEEE Trans. Image Process., vol. 5, no. 1,pp. 56–70, Jan. 1996.

[18] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and objectrecognition using shape contexts,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002.

[19] S. Manay, D. Cremers, B, Hong, A. J. Yezzi, Jr., and S. Soatto, “Integralinvariants for shape matching,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 28, no. 10, pp. 1602–1618, Oct. 2006.

[20] E. Milios and E. G. M. Petrakis, “Shape retrieval based on dynamicprogramming,” IEEE Trans. Image Process., vol. 9, no. 1, pp. 141–147,Jan. 2000.

Page 11: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23 ...IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014 4101 Hierarchical String Cuts: A Translation, Rotation, Scale,

WANG AND GAO: HIERARCHICAL STRING CUTS 4111

[21] C. H. Wei, Y. Li, W. Y. Chau, and C. T. Li, “Trademark imageretrieval using synthetic features for describing global shape and interiorstructure,” Pattern Recognit., vol. 42, no. 3, pp. 386–394, 2009.

[22] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas,“Fast and effective retrieval of medical tumor shapes,” IEEE Trans.Knowl. Data Eng., vol. 10, no. 6, pp. 889–904, Nov./Dec. 1998.

[23] J. X. Du, X. F. Wang, and G. J. Zhang, “Leaf shape based plant speciesrecognition,” Appl. Math. Comput., vol. 205, pp. 916–926, 2008.

[24] F. Mokhtarian, S. Abbasi, and J. Kittler, “Efficient and robust retrievalby shape content through curvature scale space,” in Proc. Int. WorkshopImage Database Multimedia Search, Amsterdam, The Netherlands,1996, pp. 35–42.

[25] D. Zhang and G. Lu, “A comparative study of curvature scale space andFourier descriptors for shape-based image retrieval,” J. Vis. Commun.Image Represent., vol. 14, no. 1, pp. 41–60, 2003.

[26] A. EI-ghazal, O. Basir, and S. Belkasim, “Invariant curvature-basedFourier shape descriptors,” J. Vis. Commun. Image Represent., vol. 23,no. 4, pp. 622–633, 2012.

[27] A. Aung, B. P. Ng, and S. Rahardja, “Sequency-ordered complexHadamard transform: Properties, computational complexity and appli-cations,” IEEE Trans. Signal Process., vol. 56, no. 8, pp. 3562–3571,Aug. 2008.

[28] S.-C. Fang and H.-L. Chan, “Human identification by quantifyingsimilarity and dissimilarity in electrocardiogram phase space,” PatternRecognit., vol. 42, no. 9, pp. 1824–1832, 2009.

[29] L. Pauleve, H. Jegou, and L. Amsaleg, “Locality sensitive hashing:A comparison of hash function types and querying mechanisms,” PatternRecognit. Lett., vol. 31, no. 11, pp. 1348–1358, 2010.

[30] B. Wang, J. S. Wu, H. Z. Shu, and L. M. Luo, “Shape descriptionusing sequency-ordered complex Hadamard transform,” Opt. Commun.,vol. 284, pp. 2726–2729, 2011.

[31] S. Loncaric, “A survey of shape analysis techniques,” Pattern Recognit.,vol. 31, no. 8, pp. 983–1001, 1998.

[32] M. Bober, J. D. Kim, H. K. Kim, Y. S. Kim, W.-Y. Kim, and K. Muller,Summary of the Results in Shape Descriptor Core Experiment,document MPEG-7, ISO/IEC/JTC1/SC29/WG11/MPEG99/M4869,Vancouver, BC, Canada, Jul. 1999.

[33] C. Grigorescu and N. Petkov, “Distance sets for shape filters andshape recognition,” IEEE Trans. Image Process., vol. 12, no. 10,pp. 1274–1286, Oct. 2003.

[34] J. Xie, P. A. Heng, and M. Shah, “Shape matching and modeling usingskeletal context,” Pattern Recognit., vol. 41, no. 5, pp. 1756–1767, 2008.

[35] T. B. Sebastian, P. N. Klein, and B. B. Kimia, “On aligning curves,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 116–124,Jan. 2003.

[36] E. Attalla and P. Siy, “Robust shape similarity retrieval based on contoursegmentation polygonal multiresolution and elastic matching,” PatternRecognit., vol. 38, no. 12, pp. 2229–2241, 2005.

[37] C. Xu, J. Liu, and X. Tang, “2D shape matching by contour flexibility,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 180–186,Jan. 2009.

[38] G. Mcneill and S. Vijayakumar, “Hierarchical procrustes matching forshape retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.,vol. 1. Jun. 2006, pp. 885–894.

[39] F. Mokhtarian and A. K. Mackworth, “A theory of multiscale, curvature-based shape representation for planar curves,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 14, no. 8, pp. 789–805, Aug. 1992.

[40] C. T. Zahn and R. Z. Roskies, “Fourier descriptors for plane closedcurves,” IEEE Trans. Comput., vol. C-21, no. 3, pp. 269–281, Mar. 1972.

[41] C. C. Chang, S. M. Hwang, and D. J. Buehrer, “A shape recognitionscheme based on relative distances of feature points from the centroid,”Pattern Recognit., vol. 24, no. 11, pp. 1053–1063, 1991.

[42] N. Kumar et al., “Leafsnap: A computer vision system for automaticplant species identification,” in Proc. ECCV, 2012, pp. 502–516.

[43] S. Mouine, I. Yahiaoui, and A. Verroust-Blondet, “A shape-basedapproach for leaf classification using multiscaletriangular representa-tion,” in Proc. 3rd ACM Int. Conf. Multimedia Retr., 2013, pp. 127–134.

[44] H. S. Yang, S. U. Lee, and K. M. Lee, “Recognition of 2D objectcontours using starting-point-independent wavelet coefficient matching,”J. Vis. Commun. Image Represent., vol. 9, no. 2, pp. 171–181, 1998.

[45] O. J. O. Söderkvist, “Computer vision classification of leaves fromSwedish trees,” M.S. thesis, Linköping Univ., Linköping, Sweden, 2001.

[46] S. Biswas, G. Aggarwal, and R. Chellappa, “An efficient and robustalgorithm for shape indexing and retrieval,” IEEE Trans. Multimedia,vol. 12, no. 5, pp. 372–385, Aug. 2010.

[47] R. Hu, W. Jia, H. Ling, and D. Huang, “Multiscale distance matrix forfast plant leaf recognition,” IEEE Trans. Image Process., vol. 21, no. 11,pp. 4667–4672, Nov. 2012.

[48] R. Hu, W. Jia, H. Ling, Y. Zhao, and J. Gui, “Angular pattern and binaryangular pattern for shape retrieval,” IEEE Trans. Image Process., vol. 23,no. 3, pp. 1118–1127, Mar. 2014.

[49] F. Foteini and E. George, “Multivariate angle scale descriptor for shaperetrieval,” in Proc. Signal Process. Appl. Math. Electron. Commun.(SPAMEC), 2011, pp. 105–108.

[50] J. Wang, X. Bai, X. You, W. Liu, and L. J. Latecki, “Shape matching andclassification using height functions,” Pattern Recognit. Lett., vol. 33,no. 2, pp. 134–143, 2012.

[51] R. Gopalan, P. K. Turaga, and R. Chellappa, “Articulation-invariantrepresentation of non-planar shapes,” in Proc. 11th ECCV, 2010,pp. 286–299.

[52] F. Mokhtarian and M. Bober, Curvature Scale Space Representation:Theory, Applications, and MPEG-7 Standardization. Norwell, MA,USA: Kluwer, 2003.

[53] I. E. Rubé, N. Alajlan, and M. S. Kamel, “MATR: A robust 2D shaperepresentation,” Int. J. Image Graph., vol. 6, pp. 421–443, 2006.

Bin Wang received the Ph.D. degree in computerscience from Fudan University, Shanghai, China, in2007. Since 2007, he has been on the faculty withthe Nanjing University of Finance and Economics,Nanjing, China, where he is currently an AssociateProfessor. From 2007 to 2011, he was a part-timePost-Doctoral Research Fellow with Southeast Uni-versity, Nanjing. From 2011 to 2012, he was with theSchool of Engineering, Griffith University, Brisbane,QLD, Australia, as a Visiting Scholar. His mainresearch interests include computer vision, image

processing, and pattern recognition.

Yongsheng Gao received the B.Sc. and M.Sc.degrees in electronic engineering from ZhejiangUniversity, Hangzhou, China, in 1985 and 1988,respectively, and the Ph.D. degree in computerengineering from Nanyang Technological University,Singapore. He is currently a Professor with theSchool of Engineering, Griffith University, Brisbane,QLD, Australia. His research interests include facerecognition, biometrics, biosecurity, image retrieval,computer vision, pattern recognition, environmentalinformatics, and medical imaging.