Transcript
Page 1: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

Effective Dimensionality Reduction in Multimedia Applications

Seungdo Jeong, Sang-Wook Kim, Whoi-Yul Kim, and Byung-Uk Choi

Abstract— In multimedia information retrieval, multimediadata such as images and videos are represented as vectorsin high-dimensional space. To search these vectors efficiently,a variety of indexing methods have been proposed. However,the performance of these indexing methods degrades dra-matically with increasing dimensionality, which is known asthe dimensionality curse. To resolve the dimensionality curse,dimensionality reduction methods have been proposed. Theymap feature vectors in high-dimensional space into vectorsin low-dimensional space before the data are indexed. Thispaper proposes an improvement for the previously proposeddimensionality reduction. The previous method uses the normand the approximated angle for every subvector. However, morestorage space and a number of cosine computations are requiredbecause of multiple angle components. In this paper, we proposean alternative method employing a single angle componentinstead of respective angles for all the subvectors. Because onlyone angle for every subvector is considered, though the loss ofinformation regarding the original data vector increases, whichdegrades the performance slightly, we can successfully reducestorage space as well as a number of cosine computations.Finally, we verify the superiority of the proposed approachvia extensive experiments with synthetic and real-life data sets.

I. INTRODUCTION

MULTIMEDIA information retrieval is the problem ofsearching for information satisfying a query condition

from multimedia databases. In most previous studies, amultimedia object is represented as a feature vector, whichquantifies its contents or features in a form of a vector.In order to express the original object sufficiently, featurevectors normally become several tens to a few hundredsdimensional [4], [9], [10].

Many indexing methods have been proposed for efficientmultimedia information retrieval [2], [3], [8], [11]. However,the performance of these indexing methods degrades dra-matically with the increasing dimensionality of the featurevectors. This deficiency has been termed the dimensionalitycurse [10]. One solutions is dimensionality reduction, whichtransforms feature vectors in high-dimensional space to thosein low-dimensional space [1], [7], [12], [13]. For simplicity,feature vectors in reduced low-dimensional space are calledreduced-dimensional feature vectors.

Jeong et al. proposed a dimensionality reduction methodbased on angle approximation and dimension grouping [5],[6]. Angle approximation uses the angle component betweentwo vectors in order to reduce the loss of information regard-ing the original feature vector, and computes the Euclideandistance efficiently [5]. To this, the norm of a feature vectorand the angle between a feature vector and the reference

S. Jeong, S.-W, Kim, W.-Y. Kim and B.-U. Choi are with the Departmentof Electronics and Computer Engineering, Hanyang University, Seoul, SouthKorea (email: {sdjeong, wook, wykim, buchoi}@hanyang.ac.kr).

vector are stored in a database. The previously proposeddimensionality reduction by Jeong et al. [6] represents high-dimensional space as a set of low-dimensional space toreduce errors of angle approximation in high-dimensionalspace.

However, this method requires additional storage spacecompared with the dimensionality reduction using the DCTor the PCA. That is, in order to employee angle approxima-tion for every low-dimensional space, all the angle compo-nents for the respective group are stored. Moreover, cosinecomputations for all the subvectors incur large overhead thusdelaying the filtering step. This is inevitable to prevent falsedismissal. However, we can efficiently reduce the overheadof storage space and cosine computations by allowing somedegradation of the performance.

II. RELATED WORK

A. Angle Approximation

Angle approximation was proposed by Jeong et al. tocompute the Euclidean distance efficiently [5]. This methoduses the angle component between two vectors. Equation(1) is the Euclidean distance that is calculated using theangle between two vectors. However, the angle between twovectors is only computed when all vectors are known. If wecompute the angle between a data vector and a query vectorin a query processing step, this increases query processingtime dramatically.

D(X,Q) =

√√√√n∑

i=1

(xi − qi)2

=√‖X‖2 + ‖Q‖2 − 2‖X‖‖Q‖ cos θ (1)

To solve this problem, Jeong et al. introduced the referencevector. Figure 1 shows the concept of angle approximation.If we know the angle θrxi

between a reference vector Rand the i-th data vector Xi, by merely computing the angleθqr between a query vector and a reference vector, we canapproximate the angle θqxi

between a query vector and thei-th data vector by a simple calculation given in equation(2). Equation (3) estimates the Euclidean distance usingangle approximation. This distance function is used in thefiltering step and is always less than or equal to the Euclideandistance. Thus, any dismissal is not occurred in multimediainformation retrieval using this function.

θqxi= |θqr − θrxi

| (2)

DA(Q,Xi) =√‖Xi‖2 + ‖Q‖2 − 2‖Xi‖‖Q‖ cos θqxi

(3)

978-1-4244-2760-4/09/$25.00 ©2009 IEEE

Page 2: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

Q

R

1X

1qxθ

qrθ

1rxθii rxqrqx θθθ −≈

2qxθ

2X

2rxθ

Fig. 1. Angle approximation using a reference vector.

Angle approximation, however, represents a high-dimensional feature vector as only two components: normand angle referred to the reference vector. Thus, the loss ofinformation regarding the original vector are relatively highin high-dimensional space. As a result, the error between theoriginal Euclidean distance and estimated distance gets largerwith the increasing dimensionality of the feature vectors.

B. Dimensionality Reduction based on Dimension Grouping

In angle approximation, the information loss is relativelyhigh in high-dimensional space because it does not uses allthe attributes but only the norm and the angle to estimatethe Euclidean distance. To solve this problem, Jeong etal. proposed dimensionality reduction based on dimensiongrouping [6]. Dimension grouping is to represent high-dimensional space as a set of low-dimensional space. We calla vector, which comprises partial attributes of the originalhigh-dimensional data vector, a subvector. Angle approx-imation is applied independently in each low-dimensionalspace with a respective subvector, thus reducing the loss ofinformation successfully. Therefore, a high-dimensional datavector is transformed into a set of low-dimensional vectorsby regulating the number of groups. As shown in equation(4), a reduced-dimensional data vector comprises norms andangles referred to the reference vector, with respect to eachsubvector. A function to estimate the Euclidean distance isshown in equation (5), where Xsi denotes the i-th subvector.This function low-bounds the Euclidean distance, thus, anyfalse dismissal does not occur in multimedia informationretrieval using DGA [6].

XGA = [‖Xs1‖, θXs1 , ‖Xs2‖, θXs2 , . . . , ‖Xsk‖, θXsk ] (4)

DGA(X,Q) =√√√√ k∑i=1

(‖Xsi‖2 + ‖Qsi‖2− 2‖Xsi‖‖Qsi‖ cos θXsi Qsi )(5)

III. PROPOSED METHOD

A. Motivation

In the dimensionality reduction based on dimension group-ing, norms and angles for a respective group are stored toadopt angle approximation in each group [6]. This requiresmore complex computations in a filtering step. Moreover,it requires 25% more storage space for angle componentscompared with dimensionality reduction using the DCT or

PCA. Storage overhead and high computational overhead inthe dimensionality reduction are caused by the angle com-ponent for every subvector, which is stored and computedindividually to estimate the Euclidean distance. However, ifonly one angle between a data vector and the reference vectoris used rather than respective angles for all the subvectors toestimate the Euclidean distance, we can significantly reducestorage space as well as computational overhead. In this case,some information regarding the original feature vectors getslost, thus degrading the performance slightly. This issue isdealt with in more detail in Section III-C.

B. Dimensionality Reduction using Angle Approximation

In this paper, we propose an alternative dimensionalityreduction method using an angle component to reduce stor-age space as well as computational overhead. The proposedmethod uses only one angle for all the groups, each of whichis represented by a subvector of the original data vector. Firstof all, a reference vector is selected [6]. Then, angles betweenevery data vector and the reference vector are computed.Let Xsi and θX be the i-th subvector of a data vector andthe angle between the data vector and the reference vector,respectively. A data vector is represented with k subvectors.Thus, the reduced-dimensional data vector XGSA is denotedby equation (6). As a result, the reduced-dimensional datavector comprises k norms for subvectors and one angle.Angle approximation in each subvector uses the same angleθX . Equation (7) denotes a function approximating theEuclidean distance, which is used in the filtering step toidentify candidates. Here, Q and Qsi denote a query vectorand its i-th subvector, respectively. The value θXQ is simplycomputed by |θX − θQ|.

XGSA = [‖Xs1‖, ‖Xs2‖, , . . . , ‖Xsk‖, θX ] (6)

DGSA(X,Q) =√√√√ k∑i=1

(‖Xsi‖2 + ‖Qsi‖2 − 2‖Xsi‖‖Qsi‖) cos θXQ

(7)

In dimensionality reduction based on dimension grouping,cosine computation is performed k times in order to estimatethe Euclidean distance. From equation (7), however, wenote that cosine computation is performed only once in theproposed dimensionality reduction. Moreover, the storagespace is reduced significantly because the proposed methoduses a single angle rather than respective angles for all thegroups.

In query processing, a reduced-dimensional data vector fora query vector and the angle between a query vector and thereference vector are computed once. In the filtering step, if adata vector whose estimated distance obtained from equation(7) is beyond tolerance, it is filtered out. Then, we verifycorrect answers by using the original Euclidean distance fromall the candidates in the post-processing step.

Page 3: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

C. Discussions

This section shows that the proposed method reducesthe number of false alarms significantly compared withthe previous method. However, we could not theoreticallyverify that the proposed method guarantees no false dismissalbecause the proposed function DGSA does not lower-boundthe Euclidean distance.

Theorem 1: For two vectors X and Y , the followingequation always holds.

DGSA(X,Y ) ≥ DA(X,Y )

Proof: The approximation function using angleapproximation is represented as DA(X,Y ) =√‖X‖2 + ‖Y ‖2 − 2‖X‖‖Y ‖ cos θXY . For vector X ,

‖X‖2 =∑n

i=1 x2i =

∑ki=1(

∑lkj=1 x2

j ). The last term ofthis equation is representation by using k subvectors. Thenumber of attributes of the k-th subvector is denoted bylk. From this equation, DA(X,Y ) and DGSA(X,Y ) areidentical with each other except for the terms including anglecomponents. Different parts that include angle componentsare ‖X‖‖Y ‖ cos θXY and

∑ki=1 (‖Xsi‖‖Y si‖) cos θXY .

These two terms equally include the same cos θXY . Thus,relationship between DA(X,Y ) and DGSA(X,Y ) isdependent on ‖X‖‖Y ‖ and

∑ki=1 (‖Xsi‖‖Y si‖).

Let A = ‖X‖‖Y ‖ =√

x21 + · · · + x2

n

√y21 + · · · + y2

n

and B =∑k

i=1 (‖Xsi‖‖Y si‖) =√(x1

1)2 + · · · + (x1l1

)2√

(y11)2 + · · · + (y1

l1)2 +√

(x21)2 + · · · + (x2

l2)2

√(y2

1)2 + · · · + (y2l2

)2 + · · · .

Because X = [x1, · · · , xn] =[x1

1, · · · , x1l1

, x21, · · · , x2

l2, xk

1 , · · · , xklk

], A includes all

the elements of B. Some elements such as√

(x11)2(y

21)2

are included in A but not in B. This means that A ≥ Balways holds. Therefore, for the two distance functions,DGSA(X,Y ) ≥ DA(X,Y ) always holds.

From Theorem 1, DGSA(X,Y ) approximates the Eu-clidean distance more tightly than DA(X,Y ). Thus, it ispossible to filter out more false alarms efficiently withDGSA(X,Y ) than with DA(X,Y ).

Herein, let us investigate relationship amongDGSA(X,Y ), DGA(X,Y ), and D(X,Y ) which is theEuclidean distance. Using a similar approach as aboveproof, DGSA(X,Y ) and DGA(X,Y ) are identical to eachother except for the angles. However, we cannot explainthe relationship between θXsi Y si in DGA(X,Y ) and θXY

in DGSA(X,Y ) with inequality. In other words, θXsi Y si

could be greater than or less than θXY , which is on a case-by-case. Thus, the relationship between DGSA(X,Y ) andDGA(X,Y ) depends upon the situation. The relationshipbetween DGSA(X,Y ) and D(X,Y ) is similar to the oneabove.

For instance, let us consider 4-dimensional vectors,X=[0.6, 0.2, 0.3, 0.4], Y =[0.8, 0.35, 0.3, 0.35], and R=[0.9,0.01, 0.01, 0.01], where R denotes a reference vector. Thereference vector, which is very close to an axis, is selected

to minimize the error of angle approximation. Reduceddimensionality k = 2, that is, a 4-dimensional vectoris represented by 2 subvector of 2 dimensions. In thisexample, D(X,Y ) = 0.255, DGA(X,Y ) = 0.255, andDGSA(X,Y ) = 0.261, respectively. Thus, DGSA(X,Y ) isgreater than D(X,Y ) and DGA(X,Y ). This means that in-formation retrieval using DGSA may incur false dismissals. IfY =[0.8, 0.35, 0.2, 0.35], D(X,Y ) = 0.240, DGA(X,Y ) =0.240, and DGSA(X,Y ) = 0.238, respectively. Therefore,we cannot define the relationship between DGSA(X,Y ) andD(X,Y ) with inequality. In this example, however, vectorX is relatively far from Y thus, it is not included in a correctanswer set. We note that no false dismissal appears in ourextensive experiments.

IV. PERFORMANCE EVALUATION

This section describes the environment for experimentsto evaluate the performance of the proposed method andpresents the experimental results.

A. Environment for experiments

We used both the synthetic and real-life data sets forexperiments. The synthetic data sets is composed of 20,000to 100,000 data vectors of 25 to 200 dimensions. A dataset consists of a number of clusters which have variousnumber of number data. Data in each clusters are generatedas follows. (1) Randomly select the number of data inrespective cluster from 5 to 50. (2) Determine axis systemof each cluster. (3) Create all attributes of data vectorswhich follow normal distribution with mean and standarddeviation as 0 and 1,000,000 to 100,000,000, respectively. (4)Rotate created data vectors in the cluster in order to matchrespective axis system of clusters which are determined stage(1). (5) Select the center of cluster as -2,000,000,000 to2,000,000,000 randomly, then translate data to fit the selectedcenter. Finally, normalize attributes of data with real numberof interval [0, 1].

The real-life Corel image data set consists of 68,040images [14], each of which is described by a 32-dimensionalfeature vector.

We compared the performance of the proposed dimen-sionality reduction method with existing methods employingthe PCA and the DCT. We used the number of candidatesas a metric for query processing performance. In here,candidates mean the number of data after filtering step, whichinclude correct answers and false alarms. The purpose of thisexperiment is to examine how many false alarms occur in allthe methods. We judge a method to be superior if its numberof candidates was small. We performed each experimentwith 100 random queries, and then measured the averageperformance of each method under various conditions of thereduced dimensionality k.

The experiments were conducted on a PC equipped with2.8G Pentium CPU and 512MB RAM. The software plat-form was the MS Windows 2000 and Visual C++ 6.0.

Page 4: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

1 2 3 4 5

10000

20000

30000

40000

50000

60000

5-NN search for 200-dimensional synthetic dataC

andi

date

s

Reduced-dimensionality

DCT PCA Respective Angles Proposed Method

(a) Original scale.

1 2 3 4 510

100

1000

10000

100000

5-NN search for 200-dimensional synthetic data

Can

dida

tes

Reduced-dimensionality

DCT PCA Respective Angles Proposed Method

(b) Log scale.Fig. 2. Experimental results with various reduced dimensionality.

B. Experimental results

As the reduced-dimensionality increases, the informationloss of the original data vectors decreases. As a result,the number of candidates gets decreased. To examine thisquantitatively, we investigated the change of the number ofcandidates using the criteria of the reduced-dimensionality.

Figure 2 shows the results under various reduced-dimensionality for a data set with 100,000 200-dimensionaldata vectors. Figure 2-(a) shows the number of candidatesin the original scale. The performances of four methods arenearly identical under the reduced-dimensionality values of4 and 5. However, in Figure 2-(b), we present the results inthe logarithmic scale so that the ratio may be more easilycompared. As shown in the results, the proposed methoddegrades slightly compared with the method using respectiveangles for all the groups. However, storage space required inthe proposed method is almost same as the methods usingthe DCT and the PCA unlike that in the method usingrespective angles for all the groups. In terms of the numberof candidates, the proposed method outperforms the methodsemploying the DCT and the PCA by up to 2.7 and 2.3times, respectively. In addition, any false dismissal is notintroduced by the proposed method in our all experimentseven though the proposed distance function does not lower-bound the Euclidean distance.

1 2 3 4 5 6 70

500

1000

1500

2000

2500

200-dimensional synthetic data(Reduced-dimensionality k=4)

Can

dida

tes

Nearest Neighbor

DCT PCA Respective Angles Proposed Method

Fig. 3. Experimental results with various tolerances.

25 50 75 100 125 150 175 200

0

500

1000

1500

2000

2500

3000

5-NN search for synthetic data(Reduced dimensionality k=4)

Can

dida

tes

Original Dimension

DCT PCA Respective Angles Proposed Method

Fig. 4. Experimental results with various dimensionalities.

As a tolerance ε increases, the number of candidatesalso increases. We performed experiments with 1-NN to7-NN queries. To retrieve k-NN, we pre-compute suitabletolerance and then retrieve database with computed tolerance.The reduced-dimensionality was fixed as 4. The originaldimensionality was 200 and the number of data vectors was100,000.

Figure 3 shows the results. The number of candidatesobtained by each of the four methods slightly increases withincreasing tolerance. The proposed method also has lowerperformance than the method using respective angles for allthe groups in this experiment. However, the proposed methodproduces candidates 2.7 and 2.2 times fewer candidates thanthe methods using the DCT and the PCA, respectively.

We performed the experiments with 25 to 200-dimensionaldata sets to examine the performance with varying dimen-sionality. The number of data vectors was 100,000 and kwas fixed as 4. Figure 4 shows the results, which indicatethat the proposed method and the methods using the PCAand the DCT give a number of candidates that increase withincreasing dimensionality. Though the number of candidatesgiven by the method using respective angles for all the groupsremains nearly constant with increasing dimensionality, thismethod uses storage space and computations much more thanthe proposed method. The proposed method outperforms theprevious methods using the DCT and the PCA by up to

Page 5: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

20000 40000 60000 80000 1000000

500

1000

1500

2000

2500

5-NN search for 200-dimensional synthetic data(Reduced-dimensionality k=4)

Can

dida

tes

Data Size

DCT PCA Respective Angles Proposed Method

Fig. 5. Experimental results with various numbers of data vectors.

3.2 times. Thus, the proposed method is shown to be quitedesirable for information retrieval in multimedia databaseswith a large number of objects.

Next, we investigated the performance with varying num-bers of data vectors. We performed 5-NN queries for 200-dimensional data sets. k was fixed as 4. Figure 5 shows theresults. The number of candidates in the methods using thePCA and the DCT increases rapidly with the number of datavectors. The number of candidates in the proposed methodalso increases but with a somewhat gentle slope. Though themethod using respective angles for all the groups outperformsothers, it requires more storage space and computations. Ourmethod outperforms the methods using the DCT and the PCAby 2.3 to 2.7 and 2 to 2.3 times, respectively.

Finally, we evaluated the performance of the four methodsusing the real-life Corel image data set. Figure 6 showsthe results under various reduced-dimensionality. As shownin the results with synthetic data sets, the method usingrespective angles for all the groups is the most superior inreal-life Corel image data set. However, more storage spaceand computations are required compared with the proposedmethod. The proposed method produces about 7.5 and 2.7times fewer candidates than the methods using the DCT andthe PCA, respectively. These results are higher than onesusing synthetic data sets. Therefore, it is noticeable that ourmethod is reasonably applied to real environment.

V. CONCLUSIONS

In multimedia information retrieval, multimedia objectsare represented as vectors in high-dimensional space, andtheir similarity is generally measured by the Euclideandistance. In the case of high dimensionality, however, theperformance of information retrieval degrades drastically.This problem is known as the dimensionality curse. Theproblem may be solved through dimensionality reductionthat transforms high-dimensional vectors to low-dimensionalones.

Dimensionality reduction based on dimension groupingand angle approximation has been proposed previously. How-ever, this method requires more storage space and compu-tations compared with the methods using the DCT and the

2 4 6 8

10000

20000

30000

40000

50000

60000

70000

5-NN search for 32-dimensional Corel image data

Can

dida

tes

Reduced-dimensionality

DCT PCA Respective Angles Proposed Method

(a) Original scale.

2 4 6 8

100

1000

10000

5-NN search for 32-dimensional Corel image data

Can

dida

tes

Reduced-dimensionality

DCT PCA Respective Angles Proposed Method

(b) Log scale.Fig. 6. Experimental results with varying k for the Corel image data set.

PCA. Thus, in this paper, we have proposed a novel methodfor dimensionality reduction to reduce these drawbacks. Ourmethod is basically based on dimension grouping and afunction approximating the Euclidean distance using angleapproximation. We have used only a single angle for angleapproximation of all the groups instead of respective anglesfor all the groups. Thus, we can reduce storage space aswell as the number of cosine computations. Of course theloss of information in our method regarding the original datavector is larger than the previous method. This causes somedegradation of the performance. However, we show that ourmethod can be applied to multimedia databases with a largenumber of objects via performance evaluation with extensiveexperiments.

From experimental results with synthetic data sets, interms of the number of candidates remained after filtering,the proposed method outperforms previous methods usingthe DCT and the PCA by 2.7 and 2.3 times, respectively.The results from the experiments with the real-life Corelimage data set show that the proposed method outperformthe methods using the DCT and the PCA by 7.5 times and2.7 times, respectively. In addition, no false dismissal isincurred in all our experiments even though the proposedapproximating function does not lower-bound the Euclideandistance. We note that our method is a quite reasonablesolution for multimedia information retrieval where a largenumber of high-dimensional objects are dealt with.

Page 6: [IEEE 2009 IEEE Symposium on Computational Intelligence for Image Processing (CIIP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for

ACKNOWLEDGEMENT

This research was supported by the MKE(Ministry ofKnowledge Economy), Korea, under the ITRC(InformationTechnology Research Center) support program supervised bythe IITA(Institute of Information Technology Advancement)(Grant No. IITA-2008-C1090-0801-0040)

REFERENCES

[1] C. C. Aggarwal, “On the Effects of Dimensionality Reduction on HighDimensional Similarity Search,” In Proc. Int’l. Symp. on Principlesof Database Systems, ACM SIGACT-SIGMOD-SIGART, pp. 256-266,2001.

[2] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, “TheR*-tree: An Efficient and Robust Access Method for Points andRectangles,” In Proc. Intl. Conf. on Management of Data, ACMSIGMOD, pp. 322-331, 1990.

[3] S. Berchtold, C. Bohm, B. Braunmuller, D. Keim, and H.-P. Kriegel,“Fast Parallel Similarity Search in Multimedia Databases,” In Proc.Int’l. Conf. on Management of Data, ACM SIGMOD, pp. 1-12, 1997.

[4] C. Bohm, S. Berchtold, and D. A. Keim, “Searching in High-Dimensional Spaces-Index Structures for Improving the Performanceof Multimedia Databases,” ACM Computing Surveys, vol. 33, issue 3,pp. 322-373, 2001.

[5] S. Jeong, S.-W. Kim, K. Kim, and B.-U. Choi, “An Effective Methodfor Approximating the Euclidean Distance in High-DimensionalSpace,” In Proc. Int’l. Conf. on Databases and Expert SystemsApplications, pp. 863-872, 2006.

[6] S. Jeong, S.-W. Kim, and B.-U. Choi, “Dimensionality Reduction inHigh-Dimensional Space for Multimedia Information Retrieval,” InProc. Int’l. Conf. on Databases and Expert Systems Applications, pp.404-413, 2007.

[7] K. V. R. Kanth, D. Agrawal, and A. Singh, “Dimensionality Reductionfor Similarity Searching in Dynamic Databases,” In Proc. Int’l. Conf.on Management of Data, ACM SIGMOD, pp. 166-176, 1998.

[8] N. Katayama and S. Satoh, “The SR-Tree: An Index Structure forHigh-dimensional Nearest Neighbor Queries,” In Proc. Int’l. Conf. onManagement of Data, ACM SIGMOD, pp. 369-380, 1997.

[9] T. Seidl and H.-P. Kriegel, “Optimal Multi-Step k-Nearest NeighborSearch,” In Proc. Int’l. Conf. on Management of Data, ACM SIGMOD,pp. 154-165, 1998.

[10] R. Weber, H. J. Schek, and S. Blott, “A Quantitative Analysis and Per-formance Study for Similarity-Search Methods in High-DimensionalSpaces,” In Proc. Int’l. Conf. on Very Large Data Bases, VLDB, pp.194-205, 1998.

[11] D. A. White and R. Jain, “Similarity Indexing with the SS-tree,” InProc. Int’l. Conf. on Data Engineering, IEEE, pp. 516-523, 1996.

[12] Y. Fu, M. Liu, and T. S. Huang, “Conformal Embedding Analysis withLocal Graph Modeling on the Unit Hypersphere,” In IEEE Conferenceon Computer Vision and Pattern Recognition, The 1st Workshop onComponent Analysis, pp. 1-6, 2007.

[13] Y. Fu, S. Yan, and T. S. Huang, “Correlation Metric for GeneralizedFeature Extraction,” In IEEE Transaction on Pattern Analysis andMachine Intelligence, vol. 30, Issue 12, pp. 2229-2235, 2008.

[14] Corel Image Features, University of California, 1999http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html


Top Related