# [IEEE 2010 Third International Workshop on Advanced Computational Intelligence (IWACI) - Suzhou, China (2010.08.25-2010.08.27)] Third International Workshop on Advanced Computational Intelligence - Plant leaf classification based on weighted locally linear embedding

Post on 09-Mar-2017

215 views

TRANSCRIPT

Third International Workshop on Advanced Computational Intelligence August 25-27,2010 - Suzhou, Jiangsu, China

Plant Leaf Classification Based on Weighted Locally Linear Embedding

Shanwen Zhang, Youqian Feng, and ling Liu

Abstract-Locally linear embedding (LLE) is effective in discovering the geometrical structure of the data. But when it is

applied to real-world data, it shows some weak points, such as

being quite sensitive to noise points and outliers, and being unsupervised in nature. In this paper, we propose a weighted

LLE. The experiments on synthetic data and real plant leaf data demonstrate that the proposed algorithm can efficiently maintain an accurate low-dimensional representation of the

noisy manifold data with less distortion, and acquire higher

average recognition rates of plant leaf compared to other

dimensional reduction methods.

I. INTRODUCTION

MManifold learning is an important dimensionality reduction and data visualization tool which can discover the intrinsic structure of high dimensional data and provide understanding

of multidimensional patterns in data mining, pattern recognition and machine learning. Manifold learning algorithms can be applied to extract the intrinsic features of observed data in high dimensional space by preserving the local geometric characteristics. However, due to the locality geometry preservation, most existing manifold learning methods, such as locally linear embedding (LLE) [1], are in general sensitive to noise and outliers. LLE may lose its efficiency in dimension reduction and data visualization since it is built based on Euclidean distance for constituting the neighborhood information, and since the local Euclidean distance does not match the classification property generally. That is to say, two sample points belonging to different classes may also have a short Euclidean distance. This phenomenon may result in that the neighbors of a point may come from different classes. Often they fail in the presence of the high dimensional outliers or noise, because the outliers or noise may distort the local structure of the manifold. When the local outliers or noise level are increased, the mapping efficiency quickly become very poor. In the process of initial data mapping, if the influence of outliers or noise have not been well eliminated or suppressed, the topological structure of initial data will not be well kept in low dimensional space, which is extremely essential to manifold learning. Recently, some improvement measures have been presented for improving the robustness of noisy manifold learning. Hadid et aI. [2] proposed an efficient LLE. This method is based on the assumption that all outliers are very far away

from the data on the manifold. But this assumption is not always satisfactory for many real-world data. Based on the weighted PCA, Zhang et at. [3] proposed a preprocessing method for removing outliers and suppressing noise before ordinary manifold learning is performed. However, the method for determining the weights is

This work was supported by the grants of the National Science Foundation of China, No. 60975005.

Shanwen Zhang is with the Department of Engineering and Technology, Xijing University, Xi'an 710123, China (corresponding author to provide phone: 86-029-85628051; fax: 86-029-85628051; Email:wjdw716@163.com).

Youqian Feng and ling Liu are with the Science Institute, Air-Force Engineering University, Xi' an, 710051, China; E-mail: fengyouqianI23@163.com, liu_2007@126.com.

978-1-4244-6337-4/10/$26.00 @2010 IEEE 53

heuristic in nature without formal justification. Similar to the method of Zhang et at. [3], Park et at. [4] proposed a method for outlier handling and noise suppression by using weighted local linear smoothing for a set of noisy points sampled from a nonlinear

manifold. Chang et at. [5] proposed a robust LLE (RLLE) based on the RPCA. But RLLE would also fail when the sample density is low or the data points are unevenly sampled, or the data have small perturbation. But this method introduces an additional parameter and ignores the statistical feature. The purpose of this paper is to improve the robustness and clustering ability of LLE, and propose a weighted LLE algorithm. The proposed method is tested on the synthetic data and the real plant leaf image data, the experimental results demonstrate that the method not only is robust to noisy data, but also has high clustering performance. Zhang et aI. [7] proposed a modified LLE, which can solve the above problems of LLE. But it would fail in the case as the data points are not distributed on or close to a two-dimension or three-dimension nonlinear manifold. Yin et aI. [8] proposed a neighbor smoothing embedding (NSE) for noisy data, which can efficiently maintain an accurate low-dimensional representation of the noisy data with less distortion. But this method introduces an additional parameter and ignores the statistical feature. Pan et at. [9] proposed a weighted LLE (WLLE), which can optimize the process of discovering the intrinsic structure by avoiding unreasonable neighbor searching and is robust parameter changes. But in WLLE, the weighted distance is not always the geodesic distance in real-data.

The rest of this paper is organized as follows. Section 2 briefly introduces the LLE algorithm and its influence upon noise. Section 3 proposes a weighted LLE algorithm. Section 4 demonstrates the proposed method on synthetic data and plant leaf data. Finally, some conclusive remarks and future works are proposed in Section 5.

II. LOCAL LINEAR EMBEDDING (LLE)

A. LLE algorithm Let X = [XI'X2, ,Xn] E RDxn be a set of n points in a

high dimensional input data space. The data points are well sampled from a nonlinear manifold, of which the intrinsic dimensionality is d (d D). The goal of LLE is to map the high dimensional data into a low dimensional embedding. Let us denote the corresponding set of n points in the embedding space by Y = [, Yz, ... , Y,,] E Rdxn .The outline of LLE can be summarized as follows:

Step 1: For each data point Xi' find its k nearest neighbors by kNN algorithm or 8 - ball algorithm, denoted asN(XJ ;

Step 2: Compute the low-dimensional embedding Y for X that best preserves the local geometry represented by the reconstruction weights;

Based on the kNN algorithm, step 1 aim to find the best reconstruction weights. Optimality is achieved by minimizing

the local reconstruction error for Xi' 2

k

&i (W) = Xi - I w,jXj (1) XjEN(X; )

where &i (W) is the squared distance between Xi and its reconstruction, subject to the constraints: {t.w I if Xj E N,(X,)

w,j - 0 ifXj fJ. Ni(XJ (2)

Minimizing &i (W) subject to two constraints is a constrained least squares problem. After repeating steps 1 for all n data points, the reconstruction weights obtained form a weight matrix W.

Step 3: Compute the best low-dimensional embedding Y based on the weight matrix W obtained. This corresponds to minimizing the following cost function:

k 2

&(Y) = argmin r; -Iw,j (3) j=! where &(Y) subjects to the constraints I:txnen = 0d and YdxnYIx n = nldxd

Based on the weighted matrix W , we can define a sparse, symmetric, and positive semi-definite matrix M as follows:

M = (I - wf (I -W) (4) Note that Eq. (3) can be expressed in a quadratic form

BeY) = IijMijr;T , where M = [Mij lxn. By the Rayleigh-Ritz theorem, minimizing Eq.(3) can be done by finding the eigenvectors with the smallest (nonzero) eigenvalues of the sparse matrix M .

B. Noise influence to LLE As to noise influence upon manifold learning, it is already

described in some correlated literatures. The following are two actual examples indicating that the noise seriously affect the manifold dimensional reduction. These data belong to artificial data. Fig.l shows how the classical LLE works in finding the low dimensional embedding of the S-curve and Swiss-roll manifold from]R' to]R' . As we can see clearly from Fig.l(C), the LLE algorithm is topologically unstable. In the presence of noise, it cannot preserve well the local geometry of the data manifolds in the embedding space, the entire data topological structure is also destroyed. By further experiments, we can find that small errors in the data connectivity (topology) can lead to large errors in the solution. Through observing the embedding results of above two noisy

54

data, we can see that noise influence upon manifold data mainly reflects the destruction of topological structure; such destruction means the failure of manifold algorithm. Therefore the key problem of whether manifold learning can be used successfully is how to validly restrain noise and outliers in mapping processing [5].

(A) (B) (C) Fig.!. The two dimensionality embeding results of the S-curve and Swiss-roll dataset by LLE.

III. WEIGHTED LOCAL LINEAR EMBEDDING ALGORITHM

The similarly of any two points Xi and Xj can be characterized as follows:

H = exp(-1 J ), if Xi E N(X) or Xj E N(XJ { IIX-XW

ij fJ 0, otherwise

where f3 is a suitable constant, N (Xi) is the k nearest

neighborhood of the point Xi .

II 112 X-X The function exp( - 1 J ) is the so called heat fJ

(5)

kernel which is intimately related to the manifold structure.

IHI is the Euclidean distance between two data points. When the label information is available, it can be easily incorporated into the

graph as follows:

Ilx-xl12 exp( I J ),if X, EN(X) or Xj EN(X'J f3

Hi) = andX, and Xjshare the same label

0, otherwise

(6)

For data cluster, the local density of point Xi is denoted as D = "k H . Dii provides a natural measure on the data II L..J j=1 lj points. The bigger the value Dii (corresponding to Xi) is, the more 'important' is Xi' which means the outliers in the X often have small Dii. For each point Xi' we can obtain the local density Dii' then we can obtain the local density vector D = [ Dll' D22, ... , Dnn] for all points X, then we normalize it to [Pi' P2' ... , P n] . The smaller the value Pi for a point Xi' the more likely it is that Xi is an outlier.

To preserve the integrity of the data, all data points including the clean data points and the outliers are projected into embedding space. Embedding is achieved by minimizing a weighted version of the cost function in Eq.(3) with the normal density values serving as weights.

The weighted cost function is stated as follows:

n

6i = LPi r;- L u: i=1 XjEN(X,)

2

(7)

where Y subject to two constraints I=l = 0 and

I=l J7' = n I (normalized unit covariance), 0 is a column vector of zeros and I is an identity matrix. Two constraints are appended to remove the translational degree of freedom and the rotational degree of freedom, respectively.In Eq.(7), the weighted cost function can be regarded as a generalization of that for classical LLE algorithm. The weighted cost function in the quadratic form,

6i = LT i,j

where P;j = [M ij lnxn is a spare symmetric. (8)

Based on the weighted matrix W, a sparse, symmetric and positive semi-definite matrix M is expressed as in Eq.( 4) can be expressed in a quadratic form 0' (Y) = I ij M /;T . The positive semi-definite matrix M with the elements M ij is defined as follows

P(I - wf (I -W) = PM (8) h P d (2 2 2) M (I W)T(I W) w ere = lag PI,P2,,,,Pn ' = - - .

55

By the Rayleigh-Ritz theorem [10], minimizing Eq. (7) can be done by finding the eigenvectors with the smallest (nonzero) eigenvalues of the sparse matrix M, which is equivalent to solving the generalized eigenvalue problem of

Ma = lp-1a (8) The weighted LLE algorithm is described as follows: Step 1: For each data point Xi, K -nearest -neighbor algorithm

is adopted to determine the k nearest neighbors of Xi in Euclidean space.

Step2: Compute the normal density value for every point. Step3: Compute the low-dimensional embedding Y for X

that best preserves the local geometry by minimizing the following weighted cost function.

IV. EXPERIMENTS AND RESULTS

In this section, we have designed a series of performance measurements to test the effectiveness of the proposed approach on a noisy manifold data of S-curve and plant leaf dataset. The Gaussian noise is added to S-curve data. In the processing, we map the noisy data points onto low dimensional space by our weighted LLE. In classification of the plant leaf experiment, we choose K nearest neighborhood as the classifier to classify the plant leaf. 4.1 Noisy influence ofS-curve data

The first experiment is done on S-curve. We take 1500 points arbitrarily from S-curve. These points are unlabeled. That is to say, for any two points Xi and X;, Ci:tCj, where C;, Cj are the labels of Xi and X;, respectively. Then we adopt classical LLE and our weighed LLE to study the mapping data, the neighbor size k is set to 15, the tuning parameter a is set to 0.5. Fig.2 shows the 2-D visualized results using LLE and weighed LLE. It can be clearly seen from Fig.2 that for

-0.6

-O 1L---C-- O .'o-5 ---O----CO'o-.5 -- 1-dimensional coordinates

(A) Mapping result by LLE

0.6

-0.'

.()'!-----O::';.5---0!:-----7.0.5'--- 1-dimensional coordinates

(8) Mapping result by weighed LLE

Fig. 2. Mapping result on S-curve by LLE and weighted LLE.

non-noisy data, the mapping results by LLE and weighed LLE are almost the same.

Through the above 2-D visualized results, it is verified that our method is by and large feasible.

4.2 Plant leaf classification The first is Swedish leaf dataset [11] which contains

isolated leaf from 15 different Swedish tree species, with 75 leaf per species. The original Swedish leaf images contain footstalks, which are not suitable characteristics for robust leaf shape classification, since the length and orientation of those footstalks may heavily depend on the collection process.

CA)

(8) Fig. 3. Typical images from the Swedish leaf data set, one image per species. Note that some species are quite similar, e.g., the second, the forth specious. Ca) The original Swedish leaf dataset. (b) Swedish leaf dataset processed, the leaf petiole on images are erased

Fig.3 shows some leaf images. Though the footstalks might provide some discriminate

information between different classes, we could regard them as some kind of noises and cut them off. Fig.3(B) shows the preprocessed leaf images corresponding to Fig.3(A).

We test the proposed algorithm on this new dataset and the results reveal something important.

We compare the proposed method with LEE, locally preserving projections (LPP) [12] and inner-distance shape context (IDSC). LPP is a linearized method of LLE, which aims to preserve the local structure of the image space. IDSC uses the inner-distance to build shape descriptors. The inner-distance is articulation insensitive and is good for complicated shapes with part structures. It is demonstrated excellent retrieval results in comparison with several other shape algorithms [13].

First, we resize all the leaf samples into 64*64 sizes and every sample extract to one HOG descriptor. To the images which are not squares, we enlarge them with background color to form square images before resizing. We randomly split the dataset into two parts which are used for training and the second part is kept for testing. We first choose 10 training

56

samples per species and the second experiments 15 training samples. This procedure is repeated for 100 times. Table 1 lists the result of all the experiments. S _ ORI and S _NO represent the experiments on Swedish dataset and Swedish dataset modified respectively.

In comparing with LLE, LPP and IDSe, the experiments demonstrate that our approach performs excellently on Swedish dataset. This method is efficient to catch the information of leaf image shape.

In comparing with LLE, LPP and IDSe, the experiments demonstrate that our approach performs excellently on Swedish dataset. This method is efficient to catch the information of leaf image shape.

TABLE I THE MAXIMAL AVERAGE CLASSIFICA nON RATES WITH STANDARD

DEVIA nONS OF LLE, LPP AND IDSe

Algorithm \Dataset S OR! S NO

LLE Average 89.35% fluctuation 3.31% variance 2.33E-04

LPP Average 90.27% fluctuation 3.23% variance 4.48E-05

IDSe Average 92.73% fluctuation 3.52% variance 2.34E-04

Weighted LLE Average 93.36% fluctuation 3.23%

variance 2.38E-05

V. CONCLUSION

In this paper, we proposed a weighted LLE to reduce dimensionality. The proposed method utilizes discriminant information to guide the procedure of extracting intrinsic low-dimensional features and provides a linear projection matrix. The advantage of our method is borne out by comparison with other widely used methods. In the experiments on plant leaf image database, our method achieves constantly superior performance than those competing methods. Here are several future works we want to address. HOG descriptor is sensitive to leaf petiole orientation and the shape of leaf petiole carry some of leaf species information, some pre-processing will be took to normalize petiole orientation of all the images to make HOG catch the information of leaf as much as possible. Although weighted LLE is an effective dimension reduction method, defined only on the training data points and how to evaluate the maps on novel test data points remains unclear. In future work, we improve the proposed method and further improve its recognition performance.

REFERENCES

[I] S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vo1.290, no.5500, pp.2323-2326, 2000.

[2] A. Hadid and M. Pietikainen, "Efficient locally linear embeddings of imperfect manifolds," In Proceedings of the Third International

Conference on Machine Learning and Data Mining in Pattern Recognition, 188-201, Leipzig, Gennany, pp.5-7, 2003.

[3] Z. Zhang and H.Zha, "Local linear smoothing for nonlinear manifold learning," CSE-03-003, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA, 2003.

[4] J. Park, Z. Zhang, H. Zha, R. Kasturi, "Local smoothing for manifold learning," CVPR'04 2, pp.452-459, 2004.

[5] H. Chang and DY. Yeung, "Robust locally linear embedding," Pattern Recognition 39 (6), pp.1053-1065, 2006.

[6] Hein M. and Maier, M, "Manifold denoising," Advances in NIPS 20, Cambridge, MA, pp.561-568, 2006.

[7] ZY. Zhang and J. Wang, "MLLE: Modified locally linear embedding using mUltiple weights," in: B. Scholkopf, J.e. Platt, T. Hoffman (Eds.), Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, vol. 19, pp. 171-181,2007.

[8] J. Yin, D. Hua, Z Zhou, "Noisy manifold learning using neighborhood smoothing embedding," Pattern Recognition Letters 29, pp.1613-1620, 2008.

[9] Y. Pan, S. Sam G, A. Al Mamun, "Weighted locally linear embedding for dimension reduction," Pattern Recognition, vol. 42, pp.798- 811, 2009.

[10] M. Balasubramanian, E. L Schwartz, "The LLE algorithm and topological stability," Science, vo1.295, no.5552, 2002.

[II] SOderkvist, "Computer Vision Classification of Leaf from Swedish Trees," Master Thesis, Linlwping Univ. 2001.

[12] X. He and P. Niyogi, "Locality preserving projections," Proceedings of the Conference on Advances in Neural Information Processing Systems,

2003. [13] H. Ling and D. Jacobs, "Using the inner distance for classification of

articulated shapes," in: IEEE International Coriference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005.

57

Recommended