a quantitative and comparative assessment of unmixing-based feature extraction techniques for...

15
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012 421 A Quantitative and Comparative Assessment of Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classication Inmaculada Dópido, Alberto Villa, Member, IEEE, Antonio Plaza, Senior Member, IEEE, and Paolo Gamba, Senior Member, IEEE Abstract—Over the last years, many feature extraction tech- niques have been integrated in processing chains intended for hyperspectral image classication. In the context of supervised classication, it has been shown that the good generalization capability of machine learning techniques such as the support vector machine (SVM) can still be enhanced by an adequate extraction of features prior to classication, thus mitigating the curse of dimensionality introduced by the Hughes effect. Recently, a new strategy for feature extraction prior to classication based on spectral unmixing concepts has been introduced. This strategy has shown success when the spatial resolution of the hyperspectral image is not enough to separate different spectral constituents at a sub-pixel level. Another advantage over statistical trans- formations such as principal component analysis (PCA) or the minimum noise fraction (MNF) is that unmixing-based features are physically meaningful since they can be interpreted as the abundance of spectral constituents. In turn, previously developed unmixing-based feature extraction chains do not include spatial information. In this paper, two new contributions are proposed. First, we develop a new unmixing-based feature extraction tech- nique which integrates the spatial and the spectral information using a combination of unsupervised clustering and partial spec- tral unmixing. Second, we conduct a quantitative and comparative assessment of unmixing-based versus traditional (supervised and unsupervised) feature extraction techniques in the context of hyperspectral image classication. Our study, conducted using a variety of hyperspectral scenes collected by different instruments, provides practical observations regarding the utility and type of feature extraction techniques needed for different classication scenarios. Index Terms—Hyperspectral image classication, spatial-spec- tral integration, spectral unmixing, support vector machines (SVMs), unmixing-based feature extraction. Manuscript received July 19, 2011; revised October 24, 2011; accepted November 03, 2011. Date of publication April 09, 2012; date of current version May 23, 2012. This work was supported by the European Commu- nity’s Marie Curie Research Training Networks Programme under reference MRTN-CT-2006-035927, Hyperspectral Imaging Network (HYPER-I-NET), by the Spanish Ministry of Science and Innovation (HYPERCOMP/EODIX project, reference AYA2008-05965-C04-02), and by the Junta de Extremadura (local government) under project PRI09A110. I. Dópido is with the Hyperspectral Computing Laboratory, University of Ex- tremadura, Cáceres, 10071 Extremadura, Spain. A. Plaza is with the Hyperspectral Computing Laboratory, University of Ex- tremadura, Cáceres, 10071 Extremadura, Spain (corresponding author, e-mail: [email protected]). A. Villa is with the GIPSA-Lab, Signal and Image Department, Grenoble Institute of Technology, INP, 38042 Grenoble, France, and also with the Faculty of Electrical and Computer Engineering, University of Iceland, 101 Reykjavik, Iceland. P. Gamba is with Telecommunications and Remote Sensing Laboratory, Uni- versity of Pavia, 27100 Pavia, Italy. Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/JSTARS.2011.2176721 I. INTRODUCTION T HE rich spectral information available in remotely sensed hyperspectral images allows for the possibility to dis- tinguish between spectrally similar materials [1]. However, supervised classication of hyperspectral images is a very challenging task due to the generally unfavorable ratio between the (large) number of spectral bands and the (limited) number of training samples available a priori, which results in the Hughes phenomenon [2]. As shown in [3], when the number of features considered for classication is larger than a threshold, the classication accuracy starts to decrease. The application of methods originally developed for the classication of lower dimensional data sets (such as multispectral images) provides therefore poor results when applied to hyperspectral images, es- pecially in the case of small training sets [4]. On the other hand, the collection of reliable training samples is very expensive in terms of time and nance, and the possibility to exploit large ground truth information is not common [5]. To address this issue, a dimensionality reduction step is often performed prior to the classication process, in order to bring the information in the original space (which in the case of hyperspectral data is almost empty [4]) to the right subspace which allows separating the classes by discarding information that is useless for classi- cation purposes. Several feature extraction techniques have been proposed to reduce the dimensionality of the data prior to classication, thus mitigating the Hughes phenomenon. These methods can be unsupervised (if no a priori information is available) or supervised (if available training samples are used to project the data onto a classication-optimized subspace [6], [7]). Classic unsupervised techniques include principal component analysis (PCA) [8], the minimum noise fraction (MNF) [9], or independent component analysis (ICA) [10]. Supervised approaches comprise discriminant analysis for feature extraction (DAFE), decision boundary feature extrac- tion (DBFE), and non-parametric weighted feature extraction (NWFE), among many others [4], [11]. In the context of supervised classication, kernel methods have been widely used due to their insensitivity to the curse of dimensionality [12]. However, the good generalization capa- bility of machine learning techniques such as the support vector machine (SVM) [13] can still be enhanced by an adequate ex- traction of relevant features to be used for classication pur- poses [14], especially if limited training sets are available a priori. Recently, we have investigated this issue by developing 1939-1404/$26.00 © 2011 IEEE

Upload: p

Post on 06-Nov-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012 421

A Quantitative and Comparative Assessment ofUnmixing-Based Feature Extraction Techniques for

Hyperspectral Image ClassificationInmaculada Dópido, Alberto Villa, Member, IEEE, Antonio Plaza, Senior Member, IEEE, and

Paolo Gamba, Senior Member, IEEE

Abstract—Over the last years, many feature extraction tech-niques have been integrated in processing chains intended forhyperspectral image classification. In the context of supervisedclassification, it has been shown that the good generalizationcapability of machine learning techniques such as the supportvector machine (SVM) can still be enhanced by an adequateextraction of features prior to classification, thus mitigating thecurse of dimensionality introduced by the Hughes effect. Recently,a new strategy for feature extraction prior to classification basedon spectral unmixing concepts has been introduced. This strategyhas shown success when the spatial resolution of the hyperspectralimage is not enough to separate different spectral constituentsat a sub-pixel level. Another advantage over statistical trans-formations such as principal component analysis (PCA) or theminimum noise fraction (MNF) is that unmixing-based featuresare physically meaningful since they can be interpreted as theabundance of spectral constituents. In turn, previously developedunmixing-based feature extraction chains do not include spatialinformation. In this paper, two new contributions are proposed.First, we develop a new unmixing-based feature extraction tech-nique which integrates the spatial and the spectral informationusing a combination of unsupervised clustering and partial spec-tral unmixing. Second, we conduct a quantitative and comparativeassessment of unmixing-based versus traditional (supervised andunsupervised) feature extraction techniques in the context ofhyperspectral image classification. Our study, conducted using avariety of hyperspectral scenes collected by different instruments,provides practical observations regarding the utility and type offeature extraction techniques needed for different classificationscenarios.

Index Terms—Hyperspectral image classification, spatial-spec-tral integration, spectral unmixing, support vector machines(SVMs), unmixing-based feature extraction.

Manuscript received July 19, 2011; revised October 24, 2011; acceptedNovember 03, 2011. Date of publication April 09, 2012; date of currentversion May 23, 2012. This work was supported by the European Commu-nity’s Marie Curie Research Training Networks Programme under referenceMRTN-CT-2006-035927, Hyperspectral Imaging Network (HYPER-I-NET),by the Spanish Ministry of Science and Innovation (HYPERCOMP/EODIXproject, reference AYA2008-05965-C04-02), and by the Junta de Extremadura(local government) under project PRI09A110.I. Dópido is with the Hyperspectral Computing Laboratory, University of Ex-

tremadura, Cáceres, 10071 Extremadura, Spain.A. Plaza is with the Hyperspectral Computing Laboratory, University of Ex-

tremadura, Cáceres, 10071 Extremadura, Spain (corresponding author, e-mail:[email protected]).A. Villa is with the GIPSA-Lab, Signal and Image Department, Grenoble

Institute of Technology, INP, 38042 Grenoble, France, and also with the Facultyof Electrical and Computer Engineering, University of Iceland, 101 Reykjavik,Iceland.P. Gamba is with Telecommunications and Remote Sensing Laboratory, Uni-

versity of Pavia, 27100 Pavia, Italy.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSTARS.2011.2176721

I. INTRODUCTION

T HE rich spectral information available in remotely sensedhyperspectral images allows for the possibility to dis-

tinguish between spectrally similar materials [1]. However,supervised classification of hyperspectral images is a verychallenging task due to the generally unfavorable ratio betweenthe (large) number of spectral bands and the (limited) numberof training samples available a priori, which results in theHughes phenomenon [2]. As shown in [3], when the number offeatures considered for classification is larger than a threshold,the classification accuracy starts to decrease. The applicationof methods originally developed for the classification of lowerdimensional data sets (such as multispectral images) providestherefore poor results when applied to hyperspectral images, es-pecially in the case of small training sets [4]. On the other hand,the collection of reliable training samples is very expensive interms of time and finance, and the possibility to exploit largeground truth information is not common [5]. To address thisissue, a dimensionality reduction step is often performed priorto the classification process, in order to bring the informationin the original space (which in the case of hyperspectral data isalmost empty [4]) to the right subspace which allows separatingthe classes by discarding information that is useless for classi-fication purposes. Several feature extraction techniques havebeen proposed to reduce the dimensionality of the data prior toclassification, thus mitigating the Hughes phenomenon. Thesemethods can be unsupervised (if no a priori information isavailable) or supervised (if available training samples are usedto project the data onto a classification-optimized subspace[6], [7]). Classic unsupervised techniques include principalcomponent analysis (PCA) [8], the minimum noise fraction(MNF) [9], or independent component analysis (ICA) [10].Supervised approaches comprise discriminant analysis forfeature extraction (DAFE), decision boundary feature extrac-tion (DBFE), and non-parametric weighted feature extraction(NWFE), among many others [4], [11].In the context of supervised classification, kernel methods

have been widely used due to their insensitivity to the curseof dimensionality [12]. However, the good generalization capa-bility of machine learning techniques such as the support vectormachine (SVM) [13] can still be enhanced by an adequate ex-traction of relevant features to be used for classification pur-poses [14], especially if limited training sets are available apriori. Recently, we have investigated this issue by developing

1939-1404/$26.00 © 2011 IEEE

422 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

Fig. 1. Block diagram illustrating an unsupervised clustering followed by MTMF technique for unmixing-based feature extraction.

Fig. 2. Block diagram illustrating a supervised clustering followed by MTMF technique for unmixing-based feature extraction.

a new set of feature extraction techniques based on spectral un-mixing concepts [15]. These techniques are intended to take ad-vantage of spectral unmixing models [16] in the characteriza-tion of training samples, thus including additional informationabout sub-pixel composition that can be exploited at the classifi-cation stage. Another advantage of unmixing-based techniquesover statistical transformations such as PCA, MNF or ICA isthe fact that the features derived by spectral unmixing are physi-cally meaningful since they can be interpreted as the abundanceof spectrally pure constituents. Although unmixing-based fea-ture extraction offers an interesting alternative to classic (super-vised and unsupervised approaches), several important aspectsdeserve further attention [17]:1) First, the unmixing-based chains discussed in [15] do notinclude spatial information, which is an important sourceof information since hyperspectral images exhibit spatialcorrelation between image features.

2) Second, the study in [15] suggested that partial unmixing[18], [19] could be an effective solution to deal with thelikely fact that not all pure spectral constituents in thescene (needed for spectral unmixing purposes) are knowna priori, but a more exhaustive investigation of partial un-mixing (particularly in combination with spatial informa-tion) is needed.

3) Finally, the number of features to be extracted prior to clas-sification was set in [15] to an empirical value given by theintrinsic dimensionality of the input data. However, in thecontext of supervised feature extraction the number of fea-tures to be retained is probably linked to the characteristicsof the training set rather than the full hyperspectral image.Hence, a detailed investigation of the optimal number offeatures that need to be extracted prior to classification ishighly desirable.

In this paper, we address the aforementioned issues by meansof two highly innovative contributions. First, a new feature ex-traction technique exploiting sub-pixel information is proposed.This approach integrates spatial and spectral information usingunsupervised clustering in order to define spatially homoge-neous regions prior to the partial unmixing stage. A second con-tribution of this work is a detailed investigation on the issue ofhow many (and what type of) features should be extracted priorto SVM-based classification of hyperspectral data. For this pur-pose, different types of (classic and unmixing-based) feature ex-traction strategies, both unsupervised and supervised in nature,are considered.The remainder of the paper is organized as follows. Section II

describes a new unmixing-based feature extraction techniquewhich integrates the spatial and the spectral information. A su-

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 423

TABLE INUMBER OF PIXELS IN EACH GROUND-TRUTH CLASS IN THE FOUR CONSIDERED HYPERSPECTRAL IMAGES.THE NUMBER OF TRAINING AND TEST PIXELS USED IN OUR EXPERIMENTS CAN BE DERIVED FROM THIS TABLE.

Fig. 3. (a) False color composition of the AVIRIS Indian Pines scene. (b) Ground truth-map containing 16 mutually exclusive land-cover classes (right).

pervised and an unsupervised version of this technique are de-veloped. Section III describes several representative hyperspec-tral scenes which have been used in our experiments. This in-cludes three scenes collected by the Airborne Visible Infra-RedImaging Spectrometer (AVIRIS) [20] system over the regionsof Indian Pines, Indiana, Kennedy Space Center, Florida, andSalinas Valley, California, and also a hyperspectral scene col-lected by the Reflective Optics Spectrographic Imaging System(ROSIS) [21] over the city of Pavia, Italy. Section IV providesan experimental comparison of the proposed feature extractionchains with regards to other classic and unmixing-based ap-proaches, using the four considered hyperspectral image scenes.Section V concludes the paper with some remarks and hints atplausible future research lines.

II. A NEW UNMIXING-BASED FEATUREEXTRACTION TECHNIQUE

This section is organized as follows. In Section II-A wefix notation and describe some general concepts about linear

spectral unmixing, adopted as our baseline mixture model dueto its simplicity and computational tractability. Section II-Bdescribes an unsupervised feature extraction strategy basedon spectral unmixing concepts. This strategy first performs-means clustering, searching for as many classes as thenumber of features that need to be retained. The centroidsof each cluster are considered as the endmembers, and thenthe features are obtained by applying spectral unmixing forabundanceestimation.Themainobjectiveof this chain is to solveproblemshighlightedbyendmemberextractionbasedalgorithms,which are sensitive to outliers and pixels with extreme valuesof reflectance. By using an unsupervised clustering method,the endmembers extracted are expected to be more spatiallysignificant. Finally, Section II-C describes a modified versionof the feature extraction technique in which the endmembersare searched in the available training set instead of the entireoriginal image. Here, our assumption is that training samplesmay better represent the available land cover classes in thesubsequent classification process.

424 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

Fig. 4. (a) False color composition an AVIRIS hyperspectral image comprising several agricultural fields in Salinas Valley, California. (b) Ground truth-mapcontaining 15 mutually exclusive land-cover classes. (c) Photographs taken at the site during data collection.

A. Linear Spectral Unmixing

Let us denote a remotely sensed hyperspectral scene withbands by , in which the pixel at the discrete spatial coordi-nates of the scene is represented by a vector

, where denotes the setof real numbers in which the pixel’s spectral response atsensor channels is included. Under the linear mix-ture model assumption, each pixel vector in the original scenecan be modeled using the following expression:

(1)

where denotes the spectral response of endmember ,is a scalar value designating the fractional abundance

of the endmember at the pixel , is the total number

of endmembers, and is a noise vector. An unconstrainedsolution to (1) is simply given by the following expression [22]:

(2)

Two physical constrains are generally imposed into themodeldescribed in (1), these are the abundance non-negativity con-straint (ANC), i.e., , and the abundance sum-to-oneconstraint (ASC), i.e., [23]. Imposing theASC constraint results in the following optimization problem:

(3)

Similarly, imposing the ANC constraint results in the followingoptimization problem:

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 425

Fig. 5. (a) False color composition of the ROSIS Pavia scene. (b) Ground truth-map containing 9 mutually exclusive land-cover classes. (c) Training set commonlyused for the ROSIS Pavia scene.

(4)

As indicated in [23], a fully constrained (i.e. ASC-constrainedand ANC-constrained) estimate can be obtained in least-squaressense by solving the optimization problems in (3) and (4) si-multaneously. However, in order for such estimate to be mean-ingful, it is required that the spectral signatures of all endmem-bers, i.e., , are available a priori, which is not alwayspossible. Such fully constrained linear spectral unmixing esti-mate is generally referred to in the literature by the acronymFCLSU. In the case where not all endmember signatures areavailable in advance, partial unmixing has emerged as a suit-able alternative to solve the linear spectral unmixing problem[19].

B. Unsupervised Unmixing-Based Feature Extraction

In this subsection we describe our first approach to designa new unmixing-based feature extraction technique which inte-grates spatial and spectral information. It can be summarized bythe flowchart in Fig. 1. First, we apply the -means algorithm[24] to the original hyperspectral image. Its goal is to determinea set of points, called centers, so as to minimize the meansquared distance from each pixel vector to its nearest center. Thealgorithm is based on the observation that the optimal placementof a center is at the centroid of the associated cluster. It startswith a random initial placement. At each stage, the algorithmmoves every center point to the centroid of the set of pixel vec-tors for which the center is a nearest neighbor according to thespectral angle (SA) [16], and then updates the neighborhood byrecomputing the SA from each pixel vector to its nearest center.These steps are repeated until the algorithm converges to a pointthat is a minimum for the distortion [24]. The output of -means

is a set of spectral clusters, each made up of one or more spa-tially connected regions. In order to determine the number ofclusters (endmembers) in advance, techniques used to estimatethe number of endmembers like the virtual dimensionality (VD)[25] or the hyperspectral subspace identification by minimumerror (HySime) [26] can be used. In our experiments we varythe number of clusters in a certain range in order to analyze theimpact of this parameter. In fact, our main motivation for usinga partial unmixing technique at this point is the fact that the es-timation of the number of endmembers in the original image isa very challenging issue. It is possible that the actual number ofendmembers in the original image, , is larger than the numberof clusters derived by -means. In this case, in order to unmixthe original image we need to address a situation in which notall endmembers may be available a priori). It has been shown inprevious work that the FCLSU technique does not provide ac-curate results in this scenario [15]. In turn, it is also possible that

. In this case, partial unmixing has shown great success[19] in abundance estimation. Following this line of reasoning,we have decided to resort to partial unmixing techniques in thiswork.A successful technique to estimate abundance fractions in

such partial unmixing scenarios is mixture-tuned matchedfiltering (MTMF) [19]—also known in the literature asconstrained energy minimization (CEM) [18], [22]—whichcombines the best parts of the linear spectral unmixing modeland the statistical matched filter model while avoiding somedrawbacks of each parent method. From matched filtering,it inherits the ability to map a single known target withoutknowing the other background endmember signatures, unlikethe standard linear unmixing model. From spectral mixturemodeling, it inherits the leverage arising from the mixed pixelmodel and the constraints on feasibility including the ASC andANC requirements. It is essentially a target detection algorithmdesigned to identify the presence (or absence) of a specifiedmaterial by producing a score of 1 for pixels wholly covered

426 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

TABLE IIOVERALL AND AVERAGE CLASSIFICATION ACCURACY (IN PERCENTAGE) OBTAINED BY THE CONSIDERED CLASSIFICATION SYSTEM FOR DIFFERENT

HYPERSPECTRAL IMAGE SCENES USING THE ORIGINAL SPECTRAL INFORMATION, UNSUPERVISED FEATURE EXTRACTION TECHNIQUES, AND SUPERVISED FEATUREEXTRACTION TECHNIQUES. ONLY THE BEST CASE IS REPORTED FOR EACH CONSIDERED FEATURE EXTRACTION TECHNIQUE (WITH THE OPTIMAL NUMBER OFFEATURES IN THE PARENTHESES) AND THE BEST CLASSIFICATION RESULT ACROSS ALL METHODS IN EACH EXPERIMENT IS HIGHLIGHTED IN BOLD TYPEFACE

by the material of interest, while keeping the average scoreover an image as small as possible. It uses just one endmemberspectrum (that of the target of interest) and therefore behavesas a partial unmixing method that suppresses background noiseand estimates the sub-pixel abundance of a single endmembermaterial without assuming the presence of all endmembersin the scene, as it is the case with FCLSU. If we assume that

is the endmember to be characterized, MTMF estimatesthe abundance fraction of in a specific pixel vector

of the scene as follows:

(5)

where is the matrix:

(6)

with and respectively denoting the number of samples and thenumber of lines in the original hyperspectral image. As shownby Fig. 1, the features resulting from the proposed unmixing-based technique, referred to hereinafter as unsupervised clus-tering followed by MTMF , are used to train

an SVM classifier with a few randomly selected labeled sam-ples. The classifier is then tested using the remaining labeledsamples.

C. Supervised Unmixing-Based Feature Extraction

Fig. 2 describes a variation of the techniquepresented in the previous subsection in which the endmembersare extracted from the available (labeled) training samples in-stead of from the original image. This introduces twomain prop-erties with regards to : 1) the number of endmem-bers to be extracted is given by the total number of differentclasses, , in the labeled samples available in the training set,and 2) the endmembers (class centers) are obtained after clus-tering the training set, which reduces computational complexitysignificantly. The increase in computational performance comesat the expense of introducing an additional consideration. In thisscenario, it is likely that the actual number of endmembers in theoriginal image, , is larger than the number of different classescomprised by available labeled training samples, . Therefore,in order to unmix the original image we again need to addressa partial unmixing problem. Then, as shown by Fig. 2, standardSVM classification is performed on the stack of abundance frac-tions using randomly selected training samples. Hereinafter, we

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 427

Fig. 6. Classification results for the AVIRIS Indian Pines scene (obtained using an SVM classifier with Gaussian kernel, trained with 5% of the available samples).(a) Ground Truth; (b) PCA; (c) ICA; (d) MNF; (e) ; (f) ; (g) NWFE; (h) ; (i) ; (j) .

refer to the feature extraction technique described in Fig. 2 assupervised clustering followed by MTMF .

III. HYPERSPECTRAL DATA SETS

In order to have a fair experimental comparison between theproposed and available feature extraction approaches, severalrepresentative hyperspectral data sets are investigated. In thiswork, we have considered four different images captured by twodifferent sensors: AVIRIS and ROSIS. The images span a widerange of land cover use, from agricultural areas of Indian Pines

and Salinas, to urban zones in the town of Pavia and mixed veg-etation/urban features in Kennedy Space Center. The number ofground-truth pixels per class for all the considered hyperspectralimages is given in Table I. In the following, we briefly describeeach of the data sets considered in our study.

A. AVIRIS Indian Pines

The first data set used in our experiments was collected by theAVIRIS sensor over the Indian Pines region in Northwestern In-diana in 1992. This scene, with a size of 145 lines by 145 sam-

428 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

ples, was acquired over a mixed agricultural/forest area, earlyin the growing season. The scene comprises 202 spectral chan-nels in the wavelength range from 0.4 to 2.5 m, nominal spec-tral resolution of 10 nm, moderate spatial resolution of 20 me-ters by pixel, and 16-bit radiometric resolution. After an initialscreening, several spectral bands were removed from the dataset due to noise and water absorption phenomena, leaving a totalof 164 radiance channels to be used in the experiments. For il-lustrative purposes, Fig. 3(a) shows a false color compositionof the AVIRIS Indian Pines scene, while Fig. 3(b) shows theground-truth map available for the scene, displayed in the formof a class assignment for each labeled pixel, with 16 mutuallyexclusive ground-truth classes. These data, including ground-truth information, are available online,1 a fact which has madethis scene a widely used benchmark for testing the accuracy ofhyperspectral data classification algorithms.

B. AVIRIS Salinas Valley

The second AVIRIS data set used in experiments was col-lected over the Valley of Salinas in Southern California. Thefull scene consists of 512 lines by 217 samples with 186 spec-tral bands (after removal of water absorption and noisy bands)from 0.4 to 2.5 m, nominal spectral resolution of 10 nm, and16-bit radiometric resolution. It was taken at low altitude witha pixel size of 3.7 meters (high spatial resolution). The data in-clude vegetables, bare soils and vineyard fields. Fig. 4(a) showsa false color composition of the scene and Fig. 4(b) shows theavailable ground-truth regions for this scene, which cover abouttwo thirds of the entire Salinas scene. Finally, Fig. 4(c) showssome pictures of selected land-cover classes taken on the im-aged site at the same time as the data was being collected by thesensor. Of particular interest are the relevant differences in theromaine lettuce classes resulting from different soil cover pro-portions.

C. AVIRIS Kennedy Space Center

The third data set used in experiments was collected by theAVIRIS sensor over the Kennedy Space Center,2 Florida, onMarch 1996. The portion of this scene used in our experimentshas dimensions of 292 383 pixels. After removing water ab-sorption and low SNR bands, 176 bands were used for the anal-ysis. The spatial resolution is 20 meters by pixel. 12 ground-truth classes where available, where the number of pixels in thesmallest class is 105 while the number of pixels in the largestclass is 761.

D. ROSIS Pavia

The fourth data set used in experiments was collected by theROSIS optical sensor over the urban area of the University ofPavia, Italy. The flight was operated by the Deutschen Zentrumfor Luftund Raumfahrt (DLR, the German Aerospace Agency)in the framework of the HySens project, managed and sponsoredby the European Union. The image size in pixels is 610 340,with very high spatial resolution of 1.3 meters per pixel. Thenumber of data channels in the acquired image is 115 (with spec-tral range from 0.43 to 0.86 m). Fig. 5(a) shows a false color

1http://dynamo.ecn.purdue.edu/biehl/MultiSpec.2Available online: http://www.csr.utexas.edu/hyperspectral/data/KSC/.

composite of the image, while Fig. 5(b) shows nine ground-truthclasses of interest, which comprise urban features, as well assoil and vegetation features. Finally, Fig. 5(c) shows a com-monly used training set directly derived from the ground-truthin Fig. 5(b).

IV. EXPERIMENTAL RESULTS

In this section we conduct a quantitative and comparativeanalysis of different feature extraction techniques for hyper-spectral image classification, including unmixing-based andmore traditional (supervised and unsupervised) approaches.The main goal is to use spectral unmixing and classification ascomplementary techniques, since the latter are more suitablefor the classification of pixels dominated by a single land coverclass, while the former are devoted to the characterizationof mixed pixels. Because hyperspectral images often containareas with both pure and mixed pixels, the combination of thesetwo analysis techniques provides a synergistic data processingapproach that has been explored in previous contributions[15], [27]–[30]. Before describing the results obtained in ex-perimental validation, we first describe the feature extractiontechniques that will be used in our comparison comparison inSection IV-A. Then, Section IV-B describes the adopted super-vised classification system and the experimental setup. Finally,Section IV-C discusses the obtained results in comparativefashion.

A. Feature Extraction Techniques Used in the Comparison

In our classification system, relevant features are first ex-tracted from the original image. Several types of input featureshave been considered in the classification experiments con-ducted in this work. In the following, we provide an overviewof the techniques used to extract features from the original hy-perspectral data. A detailed mathematical description of thesetechniques is out of the scope of this work, since most of themare algorithms well known in the remote sensing literature,so only a short description of the conceptual basics for eachmethod is given here. The techniques are divided into unsuper-vised approaches, if the algorithm is applied on the whole datacube, or supervised techniques, if the information associatedwith the training set of the data is somehow exploited duringthe feature extraction step.1) Unsupervised Feature Extraction Techniques: We con-

sider five unsupervised feature extraction techniques in thiswork. Three of them are classic algorithms available in theliterature (PCA, MNF and ICA), and the two remaining onesare based on the exploitation of sub-pixel information throughspectral unmixing, including the best unsupervised methodin [15] and a newly proposed technique inthis work. A brief summary of the considered unsupervisedtechniques follows:• Principal component analysis (PCA) is an orthogonallinear transformation which projects the data into newcoordinate system, such that the greatest amount ofvariance of the original data is contained in the firstprincipal components [11]. The resulting componentsare uncorrelated.

• Minimum noise fraction (MNF) differs from PCA in thefact that MNF ranks the obtained components accordingto their signal-to-noise ratio [9].

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 429

Fig. 7. Classification results for the AVIRIS Salinas Valley scene (obtained using an SVM classifier with Gaussian kernel, trained with 2% of the availablesamples). (a) Ground Truth; (b) PCA; (c) ICA; (d) MNF; (e) ; (f) ; (g) NWFE; (h) ; (i) ; (j) .

• Independent component analysis (ICA) tries to findcomponents as statistically independent as possible, mini-mizing all the dependencies in the order up to fourth [10].There are several strategies that can be adopted to define

independence (e.g., minimization of mutual information,maximization of non-Gaussianity, etc.) In this work,among several possible implementations, we have chosenJADE [31] which provides a good tradeoff between per-

430 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

formance and computational complexity when used fordimensionality reduction of hyperspectral images.

• Unsupervised Mixture-Tuned Matched Filtering, which first performs an MNF-based

dimensionality reduction and then applies the MTMFmethod in order to estimate fractional abundances of spec-tral endmembers extracted from the original data using theorthogonal subspace projection (OSP) algorithm [32]. In[15] it is shown that MTMF outperforms other techniquesfor abundance estimation such as unconstrained and fullyconstrained linear spectral unmixing (FCLSU) [23] sinceit can provide meaningful abundance maps by means ofpartial unmixing in case not all endmembers are availablea priori.

• Unsupervised Clustering followed by Mixture-TunedMatched Filtering developed in thiswork and intended to solve the problems highlighted byendmember extraction algorithms which are sensitive tooutliers and pixels with extreme values of reflectance.By using an unsupervised clustering method such as the-means to extract features, the endmembers extracted areexpected to be more spatially significant.

• Unsupervised Fuzzy Clustering is an extensionof the -means clustering method [33] which provides softclusters, where a particular pixel has a degree of member-ship in each cluster. This strategy is faster than the two pre-vious strategies as it does not include a spectral unmixingstep.

2) Supervised Feature Extraction Techniques: We considerseveral supervised feature extraction techniques in this work.The first techniques considered were discriminant analysis forfeature extraction (DAFE) and decision boundary feature ex-traction (DBFE) [4]. However, DBFE could not be applied in thecase of very limited training sets since it requires a number ofsamples (for each class) bigger than the number of dimensionsof the original data set in order to estimate the statistics used toproject the data. As it will be shown in the next sections, theserequirement was not satisfied for most of the experiments car-ried out in this work. In turn, the results provided by DAFEwerepoor compared to the othermethods for a low number of trainingsamples, hence we did not include them in our comparison. As aresult, the supervised methods adopted in our comparison wereNWFE and three sub-pixel techniques based on estimating frac-tional abundances. Two of them were already presented in [15],and the third one is the technique developed in thiswork. Although a number of supervised feature extraction tech-niques has been available in the literature [4], according to ourexperiments the advantages provided by supervised techniquesis not always evident, especially in the case of limited trainingsets [34]. A brief summary of the considered supervised tech-niques follows:• Non-parametric weighted feature extraction (NWFE)focuses on selecting samples near the eventual decisionboundaries that best separate the classes. The main ideasof the NWFE are: 1) assigning different weights to everytraining sample in order to compute local means, and 2)defining non-parametric between-class and within-classscatter matrices to perform feature extraction [4].

• Supervised Mixture-Tuned Matched Filteringis equivalent to but as-

suming that the pure spectral components are searched bythe OSP endmember extraction algorithm in the trainingset instead of in the entire hyperspectral image. Ourassumption is that training samples may better representthe available land cover classes in the subsequentclassification process [15].

• Averaged Mixture-Tuned Matched Filteringis equivalent to but assuming that the represen-tative spectral signatures are obtained as the average of thesignatures belonging to each class in the training set (here,the number of components to be retained by the MNF ap-plied prior to the MTMF is varied in a given range). In thiscase, the OSP algorithm is not used to extract the spectralsignatures, which are obtained in supervised fashion fromthe available training samples [15].

• Supervised Clustering followed by Mixture-TunedMatched Filtering developed in thiswork and acting as the supervised counterpart of

). It mainly differs with regards to that tech-nique in the fact that the clustering process is performedin the training samples, and not in the full hyperspectralimage.

B. Supervised Classification System and Experimental Setup

In our supervised classification system, different types ofinput features are extracted from the original hyperspectralimage prior to classification. In addition to the unsupervisedand supervised feature extraction techniques described in theprevious subsection, we also use the (full) original spectralinformation available in the hyperspectral data as input to theproposed classification system. In the latter case, the dimen-sionality of the input features used for classification equals ,the number of spectral bands in the original data set. Whenusing feature extraction techniques, the number of features wasvaried empirically in our experiments and only the best resultsare reported. In all cases, a supervised classification processwas performed using the SVM classifier with Gaussian kernel(observed to perform better than other tested kernels, such aspolynomial or linear). Kernel parameters were optimized by agrid search procedure, and the optimal parameters were selectedusing 10-fold cross-validation (selected after testing differentconfigurations). The LIBSVM library3 was in our experiments.In order to evaluate the ability of the tested methods to per-form under training sets with different number of samples, weadopted the following training-test configurations:• In our experiments with the AVIRIS Indian Pines data setin Fig. 3(a), we randomly selected 5% and 15% of thepixels in each ground-truth class in Table I and used themto build the training set. The remaining pixels were used astest pixels.

• In our experiments with the AVIRIS Salinas data set inFig. 4(a), in which the size of the smaller classes is biggerwhen compared to those in the AVIRIS Indian Pines dataset, we decided to reduce the training sets even more andselected only 2% and 5% of the available ground-truthpixels in Table I for training purposes.

3http://www.csie.ntu.edu.tw/cjlin/libsvm/.

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 431

• In our experiments with the AVIRIS Kennedy SpaceCenter data set, we decided to reduce the training setseven more and selected only 1% and 5% of the availableground-truth pixels in Table I for training purposes.

• Finally, in our experiments with the ROSIS Pavia data setin Fig. 5(a), we used the training set in Fig. 5(c) and alsoa different training set made up of only 50 pixels for eachclass in Table I for comparative purposes.

Based on the aforementioned training sets, the overall (OA)and average (AA) classification accuracies were computed overthe remaining test samples for each data set. This experimentwas repeated ten times to guarantee statistical consistency, andthe average results after ten runs are provided. An assessmentof the obtained results is reported in the following subsection.

C. Analysis and Discussion of Results

Table II shows the OA and AA (in percentage) obtained bythe considered classification system for different hyperspectralscenes using the original spectral information as input feature,and also the features provided by the unsupervised and super-vised feature extraction techniques described in Section IV-A.It is important to emphasize that, in the tables, we only reportthe best case (meaning the one with highest OA) for each con-sidered feature extraction technique, after testing numbers ofextracted features ranging from 5 to 50. In all cases, this rangewas sufficient to observe a decline in classification OA after acertain number of features, so the number given in the paren-theses in the tables correspond to the optimal number of featuresfor each considered feature extraction technique (in the case ofthe original spectral information, the number in the parenthesescorresponds to the number of bands of the original hyperspec-tral image). Finally, in order to outline the best feature extractiontechnique in each considered experiment, we highlight in boldtypeface the best classification result observed across all testedfeature extraction methods. In previous work [15], the statis-tical significance of some of the processing chains consideredin Table II was assessed using the McNemar’s test [35], con-cluding that the differences between the tested methods werestatistically significant. Other similar tests are also available inthe literature [36]. According to our experimental results, thesame observations regarding statistical significance apply to thenew processing chains included in this work.From Table II, several conclusions can be drawn. First and

foremost, we can observe that the use of supervised techniquesfor feature extraction is not always beneficial to improve theOA and AA, especially in case of limited training sets and sta-tistical feature extraction approaches. For example, NWFE ex-hibits better results when compared to traditional unsupervisedtechniques such as PCA or ICA. However, DAFE (not includedin the tables) exhibited quite poor results. The low performancesobtained by DAFE should be therefore attributed to the verysmall size of the training set and to the fact that the land coverclasses can be spectrally very close (as in the case of the AVIRISIndian Pines scene) thusmaking it very difficult to separate themby using spectral means and covariance matrices. Moreover, theimportance of integrating the additional information providedby the training samples is strictly connected with the nature ofthe considered approach. This can be noticed when comparing

the MTMF versus the CMTMF chains. In the former case, thebest results are generally provided by the supervised approach

since the supervised strategy for extracting spec-tral endmembers using the OSP approach benefits from the re-duction of outliers and pixels with extreme values of reflectance,which affect negatively this endmember extraction algorithm.In the latter case, the best results are generally provided by theunsupervised approach due to the fact that,when trying to identify clusters in a very small training set,several problems appear, such as the bad conditioning of ma-trices when computing the inverse (in the -means clusteringstep) or the eventual selection of very similar clusters, leadingto redundant information in class prototyping which ultimatelyaffects the subsequent partial unmixing step and the obtainedclassification performances. In addition to the aforementionedobservations, we emphasize that the supervised version derivesthe endmembers (via clustering) from a limited training set,while the unsupervised version derives the endmembers fromthe whole hyperspectral image. The former approach has theadvantage of computational complexity, as the search for end-members is only conducted in the small training set, but thiscomes at the expense of reduced modelling accuracy as ex-pected. Although in previous work we developedin the hope of addressing these problems, our experimental re-sults in this work indicate that CMTMF techniques in generaland in particular (an unsupervised approach asopposed to ) performs a better job in characterizingthe sub-pixel information prior to classification of hyperspec-tral data. Finally, it is also worth noting the good performanceachieved in all experiments by MNF, another unsupervised fea-ture extraction strategy. Figs. 6–8 show the results obtained insome of the experiments.An arising question at this point is whether there is any ad-

vantage of using unmixing chains versus the MNF transform.Since both feature extraction methods are unsupervised, withsimilar computational complexity and leading to similar clas-sification results, it is not clear from the context if there ex-ists any advantage of using an unmixing-based technique over awell-known, statistical method such as the MNF. In order to ad-dress this issue. Fig. 9 shows the first nine components extractedby the MNF from the ROSIS Pavia University image. Thesecomponents are ordered in terms of signal-to-noise ratio, withthe first component providing the maximum amount of infor-mation. Here, noise can be clearly appreciated in the last threecomponents. In turn, Fig. 10 shows the components extractedfor the same image by the technique. The com-ponents are arranged in no specific order, as spectral unmixingassigns the same priority to each endmember when deriving theassociated abundance map. As shown by Fig. 10, the compo-nents provided by the unmixing-based technique can be inter-preted in a physical manner (as the abundances of each spec-tral constituent in the scene) and most importantly these com-ponents can be related to the ground-truth classes in Fig. 5(a).This suggests that unmixing-based chains can provide an alter-native strategy to classic feature extraction chains such as theMNF with three main differences:1) Unmixing-based feature extraction techniques incorporateinformation about mixed pixels, which are the dominant

432 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

Fig. 8. Classification results for the ROSIS Pavia University scene (obtained using an SVM classifier with Gaussian kernel, trained with 50 pixels of each avail-able ground-truth class). (a) Ground Truth; (b) PCA; (c) ICA; (d) MNF; (e) ; (f) ; (g) NWFE; (h) ; (i) ; (j)

.

type of pixel in hyperspectral images. Quite opposite, stan-dard feature extraction techniques such as the MNF donot incorporate the pure/mixed nature of the pixels in hy-perspectral data, disregarding a source of information thatcould be useful for the final classification.

2) The components provided by unmixing-based feature ex-traction techniques can be interpreted as the abundanceof spectral constituents in the scene, while the compo-nents provided by other classic feature extraction tech-

niques such as the MNF do not necessarily have any phys-ical meaning.

3) Unmixing-based feature extraction techniques do not pe-nalize classes which are not relevant in terms of varianceor signal-to-noise ratio, while some classic feature extrac-tion techniques such as the MNF relegate variations of lesssignificant size to low-order components. If such low-ordercomponents are not preserved, small classes may be af-fected.

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 433

Fig. 9. Components extracted by the MNF from the ROSIS Pavia University scene (ordered from left to right in terms of amount of information).

Fig. 10. Components extracted by the feature extraction technique from the ROSIS Pavia University scene (in no specific order).

An additional aspect resulting from our experiments is thatunmixing-based chains allow for a natural integration of the spa-tial information available in the original hyperspectral image(through the clustering strategy for endmember extraction de-signed in this work). Although the aforementioned aspects mayoffer important advantages in hyperspectral data classification,the true fact is that our comparative assessment (conducted interms of OA andAA using four representative hyperspectral im-ages) only indicates a moderate improvement (or comparableperformance) of the best unmixing-guided feature extractionmethod with regards to the best statistical fea-ture extraction method (MNF) reported in our experiments. Thisleads us to believe that further improvements to the integrationof the information provided by spectral unmixing into the clas-sification process are possible. With this in mind, we anticipatesignificant advances in the integration of spectral unmixing andclassification of hyperspectral data in future developments.

V. CONCLUSIONS AND FUTURE LINES

In this paper, we have investigated the advantages that canbe gained by including information about spectral mixing atsub-pixel levels in the feature extraction stage that is usuallyconducted prior to hyperspectral image classification. For thispurpose, we have developed a new unmixing-based feature ex-traction technique that combines the spatial and the spectral in-formation through a combination of unsupervised clustering andpartial spectral unmixing. We have compared our newly devel-oped technique (which can be applied in both unsupervised andsupervised fashion) with other classic and unmixing-based tech-niques for feature extraction. Our detailed quantitative and com-parative assessment has been conducted using four represen-tative hyperspectral images collected by two different instru-ments (AVIRIS and ROSIS) over a variety of test sites and inthe framework of supervised classification scenarios dominatedby the limited availability of training samples. Our experimentalresults indicate that the unsupervised version of our newly de-veloped technique provides components which are physicallymeaningful and significant from a spatial point of view, resulting

in good classification accuracies (without penalizing very smallclasses) when compared to the other feature extraction tech-niques tested in this work. In turn, since our analysis scenariosare dominated by very limited training sets, we have experimen-tally observed that, in this context, the use of supervised featureextraction techniques can lead to lower classification accura-cies as the information considered for projecting the data intoa lower-dimensional space is not representative of the thematicclasses of the image.Future developments of this work will include an investiga-

tion of additional techniques for feature extraction from a spec-tral unmixing point of view, in order to fully substantiate theadvantages that can be gained at the feature extraction stage byincluding additional information about mixed pixels (which arepredominant in hyperspectral images) prior to classification pur-poses. Another research line deserving future attention is the de-termination of automatic procedures to determine the optimalnumber of features to be extracted from each tested method.While methods for estimating the intrinsic dimensionality of hy-perspectral images exist, the determination of the number of fea-tures suitable for classification purposes depends on each par-ticular method and, in the case of supervised feature extractionmethods, on the available training. Although in this work wehave investigated performance in a suitable range of extractedfeatures, the automatic determination of the optimal number offeatures for each method should be investigated in future workfor practical reasons. Finally, future work should also considernonlinear feature extraction methods such as kernel PCA [37]in addition to the linear feature extraction methods consideredin this work.

ACKNOWLEDGMENT

The authors would like to gratefully thank the Associate Ed-itor and the anonymous reviewers for their outstanding sugges-tions, which greatly helped to improve the technical quality andpresentation of this paper. The authors would also like to thankD. Landgrebe, M. Crawford, and L. F. Johnson for sharing thehyperspectral data sets used in this work.

434 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 2, APRIL 2012

REFERENCES

[1] A. Plaza, J. A. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G.Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, J. Gualtieri, M. Mar-concini, J. C. Tilton, and G. Trianni, “Recent advances in techniquesfor hyperspectral image processing,” Remote Sens. Environ., vol. 113,pp. 110–122, 2009.

[2] G. F. Hughes, “On the mean accuracy of statistical pattern recog-nizers,” IEEE Trans. Inf. Theory, vol. 14, pp. 55–63, Jan. 1968.

[3] K. Fukunaga, Introduction to Statistical Pattern Recognition, S. Diego,Ed. San Diego, CA: Academic Press, 1990.

[4] D. A. Landgrebe, Signal Theory Methods in Multispectral RemoteSensing. New York: Wiley, 2003.

[5] L. Bruzzone, M. Chi, and M. Marconcini, “A novel transductive SVMfor the semisupervised classification of remote sensing images,” IEEETrans. Geosci. Remote Sens., vol. 44, pp. 3363–3373, 2006.

[6] L. Jimenez and D. A. Landgrebe, “Supervised classification in highdimensional space: Geometrical, statistical and asymptotical propertiesof multivariate data,” IEEE Trans. Syst., Man, Cybern. B: Cybernetics,vol. 28, pp. 39–54, Feb. 1993.

[7] Q. Jackson and D. A. Landgrebe, “An adaptive classifier design forhigh dimensional data analysis with a limited training data set,” IEEETrans. Geosci. Remote. Sens., vol. 39, pp. 2664–2679, Dec. 2001.

[8] J. A. Richards and X. Jia, Remote Sensing Digital Image Analysis: AnIntroduction. New York: Springer, 2006.

[9] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transfor-mation for ordering multispectral data in terms of image quality withimplications for noise removal,” IEEE Trans. Geosci. Remote Sens.,vol. GRS-26, pp. 65–74, 1988.

[10] P. Comon, “Independent component analysis, a new concept?,” SignalProcess., vol. 36, no. 3, pp. 287–314, 1994.

[11] J. A. Richards, “Analysis of remotely sensed data: The formativedecades and the future,” IEEE Trans. Geosci. Remote Sens., vol. 43,pp. 422–432, 2005.

[12] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyper-spectral image classification,” IEEE Trans. Geosci. Remote Sens., vol.43, pp. 1351–1362, 2005.

[13] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances,and J. Calpe-Maravilla, “Composite kernels for hyperspectral imageclassification,” IEEE Geosci. Remote Sens. Lett., vol. 3, pp. 93–97,2006.

[14] A. Plaza, P.Martinez, J. Plaza, and R. Perez, “Dimensionality reductionand classification of hyperspectral image data using sequences of ex-tended morphological transformations,” IEEE Trans. Geosci. RemoteSens., vol. 43, no. 3, pp. 466–479, 2005.

[15] I. Dopido, M. Zortea, A. Villa, A. Plaza, and P. Gamba, “Unmixingprior to supervised classification of remotely sensed hyperspectral im-ages,” IEEE Geosci. Remote Sens. Lett., vol. 8, pp. 760–764, 2011.

[16] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE SignalProcess. Mag., vol. 19, no. 1, pp. 44–57, 2002.

[17] I. Dopido, A. Villa, A. Plaza, and P. Gamba, “A comparative assess-ment of several processing chains for hyperspectral image classifica-tion: What features to use?,” in Proc. IEEE/GRSS Workshop on Hyper-spectral Image and Signal Processing: Evolution in Remote Sensing,2011, vol. 1.

[18] C.-I. Chang, J.-M. Liu, B.-C. Chieu, C.-M. Wang, C. S. Lo, P.-C.Chung, H. Ren, C.-W. Yang, and D.-J. Ma, “Generalized constrainedenergy minimization approach to subpixel target detection for multi-spectral imagery,” Opt. Eng., vol. 39, pp. 1275–1281, 2000.

[19] J. Boardman, “Leveraging the high dimensionality of AVIRIS data forimproved subpixel target unmixing and rejection of false positives:Mixture tuned matched filtering,” in Proc. 5th JPL Geoscience Work-shop, 1998, pp. 55–56.

[20] R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aron-sson, B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit, and M.Solis et al., “Imaging spectroscopy and the airborne visible/infraredimaging spectrometer (AVIRIS),” Remote Sens. Environ., vol. 65, no.3, pp. 227–248, 1998.

[21] P. Gamba, F. Dell’Acqua, A. Ferrari, J. A. Palmason, and J. A.Benediktsson, “Exploiting spectral and spatial information in hyper-spectral urban data with high resolution,” IEEE Geosci. Remote Sens.Lett., vol. 1, pp. 322–326, 2004.

[22] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detec-tion and Classification. New York: Kluwer Academic/Plenum Pub-lishers, 2003.

[23] D. Heinz and C.-I. Chang, “Fully constrained least squares linearmixture analysis for material quantification in hyperspectral imagery,”IEEE Trans. Geosci. Remote Sens., vol. 39, pp. 529–545, 2001.

[24] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clus-tering algorithm,” J. Royal Statistical Society, Series C (Applied Sta-tistics), vol. 28, pp. 100–108, 1979.

[25] C.-I. Chang and Q. Du, “Estimation of number of spectrally distinctsignal sources in hyperspectral imagery,” IEEE Trans. Geosci. RemoteSens., vol. 42, no. 3, pp. 608–619, 2004.

[26] J. M. Bioucas-Dias and J. M. P. Nascimento, “Hyperspectral subspaceidentification,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 8, pp.2435–2445, 2008.

[27] B. Luo and J. Chanussot, “Unsupervised classification of hyperspectralimages by using linear unmixing algorithm,” in Proc. IEEE Int. Conf.Image Processing, 2009, pp. 2877–2880.

[28] L. Wang and X. Jia, “Integration of soft and hard classifications usingextended support vector machines,” IEEE Geosci. Remote Sens. Lett.,vol. 6, pp. 543–547, 2009.

[29] A. Villa, J. Chanussot, J. A. Benediktsson, and C. Jutten, “Spectralunmixing for the classification of hyperspectral images at a finer spatialresolution,” IEEE J. Sel. Topics Signal Process., vol. 5, pp. 521–533,2011.

[30] F. A. Mianji and Y. Zhang, “SVM-based unmixing-to-classificationconversion for hyperspectral abundance quantification,” IEEE Trans.Geosci. Remote Sens., vol. 49, no. 11, pp. 4318–4327, 2011.

[31] J.-F. Cardoso, “High-order contrasts for independent component anal-ysis,” Neural Computation, vol. 11, pp. 157–192, 1999.

[32] J. C. Harsanyi and C.-I. Chang, “Hyperspectral image classification anddimensionality reduction: An orthogonal subspace projection,” IEEETrans. Geosci. Remote Sens., vol. 32, pp. 779–785, 1994.

[33] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Al-gorithms. New York: Plenum Press, 1981.

[34] B. Mojaradi, H. Abrishami-Moghaddam, M. Zoej, and R. Duin, “Di-mensionality reduction of hyperspectral data via spectral feature ex-traction,” IEEE Trans. Geosci. Remote. Sensing, vol. 47, no. 7, pp.2091–2105, Jul. 2009.

[35] G. Foody, “Thematic map comparison: Evaluating the statistical sig-nificance of differences in classification accuracy,” Photogramm. Eng.Remote Sens., vol. 70, no. 5, pp. 627–633, 2004.

[36] S. García, A. Fernández, J. Luengo, and F. Herrera, “Advanced non-parametric tests for multiple comparisons in the design of experimentsin computational intelligence and data mining: Experimental analysisof power,” Inf. Sci., vol. 180, pp. 2044–2064, 2010.

[37] B. Scholkopf, A. J. Smola, and K.-R. Muller, “Nonlinear componentanalysis as a kernel eigenvalue problem,” Neural Computat., vol. 10,pp. 1299–1319, 1998.

Inmaculada Dópido received the B.S. and M.S. de-grees in telecommunications from the University ofExtremadura, Caceres, Spain, where she is currentlyworking towards the Ph.D. degree.She is a member of the Hyperspectral Computing

Laboratory (HyperComp) coordinated by Prof. An-tonio Plaza. Her research interests include remotelysensed hyperspectral imaging, pattern recognitionand signal and image processing, with particularemphasis on the development of new techniquesfor unsupervised and supervised classification and

spectral mixture analysis of hyperspectral data.Ms. Dópido has been a manuscript reviewer for the IEEE JOURNAL OF

SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING.

Alberto Villa (S’09–M’11) received the B.S. andM.S. degrees in electronic engineering from theUniversity of Pavia, Pavia, Italy, in 2005 and 2008,respectively. In 2011, he received the Ph.D. degree(a joint degree) from the Grenoble Institute ofTechnology (Grenoble INP), Grenoble, France, andthe University of Iceland, Reykjavik, Iceland.He was a visiting researcher at the Hyperspectral

Computing Laboratory (HyperComp), University ofExtremadura, Spain, from September 2010 to Feb-ruary 2011. Since July 2011, he has been working as

a research engineer for Aresys srl, a spin-off company of Politecnico di Milano

DÓPIDO et al.: A QUANTITATIVE AND COMPARATIVE ASSESSMENT OF UNMIXING-BASED FEATURE EXTRACTION TECHNIQUES 435

dealing with SAR imaging and ground-based radar. His research interests are inthe areas of SAR antenna model, spectral unmixing, machine learning, hyper-spectral imaging, signal and image processing.Dr. Villa is a reviewer for the IEEE TRANSACTIONS ON GEOSCIENCE

AND REMOTE SENSING and IEEE JOURNAL OF SELECTED TOPICS IN SIGNALPROCESSING.

Antonio Plaza (M’05–SM’07) received theM.S. andPh.D. degrees in computer engineering from the Uni-versity of Extremadura, Caceres, Spain.He was a Visiting Researcher with the Remote

Sensing Signal and Image Processing Laboratory,University of Maryland Baltimore County, Bal-timore, with the Applied Information SciencesBranch, Goddard Space Flight Center, Greenbelt,MD, and with the AVIRIS Data Facility, Jet Propul-sion Laboratory, Pasadena, CA. He is currentlyan Associate Professor with the Department of

Technology of Computers and Communications, University of Extremadura,Caceres, Spain, where he is the Head of the Hyperspectral Computing Labo-ratory (HyperComp). He was the Coordinator of the Hyperspectral ImagingNetwork (Hyper-I-Net), a European project designed to build an interdisci-plinary research community focused on hyperspectral imaging activities. Hehas been a Proposal Reviewer with the European Commission, the EuropeanSpace Agency, and the Spanish Government. He is the author or coauthor ofaround 300 publications on remotely sensed hyperspectral imaging, includingmore than 60 Journal Citation Report papers, 20 book chapters, and over 200conference proceeding papers. His research interests include remotely sensedhyperspectral imaging, pattern recognition, signal and image processing, andefficient implementation of large-scale scientific problems on parallel anddistributed computer architectures.Dr. Plaza has coedited a book on high-performance computing in remote

sensing and guest edited seven special issues on remotely sensed hyper-spectral imaging for different journals, including the IEEE TRANSACTIONSON GEOSCIENCE AND REMOTE SENSING (for which he serves as AssociateEditor on hyperspectral image analysis and signal processing since 2007), theIEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS ANDREMOTE SENSING (for which he serves as a member of the steering committeesince 2011), the International Journal of High Performance Computing Ap-plications, and the Journal of Real-Time Image Processing. He is also servingas an Associate Editor for the IEEE GEOSCIENCE AND REMOTE SENSINGNEWSLETTER. He has served as a reviewer for more than 280 manuscripts sub-

mitted to more than 50 different journals, including more than 140 manuscriptsreviewed for the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING.He has served as a Chair for the IEEE Workshop on Hyperspectral Image andSignal Processing: Evolution in Remote Sensing in 2011. He has also beenserving as a Chair for the SPIE Conference on Satellite Data Compression,Communications, and Processing since 2009, and for the SPIE Remote SensingEurope Conference on High Performance Computing in Remote Sensing since2011. Dr. Plaza is a recipient of the recognition of Best Reviewers of the IEEEGEOSCIENCE AND REMOTE SENSING LETTERS in 2009 and a recipient of therecognition of Best Reviewers of the IEEE TRANSACTIONS ON GEOSCIENCEAND REMOTE SENSING in 2010. He is currently serving as Director of Edu-cation activities and member of the Administrative Committee of the IEEEGeoscience and Remote Sensing Society.

Paolo Gamba (M’93–SM’00) is currently anAssociate Professor of telecommunications at theUniversity of Pavia, Italy. Since January 2009 heserves as Editor-in-Chief of the IEEE GEOSCIENCEAND REMOTE SENSING LETTERS. He also served asTechnical Co-Chair of the 2010 IEEE Geoscienceand Remote Sensing Symposium, Honolulu, Hawaii,July 2010. He has been the organizer and TechnicalChair of the biennial GRSS/ISPRS Joint Workshopson “Remote Sensing and Data Fusion over UrbanAreas” since 2001. The next conference in the series,

called JURSE 2013, is going to be Sao Paulo in 2013. He has been Chairof Technical Committee 7 “Pattern Recognition in Remote Sensing” of theInternational Association for Pattern Recognition (IAPR) from October 2002 toOctober 2004 and Chair of the Data Fusion Committee of the IEEE Geoscienceand Remote Sensing Society from October 2005 to May 2009.Dr. Gamba has been the Guest Editor of special issues of IEEE

TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, IEEE JOURNALOF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTESENSING, ISPRS Journal of Photogrammetry and Remote Sensing, Interna-tional Journal of Information Fusion and Pattern Recognition Letters on thetopic of Urban Remote Sensing, Remote Sensing for Disaster Management,Pattern Recognition in Remote Sensing Applications. He has been invitedto give keynote lectures and tutorials in many international conferences. Hepublished more than 80 papers on international peer-review journals andpresented more than 210 papers in workshops and conferences.