comparison of different classification algorithms for underwater target discrimination

6
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 1, JANUARY 2004 189 Comparison of Different Classification Algorithms for Underwater Target Discrimination Donghui Li, Mahmood R. Azimi-Sadjadi, and Marc Robinson Abstract—Classification of underwater targets from the acoustic backscattered signals is considered here. Several different classifi- cation algorithms are tested and benchmarked not only for their performance but also to gain insight to the properties of the fea- ture space. Results on a wideband 80-kHz acoustic backscattered data set collected for six different objects are presented in terms of the receiver operating characteristic (ROC) and robustness of the classifiers wrt reverberation. Index Terms—K-nearest neighbor (K-NN) classifier, neural net- works, probabilistic neural networks (PNNs), support vector ma- chines (SVMs), underwater target classification. I. INTRODUCTION T HE problem of classifying underwater targets from the acoustic backscattered signals involves discrimination be- tween targets and nontarget objects as well as the characteriza- tion of background clutter. Several factors that complicate this process include: nonrepeatability and variation of the target sig- nature with aspect angles, range and grazing angle, competing natural and man-made clutter, highly variable and reverberant operating environment, and lack of any a priori knowledge about the shape and geometry of the nontargets. A number of different classification schemes have been de- veloped in recent years. A good review of the previous methods is provided in [1]. The method in this reference uses a wavelet packet-based classification scheme to discriminate mine-like and nonmine-like objects from the acoustic backscattered signals. The classifier was a back-propagation neural network (BPNN). A separate multiaspect fusion system was also imple- mented to improve the classification accuracy by observing the properties of the returns in several consecutive aspects. The test results on an 80-kHz data set were presented, which showed the promise of the system. With the exclusion of the work by Carpenter and Streilein [2], which used an ARTMAP-based classification system and provided reasonable single-aspect and multiaspect results, all other schemes almost exclusively used a BPNN classifier with different sets of features. This brief paper studies the performance of other classifiers that use different decision rules on this problem. The purpose is to determine an optimum classification scheme for this problem, and also to infer clues about the properties of the data and more specifically the features. These clues could potentially provide Manuscript received May 9, 2001; revised June 5, 2002. This work was supported by the Office of Naval Research, Biosonar Program, under Grant N00014-99-1-0166 and Grant N00014-01-1-0307. Data and technical support were provided by the NSWC, Coastal Systems Station, Panama City, FL. The authors are with the Department of Electrical and Computer Engineer- ing, Colorado State University, Fort Collins, CO 80523 USA (e-mail: azimi@ engr.colostate.edu). Digital Object Identifier 10.1109/TNN.2003.820621 helpful insight into the feature space and distribution of the features, which in turn leads to a better understanding of this complex problem. Several different classification algorithms are considered and benchmarked. Among these are the multivariate Gaussian classifier, the evidential K-nearest neighbor (K-NN) classifier [4], probabilistic neural network (PNN) [5], [6] and support vector machines (SVM) [7]–[9]. The performance of these systems are then compared with that of the BPNN [1] on the wideband (80 kHz) data set provided by Coastal Systems Station (CSS), Panama City, FL. II. BRIEF REVIEW OF DIFFERENT CLASSIFIERS A brief review of different classification methods that are con- sidered in this study is given here. A. Multivariate Gaussian Classifier This simple classifier [3] was used to give a benchmark for comparison with the other more complicated ones. This classi- fier works well when the data is assumed to be clustered in two distinct groups with normal distribution. In other words, the fea- ture vectors are assumed to be simply randomly corrupted ver- sions of two prototype vectors. Clearly, in case of the acoustic return features, this assumption is not a valid one as the change in the scattering and physical properties of the objects as a func- tion of aspect and grazing angles, etc. cannot be modeled by a random process. In a two-class case (i.e. , ), a very simple discriminant function is used, where is the a priori condi- tional probability density function (assumed to be normal) and is the class prior probability. B. Evidential K-Nearest Neighbor (K-NN) Classifier In our two-class classification problem, , where the possible hypotheses are either target or nontarget, an eviden- tial K-NN classifier [4] finds the nearest neighbors (training samples) of an unknown pattern and then prescribes a belief and plausibility measure to each neighbor or item of evidence. To each item of evidence, a basic belief assignment (BBA) is as- signed. A common approach [4] is to find some type of a dis- tance measure from the unknown pattern to each piece of ev- idence and then use where is the BBA for the item of evidence in the neighborhood, is the class label of this item of evidence, is a constant, is a class dependant constant, is an integer con- stant (typically 1 or 2) and is a distance measure (e.g. Euclidean distance) between the evidence belongs to class and the unknown pattern . The remaining belief not 1045-9227/04$20.00 © 2004 IEEE

Upload: m

Post on 26-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Different Classification Algorithms for Underwater Target Discrimination

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 1, JANUARY 2004 189

Comparison of Different Classification Algorithms for UnderwaterTarget Discrimination

Donghui Li, Mahmood R. Azimi-Sadjadi, and Marc Robinson

Abstract—Classification of underwater targets from the acousticbackscattered signals is considered here. Several different classifi-cation algorithms are tested and benchmarked not only for theirperformance but also to gain insight to the properties of the fea-ture space. Results on a wideband 80-kHz acoustic backscattereddata set collected for six different objects are presented in terms ofthe receiver operating characteristic (ROC) and robustness of theclassifiers wrt reverberation.

Index Terms—K-nearest neighbor (K-NN) classifier, neural net-works, probabilistic neural networks (PNNs), support vector ma-chines (SVMs), underwater target classification.

I. INTRODUCTION

THE problem of classifying underwater targets from theacoustic backscattered signals involves discrimination be-

tween targets and nontarget objects as well as the characteriza-tion of background clutter. Several factors that complicate thisprocess include: nonrepeatability and variation of the target sig-nature with aspect angles, range and grazing angle, competingnatural and man-made clutter, highly variable and reverberantoperating environment, and lack of any a priori knowledge aboutthe shape and geometry of the nontargets.

A number of different classification schemes have been de-veloped in recent years. A good review of the previous methodsis provided in [1]. The method in this reference uses a waveletpacket-based classification scheme to discriminate mine-likeand nonmine-like objects from the acoustic backscatteredsignals. The classifier was a back-propagation neural network(BPNN). A separate multiaspect fusion system was also imple-mented to improve the classification accuracy by observing theproperties of the returns in several consecutive aspects. The testresults on an 80-kHz data set were presented, which showedthe promise of the system. With the exclusion of the work byCarpenter and Streilein [2], which used an ARTMAP-basedclassification system and provided reasonable single-aspectand multiaspect results, all other schemes almost exclusivelyused a BPNN classifier with different sets of features.

This brief paper studies the performance of other classifiersthat use different decision rules on this problem. The purpose isto determine an optimum classification scheme for this problem,and also to infer clues about the properties of the data and morespecifically the features. These clues could potentially provide

Manuscript received May 9, 2001; revised June 5, 2002. This work wassupported by the Office of Naval Research, Biosonar Program, under GrantN00014-99-1-0166 and Grant N00014-01-1-0307. Data and technical supportwere provided by the NSWC, Coastal Systems Station, Panama City, FL.

The authors are with the Department of Electrical and Computer Engineer-ing, Colorado State University, Fort Collins, CO 80523 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TNN.2003.820621

helpful insight into the feature space and distribution of thefeatures, which in turn leads to a better understanding of thiscomplex problem. Several different classification algorithms areconsidered and benchmarked. Among these are the multivariateGaussian classifier, the evidential K-nearest neighbor (K-NN)classifier [4], probabilistic neural network (PNN) [5], [6] andsupport vector machines (SVM) [7]–[9]. The performance ofthese systems are then compared with that of the BPNN [1] onthe wideband (80 kHz) data set provided by Coastal SystemsStation (CSS), Panama City, FL.

II. BRIEF REVIEW OF DIFFERENT CLASSIFIERS

A brief review of different classification methods that are con-sidered in this study is given here.

A. Multivariate Gaussian Classifier

This simple classifier [3] was used to give a benchmark forcomparison with the other more complicated ones. This classi-fier works well when the data is assumed to be clustered in twodistinct groups with normal distribution. In other words, the fea-ture vectors are assumed to be simply randomly corrupted ver-sions of two prototype vectors. Clearly, in case of the acousticreturn features, this assumption is not a valid one as the changein the scattering and physical properties of the objects as a func-tion of aspect and grazing angles, etc. cannot be modeled by arandom process. In a two-class case (i.e. , ), a verysimple discriminant function

is used, where is the a priori condi-tional probability density function (assumed to be normal) and

is the class prior probability.

B. Evidential K-Nearest Neighbor (K-NN) Classifier

In our two-class classification problem, , wherethe possible hypotheses are either target or nontarget, an eviden-tial K-NN classifier [4] finds the nearest neighbors (trainingsamples) of an unknown pattern and then prescribes a belief andplausibility measure to each neighbor or item of evidence. Toeach item of evidence, a basic belief assignment (BBA) is as-signed. A common approach [4] is to find some type of a dis-tance measure from the unknown pattern to each piece of ev-idence and then use whereis the BBA for the item of evidence in the neighborhood,

is the class label of this item of evidence, is aconstant, is a class dependant constant, is an integer con-stant (typically 1 or 2) and is a distance measure(e.g. Euclidean distance) between the evidence belongs toclass and the unknown pattern . The remaining belief not

1045-9227/04$20.00 © 2004 IEEE

Page 2: Comparison of Different Classification Algorithms for Underwater Target Discrimination

190 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 1, JANUARY 2004

assigned to class is assigned to the set of classes , of whichis a subset. Thus, . Clearly, the

farther away a piece of evidence is from the unknown sample,the less effect it will have on the belief in a class and it will beassigned completely to the set of classes . For this two-classproblem the belief caused by each evidence is placed in the sameclass, which is known from the training labels. Then, the BBAof all the neighbors are combined using the Dempster’s ruleof combination [4]. Using this “orthogonal sum of the beliefs”rule, the BBAs, and , associated with two independentitems of evidence can be combined to produce the BBA for thecollective evidence, i.e., by

(1)

Then, the class with the highest pignistic probability is declaredas the winner. This probability is defined as

(2)

where represents the cardinality of .

C. PNN

PNN was introduced [5], [6] to implement the Parzennonparametric probability density function (pdf) estimationmethod. To overcome the structural and computational (duringthe testing phase) of this network Streit et al. [12] introduceda modified version of the PNN that substantially reduces thenumber of neurons in the recognition layer by using Gaussianmixture models and the expectation maximization (EM) algo-rithm to estimate the network parameters. This algorithm notonly reduces the computational costs during the training, but italso overcomes the issues with unbalanced training problemsand the optimal kernel selection.

In this algorithm, for any class , the class con-ditional distribution is modeled approximately by a Gaussianmixture as

(3)

where is the number of Gaussian components in classthat is generally decided experimentally, ’s are the weightsof the components (note, that ), )denotes the multivariate Gaussian density function of thecomponent in class with and being its mean vector andcovariance matrix, respectively. The algorithm in [12] achievesthe maximum likelihood (ML) estimates of these parameters viaiterative computation when the observations can be viewed asincomplete data.

If the Gaussian mixture model is a good assumption, the mod-ified PNN classifier will achieve high classification accuracy.Although the number of Gaussian components can always beincreased so that the mixture model can better approximate thedistribution, the computational cost may become unacceptable.

D. SVM

SVM [7]–[9] are powerful learning systems primarily fortwo-class problems. SVM maps the input patterns into a higherdimensional feature space through some nonlinear mappingchosen a priori. A linear decision surface is then constructedin this high dimensional feature space. Thus, SVM is a linearclassifier in the parameter space, but it becomes a nonlinearclassifier as a result of the nonlinear mapping of the space ofthe input patterns into the high dimensional feature space. Theprocess involves solving a quadratic programming problem.SVM has been shown [7] to provide high generalization ability.

For a two-class problem, assuming the optimal hyperplane inthe feature space is generated, the classification decision of anunknown pattern will be made based on

(4)

where , are nonnegative Lagrange mul-tipliers that satisfy ,are class labels of training patterns , and

for represents a symmetric positivedefinite kernel function that defines an inner product in the fea-ture space [9]. This shows that is a linear combination ofthe inner products or kernels.

Note that output corresponds to the distance of theunknown pattern to the hyperplane in the feature space wherethe distance of the support vectors to the hyperplane is presumedto be 1.

The kernel function enables the operations to be carried out inthe input space rather than in the high-dimensional feature space.Some typical examples [7] of kernel functions are

(linear SVM); (polynomial SVMof degree ); (RBF SVM);

(two layer neural SVM) where, , and are constants. However, a proper kernel function for

a certain problem is dependent on the specific data and till nowthere is no good method on how to choose a kernel function. Inthis paper, the choice of the kernel functions was studied em-pirically and optimal results were achieved using second-orderpolynomial kernel function .

The generalization ability of the SVM is controlled by twodifferent factors: the training error rate and the capacity of thelearning machine measured by its VC dimension [9]. The smallerthe VC dimension of the function set of the learning machine, thelarger the value of training error rate. We can control the tradeoffbetween the complexity of decision rule and training error rateby changing a parameter [8] in the SVM.

III. RESULTS AND DISCUSSIONS

To examine the effectiveness of the above-mentioned clas-sifiers, the algorithms were tested and benchmarked on thewideband 80 kHz data set provided by CSS. This data setcontains backscattered signals corresponding to six differentobjects—two mine-like namely a bullet-shape metallic objectand a truncated-cone-shape plastic object; and four nonmine-like, namely a water-filled drum, an irregular shape limestonerock, a smooth granite rock, and a water-saturated wooden

Page 3: Comparison of Different Classification Algorithms for Underwater Target Discrimination

LI et al.: COMPARISON OF DIFFERENT CLASSIFICATION ALGORITHMS 191

Fig. 1. Feature extraction and reduction processes.

log. The transmit signal was a linear frequency modulated(LFM) up-sweep with frequency range from 30 to 110 kHz.Each object was insonified at aspect angles from 0 to 355with 5 separation. This resulted in 72 aspect angles out ofwhich the even-angles were used in the training data set whilethe odd-angle samples were used as the testing samples. As aresult, for each object there were 36 patterns (at different aspectangles) in the training or testing data sets. The training dataset contained the feature vectors of backscattered data withsynthesized reverberation effects with signal-to-reverberation(SRR) that corresponds to nominal operating condi-tions. The procedure for generating synthesized reverberationinvolves convolving the transmit signal with a random sequenceand scaling the resultant signal according to the specified SRR[1]. The synthesized reverberation signal is then added to thebackscattered signal to generate one “noisy realization.” Theprocess is repeated for every aspect angle multiple times inorder to generate a statistically rich data set for determining thegeneralization ability of the classifier. The testing data in ourstudy contained two sets of ten and 50 noisy realizations with

for each aspect angle in the testing data in orderto obtain statistically significant results.

The steps involved in the front-end feature extraction andselection system are shown in Fig. 1. A five-level waveletpacket (WP) decomposition [10] was applied to decomposethe frequency spectra of the acoustic backscattered signals intoseveral subbands. The subband features can provide sufficientdiscriminatory clues that aid target/nontarget discrimination.Only 12 subbands that reside within the frequency range ofthe transmit signal (i.e., 30–110 kHz) were selected. Thetransmit signal was also decomposed using the same WP treestructure. A third-order linear autoregressive (AR) model wasemployed to represent the spectral behavior of the signals. TheAR coefficients were then used as features for classification.This led to a total of 48 features. To select an appropriate set offeatures, a function can be used to provide the discriminatorypower of the individual features. In this study, the Batcharayandiscriminant function [3] was used to evaluate the distancebetween the two classes for each feature and select 22 (out of48) features with high discriminatory power. This measure waschosen based upon the observation of the histograms of thefeatures exhibit a unimodal Gaussian distribution.

A. ROC Analysis and Classification Error of DifferentClassifiers

The classifiers are benchmarked on a data set containing tennoisy realizations with in terms of their re-ceiver operating characteristics (ROC) curves and the error loca-

Fig. 2. Classification results of the multivariate Gaussian.

tion plots versus aspect angle. The ROC curve is the plot of theprobability of correct classification rate versus probabilityof false-alarm . The “knee point” of this ROC is particu-larly important. This point corresponds to a decision thresholdleading to . The error location plot, on the otherhand, gives the corresponding classification error locations andtheir frequencies versus aspect angles for each object. In thisplot, each ring corresponds to a particular object (i.e., objects1–6) and each grid on the ring represents an aspect angle. In thecounter-clockwise direction, the aspect angles start from 5 to355 with 10 separation since there are 36 odd aspect angles forevery object in the test set. The gray levels in each grid, whichcan vary from 0 to 10, represents the frequency of classificationerror at that particular aspect angle.

1) Multivariate Gaussian: As mentioned before, this clas-sifier works well when the data is assumed to be clustered intwo distinct groups with normal distribution. Fig. 2(a) showsthe ROC curve for this classifier. At the knee point we have

and . Fig. 2(b) shows the corre-sponding error location plot. As evident from this plot the mis-classification for this classifier occurs only for the first three ob-jects. The less than adequate results of this classifier for this dataset can be blamed on several factors including: improperness ofGaussian assumption, inseparability of the clusters and the factthat the patterns are not randomly corrupted versions of someprototype vectors.

2) Evidential K-NN: When using the evidential K-NN, sev-eral parameters need to be preselected. We chose ,

and can be found heuristically for each class by esti-mating , where is the mean of the distances betweenall combinations of two training vectors belonging to class . Agradient descent algorithm [4] can also be used to find the op-timum value of over the training set. This method is basedupon minimizing classification error by finding the optimum .These methods were tested for various K ranging from 7 to25 and the results indicated that the gradient-based approachperformed slightly better than the heuristic-based method. This

Page 4: Comparison of Different Classification Algorithms for Underwater Target Discrimination

192 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 1, JANUARY 2004

(a) (b)

Fig. 3. Classification results of the evidential K-NN:K = 7with optimized .

(a) (b)

Fig. 4. Classification results of the original PNN (� = 2).

study also revealed that the best results are obtained for .The performance slightly degrades as K increases. Fig. 3(a), (b)shows the ROC and error location plots for this classifier. At theknee point, we have and . Fromthe corresponding error location plot, it is interesting to see thatthe errors are concentrated only on a few aspects of objects 1, 2,5, and 6 with high frequency of occurrence. This is in contrast tothe multivariate Gaussian classifier where the errors were morewidely spread among aspects of objects 1–3.

3) PNN: First, the original PNN [2] was implemented. Theeffects of the smoothing factor, on the performance of thePNN was studied first. It was empirically determined that theminimum classification error rate (6.16%) was achieved at

. Unlike BPNN, whose output units directly provide thea posteriori class conditional probabilities, the output units ofthe PNN are only proportional to these probabilities. Thus, theoutput units of PNN had to be scaled so that the sum of theoutputs equals 1. In this case, the outputs could be comparedwith those of the BPNN and SVM. Fig. 4(a) shows the ROCcurve of this PNN ( ) with andat the knee point. Fig. 4(b) shows the corresponding error loca-tion plot based upon the decision threshold at the knee point ofthe ROC. Comparing this figure with Fig. 3(b) for evidentialK-NN, it is clear that PNN has a slightly inferior performance.In particular, object 3 (water-filled drum) has a large number ofclassification errors, while the evidential K-NN had none. Notethat although the error location plot in Fig. 3(b) was generatedusing the hard-limiter decision, this comparison is fair since weconsider the optimal performance of each classifier irrespectiveof the choice of the decision threshold.

Since a PNN creates a pool of neurons for each class, thetraining data should be balanced to avoid disparity in approx-imating different density functions. In this application, the

Fig. 5. Performance plots of the EM-PNN.

training was not balanced. To study the effect of the balanceddata on the PNN, one half of the training samples for thenontargets were removed. Thus, the training set was left with72 aspects of targets and 72 aspects of nontargets. For thisbalanced training set, a provided the best results. Theminimum classification error for this choice of was 6.25%,only 0.09% less than the unbalanced data. On the ROC curve,the knee point was located at and .This is an improvement of just above 2% over the unbalancedtraining set. These changes are very slight and from this obser-vation it can be concluded that the unbalanced training set hasvery little impact on these results and that the poor performanceof the PNN must be attributed to some other reasons.

The modified PNN using Gaussian mixture models and EMtraining was examined next. The purpose of the study was todetermine if the relatively poor performance of the originalPNN can indeed be attributed to the optimum kernel selectionand/or the unbalanced training data problems. During thetraining phase, EM-PNN algorithm automatically selects thenumber of Gaussian mixtures to model the distribution of thetraining data for each class. For this particular application only2 mixtures were selected for each class. Again the outputs ofthe EM-PNN were scaled and normalized so that they could becompared with the other classifiers. Fig. 5(a) shows the ROCcurve of this EM-PNN. At the knee point of this curve, we have

and , which provides substantiallyworse performance comparing to the original PNN. Fig. 5(b)shows the corresponding error location plot based upon ahard-limiting threshold. As can be seen from Fig. 5(b), a largeamount of errors are made for the targets. Since the EM-PNNdoes not require even training sets, one can conclude that this isnot the reason for poor performance on the targets in the testing.To stay consistent, however, an EM-PNN was trained on thebalanced data set of 72 aspects of both targets and nontargets.As expected, the results were very similar. The unbalanced dataset had a hard-limiter threshold correct classification rate of88.5%, while the performance on the balanced data improvedby half a percentage point. From these results one can concludethat the poor performance of the PNN-type networks is notcaused by the unbalanced data set or the choice of kernel.

4) SVM: The choice of parameter [4] in SVM is very crit-ical in order to have a properly trained SVM. Our studies on thisproblem indicated that when the value of is in the range of

to 0.001, the classification error rate of SVM remains rela-tively constant. When the value of further decreases, the per-formance of SVM degrades rapidly. Moreover, when ,

Page 5: Comparison of Different Classification Algorithms for Underwater Target Discrimination

LI et al.: COMPARISON OF DIFFERENT CLASSIFICATION ALGORITHMS 193

(a) (b)

Fig. 6. Classification results of the SVM (C = 0:001).

(a) (b)

Fig. 7. Performance plots of the BPNN.

the number of support vectors in SVM training was found to bebetween 25 and 27. For , the number of support vec-tors increased to 39 and the performance of SVM was slightlyimproved. Also, the training error rate for was 0, i.e.,SVM was perfectly trained. For larger values of , the numberof support vectors increased but the performance of SVM dete-riorated drastically. Thus, the SVM for was chosenfor the rest of this study.

Fig. 6(a) shows the ROC curve of the SVM for .At the knee point of this curve, we have and

. As mentioned in Section III-C, the output of theoriginal SVM relates to the distance, , of the test pattern to thedecision boundary where the distance of the support vectors tothe decision boundary is assumed to be 1. The transformation

, where is the new output of the SVM, wasused in order to generate output values that can becompared with those of the other classifiers, i.e. PNN and BPNNand to generate the ROC. Fig. 6(b) is the error location plot for thedecision threshold corresponding to the ROC knee point. Theseresults indicate excellent performance of SVM on this data set.Most of the classification errors for this classifier occur for ob-jects 2 (truncated-cone-shape target) and 4 (limestone rock).

5) BPNN: As a benchmark, a two-layer BPNN with struc-ture 22-42-2 was also used in this study. This network wastrained using the standard BPNN training algorithm usingan adaptive learning rate and momentum factor of . Tendifferent weight initialization trials were implemented and thenetwork with the best performance on the validation data setwas chosen. Fig. 7(a) shows the ROC curve of the BPNN for tennoisy realizations. At the knee point, we have and

. Fig. 7(b) is the corresponding error location plotat the ROC knee point. As can be observed for this classifiermost of the errors occurred for the mine-like object 1 andnonmine-like objects 3 and 6. This is an interesting observation

TABLE ICOMPARISON OF ROBUSTNESS OF DIFFERENT CLASSIFICATION ALGORITHMS

FOR 50 NOISY REALIZATIONS (%)

since the errors of all these classifiers occur for different objectsand at different aspect angles. This suggests the possibility thatfurther improvements could potentially be gained by fusing theresults of these classifiers.

B. Robustness Analysis of Different Classifiers

To study the robustness and generalization of different clas-sifiers to reverberation, we have studied the error rate statisticsfor a larger number of trials. For a given set of input pattern vec-tors, the number of misclassifications can be viewed as a randomvariable resulting in a randomly varying empirical error rate. In[15], irrespective of the pattern source and the type of the classi-fier, it is shown that this random variable follows a binomial dis-tribution. For a very large number of patterns, this distributionapproaches normal distribution. To estimate the error rate sta-tistics of the classifiers, fifty different Monte Carlo trials wereperformed. For each trial, a different testing set with differentreverberation realizations was used. As before, each set con-tained 216 backscattered signals corresponding to odd aspectangles and different synthesized reverberation sequences with

. The histograms of the classification errors forthe multivariate Gaussian, evidential K-NN, PNN, EM-PNN,SVM, and BPNN classifiers were then formed for 50 error mea-surements. These were obtained under the assumption that falsepositive and negative errors have the same weighting, i.e., thefalse alarm and misclassification errors are summed up to a totalclassification error. The classification decision was made basedon hard-limit threshold operation. The threshold associated withthe “knee” point of the ROC curve was not used since each clas-sifier has a different knee point and hence different threshold.Table I summarizes the classification error statistics, i.e., themean, , and the standard deviation ), extracted from the his-togram associated with each classifier. The results in this tablereinforce the fact that SVM gave the best overall results on thiswideband data set. Although, PNN gave inferior results, com-paring to all other classifiers except the multivariate Gaussian,the online updating feature of this classifier makes it attractivefor in situ underwater target classification applications.

C. Discussion and Analysis

Each of these classification schemes and their respective re-sults give insights into the 22-dimensional feature space of thiscomplex problem. Even though the features are fairly unimodalindividually, the distribution of these features in the 22-dimen-sional space will not necessarily follow a multivariate Gaussian

Page 6: Comparison of Different Classification Algorithms for Underwater Target Discrimination

194 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 1, JANUARY 2004

(a)

(b)

Fig. 8. Scatter plots of the first two features.

distribution for each class. Based upon the poor results of themultivariate Guassian and the PNN classifiers, namely those ofthe EM-PNN, it can be argued that the distribution in each classcannot be represented by Gaussian mixtures. The fact that theoriginal PNN provided somewhat better results is an indicationthat the data is not clustered, rather it is scattered in the featurespace. To promote this belief, the scatter plots of the first two fea-tures with high discriminatory ability are presented in Fig. 8(a)and (b) for the training and testing data sets, respectively. It mustbe mentioned that there were several widely scattered pointsthat could not have been included in these plots. Although, theseplots present only the distributions of the first two features, theyclearly show the scattered nature of this data set.

The evidential K-NN relies on the closeness of the appliedpattern to the training samples in order to provide good perfor-mance. Although in certain regions of the feature space most ofthe nearby training samples to an unknown pattern are of the rep-resentative class, in the others proximity measure does not nec-essarily dictate to a stronger evidence of class membership. Thisimplies that the features are mixed together in the feature space,and that the closest neighbors may not always give the best cluespertaining to class membership. As the size of the neighbor-hood (i.e., K) increased, performance decreased, inferring thatthe mixing of features is throughout the space. The fact that thetwo-layer BPNN performs better suggests that the features arefairly separable with a multidimensional convex-type decisionregions. The SVM maps the features to a higher dimensionalspace and then uses an optimal hyperplane in the mapped space.This implies that though the original features carry adequate in-formation for good classification, mapping to a higher dimen-sional feature space could potentially provide better discrimi-natory clues that are not present in the original feature space.Thus, each of these classifiers provides hints about the proper-ties of the data in the 22-dimensional space and from these hints

it may be possible to devise a classification paradigm that willgive perfect classification results on this data.

IV. CONCLUSION

In this brief paper, four different classification algorithms,namely multivariate Gaussian, evidential K-NN, PNN, andSVM are examined on a wide-band 80-kHz acoustic backscat-tered data set. The performance of these classifiers was thencompared together and with those of the BPNN. The robustnessand statistical confidence of these results were studied on alarge number of trials. This study indicated that the widebandinsonification provides much better robustness to reverberationwhen compared to the results on the 40-kHz data in [1]. Addi-tionally, the results of these classifiers point to this interestinghypothesis that the data in this 22-dimensional feature spacehas very complex scattered as well as clustered natures. This,to some extent, explains the great performance of the SVMclassifier, which maps the features to a higher dimensionalspace. Clearly, the choice of the kernel function has an impor-tant impact on the overall performance. The performance of thePNN and multivariate Gaussian were not as good as the otherclassifiers. This may be attributed to several factors includingthe scattered and mixed nature of the features, invalidity of theGaussian assumption, and the unbalanced number of trainingsamples for the target and nontarget classes (for the originalPNN). Certainly, the behavior of each classifier gave valuableinsights to the properties of the feature space.

REFERENCES

[1] M. R. Azimi-Sadjadi, D. Yao, Q. Huang, and G. J. Dobeck, “Underwatertarget classification using wavelet packets and neural networks,” IEEETrans. Neural Networks, vol. 11, pp. 784–794, May 2000.

[2] G. A. Carpenter and W. W. Streilein, “ARTMAP-FTR: a neural networkfor fusion target recognition with application to sonar classification,” inProc. SPIE Int. Symp. Aerospace/Defense Sensing Control, vol. 3392,Orlando, FL, Apr. 1998, pp. 342–356.

[3] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. NewYork: Wiley, 2001.

[4] T. Denoeux, “A k-nearest neighbor classification rule based ondempster-shafer theory,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp.804–813, May 1995.

[5] D. F. Specht, “Probabilistic neural networks,” Neural Networks, vol. 3,no. 1, pp. 109–118, 1990.

[6] D. F. Specht and P. D. Shapiro, “Generalization accuracy of probabilisticneural networks compared with backpropogation networks,” in Proc.IJCNN’91, July 1991, pp. 458–461.

[7] S. Gunn, “Support vector machines for classification and regression,”Univ. Southampton, Southampton, U.K., ISIS Tech. Rep., May 1998.

[8] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn., vol.20, no. 3, pp. 273–297, 1995.

[9] V. Vapnik, The Nature of Statistical Learning Theory. New York:Springer-Verlag, 1995.

[10] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. UpperSaddle River, NJ: Prentice-Hall, Apr. 1995.

[11] L. Rabiner et al., Fundamentals of Speech Recognition. Upper SaddleRiver, NJ: Prentice-Hall, 1993.

[12] R. L. Streit and T. E. Luginbuhl, “Maximum likelihood training of prob-abilistic neural networks,” IEEE Trans. Neural Networks, vol. 5, pp.764–783, Sept. 1994.

[13] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the EM algorithm,” J. Royal Stat. Soc., ser. B,vol. 39, pp. 1–38, 1977.

[14] C. J. C. Burges, “A turotial on support vector machines for pattern recog-nition,” Data Mining Knowledge Discovery, vol. 2, no. 2, pp. 121–167,1998.

[15] J. Schurmann, “Pattern classification,” in A Unified View of Statisticaland Neural Approaches. New York: Wiley, 1996.