classification of iris data using kernel radial basis probabilistic neural network

Scientific Review

ISSN: 2412-2599 Vol. 1, No. 4, pp: 74-78, 2015

URL: http://arpgweb.com/?ic=journal&journal=10&info=aims

*Corresponding Author

74

Academic Research Publishing Group

Classification of Iris Data using Kernel Radial Basis Probabilistic

Neural Network

Lim Eng Aik* Institute of Engineering Mathematic Universiti Malaysia Perlis, 02600 Ulu Pauh, Perlis

Mohd. Syafarudy Abu Institute of Engineering Mathematic Universiti Malaysia Perlis, 02600 Ulu Pauh, Perlis

1. Introduction In recent years, data classification and assessment model is a very popular and important research topic. The so-

called classification and assessment means to use the related class information to evaluate the level of accuracy for

classification problem. However, the data classification accuracy is usually the most important key for predicting the

right subject [1]. Therefore, if the data classification accuracy can be improve, then public can use it for

classification in many fields with its error greatly reduced.

In the past literature, some researchers use conventional statistical model for model construction, for example,

the Discriminant Analysis of [2], the Probit of Ohlson [3]. However, in recent years, some researchers found that the

model constructed by data mining technique has better prediction accuracy than that of the conventional statistical

model; some of them are, for example, the Back-Propagation Networks model of Odom and Sharda [4] and the

decision tree classification model of [5]. Another recent approach done by Zarita, et al. [6] using backpropagation

method in classification of river water quality. The approach provided a good result in classification, but, it takes a

long times for training the network. In this paper, Radial Basis Probabilistic Neural Networks (RBPNN) which is

known to have good generalization properties and is trained faster than the backpropagation network Ganchev, et al.

[7] is adopted for the construction of classification prediction model.

Our proposed method, used kernel-function to increase the separability of data by working in a high dimensional

space. Thus, the proposed method is characterized by higher classification accuracy than the original RBPNN. The

kernel-based classification in the feature space not only preserves the inherent structure of groups in the input space,

but also simplifies the associated structure of the data [8]. Since Girolami first developed the kernel k-means

clustering algorithm for unsupervised classification [9], several studies have demonstrated the superiority of kernel

classification algorithms over other approaches to classification [10-14].

In this paper, we evaluate the performance of our proposed method, kernel RBPNN, with a comparison to three

well-known classification methods: the Back-Propagation (BP), Radial Basis Function Network (RBFN), and

RBPNN; meanwhile, comparison and analysis on the classification power is done to these three methods. These

methods performance are compared using Iris data sets.

2. Radial Basis Probabilistic Neural Networks Radial Basis Probabilistic Neural Network (RBPNN) architecture is based on based on Bayesian decision theory

and nonparametric technique to estimate Probability Density Function (PDF). The form of PDF is a Gaussian

distribution. Specht [15] had proposed this function [16]:

Abstract: Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been

successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is

extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The

kernel function is a generalization of the distance metric that measures the distance between two data points as the

data points are mapped into a high dimensional space. During the comparing of the four constructed classification

models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as

proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding

performance in this regard.

Keywords: Kernel function; Radial Basis Probabilistic Neural Network; Iris Data; Classification.

http://arpgweb.com/?ic=journal&journal=10&info=aims

http://arpgweb.com/index.php

Scientific Review, 2015, 1(4): 74-78

75

2 2

1

2

1 1 1( ) exp

22

kN

k j

k m m

jk

X Xf X

N

(1)

Since RBPNN is applicable to general classification problem, and assume that the eigenvector to be classified

must belong to one of the known classifications, then the absolute probabilistic value of each classification is not

important and only relative value needs to be considered, hence, in equation (1),

2

1 1

2m m

Can be neglected and equation (1) can be simplified as

2

1

2

1( ) exp

2

kN

k j

k

jk

X Xf X

N

(2)

In equation (2), is the smoothing parameter of RBPNN. After network training is completed, the prediction

accuracy can be enhanced through the adjustment of the smoothing parameter , that is, the larger the value, the

smoother the approaching function. If the smoothing parameter is inappropriately selected, it will lead to an

excessive or insufficient neural units in the network design, and over fitting or inappropriate fitting will be the result

in the function approaching attempt; finally, the prediction power will be reduced.

Let 2

kj k jd X X

Be the square of the Euclidean distance of two points Xk and Xj in the sample space, and equation (2) can be re-

written as

2

1

1( ) exp

2

kN

k

jk

kjdf X

N

(3)

In equation (3), when smoothing parameter approaches zero,

1( )

k

k

f XN

If Xk = Xj, then

( ) 0k

f X

At this moment, RBPNN will depend fully on the non-classified sample which is closest to the classified sample

to decide its classification. When smoothing parameter approaches infinity,

( ) 1k

f X

At this moment, RBPNN is close to blind classification. Usually, the researchers need to try different in

certain range to obtain one that can reach the optimum accuracy. Specht [15] had proposed a method that can be

used to adjust smoothing parameter ,that is, assign each input neural unit a single ; during the test stage, that

have the optimum classification result are taken through the fine adjustment of each .

RBPNN is a three-layer feed-forward neural network (as in Figure 1). The first layer is the input layer and the

number of neural unit is the number of independent variable and receives the input data; the second hidden layer in

the middle is Pattern Layer, which stores each training data; the data sent out by Pattern Layer will pass through the

neural unit of the third layer Summation Layer to correspond to each possible category, in this layer, the calculation

of equation (3) will be performed. The fourth layer is Competitive Layer; the competitive transfer function of this

layer will pick up from the output of the last layer the maximum value from these probabilities and generate the

output value. If the output value is 1, it means it is the category you want; but if the output value is 0, it means it is

other unwanted category.

Figure-1. RBPNN architecture

http://arpgweb.com/?ic=journal&journal=10&info=archive


76

3. Kernelized Radial Basis Probabilistic Neural Networks 3.1. Kernel-Based Approach

Given an unlabeled data set 1, ,

nX x x in the d-dimensional space R

d, let :

dR H be a non-linear

mapping function from this input space to a high dimensional feature space H. By applying the non-linear mapping

function , the dot product xi•xj in the input space. The key notion in kernel-based learning is that the mapping

function need not be explicitly specified. The dot product ( () )i j

x x in the high dimensional feature space can

be calculated through the kernel function K(xi, xj) in the input space Rd [17].

( , ) ( ) ( )i j i j

K x x x x (4)

Three commonly used kernel functions (Scholkopf & Smola, 2002) are the polynomial kernel function,

( , ) ( )d

i j i jK x x x x c (5)

where 0, ;c d N the Gaussian kernel function,

2

2( , ) exp

2

i j

i j

x xK x x

(6)

where 0; and the sigmoidal kernel function,

( , ) tanh( ( , ) )i j i j

K x x x x (7)

where 0 and 0.

3.2. Formulation

Given a data point (1 )d

ix R i n and a non-linear mapping : ,

dR H the Probability Density Function

at data point xi is defined as

2

1

2( (1

( ) exp2

) )iN

i j

i

ji

x xf x

N

(8)

where2

( ) ( )i j

x x is the square of distance between ( )i

x and ( ).j

x Thus a higher value of ( )i

f x

indicates that xi has more data points xj near to it in the feature space. The distance in feature space is calculated

through the kernel in the input space as follows:

2

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) 2 ( ) ( ) ( ) ( )

( , ) 2 ( , ) ( , ) (9)

i j i j i j

i i i j j j

i i i j j j

x x x x x x

x x x x x x

K x x K x x K x x

Therefore, Equation (8) can be rewritten as

2

1

( , ) 2 ( , ) ( , )1( ) exp

2

iN

i i i j j j

i

ji

K x x K x x K x xf x

N

(10)

The rest of the computation procedure is similar to that of the RBPNN method.

4. Construction of Classification Prediction Model and Roc Curve Analysis A sequence of 150 rows of data consists of sepal length (in cm), sepal width (in cm), petal length (in cm) and

petal width (in cm) that constitutes to the dataset. The data to include in the database is taken from Machine

Learning Repository [18]. All of data sets of the four parameters have been Z-transformed for classification purposes

and the Gaussian kernel was used in training of Kernelized RBPNN. The Z-transform is performed as a

preprocessing step to minimize the gap of minimum and maximum of data values. Thus, we divided 150 rows of

data into two separate parts, 100 rows for training and 50 rows remaining for testing. So, the ratio of train to test is

approximately 2:1. All training is done in a desktop PC of Intel duo core 2GHz with 1Gb RAM memory. In the Kernel RBPNN, MATLAB is used to self-write the program for the model construction and we fixed

value to the Smoothing Parameter σ to 0.1 according to Sansone and Vento [19]. In the test phase, each classifier has

been tested with data sets mentioned above. Performance of different classifiers in the test phase can be observed in

Table 1. The proposed classifier, which referred as Kernel RBPNN, achieves the best result amongst all other



77

methods. It can be concluded from Table 1 that the performance of Kernel RBPNN is better than RBPNN, RBFN

and BP. In term of time, Kernel RBPNN is about 82 times faster than RBFN and 1931 times faster than GD. The

classification accuracy obtained using Kernel RBPNN is 6% higher than RBPNN. It is also obvious that Kernel

RBPNN consumed less time in comparison to RBPNN, RBFN and BP. RBPNN outperform RBFN in classification

accuracy due to its network architecture that implemented Bayesian decision and nonparametric technique which

make it more applicable to general classification problem, this bring RBFN has the worst performance among all

models. Although the classification accuracy of the well known classification method, BP, is high, but as we know,

high processing time of BP makes it undesirable for many on-line recognition applications.

The true positive rate in Figure 2 means the percentage of the number of class with prediction result of 0 to the

number of class with real value of 0; false positive rate means the percentage of the number of class with prediction

result of 1 to the number of class with real value of 1. In this paper, the mutual verification of the data among the

models is performed. The Kernel RBPNN is verified as the best performed model and there are 50 rows of data in

the test results. The result is drawn as Receiver Operator Characteristic (ROC) curve as in Figure 2. In the figure, the

farther the ROC curve above the reference line, the larger the area under the ROC curve (AUC), which also means

the higher the classification prediction power of this model [20]. Table 2 shows the mutual verification results of the

Iris dataset. Figure 2 shows that the area under the ROC curve of the Kernel RBPNN model of the Iris dataset are

larger than the other three models; from the observation of Table 2, it seems that the specificity of Kernel RBPNN

model is less than other three models, but the AUC value are higher than those of BP, RBPNN and RBFN models;

therefore, it can be concluded that Kernel RBPNN model has very good classification prediction capability.

Table-1. Classification Prediction accuracy of Iris dataset

Method Classification Accuracy (%) CPU Time (s) Epoch

Kernel RBPNN 89.12 0.057 9

RBPNN 82.44 0.068 11

RBFN 45.15 4.688 200

GD 85.66 110.082 30000

Table-2. Mutual Verification with AUC.

Method AUC

Kernelized RBPNN 0.7296

RBPNN 0.6554

RBFN 0.4827

GD 0.6349

Figure-2. The Mutual verified ROC curve of test data of Iris dataset

5. Conclusion In this paper, a new RBPNN which consists of kernel-based approach has been proposed. Not like the

conventional RBPNN that only based on Euclidean distance computation, the Kernel RBPNN applied the kernel-

induced distance computation which can implicitly map the input data to a high dimensional space in which data

classification is easier. The network is applied to river water classification problem and showed an increased in

classification accuracy in comparison with other well-known classifiers. The metrics for considering performance of

ROC Curve

Kernel RBPNN

RBPNN

RBFN

GD

Reference Line



78

a classifier in this work were the classification accuracy and processing time of the classifier. In future work, we plan

to improve the classification rate by employing different kernel function such as wavelet function.

References [1] Yamazaki, K., 2015. "Accuracy analysis of semi-supervised classification when the class balance changes."

Neurocomputing, vol. 160, pp. 132-140.

[2] Altman, E. I., 1968. "Financial ratios, discriminate analysis and the prediction of corporate bankruptcy."

Journal of Finance, vol. 23, pp. 589-609.

[3] Ohlson, J. A., 1980. "Financial ratios and the probabilistic prediction of bankruptcy." Journal of Accounting

Research, vol. 18, pp. 109-131.

[4] Odom, M. D. and Sharda, R., 1990. "A neural network model for bankruptcy prediction." IEEE INNS

IJCNN, vol. 2, pp. 163-168.

[5] Breiman, L., 1996. "Bagging predictors." Machine Learning, vol. 24, pp. 123-140.

[6] Zarita, Z., Sharifuddin, M. Z., Lim, E. A., and Hafizan, J., 2004. "Application of neural network in river

water quality classification." Ecological and Environmental Modeling, vol. 2, pp. 24-31.

[7] Ganchev, T., Tasoulis, D. K., and Vrahatis, M. N., 2003. "Locally Recurrent Probabilistic Neural Network

for Text-Independent Speaker Verification." Proc. of the Euro Speech, vol. 3, pp. 762-766.

[8] Muller, K. R., 2001. "An introduction to kernel-based learning algorithms." IEEE Trans. Neural Networks,

vol. 12, pp. 181-202.

[9] Girolami, M., 2002. "Mercer kernel-based clustering in feature space." IEEE Trans. Neural Networks, vol.

13, pp. 780-784.

[10] Zhang, R. and Rudnicky, A. I., 2002. "A large scale clustering scheme for kernel k-means," In The 16th

International Conference on Pattern Recognition. pp. 289-292.

[11] Wu, Z. D. and Xie, W. X., 2003. "Fuzzy c-means clustering algorithm based on kernel method," In The

Fifth International Conference on Computational Intelligent and Multimedia Applications. pp. 1-6.

[12] Kosic, D., 2015. "Fast clustered radial basis function network as an adaptive predictive controller." Neural

Networks, vol. 63, pp. 79-86.

[13] Ha, Q., Wahid, H., Duc, H., and Azzi, M., 2015. "Enhanced radial basis function neural networks for ozone

level estimation." Neurocomputing, vol. 155, pp. 62-70.

[14] Lee, Y. J., Micchelli, C. A., and Yoon, J., 2015. "A study on multivariate interpolation by increasingly flat

kernel functions." Journal of Mathematical Analysis and Applications, vol. 427, pp. 74-87.

[15] Specht, D. F., 1990. "Probabilistic neural network and the polynomial adaline as complementary techniques

for classification." IEEE Trans. Neural Networks, vol. 1, pp. 111-121.

[16] Yeh, Y. C., 1998. Application of neural network. Taiwan: Scholars Books

[17] Scholkopf, B. and Smola, A. J., 2002. Learning with Kernels. Cambridge: MIT Press

[18] Machine Learning Repository, 2015. https://archive.ics.uci.edu/ml/datasets/Iris [19] Sansone, C. and Vento, M., 2001. "A classification reliability drive reject rule for multi-expert system."

International Journal of Pattern Recognition and Artificial Intelligent, vol. 15, pp. 1-19.

[20] Bradley, A. P., 1977. "The use of the area under ROC curve in the evaluation of machine learning

algorithms." Pattern Recognition, vol. 30, pp. 1145-1159.