classification of iris data using kernel radial basis probabilistic neural network
TRANSCRIPT
Scientific Review
ISSN: 2412-2599 Vol. 1, No. 4, pp: 74-78, 2015
URL: http://arpgweb.com/?ic=journal&journal=10&info=aims
*Corresponding Author
74
Academic Research Publishing Group
Classification of Iris Data using Kernel Radial Basis Probabilistic
Neural Network
Lim Eng Aik* Institute of Engineering Mathematic Universiti Malaysia Perlis, 02600 Ulu Pauh, Perlis
Mohd. Syafarudy Abu Institute of Engineering Mathematic Universiti Malaysia Perlis, 02600 Ulu Pauh, Perlis
1. Introduction In recent years, data classification and assessment model is a very popular and important research topic. The so-
called classification and assessment means to use the related class information to evaluate the level of accuracy for
classification problem. However, the data classification accuracy is usually the most important key for predicting the
right subject [1]. Therefore, if the data classification accuracy can be improve, then public can use it for
classification in many fields with its error greatly reduced.
In the past literature, some researchers use conventional statistical model for model construction, for example,
the Discriminant Analysis of [2], the Probit of Ohlson [3]. However, in recent years, some researchers found that the
model constructed by data mining technique has better prediction accuracy than that of the conventional statistical
model; some of them are, for example, the Back-Propagation Networks model of Odom and Sharda [4] and the
decision tree classification model of [5]. Another recent approach done by Zarita, et al. [6] using backpropagation
method in classification of river water quality. The approach provided a good result in classification, but, it takes a
long times for training the network. In this paper, Radial Basis Probabilistic Neural Networks (RBPNN) which is
known to have good generalization properties and is trained faster than the backpropagation network Ganchev, et al.
[7] is adopted for the construction of classification prediction model.
Our proposed method, used kernel-function to increase the separability of data by working in a high dimensional
space. Thus, the proposed method is characterized by higher classification accuracy than the original RBPNN. The
kernel-based classification in the feature space not only preserves the inherent structure of groups in the input space,
but also simplifies the associated structure of the data [8]. Since Girolami first developed the kernel k-means
clustering algorithm for unsupervised classification [9], several studies have demonstrated the superiority of kernel
classification algorithms over other approaches to classification [10-14].
In this paper, we evaluate the performance of our proposed method, kernel RBPNN, with a comparison to three
well-known classification methods: the Back-Propagation (BP), Radial Basis Function Network (RBFN), and
RBPNN; meanwhile, comparison and analysis on the classification power is done to these three methods. These
methods performance are compared using Iris data sets.
2. Radial Basis Probabilistic Neural Networks Radial Basis Probabilistic Neural Network (RBPNN) architecture is based on based on Bayesian decision theory
and nonparametric technique to estimate Probability Density Function (PDF). The form of PDF is a Gaussian
distribution. Specht [15] had proposed this function [16]:
Abstract: Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been
successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is
extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The
kernel function is a generalization of the distance metric that measures the distance between two data points as the
data points are mapped into a high dimensional space. During the comparing of the four constructed classification
models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as
proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding
performance in this regard.
Keywords: Kernel function; Radial Basis Probabilistic Neural Network; Iris Data; Classification.
Scientific Review, 2015, 1(4): 74-78
75
2 2
1
2
1 1 1( ) exp
22
kN
k j
k m m
jk
X Xf X
N
(1)
Since RBPNN is applicable to general classification problem, and assume that the eigenvector to be classified
must belong to one of the known classifications, then the absolute probabilistic value of each classification is not
important and only relative value needs to be considered, hence, in equation (1),
2
1 1
2m m
Can be neglected and equation (1) can be simplified as
2
1
2
1( ) exp
2
kN
k j
k
jk
X Xf X
N
(2)
In equation (2), is the smoothing parameter of RBPNN. After network training is completed, the prediction
accuracy can be enhanced through the adjustment of the smoothing parameter , that is, the larger the value, the
smoother the approaching function. If the smoothing parameter is inappropriately selected, it will lead to an
excessive or insufficient neural units in the network design, and over fitting or inappropriate fitting will be the result
in the function approaching attempt; finally, the prediction power will be reduced.
Let 2
kj k jd X X
Be the square of the Euclidean distance of two points Xk and Xj in the sample space, and equation (2) can be re-
written as
2
1
1( ) exp
2
kN
k
jk
kjdf X
N
(3)
In equation (3), when smoothing parameter approaches zero,
1( )
k
k
f XN
If Xk = Xj, then
( ) 0k
f X
At this moment, RBPNN will depend fully on the non-classified sample which is closest to the classified sample
to decide its classification. When smoothing parameter approaches infinity,
( ) 1k
f X
At this moment, RBPNN is close to blind classification. Usually, the researchers need to try different in
certain range to obtain one that can reach the optimum accuracy. Specht [15] had proposed a method that can be
used to adjust smoothing parameter ,that is, assign each input neural unit a single ; during the test stage, that
have the optimum classification result are taken through the fine adjustment of each .
RBPNN is a three-layer feed-forward neural network (as in Figure 1). The first layer is the input layer and the
number of neural unit is the number of independent variable and receives the input data; the second hidden layer in
the middle is Pattern Layer, which stores each training data; the data sent out by Pattern Layer will pass through the
neural unit of the third layer Summation Layer to correspond to each possible category, in this layer, the calculation
of equation (3) will be performed. The fourth layer is Competitive Layer; the competitive transfer function of this
layer will pick up from the output of the last layer the maximum value from these probabilities and generate the
output value. If the output value is 1, it means it is the category you want; but if the output value is 0, it means it is
other unwanted category.
Figure-1. RBPNN architecture
Scientific Review, 2015, 1(4): 74-78
76
3. Kernelized Radial Basis Probabilistic Neural Networks 3.1. Kernel-Based Approach
Given an unlabeled data set 1, ,
nX x x in the d-dimensional space R
d, let :
dR H be a non-linear
mapping function from this input space to a high dimensional feature space H. By applying the non-linear mapping
function , the dot product xi•xj in the input space. The key notion in kernel-based learning is that the mapping
function need not be explicitly specified. The dot product ( () )i j
x x in the high dimensional feature space can
be calculated through the kernel function K(xi, xj) in the input space Rd [17].
( , ) ( ) ( )i j i j
K x x x x (4)
Three commonly used kernel functions (Scholkopf & Smola, 2002) are the polynomial kernel function,
( , ) ( )d
i j i jK x x x x c (5)
where 0, ;c d N the Gaussian kernel function,
2
2( , ) exp
2
i j
i j
x xK x x
(6)
where 0; and the sigmoidal kernel function,
( , ) tanh( ( , ) )i j i j
K x x x x (7)
where 0 and 0.
3.2. Formulation
Given a data point (1 )d
ix R i n and a non-linear mapping : ,
dR H the Probability Density Function
at data point xi is defined as
2
1
2( (1
( ) exp2
) )iN
i j
i
ji
x xf x
N
(8)
where2
( ) ( )i j
x x is the square of distance between ( )i
x and ( ).j
x Thus a higher value of ( )i
f x
indicates that xi has more data points xj near to it in the feature space. The distance in feature space is calculated
through the kernel in the input space as follows:
2
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) 2 ( ) ( ) ( ) ( )
( , ) 2 ( , ) ( , ) (9)
i j i j i j
i i i j j j
i i i j j j
x x x x x x
x x x x x x
K x x K x x K x x
Therefore, Equation (8) can be rewritten as
2
1
( , ) 2 ( , ) ( , )1( ) exp
2
iN
i i i j j j
i
ji
K x x K x x K x xf x
N
(10)
The rest of the computation procedure is similar to that of the RBPNN method.
4. Construction of Classification Prediction Model and Roc Curve Analysis A sequence of 150 rows of data consists of sepal length (in cm), sepal width (in cm), petal length (in cm) and
petal width (in cm) that constitutes to the dataset. The data to include in the database is taken from Machine
Learning Repository [18]. All of data sets of the four parameters have been Z-transformed for classification purposes
and the Gaussian kernel was used in training of Kernelized RBPNN. The Z-transform is performed as a
preprocessing step to minimize the gap of minimum and maximum of data values. Thus, we divided 150 rows of
data into two separate parts, 100 rows for training and 50 rows remaining for testing. So, the ratio of train to test is
approximately 2:1. All training is done in a desktop PC of Intel duo core 2GHz with 1Gb RAM memory. In the Kernel RBPNN, MATLAB is used to self-write the program for the model construction and we fixed
value to the Smoothing Parameter σ to 0.1 according to Sansone and Vento [19]. In the test phase, each classifier has
been tested with data sets mentioned above. Performance of different classifiers in the test phase can be observed in
Table 1. The proposed classifier, which referred as Kernel RBPNN, achieves the best result amongst all other
Scientific Review, 2015, 1(4): 74-78
77
methods. It can be concluded from Table 1 that the performance of Kernel RBPNN is better than RBPNN, RBFN
and BP. In term of time, Kernel RBPNN is about 82 times faster than RBFN and 1931 times faster than GD. The
classification accuracy obtained using Kernel RBPNN is 6% higher than RBPNN. It is also obvious that Kernel
RBPNN consumed less time in comparison to RBPNN, RBFN and BP. RBPNN outperform RBFN in classification
accuracy due to its network architecture that implemented Bayesian decision and nonparametric technique which
make it more applicable to general classification problem, this bring RBFN has the worst performance among all
models. Although the classification accuracy of the well known classification method, BP, is high, but as we know,
high processing time of BP makes it undesirable for many on-line recognition applications.
The true positive rate in Figure 2 means the percentage of the number of class with prediction result of 0 to the
number of class with real value of 0; false positive rate means the percentage of the number of class with prediction
result of 1 to the number of class with real value of 1. In this paper, the mutual verification of the data among the
models is performed. The Kernel RBPNN is verified as the best performed model and there are 50 rows of data in
the test results. The result is drawn as Receiver Operator Characteristic (ROC) curve as in Figure 2. In the figure, the
farther the ROC curve above the reference line, the larger the area under the ROC curve (AUC), which also means
the higher the classification prediction power of this model [20]. Table 2 shows the mutual verification results of the
Iris dataset. Figure 2 shows that the area under the ROC curve of the Kernel RBPNN model of the Iris dataset are
larger than the other three models; from the observation of Table 2, it seems that the specificity of Kernel RBPNN
model is less than other three models, but the AUC value are higher than those of BP, RBPNN and RBFN models;
therefore, it can be concluded that Kernel RBPNN model has very good classification prediction capability.
Table-1. Classification Prediction accuracy of Iris dataset
Method Classification Accuracy (%) CPU Time (s) Epoch
Kernel RBPNN 89.12 0.057 9
RBPNN 82.44 0.068 11
RBFN 45.15 4.688 200
GD 85.66 110.082 30000
Table-2. Mutual Verification with AUC.
Method AUC
Kernelized RBPNN 0.7296
RBPNN 0.6554
RBFN 0.4827
GD 0.6349
Figure-2. The Mutual verified ROC curve of test data of Iris dataset
5. Conclusion In this paper, a new RBPNN which consists of kernel-based approach has been proposed. Not like the
conventional RBPNN that only based on Euclidean distance computation, the Kernel RBPNN applied the kernel-
induced distance computation which can implicitly map the input data to a high dimensional space in which data
classification is easier. The network is applied to river water classification problem and showed an increased in
classification accuracy in comparison with other well-known classifiers. The metrics for considering performance of
ROC Curve
Kernel RBPNN
RBPNN
RBFN
GD
Reference Line
Scientific Review, 2015, 1(4): 74-78
78
a classifier in this work were the classification accuracy and processing time of the classifier. In future work, we plan
to improve the classification rate by employing different kernel function such as wavelet function.
References [1] Yamazaki, K., 2015. "Accuracy analysis of semi-supervised classification when the class balance changes."
Neurocomputing, vol. 160, pp. 132-140.
[2] Altman, E. I., 1968. "Financial ratios, discriminate analysis and the prediction of corporate bankruptcy."
Journal of Finance, vol. 23, pp. 589-609.
[3] Ohlson, J. A., 1980. "Financial ratios and the probabilistic prediction of bankruptcy." Journal of Accounting
Research, vol. 18, pp. 109-131.
[4] Odom, M. D. and Sharda, R., 1990. "A neural network model for bankruptcy prediction." IEEE INNS
IJCNN, vol. 2, pp. 163-168.
[5] Breiman, L., 1996. "Bagging predictors." Machine Learning, vol. 24, pp. 123-140.
[6] Zarita, Z., Sharifuddin, M. Z., Lim, E. A., and Hafizan, J., 2004. "Application of neural network in river
water quality classification." Ecological and Environmental Modeling, vol. 2, pp. 24-31.
[7] Ganchev, T., Tasoulis, D. K., and Vrahatis, M. N., 2003. "Locally Recurrent Probabilistic Neural Network
for Text-Independent Speaker Verification." Proc. of the Euro Speech, vol. 3, pp. 762-766.
[8] Muller, K. R., 2001. "An introduction to kernel-based learning algorithms." IEEE Trans. Neural Networks,
vol. 12, pp. 181-202.
[9] Girolami, M., 2002. "Mercer kernel-based clustering in feature space." IEEE Trans. Neural Networks, vol.
13, pp. 780-784.
[10] Zhang, R. and Rudnicky, A. I., 2002. "A large scale clustering scheme for kernel k-means," In The 16th
International Conference on Pattern Recognition. pp. 289-292.
[11] Wu, Z. D. and Xie, W. X., 2003. "Fuzzy c-means clustering algorithm based on kernel method," In The
Fifth International Conference on Computational Intelligent and Multimedia Applications. pp. 1-6.
[12] Kosic, D., 2015. "Fast clustered radial basis function network as an adaptive predictive controller." Neural
Networks, vol. 63, pp. 79-86.
[13] Ha, Q., Wahid, H., Duc, H., and Azzi, M., 2015. "Enhanced radial basis function neural networks for ozone
level estimation." Neurocomputing, vol. 155, pp. 62-70.
[14] Lee, Y. J., Micchelli, C. A., and Yoon, J., 2015. "A study on multivariate interpolation by increasingly flat
kernel functions." Journal of Mathematical Analysis and Applications, vol. 427, pp. 74-87.
[15] Specht, D. F., 1990. "Probabilistic neural network and the polynomial adaline as complementary techniques
for classification." IEEE Trans. Neural Networks, vol. 1, pp. 111-121.
[16] Yeh, Y. C., 1998. Application of neural network. Taiwan: Scholars Books
[17] Scholkopf, B. and Smola, A. J., 2002. Learning with Kernels. Cambridge: MIT Press
[18] Machine Learning Repository, 2015. https://archive.ics.uci.edu/ml/datasets/Iris [19] Sansone, C. and Vento, M., 2001. "A classification reliability drive reject rule for multi-expert system."
International Journal of Pattern Recognition and Artificial Intelligent, vol. 15, pp. 1-19.
[20] Bradley, A. P., 1977. "The use of the area under ROC curve in the evaluation of machine learning
algorithms." Pattern Recognition, vol. 30, pp. 1145-1159.