gujarati character recognition using adaptive neuro fuzzy classifier with fuzzy hedges
TRANSCRIPT
ORIGINAL ARTICLE
Gujarati character recognition using adaptive neuro fuzzyclassifier with fuzzy hedges
Jayashree Rajesh Prasad • Uday Kulkarni
Received: 2 August 2013 / Accepted: 17 April 2014
� Springer-Verlag Berlin Heidelberg 2014
Abstract Recognition of Indian scripts is a challenging
problem and work towards development of an OCR for
handwritten Gujarati, an Indian script is still in infancy. This
paper implements an Adaptive Neuro Fuzzy Classifier
(ANFC) for Gujarati character recognition using fuzzy hedges
(FHs). FHs are trained with other network parameters by
scaled conjugate gradient training algorithm. The tuned fuzzy
hedge values of fuzzy sets improve the flexibility of fuzzy sets;
this property of FH improves the distinguishability rates of
overlapped classes. This work is further extended for feature
selection based on FHs. The values of fuzzy hedges can be
used to show the importance of degree of fuzzy sets.
According to the FH value, the redundant, noisily features can
be eliminated, and significant features can be selected. An FH-
based feature selection algorithm is implemented using
ANFC. This paper aims to demonstrate recognition of ANFC-
FH and improved results of the same with feature selection.
Keywords Concentration � Dilution � Feature selection
(FS) � Fuzzy Hedges (FHs) � Fuzzy surface transformers
1 Introduction
India is a land of many languages and Gujarati is an Indic
script similar in appearance to other Indo-Aryan scripts.
Gujarati script has a rich literary heritage. However,
research in the field of Gujarati script recognition faces
major problems mainly due to a large set of visually similar
characters, multi-component characters, touching and bro-
ken characters.
This paper presents a pattern recognition system for
Gujarati character recognition. Authors use combination of
four features. Novel Gabor phase XNOR pattern (GPXNP)
and pattern descriptor are proposed for isolated handwrit-
ten character set of Gujarati. In addition to these two fea-
tures, authors use Contour Direction Probability
Distribution Function (CDPDF) and autocorrelation fea-
tures. Furthermore, authors present the design and devel-
opment of an ANFC for recognition of isolated handwritten
characters of Gujarati.
Authors exploit the method of employing adaptive net-
works to solve a fuzzy classification problem. System
parameters, such as the membership functions (MFs)
defined for each feature and the parameterized t-norms
used to combine conjunctive conditions are calibrated with
backpropagation.
This paper is organized as follows: Sect. 2 surveys
related work. Motivation behind this research is presented
in Sect. 3. Section 4 justifies significance of present work
on Gujarati script and focuses on challenges and opportu-
nities in the research of Gujarati handwriting recognition.
Section 5 presents core system architecture of the intended
Gujarati character recognition system that encompasses
brief information about the data set, preprocessing, nor-
malization and feature extraction methods. Section 6
describes ANFC layers. Section 7 elaborates feature
selection and classification mechanism of the said ANFC,
followed by results in Sect. 8. Authors conclude paper in
Sect. 9 and comment on scope for future research in Sect.
10.
J. R. Prasad (&)
Department of Computer Engineering, Vishwakarma Institute of
Information Technology, Pune, India
e-mail: [email protected]
U. Kulkarni
Department of Computer Engineering, SGGS Institute of
Engineering and Technology, Nanded, India
e-mail: [email protected]
123
Int. J. Mach. Learn. & Cyber.
DOI 10.1007/s13042-014-0259-8
2 Related work
A detailed discussion on applications of fuzzy systems
based on fuzzy rules for classification is available [1–7].
These systems employ linguistic rules, which are provided
by experts or the rules, and are extracted from a given
training data set by a variety of methods like clustering [1].
The fuzzy systems can be constituted with neural net-
works, and such systems are called as neuro-fuzzy systems
[8–12]. The neuro-fuzzy systems define the class distribu-
tions and show the input–output relations [8, 13–15],
whereas the fuzzy systems employ natural language for
developing fuzzy rules.
Neural networks are employed for tuning or training
the system parameters in neuro-fuzzy applications. In the
early 1970s, a class of powering modifiers was introduced
[7], which defined the concept of linguistic variables and
hedges. The concept of computing with words was
introduced as an extension of fuzzy sets and logic theory
[16, 17]. The FHs change the meaning of primary terms
value.
Many researchers have contributed to the computing
with words and to the FH concepts with their theoretical
studies [18–21]. However, these contributions have been
rarely used to solve real world problems and applications.
The FH concept is employed to propose a particle swarm
optimization aided neuro-fuzzy classifier [13].
An FH-based fuzzification layer is added to the net-
work and the FHs are used to modify the piecewise
membership sub-functions [13]. Furthermore, a fuzzy-
genetic inference system based on Mamdani model by
applying FHs and power parameters to membership
functions is proposed [22]. In their study, FHs and power
parameters are defined for membership functions sepa-
rately. The concept of extended hedge algebras and their
application in approximate reasoning are discussed [23].
Modifying the existing FH models, a horizon-shifting
model of FHs is proposed [24], by which the membership
function (MF) can be shifted and its steepness modified.
A current mode circuit is designed using FH for adaptive
fuzzy logic controllers [25], while a study on the
approximate reasoning is presented [26] using the FHs.
Hedge operations are used to better qualify and emphasize
the crisp variables to mix crisp and fuzzy logic in
applications [27]. Several interesting properties of FHs are
investigated [28] such as, being compatible with simple
symbolic rules, avoiding computations and being com-
patible with the fuzzy logic, enhancing the comparison of
various available fuzzy implications and managing grad-
ual rules in the context of deductive rules. Finally a
design of an adaptive fuzzy logic controller is proposed
based on FH concepts [29].
3 Motivation
Motivation behind this research can be stated through
following objectives:
1. To improve classification efficiency for large vocab-
ulary pattern recognition problems like handwritten
character recognition of Gujrati.
2. To handle overlapped classes with reliability.
3. To improve the meaning of classical fuzzy rules.
In literature, a lot of fuzzy classifiers are proposed to
solve classification problem [3–6, 8, 11, 13–15, 30–32].
However, some of the problems have not been solved. One
of them is to discriminate overlapped classes with high
reliability. Overlapping classes can significantly decrease
the classifier performance. Generally, to overcome this
problem, the input space is projected into a new space [1].
In these cases, the meaning of the original features could be
lost. Nozaki et al. [5] used the rule weights to solve this
problem. Instead of these solutions, some features of input
space that cause overlapping can be weakened in classifi-
cation. Investigation of fuzzy classification rules shows that
some fuzzy sets affect the classification success due to
overlapping. Furthermore, changing the width of the MFs
in fuzzy-based classification, is impossible due to other
proximate MFs. In order to meet first objective authors use
the FH concept.
To meet the second objective, a mechanism to improve
the meaning of the classical fuzzy rules is suggested.
Therefore, this study implements an ANFC using FHs. To
accomplish this implementation, a layer is added to the
neuro-fuzzy classifier to indicate the effect of FHs.
The empirical studies show that FHs can improve or
keep the classification success of the neuro-fuzzy classifier.
Especially, the implemented neuro-fuzzy classifier simpli-
fies the distinguishability of overlapping classes. Discrim-
inative features should be used for classification, instead of
using all the features, as some of them can cause over-
lapping among the classes. Thus feature selection mecha-
nism presented in this paper helps to meet the third
objective.
4 Significance of present work for Gujarati
Gujarati is also the name of the script used to write the
Gujarati language spoken by about 50 million people in
western India [33]. Gujarati has 12 vowels and 34 conso-
nants, as shown in Fig. 1. Gujarati belongs to the genre of
languages that use variants of the Devanagri script [34, 35].
No significant work is found in the literature that addresses
recognition of Gujarati language [36, 37]. Some of the
Int. J. Mach. Learn. & Cyber.
123
Gujarati characters are very similar in appearance. With
sufficient noise these characters can easily be misclassified.
Often, these characters are misclassified even by humans
who then need to use context knowledge to correct the
error. Authors find some issues which characterize the
Gujarati script. Although the issues are discussed with
reference to Gujarati, they are largely applicable to most
Indian languages.
Authors further focus on earlier work for Gujarati script.
Development of OCR for Gujarati was initiated in 2003 at
M.S. University Baroda at Indian language technology
solutions for Gujarati. Gujarati language is a multilevel
script, written in three zones: base character zone, upper
modifier zone and lower modifier zone. A sophisticated
method for accurate zone detection in images of printed
Gujarati is presented [33]. Another approach to recognize
printed Gujarati characters [36] describes design and
implementation using template matching prototype system
to recognize subset of printed Gujarati script.
After these initial efforts for Gujarati script, classifica-
tion of a subset of printed or digitized Gujarati characters
[38] is proposed that utilizes the Euclidean minimum dis-
tance and the k–NN classifiers.
More recently, OCR system for handwritten Gujarati
numbers is proposed [33]. A multi layered feed forward
neural network is suggested for classification of digits. The
features of Gujarati digits are abstracted by four different
profiles of digits. There is another attempt on handwritten
numeral recognition of Gujarati [34] that proposes SVM
based recognition scheme.
According to the review [35], recognition rate depends
on the level of constraints on handwriting such as types of
handwriting, the number of writers, the size of the vocab-
ulary and the spatial layout. Thus literature survey shows
that there is no documentary evidence of research for
Gujarati OCR for a complete character set. This work is
probably the one amongst the initial attempts towards
recognition of full character set of isolated Gujrati char-
acters even as compared to previous work by authors [39]
that presents recognition of only 18 characters of Gujrati
script.
Authors exploited various feature extraction methods
and recognition techniques in order to develop a character
recognition system for isolated Gujrati characters. The
initial attempt of authors was development of a template
matching algorithm that worked on subset of Gujrati
characters that yielded recognition efficiency of 71.66 %
[39]. Next, authors developed a statistical pattern recog-
nition algorithm [40], based on statistical features such as
Euler value, standard deviation and Euclidian distance.
These statistical parameters were compared with the ima-
ges of stored characters for feature matching. This tech-
nique yielded recognition efficiency of 56.14 %. The work
presented in [39, 40] represent experimentation on only 18
characters of Gujrati out of 46. Similarly implementation
of an ANFC that yields around 62 % of recognition effi-
ciency is presented in [41]. The difference between this
previous work and the present work is the use of FHs.
Furthermore, authors proposed a weighted k–NN algorithm
with novel distance measure, mean v2 [42] that yielded the
highest recognition efficiency i.e. 86.33 % among all the
techniques proposed by the authors. All the previous
techniques except [42] work on limited set of characters
and deploy different features and classification techniques.
Present work focuses mainly on use of FHs to eliminate the
ill-effects of overlapped classes.
The facts that make this domain challenging are listed as
follows:
1. Large character sets with different patterns as opposed
to English.
2. Structure of Indian language scripts as characterized by
curves, holes, and also strokes [43].
3. Recognition difficulty due to translation, rotation and
scaling.
4. Unavailability of correct data sets or absence of
methods to generate appropriate data sets.
5. Selection of appropriate features for classification of
characters.
These facts emphasize the necessity and scope for
development of efficient techniques towards Gujarati OCR.
5 System architecture
This section presents the system architecture for hand-
written Gujarati OCR as shown in Fig. 2. Training phase
comprises preprocessing and feature extraction. Proposed
system extracts four different features. It is well-known
that statistical and structural approaches to OCR have
Fig. 1 Gujarati script with consonants and vowels
Int. J. Mach. Learn. & Cyber.
123
specific advantages and disadvantages. Authors have
developed hybrid feature extraction techniques in an effort
to leverage the advantages of both these approaches.
5.1 Data set description
The availability of data set that captures variations
encountered in real world is a critical part of any experi-
mental research. Due to significant advances in the OCR
research in recent years, several data sets are available for
English.
To the best of our knowledge, no handwritten Gujarati
data sets exist [36]. Therefore 360 samples from different
writers are collected for each character in Gujarati alphabet
i.e. 34 consonants and 12 vowels. Thus this data set con-
sists of 16,560 samples altogether. The characters are
scanned at 300 dots per inch resolution. Experiments are
executed on unconstrained handwritten characters. These
characters have skew in them and also may have noise
pixels. Some characters are also observed to be broken at
locations which have fine links.
Data set developed by authors is published as a ‘national
repository’ to be used by future OCR researchers. This data
set is available for free download at the portal of Indian
Language Technology Proliferation and Deployment Cen-
ter which is an undertaking of Department of Electronics
and Information Technology of the Government of India.
5.2 Preprocessing
Preprocessing serves the purpose of extracting regions of
interest, enhancing and cleaning up the images, so that they
can be directly and efficiently processed by the feature
extraction stage. Digital scanners suffer from a number of
limitations e.g. geometrical distortions. Due to absence of
standard image acquisition procedures for OCR data sets,
efficient preprocessing is required. Initially, the scanned
images undergo normalization operation.
Authors use moment-based normalization to obtain a
normalized image from a geometric transformation pro-
cedure that is invariant to any affine distortion of the
image. This enhances the recognition rate of character even
when the character samples from different writers exhibit
affine geometric variations [37, 44].
5.3 Feature extraction
Authors present a novel combination of four feature
extraction methods. It is well-known that statistical and
structural approaches to OCR have specific advantages and
disadvantages. Authors propose hybrid feature extraction
techniques in an effort to leverage the advantages of both
these approaches. Hybrid approaches overcome the prob-
lems associated with statistical and structural methods
when utilized independently.
The proposed system extracts four features listed as
follows:
1. Gabor Phase XNOR pattern (GPXNP).
2. Pattern descriptor.
3. Contour Direction Probability Distribution Function
(CDPDF).
4. Autocorrelation.
Among these four features, first two are newly proposed
for isolated handwritten character set of Gujarati. The
details of proposed feature extraction algorithms are
Fig. 2 ANFC-FH with feature extraction methods, training and recognition of isolated Gujarati characters
Int. J. Mach. Learn. & Cyber.
123
available in [42]. Furthermore, GPXNP, CDPDF and
autocorrelation represent statistical properties whereas
pattern descriptor represents structural properties of an
image.
6 ANFC with FHs
This paper explores various kinds of fuzzy set shape
transformers and generators known as hedges to improve
discrimination power of classifier. Hedges play the same
role in the fuzzy modeling system as adverbs and adjec-
tives do in English: they modify the nature of a fuzzy set.
Hedges are important components of the fuzzy system,
allowing us to model closely the semantics of the under-
lying knowledge [45].
A hedge modifies the shape of a fuzzy set’s surface
causing a change in the membership truth function. Thus, a
hedge transforms one fuzzy set into another, new fuzzy set.
In a fuzzy reasoning system there are several different
classes of hedge operations, each represented by a lin-
guistic construct. To investigate the effects of FHs on
ANFC the details of layers of ANFC are followed further.
Figure 3 shows the architecture of ANFC. Fuzzy clas-
sification is the task of partitioning a feature space into
fuzzy classes. It is possible to describe the feature space
with fuzzy regions, and to control each region with fuzzy
rules [36, 45].
ANFC under discussion with FHs is based on fuzzy
rules. A fuzzy classification rule that has inputs
fx1; x2; . . .xmg and output defined with FHs as
IF x1 is A1 with p1 hedge AND x2 is A2 with
p2 hedge THEN y is c1 class,
where A1 and A2 denote linguistic terms that are defined on
x1 and x2 feature space; p1 and p2 denote fuzzy hedges,
respectively;
Fig. 3 Architecture of ANFC with FH
Int. J. Mach. Learn. & Cyber.
123
In this chapter, the ANFC is based on type-3 Sugeno
fuzzy model. The crisp outputs of fuzzy rules are deter-
mined by weighted average operator [45]. In this classifier,
the nodes in the same layer have the same type of node
functions. The layers and their properties are given as
follows:
Layer 1 The membership grade of each input to speci-
fied fuzzy region is measured in layer 1, where generalized
bell-shape, Gaussian, triangle, and trapezoidal functions
can be used as membership function (MF). Authors employ
Gaussian function to smooth partial derivatives of its
parameters, and has less parameter. The Gaussian MF is
given as
lij Xsj
� �¼ exp �0:5
Xsj � Cij
� �2
r2ij
!
ð1Þ
where lij Xsj
� �represents the membership grade of the
ith rule and the jth feature; Xsj denotes the sth sample
and the jth feature of input matrix X X 2 RN�Df g; Cij and
rij are the center and the width of Gaussian function,
respectively.
Layer 2 In this layer, the secondary meanings of fuzzy
sets are calculated with their FHs
aijs ¼ lij Xsj
� �� �Pij ; ð2Þ
where aijs denotes the modified membership grades of
lij Xsj
� �; Pij denotes the FH value of the ith rule and the jth
feature.
Layer 3 The degree of fulfillment of the fuzzy rule for xs
sample is determined in this layer. It is also called as the
firing strength of rule. So, the bis firing strength of the ith
rule is where D represents the number of features.
bis ¼YD
j¼1
aijs ð3Þ
Layer 4 In this layer, weighted outputs are calculated,
and every rule can affect each class according to their
weights. However, if a rule controls a specific class region,
the weight between this rule output and the specific class is
to be bigger than the other class weights. Otherwise, the
class weights are fairly small:
Osk ¼XU
i¼1
bisWik; ð4Þ
where Wik represents the degree of belonging to the kth
class that is controlled with the ith rule; Osk denotes the
weighted output for the sth sample that belongs to the kth
class, and U is the number of rules.
Layer 5 Sometimes the summation of weights may be
bigger than 1. Therefore, the outputs of the network should
be normalized:
hsk ¼OskPKi¼1 Osi
¼ Osk
ds
;
ds ¼XK
i¼1
Osi;
ð5Þ
where hsk represents the normalized degree of the sth sample
that belongs to the kth class; and K is the number of classes.
After then, the class label of sth sample is determined by the
maximum hsk value as where cs is the class label.
6.1 Optimization of weight in ANFC layers
The antecedent parameters of the network c; r; pf g could
be adapted by any optimization method. In this study, SCG
method is used to adapt the network parameters [43]. The
SCG is a second order supervised training and derivative
based method. It determines the second order derivatives of
parameters from their first-order derivatives. This calcula-
tion method is decreased by the number of operation in
each iteration. The SCG has a super linear convergence
rate, which is two times faster than that of the back-prop-
agation algorithm [43].
The last parameter Wik can also be adapted with the
SCG method. However, in the training, Wik can be bigger
than 1. In these cases, the meanings of weights could be
lost among the same-class clusters. For that reason, either
Wik should be constrained or Wik is determined from the
ratio of the number of kth class samples in the ith fuzzy
rule region respect to the total number of kth class samples.
However, Wik must be determined in the every iteration of
optimization method. The weight parameter Wik is assumed
a cluster weight as in Gaussian mixture density. According
to the explained definition above, the weight between the
ith fuzzy rule and the kth class is
Wik ¼Si
Sk
ð6Þ
Si is the number of kth class samples that belong to the
ith fuzzy rule region, and Sk is the number of all kth class
samples.
When the fuzzy classification rules are constructed as a
network, these parameters can be adapted with neural
networks. As a result, fuzzy classification systems and
neural networks can be combined with their superior
properties. The combined system is named as neuro-fuzzy
classifier that is an adaptive network-based system having
multiple inputs multiple outputs [45].
6.2 Computing cost function using least square
estimate
The cost function that is used in the SCG method is
determined from the least mean squares of the difference
Int. J. Mach. Learn. & Cyber.
123
target and the calculated class value [45]. According to the
above definition, the cost function E is
E ¼ 1
N
XN
s¼1
Es;
Es ¼1
2
XK
k¼1
tsk � hskð Þ2;ð7Þ
where N represents the number of samples; tsk and hsk are
target and calculated values of the sth sample belonging to
the kth class, respectively. If the sth sample belongs to the
kth class, the target value tsk is set to 1, which is otherwise
0. For example, let the sth sample belong to the kth class,
so ts ¼ Ok�11OK�k½ �K where
Ok�1 ¼ 00 � � � 0½ �1� k�1ð Þ: ð8Þ
The partial derivative of E respects to cij can be calcu-
lated using the chain rule:
dE
dCij
¼XN
s¼1
dE
dEs
XK
k¼1
dEs
dhsk
� dhsk
dOsk
� dOsk
dbis
� dbis
daijs
� daijs
dlijs
�dlijs
dcij
!
:
ð9Þ
The partial derivatives that are given in Eq. (9) can be
clearly defined [45] as
dE
dEs
¼ 1
N;
dEs
dhsk
¼ hsk � tsk;
dhsk
dOsk
¼ 1� hsk
ds;
dOsk
dbis
¼ Wik
ð10Þ
dbis
daijs
¼ bis
aijs
;
daijs
dlijs
¼ pij
lijs
� aijs
dlijs
dcij
¼ lijs �xsj � cij
r2ij
Similarly, the partial derivatives of E respect to rij and
pij can also be defined as, respectively. The ANFC-FH is
trained with the SCG optimization method using the partial
derivatives of E respect to the parameters above.
7 Extending ANFC to feature selection
Due to the rapid advancement of computer and database
technologies, it is very important to obtain true or
desirable knowledge. The importance of knowledge is
the cause for new scientific branches such as data
mining, machine intelligence, knowledge discovery and
statistics to appear.
Dimension reduction and feature selection (FS) are com-
mon preprocessing steps used for pattern recognition and
classification applications [45]. In some problems, a lot of
features can be used. If irrelevant features are used in combi-
nation with good features, the classifier will not perform well as
it would with only good features. Therefore, the goal should be
aimed at choosing a discriminative subset of features.
There are many potential benefits of dimensionality
reduction and feature selection: facilitating data visualiza-
tion and data understanding, reducing the measurement and
storage requirements, decreasing computational complex-
ity, reducing training and utilization times. Dimensionality
reduction of a feature set is a preprocessing technique
commonly used on multi-dimensional data.
7.1 Alternate methods for dimensionality reduction
There are two different approaches to achieve feature
reduction, namely feature extraction and feature selection [1,
45]. In the feature extraction approach, the popular methods
used are principal component analysis or Karhunen–Loeve
transform, independent component analysis, singular value
decomposition, manifold learning, factor analysis, and fisher
linear discriminate analysis [36]. However, they have a dis-
advantage that measurements from all of the original features
are used in the projection to the lower dimensional space, due
to which the meaning of original features can be lost. In some
applications, it is desirable to pick a subset of the original
features rather than finding a mapping that uses all of the
original features. In these cases, the FS methods should be
used instead of the feature extraction methods.
In the FS approach, relevant features are selected from the
original features without any projection. There are various
well-known measurements to obtain the relevant features,
such as heuristic stepwise analysis [38] statistical hypothesis
testing [43] genetic algorithm, neural networks [38], support
vector machines [43], and fuzzy systems [1, 45–49].
The FS algorithms may also be categorized into two
groups based on their evaluation procedure: filters and
wrappers [14, 49]. If the FS algorithm is run without any
learning algorithm, then it is a filter approach. The filter-based
approaches select the features using estimation criterion
based on the statistics of learning data, and are independent of
the induction classifier. Essentially, irrelevant features are
filtered out before induction. Filters tend to be applicable to
most domains, as they are not tied to any particular induction
algorithm. If the evaluation procedure is tied to the task of the
learning algorithm, the FS algorithm employs the wrapper
approach. The wrappers may produce better results, though
they are expensive to run, and can break down with very large
numbers of features. This is due to the use of learning
Int. J. Mach. Learn. & Cyber.
123
algorithms in the evaluation of subsets, some of which can
encounter problems when dealing with large datasets. Fuzzy
systems such as entropy based [43], fuzzy rough sets [38],
optimal fuzzy-valued feature subset selection (OFSS) [38],
and fuzzy weights [43], methods have been used for FS in
scientific literature [50–55].
7.2 The effect of fuzzy hedges on feature selection
In this sub-section, the positive effect of FHs on ANFC is
presented. Feature selection is used in fuzzy classification
rules, and adapted during the training of the system.
Besides the contribution of meaning into the fuzzy classi-
fication, FH values can be used for feature selection [38].
Experimental results show that when the FH value of the
fuzzy classification set in any feature is close to 1, this
feature is relevant for that class, otherwise it may be
irrelevant. The feature selection algorithm considerably
decreases the number of features for classification prob-
lems. This characteristic of the method is satisfied to
simplify the complex problem. It is concluded that while a
feature is relevant for a particular class, it might be irrel-
evant to the other classes.
In this case, the fuzzy sets of this feature should be taken
differently in the fuzzy rules, and the difference is satisfied
by fuzzy hedges in this study. In some cases, the adaptive
fuzzy hedges can increase the classification accuracy rates.
It was shown by the experimental studies that the algorithm
is successful to select the relevant features, and it can also
eliminate the irrelevant features.
When the numerical values of FHs are changed with the
words, the effects of features on classes are clearly repre-
sented as in the classification of the handwritten Gujarati
character dataset. In this study, a computing with words is
applied on classification problems by progressive stages.
In future, computing with words is likely to emerge as a
major field in its own right. The adaptive FHs have an
important contribution to this concept [38].
7.3 Basic concept: the Shannon’s binary selection
functions
Binary functions using two-valued variables are described
by Shanon [38]. The two of them are the selection of A1 or
A2 inputs for every condition. These functions are given in
Table 1. When the function F1 is investigated, it can be
seen that it follows the variable A1fF1 ¼ f1 A1;A2ð Þ ¼ A1g:It means that the function F1 depends only on the variable
A1, irrespective of the value of A2. A similar case is defined
for the function F2; which also follows the variable
A2fF2 ¼ f2ðA1;A2Þ ¼ A2g: These two functions are
examples of feature selection and can be defined by the
product and power operators:
F1 ¼ A11 � A0
2 ¼ A1 and
F2 ¼ A01 � A1
2 ¼ A2
If the power value of any variable is zero, then the value
of the variable always is one. If the power value of any
variable is one, then the variable is used with its original
value. These conditions are given in Table 2.
These Boolean functions can also be defined for fuzzy
algebra. Let A1 and A2 be fuzzy sets on x1 and x2 feature,
and y be the output, respectively. The P1 and P2 represent
the FH values of those fuzzy sets. In this case, a new
general fuzzy classification rule instead of the Boolean
function can be defined as
IF x1 is A1 with P1 hedge AND x2 is A2with P2 hedge
THEN y is C1;
where C1 represents the class label of output. According
to this fuzzy rule, the functions F1 and F2 are redefined in
fuzzy logic with a similar meaning.
The reduced rules contain only the selected features. For
selection, the FH values have active roles. It can easily be
said that, if the FH value of fuzzy set of any feature for any
class equals to one, this feature is important for that class.
Otherwise, it is not.
This selection criterion can easily be seen with binary
values. But in real applications, binary values of FHs
cannot always be obtained. When the FHs have been tuned,
they are taken as a real number in a wide range. For that
reason, there should be a crucial point to give a decision.
In this study, this point is taken as 0.5. If the FH value of
fuzzy set of the jth feature is bigger and equal to the crucial
point Pj [ 0.5 then the jth feature is important. If the FH
value is smaller than the crucial point Pj \ 0.5 then the jth
feature is not important.
Table 1 The Shannon’s binary
selection functionsInputs Outputs
A1 A2 F1 F2
0 0 0 0
0 1 0 1
1 0 1 0
1 1 1 1
Table 2 The using of powers to describe the F1 and F2 functions
Inputs Outputs
A1 A2 P1 P2 F1 P1 P2 F2
0 0 1 0 0 0 1 0
0 1 1 0 0 0 1 1
1 0 1 0 1 0 1 0
1 1 1 0 1 0 1 1
Int. J. Mach. Learn. & Cyber.
123
In fuzzy literature, FHs are generally employed with
constant values that are also described with linguistic
words. However, in this study, the FHs for FS and classi-
fication are used as variable, and can be changed in a
determined range. For that reason, it is not possible to use a
word for every FH value. But, ‘‘more recessive’’, ‘‘reces-
sive’’, ‘‘neutral’’, ‘‘dominant’’, and ‘‘more dominant’’
words can be used for specific ranges Pj = 0, 0\Pj\0:5;
Pj ¼ 0:5; 0:5\Pj\1; and Pj ¼ 1; respectively.
In classification problems, when a fuzzy rule for every
class with FH s is defined, the FHs of any feature give
different results for classes. It means that some features are
relevant for any class, but these features cannot be irrelevant
for the other classes. Therefore, an FS algorithm uses FHs.
R1: IF x1 is A1 with P1 = 1 hedge AND x2 is A2 with
P2 = 0 hedge THEN y is F1.
R2: IF x1 is A1 with P1 = 0 hedge AND x2 is A2 with
P2 = 1 hedge THEN y is F2.
These rules can be reduced to the following rules:
R1: IF x1 is A1 with P1 = 1 hedge THEN y is F1.
R2: IF x2 is A2 with P2 = 1 hedge THEN y is F2.
7.4 Development of classification rules
Few classification rules as sample are stated as:
R1: IF CDPDF is A11 with P11 = 0.5 AND AUTOCORR
is A12 with P12 = 0.5 AND pattern descriptor is A13
with P13 = 0.5 AND GPXNP is A14 with P14 = 0.5
THEN class is ‘‘Ka’’.
R2: IF CDPDF is A21 with P21 = 0 AND AUTOCORR
is A22 with P22 = 0 AND pattern descriptor is A23
with P23 = 1 AND GPXNP is A24 with P24 = 1
THEN class is ‘‘Kha’’.
R3: IF CDPDF is A31 with P31 = 0 AND AUTOCORR
is A32 with P32 = 0 AND pattern descriptor is A33
with P33 = 1 AND GPXNP is A34 with A34 = 1
THEN class is ‘‘ga’’.
The FS rules could also be expressed with adjectives as:
R1: IF CDPDF is Neutral A11 AND AUTOCORR is
Neutral A12 AND pattern descriptor is Neutral A13
AND GPXNP is Neutral A14 THEN class is ‘‘Ka’’.
R2: IF CDPDF is more recessive A21 AND AUTOCORR
is more recessive A22 AND pattern descriptor is
more dominant A23 AND GPXNP is more dominant
A24 THEN class is ‘‘Kha’’.
R3: IF CDPDF is more recessive A31 AND AUTOCORR
is more recessive A32 AND pattern descriptor is
more dominant A33 AND GPXNP is more dominant
A34 THEN class is ‘‘ga’’.
After the FS and the classification steps, these rules can
be reduced to following rules:
R1: IF pattern descriptor is A13 with P13 = 0.5 AND
GPXNP is A14 with P14 = 0.5 THEN class is ‘‘Ka’’.
R2: IF pattern descriptor is A23 with P23 = 1.1 AND
GPXNP is A24 with P24 = 1.1 THEN class is ‘‘Kha’’.
R3: IF pattern descriptor is A33 with P33 = 1.0 AND
GPXNP is A34 with P34 = 1.2 THEN class is ‘‘ga’’.
After the classification step, it can be seen that some of
the hedge values are bigger than 1, because the hedge
values are not constrained in the classification step. These
classification rules can also be expressed with adjectives as
shown in the following:
R1: IF pattern descriptor is minus A13 AND GPXNP is
minus A14 THEN class is ‘‘Ka’’.
R2: IF pattern descriptor is plus A24 AND GPXNP is plus
A24 THEN class is ‘‘Kha’’.
R3: IF pattern descriptor is plus A33 AND GPXNP is plus
A34 THEN class is ‘‘ga’’.
As a result, these fuzzy classification rules have more
meaning and have a distinctive mark. In addition, one of
the aims of the fuzzy theory that is computing with words
concept is verified by using adaptive FHs in this study.
The results of classification with all four features using
FHs are shown in Table 3. These results demonstrate
potential for elimination of features that does not con-
tribute the recognition efficiency. The selected features
would increase the recognition rate for test set. It means
that some overlapping classes can be easily distinguished
by selected features. Based on the criterion stated above
for feature selection, the system selects 2 features namely
GPXNP and pattern descriptor. Results shown in
Tables 3 and 4 prove that feature selection improves
recognition efficiency by 10 %.
7.5 Feature selection and classification algorithm
7.5.1 Feature selection
Initialize:
1. Fuzzy classification rule for every class, c ¼ 1 using
Gaussian distribution.
2. Hedgecf = 0.5 where number of classes c ¼ 46 and
number of features f ¼ 5:
3. S ¼ 0; the set of selected features.
4. For 0\Hedgecf \1; train the ANFC with FHs.
5. For i ¼ 1 to c
6. Add jth feature to S where jth feature has maximum
Hedgecf value.
Int. J. Mach. Learn. & Cyber.
123
7.5.2 Classification
1. Initialize Hedgecf ¼ 1; for c ¼ 1; 2; . . .; 46 and
f ¼ 1; 2; . . .4:
2. Determine center, width and weight matrix of Gauss-
ian MF using K-means clustering. Train the ANFC
with Hedgecf with S features and Tnew; the new training
set.
3. Obtain the training and testing classification results.
4. Obtain the training and testing classification results.
Classification results for Gujarati character recognition are
displayed in Tables 3 and 4. There are two important
features: GPXNP and pattern descriptor according to fea-
ture selection algorithm, because their power values are the
biggest and also their FH values are the maximum for
Table 3 Recognition rates for Gujarati characters with ANFC-FH
without feature selection
Character Average recognition efficiency g in %
GPXNP Pattern
descriptor
CDPDF Autocorrelation
60.34 80.55 72.23 61.12
69.45 56.94 64.00 44.45
61.67 55.56 44.45 41.67
66.67 53.78 48.61 47.23
56.39 66.67 52.78 40.28
72.50 63.88 41.67 44.45
46.38 55.56 37.50 38.89
56.56 58.94 36.12 41.47
70.78 59.72 58.34 36.12
61.12 56.94 44.45 45.83
47.78 40.28 19.45 18.05
48.34 52.78 38.89 41.47
38.05 19.45 16.67 17.25
60.89 70.83 47.23 38.89
75.00 66.67 44.45 27.78
77.78 69.45 50.00 25.00
56.38 73.61 47.23 36.12
50.83 65.27 38.89 27.78
60.34 52.78 38.89 41.47
70.12 56.94 24.45 15.83
47.78 40.28 19.45 18.05
67.34 62.78 48.89 41.47
58.34 52.78 38.89 41.47
56.38 55.56 37.50 38.89
58.05 39.45 16.67 17.25
65.00 63.61 42.34 31.66
28.05 29.45 16.67 17.25
52.38 55.56 37.50 34.89
61.12 56.94 44.45 45.83
27.78 40.28 19.45 18.05
48.34 52.78 38.89 41.47
42.78 59.72 39.45 18.05
55.00 73.61 52.34 41.66
69.17 81.94 66.67 55.56
Total average
g in %
58.78 57.29 34.29 33.90
Table 4 Recognition rates for Gujarati characters with ANFC-FH
using feature selection
Character Average recognition efficiency g in %
GPXNP Pattern
descriptor
CDPDF Autocorrelation
88.34 85.55 77.23 66.12
74.45 61.94 69.00 49.45
66.67 59.56 49.45 46.67
69.67 58.78 53.61 52.23
82.39 70.67 57.78 45.28
79.50 68.88 46.67 49.45
51.38 60.56 42.50 43.89
61.56 63.94 42.12 46.47
83.78 64.72 63.34 41.12
66.12 61.94 51.45 50.83
73.78 45.28 24.45 23.05
63.34 57.78 43.89 46.47
43.05 24.45 22.67 23.25
90.89 75.83 52.23 43.89
80.00 71.67 49.45 33.78
82.78 74.45 55.00 30.00
81.38 78.61 52.23 41.12
75.83 70.27 45.89 32.78
73.34 57.78 43.89 46.47
78.12 61.94 29.45 20.83
72.78 45.28 24.45 23.05
73.34 67.78 54.89 46.47
63.34 57.78 45.89 46.47
60.38 60.56 42.50 43.89
63.05 44.45 21.67 22.25
70.00 68.61 47.34 36.66
33.05 34.45 21.67 22.25
61.38 60.56 43.50 39.89
66.12 62.94 49.45 50.83
33.78 45.28 24.45 23.05
63.34 57.78 43.89 46.47
57.78 64.72 44.45 23.05
80.00 78.61 57.34 46.66
84.17 86.94 70.67 60.56
Total average
g in %
68.67 62.29 39.29 38.90
Int. J. Mach. Learn. & Cyber.
123
every class. The CDPDF and autocorrelation features are
found irrelevant based on their hedge values and therefore
rejected.
K-means clustering method is employed for creating initial
fuzzy classification rules from the input space. In the FS parts
of experiments, ANFC-FH is trained using all data instances
without testing set. For classification, the number of fuzzy
rules is determined according to the number of classes.
8 Results and discussion
Experiments show that ANFC-FH achieves recognition
rate of 58.78 % as shown in Table 3. Feature selection
algorithm selects two important features GPXNP and pat-
tern descriptor according to FS algorithm. Because, their
P values are the biggest, and also their FHs. Remaining
features are considered irrelevant and therefore eliminated.
Feature selection improves overall recognition rate as
68.67 % than ANFC-FH as shown in Table 4.
Authors present the hybrid feature extraction framework,
which combines the strengths of both statistical and struc-
tural feature extractors [50]. Thanks to the combination of
features describing both local and global properties of
characters, thus providing a wide range of recognition clues.
Novel pattern descriptor uniquely describes character
shapes with relatively large intra-class and inter-class varia-
tions. Next, this study indicates that Gabor phase embodies
good discriminating power, if it is appropriately exploited.
Authors also experimentally reveal [50] that proposed
GPXNP method based on local Gabor patterns work rea-
sonably well under relatively complex testing scenarios. The
third feature, CDPDF refers to the probability distribution of
handwritten characters that captures peculiarity of writers
and trends in writing. This help to provide stability of rec-
ognition results for variety of handwriting samples from
different writers. Notion of self-matching is provided by
autocorrelation feature, fourth feature.
This work has been oriented towards investigation of
hybrid pattern recognition systems especially applied to
character recognition task that further could be deployed
for people with visual impairments. However, this inves-
tigation has provided the basis for further fruitful work in
the future. A series of points described next will comment
on the lines of research that could be pursued.
The performance of the proposed classifiers is compared
with existing back-propagation neural network (NN) clas-
sifier, conventional k–NN classifier ANFC (without fuzzy
hedges) and weighted k–NN with novel Mean v2 distance
measure. This comparison is shown in Table 5.
It is seen from Table 5, that weighted k–NN [42] out-
performs ANFC and ANFC-FH, as far as recognition
efficiency is concerned. The reason lies in application of
different distance measures for all features. The range of
extracted feature values of varies widely; therefore, clas-
sifiers will not work properly without normalization. For
example, this classifier handles broad range of feature
values; therefore the results are not up to the mark. The
solution to this problem is to normalize the range of all
features so that each feature contributes significantly to
classification.
The results further show that ANFC results are better
than a back-propagation NN classifier. This shows that
proposed ANFC is successful than the NN. When the
complexity of the classification problem and the number of
samples of data sets are increased, the difference in the
training times between the ANFC and the NN is increased
in favor of the ANFC. It implies that the ANFC is faster
than the NN algorithm considering the training times.
However the recognition efficiency is not that satisfying.
The reason is that, majority of characters are similar/
identical in shape and thus resulting into overlapped clas-
ses. This drawback of ANFC is eliminated to some extent
by ANFC-FH.
The ANFC rules are improved with fuzzy hedges, and
the meanings of these rules are also extended that are less
known concept in fuzzy systems. In this way, the effects of
features on classification are demonstrated. The FHs are
not represented as stable meaning words due to variability.
But it can easily be said that if the fuzzy hedge value of
fuzzy set in any feature for any class is close to 1, this
feature is important for that class.
On the other hand, if the fuzzy hedge value of any fuzzy
set in any feature for any class is near to 0, it means that
this feature is not important for that class. The usage of
fuzzy hedges in ANFC improves the classification
accuracy.
9 Conclusion
This study presents a classifier that has potential of rea-
sonable generalization capabilities. Authors use fuzzy
hedges that are applied to the fuzzy sets of rules, and are
adapted by SCG algorithm.
Table 5 Comparison of recognition efficiency of existing classifiers
Sr. no. Classification method Recognition g in percent
1 k–NN 16.09
2 Neural network 24.38
3 ANFC 55.67
4 ANFC (FH) 58.78
5 ANFC using feature selection 68.67
6 Weighted k–NN 86.33
Int. J. Mach. Learn. & Cyber.
123
The system creates a feature selection and a rejection
criterion by using power values of features. It emphasizes
some distinctive features and damps irrelevant features
with power values. Feature selection employed with fuzzy
hedges, increases the overall recognition efficiency.
Fuzzy classification rules are improved with fuzzy
hedges, and the meanings of these rules are also extended
that are less known concept in fuzzy systems. Tables 3 and
4 demonstrate the effect of feature selection on
classification.
The fuzzy modifier has been used for the classification
problem, and how they affect the fuzzy classification rules
is shown. Experimental results show that when the fuzzy
hedge value of the fuzzy classification set in any feature is
close to 1, this feature is relevant for that class, otherwise it
may be irrelevant.
Feature selection algorithm considerably decreases the
number of features for classification problems. This char-
acteristic of the method is satisfied to simplify the complex
problem. It is shown by the experimental studies that the
algorithm is successful to select the relevant features, and it
can also eliminate the irrelevant features and thereby
improving classification results.
10 Scope for future research
Gujarati is a language that has a rich cultural heritage. This
combined with the relatively higher literacy rate of people
and significant requirements of visually impaired people
make the problem of OCR and handwriting recognition
relevant, and solutions immediately useful.
The problem of Gujarati OCR and handwriting recog-
nition is very challenging, and authors attempt at under-
standing the challenges and exploring possible solutions to
these problems. A large number of issues still remain to be
solved and active research in this area is required to take
this potential problem to useful levels, when product using
the solution would become available to common man.
References
1. Jang JSR, Sun CT, Mizutani E (1997) Neuro-fuzzy and soft
computing. Prentice Hall, Upper Saddle River
2. Joshi A, Ramakrishman N, Houstis EN, Rice JR (1997) On
neurobiological, neuro-fuzzy, machine learning and statistical
pattern recognition techniques. IEEE Trans Neural Netw
8(1):18–31
3. Lin CT, Yeh CM, Liang SF, Chung JF, Kumar N (2006) Support
vector based fuzzy neural network for pattern classification. IEEE
Trans Fuzzy Syst 14(1):31–41
4. Mitra S, De RK, Pal SK (1997) Knowledge-based fuzzy MLP for
classification and rule generation. IEEE Trans Neural Netw
8(6):1338–1350
5. Nozaki K, Ishibuchi H, Tanaka H (1996) Adaptive fuzzy rule-
based classification systems. IEEE Trans Fuzzy Syst
4(3):238–250
6. Simpson PK (1992) Fuzzy min–max neural networks classifica-
tion. IEEE Trans Neural Netw 3(5):776–786
7. Zadeh LA (1972) A fuzzy set theoretic interpretation of linguistic
hedges. J Cybernet 2(3):4–34
8. Jang JSR (1993) ANFIS: adaptive network based fuzzy inference
systems. IEEE Trans Syst Man Cybernet 23(3):665–685
9. Juang CF, Lin CT (1998) An online self-constructing neural
fuzzy inference network and its applications. IEEE Trans Fuzzy
Syst 6(1):12–32
10. Kasabov NK (2001) Evolving fuzzy neural networks for super-
vised/unsupervised online knowledge-based learning. IEEE Trans
Syst Man Cybernet Part B 31(6):902–918
11. Kasabov NK, Song Q (2002) DENFIS: dynamic evolving neural-
fuzzy inference system and its application for time-series pre-
diction. IEEE Trans Fuzzy Syst 10(2):144–154
12. Nauck D, Kruse R (1999) Neuro-fuzzy systems for function
approximation. Fuzzy Sets Syst 101(2):261–271
13. Chatterjee A, Siarry P (2007) A PSO-aided neuro-fuzzy classifier
employing linguistic hedge concepts. Expert Syst Appl
33(4):1097–1109
14. Marin-Blazquez JG, Shen Q (2002) From approximative to
descriptive fuzzy classifiers. IEEE Trans Fuzzy Syst 10(4):484–497
15. Nauck D (2003) Fuzzy data analysis with NEFCLASS. Int J
Approx Reason 32(2–3):103–130
16. Zadeh LA (1996) Fuzzy logic = computing with words. IEEE
Trans Fuzzy Syst 4(2):103–111
17. Zadeh LA (1999) From computing with numbers to computing
with words—from manipulation of measurements to manipula-
tion of perceptions. IEEE Trans Circ Syst I: Fundam Theory Appl
45(1):105–119
18. De Cock M, Kerre EE (2004) Fuzzy modifiers based on fuzzy
relations. Inf Sci 160:173–199
19. Huynh VN, Ho TB, Nakamori Y (2002) A parametric represen-
tation of linguistic hedges in Zadeh’s fuzzy logic. Int J Approx
Reason 30:203–223
20. Rubin SH (1999) Computing with words. IEEE Trans Syst Man
Cybernet Part B 29(4):518–524
21. Turksn IB (2004) A foundation for CWW: meta-linguistic axi-
oms. In: IEEE fuzzy information, processing NAFIPS’04,
pp 395–400
22. Casillas J, Cordon O, Del Jesus MJ, Herrera F (2005) Genetic
tuning of fuzzy rule deep structures preserving interpretability
and its interaction with fuzzy rule set reduction. IEEE Trans
Fuzzy Syst 13(1):13–29
23. Ho NC, Wechler W (1992) Extended hedge algebras and their
application to fuzzy logic. Fuzzy Sets Syst 52(3):259–281
24. Novak V (1996) A horizon shifting model of linguistic hedges for
approximate reasoning. In: Proceedings of the fifth IEEE inter-
national conference on fuzzy systems, pp 423–427
25. Huang CY, Chen CY, Liu BD (1999) Current-mode fuzzy lin-
guistic hedge circuits. Analog Integr Circ Sig Process 19:255–278
26. Zadeh LA (1975) The concept of a linguistic variable and its
application to approximate reasoning, parts 1, 2 and 3. Inf Sci
8–9:199–249 (pp 301–357, 43–80)
27. Banks W (1994) Mixing crisp and fuzzy logic in applications. In:
WESCON’94 idea microelectronics Conference record, Ana-
heim, CA, pp 94–97
28. Bouchon-Meunier B (1992) Linguistic hedges and fuzzy logic.
In: Proceedings of the first IEEE international conference on
fuzzy systems, San Diego, CA, pp 247–254
29. Liu BD, Chen CY, Tsao JY (2001) Design of adaptive fuzzy logic
controller based on linguistic-hedge concepts and genetic algo-
rithms. IEEE Trans Syst Man Cybernet Part B 31(1):32–53
Int. J. Mach. Learn. & Cyber.
123
30. Chakraborty D, Pal NR (2004) A neuro-fuzzy scheme for
simultaneous feature selection and fuzzy rule-based classifica-
tion. IEEE Trans Neural Netw 15(1):110–123
31. Rutkowski L, Cpalka K (2003) Flexible neuro-fuzzy systems.
IEEE Trans Neural Netw 14(3):554–574
32. Shilton A, Lai DTH (2007) Iterative fuzzy support vector
machine classification. IEEE international fuzzy systems con-
ference, pp 1–6
33. Dholakia J, Negi A, Rama Mohan S (2005) Zone identification in
the printed Gujarati text. ICDAR, pp 272–276
34. Maloo M, Kale KV (2011) Support vector machine based
Gujarati numeral recognition. Int J Computer Sci Eng (IJCSE)
3(7):2595–2600
35. Maloo M, Kale KV (2011) Gujarati script recognition: a review.
Int J Computer Sci Eng (IJCSE) 4(1):1694–1814
36. Shah SK, Sharma A (2006) Design and implementation of optical
character recognition system to recognize Gujarati script using
template matching. IE(I) J ET 86:44–49
37. Antani S, Lalitha A (1999) Gujarati character recognition. IC-
DAR, pp 418–421
38. Shannon CE (1938) A symbolic analysis of relay and switching
circuits. Trans Am Inst Electr Eng 57:713–723
39. Prasad J, Kulkarni U (2009) Offline handwritten character rec-
ognition of Gujrati script using pattern matching. In: Proceedings
of IEEE ASID 2009, pp 611–615
40. Prasad J, Kulkarni U (2011) Statistical feature extraction and
recognition of isolated handwritten Gujrati characters. In: CiiT
Int J Digital Image Process 3(19). ISSN 0975-9691
(pii:DIP122011008)
41. Prasad J, Kulkarni U (2014) Gujarati character recognition using
adaptive neuro fuzzy classifier with fuzzy hedges. In: The pro-
ceedings of ICESC-2014, at RKNEC Nagpur, January 2014
(press)
42. Prasad J, Kulkarni U (2013) Gujarati character recognition using
weighted k–NN with mean v2 distance measure. Int J Mach Learn
Cybernet. ISSN 1868-8071. doi:10.1007/s13042-013-0187-z
43. Tsang ECC, Yeung DS, Wang XZ (2003) OFFSS: optimal fuzzy-
valued feature subset selection. IEEE Trans Fuzzy Syst
11(2):202–213
44. Desai A (2010) Gujarati handwritten numeral optical character
reorganization through neural network. In: Pattern recognition,
vol 43, issue 7. Elsevier Science Inc., New York, pp 2582–2589
45. Cetisli B (2010) Development of an adaptive neuro-fuzzy clas-
sifier using linguistic hedges: part 1. J Expert Syst Appl
37:6093–6101
46. Cord A, Ambroise C, Cocquerez JP (2006) Feature selection in
robust clustering based on Laplace mixture. Pattern Recogn Lett
27:627–635
47. Kwak N, Choi CCH (2002) Input feature selection for classifi-
cation problems. IEEE Trans Neural Netw 13(1):143–159
48. Sindhwani V, Rakshit S (2004) Feature selection in MLPs and
SVMs based on maximum output information. IEEE Trans
Neural Netw 15(4):937–948
49. Jensen R, Shen Q (2007) Fuzzy-rough sets assisted attribute
selection. IEEE Trans Fuzzy Syst 15(1):73–89
50. Lee HM, Chen CM, Chen JM, Jou YL (2001) An efficient fuzzy
classifier with feature selection based on fuzzy entropy. IEEE
Trans Syst Man Cybernet Part B 31(3):426–432
51. Møller M (1993) A scaled conjugate gradient algorithm for fast
supervised learning. Neural Netw 6(4):525–533. doi:10.1016/
S0893-6080(05)80056-5
52. Cetisli B (2010) The effect of linguistic hedges on feature
selection: part 2. Expert Syst Appl 37:6102–6108
53. Liu H et al (2005) Evolving feature selection. IEEE Intell Syst
20:64–76
54. Sankar KP, Rajat KD, Basak J (2000) Unsupervised feature
evaluation: a neuro-fuzzy approach. IEEE Trans Neural Netw
11(2):366–376
55. Uncu O, Turks IB (2007) A novel feature selection approach:
combining feature wrappers and filters. Inf Sci 177:449–466
Int. J. Mach. Learn. & Cyber.
123