gujarati character recognition using adaptive neuro fuzzy classifier with fuzzy hedges

13
ORIGINAL ARTICLE Gujarati character recognition using adaptive neuro fuzzy classifier with fuzzy hedges Jayashree Rajesh Prasad Uday Kulkarni Received: 2 August 2013 / Accepted: 17 April 2014 Ó Springer-Verlag Berlin Heidelberg 2014 Abstract Recognition of Indian scripts is a challenging problem and work towards development of an OCR for handwritten Gujarati, an Indian script is still in infancy. This paper implements an Adaptive Neuro Fuzzy Classifier (ANFC) for Gujarati character recognition using fuzzy hedges (FHs). FHs are trained with other network parameters by scaled conjugate gradient training algorithm. The tuned fuzzy hedge values of fuzzy sets improve the flexibility of fuzzy sets; this property of FH improves the distinguishability rates of overlapped classes. This work is further extended for feature selection based on FHs. The values of fuzzy hedges can be used to show the importance of degree of fuzzy sets. According to the FH value, the redundant, noisily features can be eliminated, and significant features can be selected. An FH- based feature selection algorithm is implemented using ANFC. This paper aims to demonstrate recognition of ANFC- FH and improved results of the same with feature selection. Keywords Concentration Dilution Feature selection (FS) Fuzzy Hedges (FHs) Fuzzy surface transformers 1 Introduction India is a land of many languages and Gujarati is an Indic script similar in appearance to other Indo-Aryan scripts. Gujarati script has a rich literary heritage. However, research in the field of Gujarati script recognition faces major problems mainly due to a large set of visually similar characters, multi-component characters, touching and bro- ken characters. This paper presents a pattern recognition system for Gujarati character recognition. Authors use combination of four features. Novel Gabor phase XNOR pattern (GPXNP) and pattern descriptor are proposed for isolated handwrit- ten character set of Gujarati. In addition to these two fea- tures, authors use Contour Direction Probability Distribution Function (CDPDF) and autocorrelation fea- tures. Furthermore, authors present the design and devel- opment of an ANFC for recognition of isolated handwritten characters of Gujarati. Authors exploit the method of employing adaptive net- works to solve a fuzzy classification problem. System parameters, such as the membership functions (MFs) defined for each feature and the parameterized t-norms used to combine conjunctive conditions are calibrated with backpropagation. This paper is organized as follows: Sect. 2 surveys related work. Motivation behind this research is presented in Sect. 3. Section 4 justifies significance of present work on Gujarati script and focuses on challenges and opportu- nities in the research of Gujarati handwriting recognition. Section 5 presents core system architecture of the intended Gujarati character recognition system that encompasses brief information about the data set, preprocessing, nor- malization and feature extraction methods. Section 6 describes ANFC layers. Section 7 elaborates feature selection and classification mechanism of the said ANFC, followed by results in Sect. 8. Authors conclude paper in Sect. 9 and comment on scope for future research in Sect. 10. J. R. Prasad (&) Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India e-mail: [email protected] U. Kulkarni Department of Computer Engineering, SGGS Institute of Engineering and Technology, Nanded, India e-mail: [email protected] 123 Int. J. Mach. Learn. & Cyber. DOI 10.1007/s13042-014-0259-8

Upload: uday

Post on 23-Jan-2017

218 views

Category:

Documents


5 download

TRANSCRIPT

ORIGINAL ARTICLE

Gujarati character recognition using adaptive neuro fuzzyclassifier with fuzzy hedges

Jayashree Rajesh Prasad • Uday Kulkarni

Received: 2 August 2013 / Accepted: 17 April 2014

� Springer-Verlag Berlin Heidelberg 2014

Abstract Recognition of Indian scripts is a challenging

problem and work towards development of an OCR for

handwritten Gujarati, an Indian script is still in infancy. This

paper implements an Adaptive Neuro Fuzzy Classifier

(ANFC) for Gujarati character recognition using fuzzy hedges

(FHs). FHs are trained with other network parameters by

scaled conjugate gradient training algorithm. The tuned fuzzy

hedge values of fuzzy sets improve the flexibility of fuzzy sets;

this property of FH improves the distinguishability rates of

overlapped classes. This work is further extended for feature

selection based on FHs. The values of fuzzy hedges can be

used to show the importance of degree of fuzzy sets.

According to the FH value, the redundant, noisily features can

be eliminated, and significant features can be selected. An FH-

based feature selection algorithm is implemented using

ANFC. This paper aims to demonstrate recognition of ANFC-

FH and improved results of the same with feature selection.

Keywords Concentration � Dilution � Feature selection

(FS) � Fuzzy Hedges (FHs) � Fuzzy surface transformers

1 Introduction

India is a land of many languages and Gujarati is an Indic

script similar in appearance to other Indo-Aryan scripts.

Gujarati script has a rich literary heritage. However,

research in the field of Gujarati script recognition faces

major problems mainly due to a large set of visually similar

characters, multi-component characters, touching and bro-

ken characters.

This paper presents a pattern recognition system for

Gujarati character recognition. Authors use combination of

four features. Novel Gabor phase XNOR pattern (GPXNP)

and pattern descriptor are proposed for isolated handwrit-

ten character set of Gujarati. In addition to these two fea-

tures, authors use Contour Direction Probability

Distribution Function (CDPDF) and autocorrelation fea-

tures. Furthermore, authors present the design and devel-

opment of an ANFC for recognition of isolated handwritten

characters of Gujarati.

Authors exploit the method of employing adaptive net-

works to solve a fuzzy classification problem. System

parameters, such as the membership functions (MFs)

defined for each feature and the parameterized t-norms

used to combine conjunctive conditions are calibrated with

backpropagation.

This paper is organized as follows: Sect. 2 surveys

related work. Motivation behind this research is presented

in Sect. 3. Section 4 justifies significance of present work

on Gujarati script and focuses on challenges and opportu-

nities in the research of Gujarati handwriting recognition.

Section 5 presents core system architecture of the intended

Gujarati character recognition system that encompasses

brief information about the data set, preprocessing, nor-

malization and feature extraction methods. Section 6

describes ANFC layers. Section 7 elaborates feature

selection and classification mechanism of the said ANFC,

followed by results in Sect. 8. Authors conclude paper in

Sect. 9 and comment on scope for future research in Sect.

10.

J. R. Prasad (&)

Department of Computer Engineering, Vishwakarma Institute of

Information Technology, Pune, India

e-mail: [email protected]

U. Kulkarni

Department of Computer Engineering, SGGS Institute of

Engineering and Technology, Nanded, India

e-mail: [email protected]

123

Int. J. Mach. Learn. & Cyber.

DOI 10.1007/s13042-014-0259-8

2 Related work

A detailed discussion on applications of fuzzy systems

based on fuzzy rules for classification is available [1–7].

These systems employ linguistic rules, which are provided

by experts or the rules, and are extracted from a given

training data set by a variety of methods like clustering [1].

The fuzzy systems can be constituted with neural net-

works, and such systems are called as neuro-fuzzy systems

[8–12]. The neuro-fuzzy systems define the class distribu-

tions and show the input–output relations [8, 13–15],

whereas the fuzzy systems employ natural language for

developing fuzzy rules.

Neural networks are employed for tuning or training

the system parameters in neuro-fuzzy applications. In the

early 1970s, a class of powering modifiers was introduced

[7], which defined the concept of linguistic variables and

hedges. The concept of computing with words was

introduced as an extension of fuzzy sets and logic theory

[16, 17]. The FHs change the meaning of primary terms

value.

Many researchers have contributed to the computing

with words and to the FH concepts with their theoretical

studies [18–21]. However, these contributions have been

rarely used to solve real world problems and applications.

The FH concept is employed to propose a particle swarm

optimization aided neuro-fuzzy classifier [13].

An FH-based fuzzification layer is added to the net-

work and the FHs are used to modify the piecewise

membership sub-functions [13]. Furthermore, a fuzzy-

genetic inference system based on Mamdani model by

applying FHs and power parameters to membership

functions is proposed [22]. In their study, FHs and power

parameters are defined for membership functions sepa-

rately. The concept of extended hedge algebras and their

application in approximate reasoning are discussed [23].

Modifying the existing FH models, a horizon-shifting

model of FHs is proposed [24], by which the membership

function (MF) can be shifted and its steepness modified.

A current mode circuit is designed using FH for adaptive

fuzzy logic controllers [25], while a study on the

approximate reasoning is presented [26] using the FHs.

Hedge operations are used to better qualify and emphasize

the crisp variables to mix crisp and fuzzy logic in

applications [27]. Several interesting properties of FHs are

investigated [28] such as, being compatible with simple

symbolic rules, avoiding computations and being com-

patible with the fuzzy logic, enhancing the comparison of

various available fuzzy implications and managing grad-

ual rules in the context of deductive rules. Finally a

design of an adaptive fuzzy logic controller is proposed

based on FH concepts [29].

3 Motivation

Motivation behind this research can be stated through

following objectives:

1. To improve classification efficiency for large vocab-

ulary pattern recognition problems like handwritten

character recognition of Gujrati.

2. To handle overlapped classes with reliability.

3. To improve the meaning of classical fuzzy rules.

In literature, a lot of fuzzy classifiers are proposed to

solve classification problem [3–6, 8, 11, 13–15, 30–32].

However, some of the problems have not been solved. One

of them is to discriminate overlapped classes with high

reliability. Overlapping classes can significantly decrease

the classifier performance. Generally, to overcome this

problem, the input space is projected into a new space [1].

In these cases, the meaning of the original features could be

lost. Nozaki et al. [5] used the rule weights to solve this

problem. Instead of these solutions, some features of input

space that cause overlapping can be weakened in classifi-

cation. Investigation of fuzzy classification rules shows that

some fuzzy sets affect the classification success due to

overlapping. Furthermore, changing the width of the MFs

in fuzzy-based classification, is impossible due to other

proximate MFs. In order to meet first objective authors use

the FH concept.

To meet the second objective, a mechanism to improve

the meaning of the classical fuzzy rules is suggested.

Therefore, this study implements an ANFC using FHs. To

accomplish this implementation, a layer is added to the

neuro-fuzzy classifier to indicate the effect of FHs.

The empirical studies show that FHs can improve or

keep the classification success of the neuro-fuzzy classifier.

Especially, the implemented neuro-fuzzy classifier simpli-

fies the distinguishability of overlapping classes. Discrim-

inative features should be used for classification, instead of

using all the features, as some of them can cause over-

lapping among the classes. Thus feature selection mecha-

nism presented in this paper helps to meet the third

objective.

4 Significance of present work for Gujarati

Gujarati is also the name of the script used to write the

Gujarati language spoken by about 50 million people in

western India [33]. Gujarati has 12 vowels and 34 conso-

nants, as shown in Fig. 1. Gujarati belongs to the genre of

languages that use variants of the Devanagri script [34, 35].

No significant work is found in the literature that addresses

recognition of Gujarati language [36, 37]. Some of the

Int. J. Mach. Learn. & Cyber.

123

Gujarati characters are very similar in appearance. With

sufficient noise these characters can easily be misclassified.

Often, these characters are misclassified even by humans

who then need to use context knowledge to correct the

error. Authors find some issues which characterize the

Gujarati script. Although the issues are discussed with

reference to Gujarati, they are largely applicable to most

Indian languages.

Authors further focus on earlier work for Gujarati script.

Development of OCR for Gujarati was initiated in 2003 at

M.S. University Baroda at Indian language technology

solutions for Gujarati. Gujarati language is a multilevel

script, written in three zones: base character zone, upper

modifier zone and lower modifier zone. A sophisticated

method for accurate zone detection in images of printed

Gujarati is presented [33]. Another approach to recognize

printed Gujarati characters [36] describes design and

implementation using template matching prototype system

to recognize subset of printed Gujarati script.

After these initial efforts for Gujarati script, classifica-

tion of a subset of printed or digitized Gujarati characters

[38] is proposed that utilizes the Euclidean minimum dis-

tance and the k–NN classifiers.

More recently, OCR system for handwritten Gujarati

numbers is proposed [33]. A multi layered feed forward

neural network is suggested for classification of digits. The

features of Gujarati digits are abstracted by four different

profiles of digits. There is another attempt on handwritten

numeral recognition of Gujarati [34] that proposes SVM

based recognition scheme.

According to the review [35], recognition rate depends

on the level of constraints on handwriting such as types of

handwriting, the number of writers, the size of the vocab-

ulary and the spatial layout. Thus literature survey shows

that there is no documentary evidence of research for

Gujarati OCR for a complete character set. This work is

probably the one amongst the initial attempts towards

recognition of full character set of isolated Gujrati char-

acters even as compared to previous work by authors [39]

that presents recognition of only 18 characters of Gujrati

script.

Authors exploited various feature extraction methods

and recognition techniques in order to develop a character

recognition system for isolated Gujrati characters. The

initial attempt of authors was development of a template

matching algorithm that worked on subset of Gujrati

characters that yielded recognition efficiency of 71.66 %

[39]. Next, authors developed a statistical pattern recog-

nition algorithm [40], based on statistical features such as

Euler value, standard deviation and Euclidian distance.

These statistical parameters were compared with the ima-

ges of stored characters for feature matching. This tech-

nique yielded recognition efficiency of 56.14 %. The work

presented in [39, 40] represent experimentation on only 18

characters of Gujrati out of 46. Similarly implementation

of an ANFC that yields around 62 % of recognition effi-

ciency is presented in [41]. The difference between this

previous work and the present work is the use of FHs.

Furthermore, authors proposed a weighted k–NN algorithm

with novel distance measure, mean v2 [42] that yielded the

highest recognition efficiency i.e. 86.33 % among all the

techniques proposed by the authors. All the previous

techniques except [42] work on limited set of characters

and deploy different features and classification techniques.

Present work focuses mainly on use of FHs to eliminate the

ill-effects of overlapped classes.

The facts that make this domain challenging are listed as

follows:

1. Large character sets with different patterns as opposed

to English.

2. Structure of Indian language scripts as characterized by

curves, holes, and also strokes [43].

3. Recognition difficulty due to translation, rotation and

scaling.

4. Unavailability of correct data sets or absence of

methods to generate appropriate data sets.

5. Selection of appropriate features for classification of

characters.

These facts emphasize the necessity and scope for

development of efficient techniques towards Gujarati OCR.

5 System architecture

This section presents the system architecture for hand-

written Gujarati OCR as shown in Fig. 2. Training phase

comprises preprocessing and feature extraction. Proposed

system extracts four different features. It is well-known

that statistical and structural approaches to OCR have

Fig. 1 Gujarati script with consonants and vowels

Int. J. Mach. Learn. & Cyber.

123

specific advantages and disadvantages. Authors have

developed hybrid feature extraction techniques in an effort

to leverage the advantages of both these approaches.

5.1 Data set description

The availability of data set that captures variations

encountered in real world is a critical part of any experi-

mental research. Due to significant advances in the OCR

research in recent years, several data sets are available for

English.

To the best of our knowledge, no handwritten Gujarati

data sets exist [36]. Therefore 360 samples from different

writers are collected for each character in Gujarati alphabet

i.e. 34 consonants and 12 vowels. Thus this data set con-

sists of 16,560 samples altogether. The characters are

scanned at 300 dots per inch resolution. Experiments are

executed on unconstrained handwritten characters. These

characters have skew in them and also may have noise

pixels. Some characters are also observed to be broken at

locations which have fine links.

Data set developed by authors is published as a ‘national

repository’ to be used by future OCR researchers. This data

set is available for free download at the portal of Indian

Language Technology Proliferation and Deployment Cen-

ter which is an undertaking of Department of Electronics

and Information Technology of the Government of India.

5.2 Preprocessing

Preprocessing serves the purpose of extracting regions of

interest, enhancing and cleaning up the images, so that they

can be directly and efficiently processed by the feature

extraction stage. Digital scanners suffer from a number of

limitations e.g. geometrical distortions. Due to absence of

standard image acquisition procedures for OCR data sets,

efficient preprocessing is required. Initially, the scanned

images undergo normalization operation.

Authors use moment-based normalization to obtain a

normalized image from a geometric transformation pro-

cedure that is invariant to any affine distortion of the

image. This enhances the recognition rate of character even

when the character samples from different writers exhibit

affine geometric variations [37, 44].

5.3 Feature extraction

Authors present a novel combination of four feature

extraction methods. It is well-known that statistical and

structural approaches to OCR have specific advantages and

disadvantages. Authors propose hybrid feature extraction

techniques in an effort to leverage the advantages of both

these approaches. Hybrid approaches overcome the prob-

lems associated with statistical and structural methods

when utilized independently.

The proposed system extracts four features listed as

follows:

1. Gabor Phase XNOR pattern (GPXNP).

2. Pattern descriptor.

3. Contour Direction Probability Distribution Function

(CDPDF).

4. Autocorrelation.

Among these four features, first two are newly proposed

for isolated handwritten character set of Gujarati. The

details of proposed feature extraction algorithms are

Fig. 2 ANFC-FH with feature extraction methods, training and recognition of isolated Gujarati characters

Int. J. Mach. Learn. & Cyber.

123

available in [42]. Furthermore, GPXNP, CDPDF and

autocorrelation represent statistical properties whereas

pattern descriptor represents structural properties of an

image.

6 ANFC with FHs

This paper explores various kinds of fuzzy set shape

transformers and generators known as hedges to improve

discrimination power of classifier. Hedges play the same

role in the fuzzy modeling system as adverbs and adjec-

tives do in English: they modify the nature of a fuzzy set.

Hedges are important components of the fuzzy system,

allowing us to model closely the semantics of the under-

lying knowledge [45].

A hedge modifies the shape of a fuzzy set’s surface

causing a change in the membership truth function. Thus, a

hedge transforms one fuzzy set into another, new fuzzy set.

In a fuzzy reasoning system there are several different

classes of hedge operations, each represented by a lin-

guistic construct. To investigate the effects of FHs on

ANFC the details of layers of ANFC are followed further.

Figure 3 shows the architecture of ANFC. Fuzzy clas-

sification is the task of partitioning a feature space into

fuzzy classes. It is possible to describe the feature space

with fuzzy regions, and to control each region with fuzzy

rules [36, 45].

ANFC under discussion with FHs is based on fuzzy

rules. A fuzzy classification rule that has inputs

fx1; x2; . . .xmg and output defined with FHs as

IF x1 is A1 with p1 hedge AND x2 is A2 with

p2 hedge THEN y is c1 class,

where A1 and A2 denote linguistic terms that are defined on

x1 and x2 feature space; p1 and p2 denote fuzzy hedges,

respectively;

Fig. 3 Architecture of ANFC with FH

Int. J. Mach. Learn. & Cyber.

123

In this chapter, the ANFC is based on type-3 Sugeno

fuzzy model. The crisp outputs of fuzzy rules are deter-

mined by weighted average operator [45]. In this classifier,

the nodes in the same layer have the same type of node

functions. The layers and their properties are given as

follows:

Layer 1 The membership grade of each input to speci-

fied fuzzy region is measured in layer 1, where generalized

bell-shape, Gaussian, triangle, and trapezoidal functions

can be used as membership function (MF). Authors employ

Gaussian function to smooth partial derivatives of its

parameters, and has less parameter. The Gaussian MF is

given as

lij Xsj

� �¼ exp �0:5

Xsj � Cij

� �2

r2ij

!

ð1Þ

where lij Xsj

� �represents the membership grade of the

ith rule and the jth feature; Xsj denotes the sth sample

and the jth feature of input matrix X X 2 RN�Df g; Cij and

rij are the center and the width of Gaussian function,

respectively.

Layer 2 In this layer, the secondary meanings of fuzzy

sets are calculated with their FHs

aijs ¼ lij Xsj

� �� �Pij ; ð2Þ

where aijs denotes the modified membership grades of

lij Xsj

� �; Pij denotes the FH value of the ith rule and the jth

feature.

Layer 3 The degree of fulfillment of the fuzzy rule for xs

sample is determined in this layer. It is also called as the

firing strength of rule. So, the bis firing strength of the ith

rule is where D represents the number of features.

bis ¼YD

j¼1

aijs ð3Þ

Layer 4 In this layer, weighted outputs are calculated,

and every rule can affect each class according to their

weights. However, if a rule controls a specific class region,

the weight between this rule output and the specific class is

to be bigger than the other class weights. Otherwise, the

class weights are fairly small:

Osk ¼XU

i¼1

bisWik; ð4Þ

where Wik represents the degree of belonging to the kth

class that is controlled with the ith rule; Osk denotes the

weighted output for the sth sample that belongs to the kth

class, and U is the number of rules.

Layer 5 Sometimes the summation of weights may be

bigger than 1. Therefore, the outputs of the network should

be normalized:

hsk ¼OskPKi¼1 Osi

¼ Osk

ds

;

ds ¼XK

i¼1

Osi;

ð5Þ

where hsk represents the normalized degree of the sth sample

that belongs to the kth class; and K is the number of classes.

After then, the class label of sth sample is determined by the

maximum hsk value as where cs is the class label.

6.1 Optimization of weight in ANFC layers

The antecedent parameters of the network c; r; pf g could

be adapted by any optimization method. In this study, SCG

method is used to adapt the network parameters [43]. The

SCG is a second order supervised training and derivative

based method. It determines the second order derivatives of

parameters from their first-order derivatives. This calcula-

tion method is decreased by the number of operation in

each iteration. The SCG has a super linear convergence

rate, which is two times faster than that of the back-prop-

agation algorithm [43].

The last parameter Wik can also be adapted with the

SCG method. However, in the training, Wik can be bigger

than 1. In these cases, the meanings of weights could be

lost among the same-class clusters. For that reason, either

Wik should be constrained or Wik is determined from the

ratio of the number of kth class samples in the ith fuzzy

rule region respect to the total number of kth class samples.

However, Wik must be determined in the every iteration of

optimization method. The weight parameter Wik is assumed

a cluster weight as in Gaussian mixture density. According

to the explained definition above, the weight between the

ith fuzzy rule and the kth class is

Wik ¼Si

Sk

ð6Þ

Si is the number of kth class samples that belong to the

ith fuzzy rule region, and Sk is the number of all kth class

samples.

When the fuzzy classification rules are constructed as a

network, these parameters can be adapted with neural

networks. As a result, fuzzy classification systems and

neural networks can be combined with their superior

properties. The combined system is named as neuro-fuzzy

classifier that is an adaptive network-based system having

multiple inputs multiple outputs [45].

6.2 Computing cost function using least square

estimate

The cost function that is used in the SCG method is

determined from the least mean squares of the difference

Int. J. Mach. Learn. & Cyber.

123

target and the calculated class value [45]. According to the

above definition, the cost function E is

E ¼ 1

N

XN

s¼1

Es;

Es ¼1

2

XK

k¼1

tsk � hskð Þ2;ð7Þ

where N represents the number of samples; tsk and hsk are

target and calculated values of the sth sample belonging to

the kth class, respectively. If the sth sample belongs to the

kth class, the target value tsk is set to 1, which is otherwise

0. For example, let the sth sample belong to the kth class,

so ts ¼ Ok�11OK�k½ �K where

Ok�1 ¼ 00 � � � 0½ �1� k�1ð Þ: ð8Þ

The partial derivative of E respects to cij can be calcu-

lated using the chain rule:

dE

dCij

¼XN

s¼1

dE

dEs

XK

k¼1

dEs

dhsk

� dhsk

dOsk

� dOsk

dbis

� dbis

daijs

� daijs

dlijs

�dlijs

dcij

!

:

ð9Þ

The partial derivatives that are given in Eq. (9) can be

clearly defined [45] as

dE

dEs

¼ 1

N;

dEs

dhsk

¼ hsk � tsk;

dhsk

dOsk

¼ 1� hsk

ds;

dOsk

dbis

¼ Wik

ð10Þ

dbis

daijs

¼ bis

aijs

;

daijs

dlijs

¼ pij

lijs

� aijs

dlijs

dcij

¼ lijs �xsj � cij

r2ij

Similarly, the partial derivatives of E respect to rij and

pij can also be defined as, respectively. The ANFC-FH is

trained with the SCG optimization method using the partial

derivatives of E respect to the parameters above.

7 Extending ANFC to feature selection

Due to the rapid advancement of computer and database

technologies, it is very important to obtain true or

desirable knowledge. The importance of knowledge is

the cause for new scientific branches such as data

mining, machine intelligence, knowledge discovery and

statistics to appear.

Dimension reduction and feature selection (FS) are com-

mon preprocessing steps used for pattern recognition and

classification applications [45]. In some problems, a lot of

features can be used. If irrelevant features are used in combi-

nation with good features, the classifier will not perform well as

it would with only good features. Therefore, the goal should be

aimed at choosing a discriminative subset of features.

There are many potential benefits of dimensionality

reduction and feature selection: facilitating data visualiza-

tion and data understanding, reducing the measurement and

storage requirements, decreasing computational complex-

ity, reducing training and utilization times. Dimensionality

reduction of a feature set is a preprocessing technique

commonly used on multi-dimensional data.

7.1 Alternate methods for dimensionality reduction

There are two different approaches to achieve feature

reduction, namely feature extraction and feature selection [1,

45]. In the feature extraction approach, the popular methods

used are principal component analysis or Karhunen–Loeve

transform, independent component analysis, singular value

decomposition, manifold learning, factor analysis, and fisher

linear discriminate analysis [36]. However, they have a dis-

advantage that measurements from all of the original features

are used in the projection to the lower dimensional space, due

to which the meaning of original features can be lost. In some

applications, it is desirable to pick a subset of the original

features rather than finding a mapping that uses all of the

original features. In these cases, the FS methods should be

used instead of the feature extraction methods.

In the FS approach, relevant features are selected from the

original features without any projection. There are various

well-known measurements to obtain the relevant features,

such as heuristic stepwise analysis [38] statistical hypothesis

testing [43] genetic algorithm, neural networks [38], support

vector machines [43], and fuzzy systems [1, 45–49].

The FS algorithms may also be categorized into two

groups based on their evaluation procedure: filters and

wrappers [14, 49]. If the FS algorithm is run without any

learning algorithm, then it is a filter approach. The filter-based

approaches select the features using estimation criterion

based on the statistics of learning data, and are independent of

the induction classifier. Essentially, irrelevant features are

filtered out before induction. Filters tend to be applicable to

most domains, as they are not tied to any particular induction

algorithm. If the evaluation procedure is tied to the task of the

learning algorithm, the FS algorithm employs the wrapper

approach. The wrappers may produce better results, though

they are expensive to run, and can break down with very large

numbers of features. This is due to the use of learning

Int. J. Mach. Learn. & Cyber.

123

algorithms in the evaluation of subsets, some of which can

encounter problems when dealing with large datasets. Fuzzy

systems such as entropy based [43], fuzzy rough sets [38],

optimal fuzzy-valued feature subset selection (OFSS) [38],

and fuzzy weights [43], methods have been used for FS in

scientific literature [50–55].

7.2 The effect of fuzzy hedges on feature selection

In this sub-section, the positive effect of FHs on ANFC is

presented. Feature selection is used in fuzzy classification

rules, and adapted during the training of the system.

Besides the contribution of meaning into the fuzzy classi-

fication, FH values can be used for feature selection [38].

Experimental results show that when the FH value of the

fuzzy classification set in any feature is close to 1, this

feature is relevant for that class, otherwise it may be

irrelevant. The feature selection algorithm considerably

decreases the number of features for classification prob-

lems. This characteristic of the method is satisfied to

simplify the complex problem. It is concluded that while a

feature is relevant for a particular class, it might be irrel-

evant to the other classes.

In this case, the fuzzy sets of this feature should be taken

differently in the fuzzy rules, and the difference is satisfied

by fuzzy hedges in this study. In some cases, the adaptive

fuzzy hedges can increase the classification accuracy rates.

It was shown by the experimental studies that the algorithm

is successful to select the relevant features, and it can also

eliminate the irrelevant features.

When the numerical values of FHs are changed with the

words, the effects of features on classes are clearly repre-

sented as in the classification of the handwritten Gujarati

character dataset. In this study, a computing with words is

applied on classification problems by progressive stages.

In future, computing with words is likely to emerge as a

major field in its own right. The adaptive FHs have an

important contribution to this concept [38].

7.3 Basic concept: the Shannon’s binary selection

functions

Binary functions using two-valued variables are described

by Shanon [38]. The two of them are the selection of A1 or

A2 inputs for every condition. These functions are given in

Table 1. When the function F1 is investigated, it can be

seen that it follows the variable A1fF1 ¼ f1 A1;A2ð Þ ¼ A1g:It means that the function F1 depends only on the variable

A1, irrespective of the value of A2. A similar case is defined

for the function F2; which also follows the variable

A2fF2 ¼ f2ðA1;A2Þ ¼ A2g: These two functions are

examples of feature selection and can be defined by the

product and power operators:

F1 ¼ A11 � A0

2 ¼ A1 and

F2 ¼ A01 � A1

2 ¼ A2

If the power value of any variable is zero, then the value

of the variable always is one. If the power value of any

variable is one, then the variable is used with its original

value. These conditions are given in Table 2.

These Boolean functions can also be defined for fuzzy

algebra. Let A1 and A2 be fuzzy sets on x1 and x2 feature,

and y be the output, respectively. The P1 and P2 represent

the FH values of those fuzzy sets. In this case, a new

general fuzzy classification rule instead of the Boolean

function can be defined as

IF x1 is A1 with P1 hedge AND x2 is A2with P2 hedge

THEN y is C1;

where C1 represents the class label of output. According

to this fuzzy rule, the functions F1 and F2 are redefined in

fuzzy logic with a similar meaning.

The reduced rules contain only the selected features. For

selection, the FH values have active roles. It can easily be

said that, if the FH value of fuzzy set of any feature for any

class equals to one, this feature is important for that class.

Otherwise, it is not.

This selection criterion can easily be seen with binary

values. But in real applications, binary values of FHs

cannot always be obtained. When the FHs have been tuned,

they are taken as a real number in a wide range. For that

reason, there should be a crucial point to give a decision.

In this study, this point is taken as 0.5. If the FH value of

fuzzy set of the jth feature is bigger and equal to the crucial

point Pj [ 0.5 then the jth feature is important. If the FH

value is smaller than the crucial point Pj \ 0.5 then the jth

feature is not important.

Table 1 The Shannon’s binary

selection functionsInputs Outputs

A1 A2 F1 F2

0 0 0 0

0 1 0 1

1 0 1 0

1 1 1 1

Table 2 The using of powers to describe the F1 and F2 functions

Inputs Outputs

A1 A2 P1 P2 F1 P1 P2 F2

0 0 1 0 0 0 1 0

0 1 1 0 0 0 1 1

1 0 1 0 1 0 1 0

1 1 1 0 1 0 1 1

Int. J. Mach. Learn. & Cyber.

123

In fuzzy literature, FHs are generally employed with

constant values that are also described with linguistic

words. However, in this study, the FHs for FS and classi-

fication are used as variable, and can be changed in a

determined range. For that reason, it is not possible to use a

word for every FH value. But, ‘‘more recessive’’, ‘‘reces-

sive’’, ‘‘neutral’’, ‘‘dominant’’, and ‘‘more dominant’’

words can be used for specific ranges Pj = 0, 0\Pj\0:5;

Pj ¼ 0:5; 0:5\Pj\1; and Pj ¼ 1; respectively.

In classification problems, when a fuzzy rule for every

class with FH s is defined, the FHs of any feature give

different results for classes. It means that some features are

relevant for any class, but these features cannot be irrelevant

for the other classes. Therefore, an FS algorithm uses FHs.

R1: IF x1 is A1 with P1 = 1 hedge AND x2 is A2 with

P2 = 0 hedge THEN y is F1.

R2: IF x1 is A1 with P1 = 0 hedge AND x2 is A2 with

P2 = 1 hedge THEN y is F2.

These rules can be reduced to the following rules:

R1: IF x1 is A1 with P1 = 1 hedge THEN y is F1.

R2: IF x2 is A2 with P2 = 1 hedge THEN y is F2.

7.4 Development of classification rules

Few classification rules as sample are stated as:

R1: IF CDPDF is A11 with P11 = 0.5 AND AUTOCORR

is A12 with P12 = 0.5 AND pattern descriptor is A13

with P13 = 0.5 AND GPXNP is A14 with P14 = 0.5

THEN class is ‘‘Ka’’.

R2: IF CDPDF is A21 with P21 = 0 AND AUTOCORR

is A22 with P22 = 0 AND pattern descriptor is A23

with P23 = 1 AND GPXNP is A24 with P24 = 1

THEN class is ‘‘Kha’’.

R3: IF CDPDF is A31 with P31 = 0 AND AUTOCORR

is A32 with P32 = 0 AND pattern descriptor is A33

with P33 = 1 AND GPXNP is A34 with A34 = 1

THEN class is ‘‘ga’’.

The FS rules could also be expressed with adjectives as:

R1: IF CDPDF is Neutral A11 AND AUTOCORR is

Neutral A12 AND pattern descriptor is Neutral A13

AND GPXNP is Neutral A14 THEN class is ‘‘Ka’’.

R2: IF CDPDF is more recessive A21 AND AUTOCORR

is more recessive A22 AND pattern descriptor is

more dominant A23 AND GPXNP is more dominant

A24 THEN class is ‘‘Kha’’.

R3: IF CDPDF is more recessive A31 AND AUTOCORR

is more recessive A32 AND pattern descriptor is

more dominant A33 AND GPXNP is more dominant

A34 THEN class is ‘‘ga’’.

After the FS and the classification steps, these rules can

be reduced to following rules:

R1: IF pattern descriptor is A13 with P13 = 0.5 AND

GPXNP is A14 with P14 = 0.5 THEN class is ‘‘Ka’’.

R2: IF pattern descriptor is A23 with P23 = 1.1 AND

GPXNP is A24 with P24 = 1.1 THEN class is ‘‘Kha’’.

R3: IF pattern descriptor is A33 with P33 = 1.0 AND

GPXNP is A34 with P34 = 1.2 THEN class is ‘‘ga’’.

After the classification step, it can be seen that some of

the hedge values are bigger than 1, because the hedge

values are not constrained in the classification step. These

classification rules can also be expressed with adjectives as

shown in the following:

R1: IF pattern descriptor is minus A13 AND GPXNP is

minus A14 THEN class is ‘‘Ka’’.

R2: IF pattern descriptor is plus A24 AND GPXNP is plus

A24 THEN class is ‘‘Kha’’.

R3: IF pattern descriptor is plus A33 AND GPXNP is plus

A34 THEN class is ‘‘ga’’.

As a result, these fuzzy classification rules have more

meaning and have a distinctive mark. In addition, one of

the aims of the fuzzy theory that is computing with words

concept is verified by using adaptive FHs in this study.

The results of classification with all four features using

FHs are shown in Table 3. These results demonstrate

potential for elimination of features that does not con-

tribute the recognition efficiency. The selected features

would increase the recognition rate for test set. It means

that some overlapping classes can be easily distinguished

by selected features. Based on the criterion stated above

for feature selection, the system selects 2 features namely

GPXNP and pattern descriptor. Results shown in

Tables 3 and 4 prove that feature selection improves

recognition efficiency by 10 %.

7.5 Feature selection and classification algorithm

7.5.1 Feature selection

Initialize:

1. Fuzzy classification rule for every class, c ¼ 1 using

Gaussian distribution.

2. Hedgecf = 0.5 where number of classes c ¼ 46 and

number of features f ¼ 5:

3. S ¼ 0; the set of selected features.

4. For 0\Hedgecf \1; train the ANFC with FHs.

5. For i ¼ 1 to c

6. Add jth feature to S where jth feature has maximum

Hedgecf value.

Int. J. Mach. Learn. & Cyber.

123

7.5.2 Classification

1. Initialize Hedgecf ¼ 1; for c ¼ 1; 2; . . .; 46 and

f ¼ 1; 2; . . .4:

2. Determine center, width and weight matrix of Gauss-

ian MF using K-means clustering. Train the ANFC

with Hedgecf with S features and Tnew; the new training

set.

3. Obtain the training and testing classification results.

4. Obtain the training and testing classification results.

Classification results for Gujarati character recognition are

displayed in Tables 3 and 4. There are two important

features: GPXNP and pattern descriptor according to fea-

ture selection algorithm, because their power values are the

biggest and also their FH values are the maximum for

Table 3 Recognition rates for Gujarati characters with ANFC-FH

without feature selection

Character Average recognition efficiency g in %

GPXNP Pattern

descriptor

CDPDF Autocorrelation

60.34 80.55 72.23 61.12

69.45 56.94 64.00 44.45

61.67 55.56 44.45 41.67

66.67 53.78 48.61 47.23

56.39 66.67 52.78 40.28

72.50 63.88 41.67 44.45

46.38 55.56 37.50 38.89

56.56 58.94 36.12 41.47

70.78 59.72 58.34 36.12

61.12 56.94 44.45 45.83

47.78 40.28 19.45 18.05

48.34 52.78 38.89 41.47

38.05 19.45 16.67 17.25

60.89 70.83 47.23 38.89

75.00 66.67 44.45 27.78

77.78 69.45 50.00 25.00

56.38 73.61 47.23 36.12

50.83 65.27 38.89 27.78

60.34 52.78 38.89 41.47

70.12 56.94 24.45 15.83

47.78 40.28 19.45 18.05

67.34 62.78 48.89 41.47

58.34 52.78 38.89 41.47

56.38 55.56 37.50 38.89

58.05 39.45 16.67 17.25

65.00 63.61 42.34 31.66

28.05 29.45 16.67 17.25

52.38 55.56 37.50 34.89

61.12 56.94 44.45 45.83

27.78 40.28 19.45 18.05

48.34 52.78 38.89 41.47

42.78 59.72 39.45 18.05

55.00 73.61 52.34 41.66

69.17 81.94 66.67 55.56

Total average

g in %

58.78 57.29 34.29 33.90

Table 4 Recognition rates for Gujarati characters with ANFC-FH

using feature selection

Character Average recognition efficiency g in %

GPXNP Pattern

descriptor

CDPDF Autocorrelation

88.34 85.55 77.23 66.12

74.45 61.94 69.00 49.45

66.67 59.56 49.45 46.67

69.67 58.78 53.61 52.23

82.39 70.67 57.78 45.28

79.50 68.88 46.67 49.45

51.38 60.56 42.50 43.89

61.56 63.94 42.12 46.47

83.78 64.72 63.34 41.12

66.12 61.94 51.45 50.83

73.78 45.28 24.45 23.05

63.34 57.78 43.89 46.47

43.05 24.45 22.67 23.25

90.89 75.83 52.23 43.89

80.00 71.67 49.45 33.78

82.78 74.45 55.00 30.00

81.38 78.61 52.23 41.12

75.83 70.27 45.89 32.78

73.34 57.78 43.89 46.47

78.12 61.94 29.45 20.83

72.78 45.28 24.45 23.05

73.34 67.78 54.89 46.47

63.34 57.78 45.89 46.47

60.38 60.56 42.50 43.89

63.05 44.45 21.67 22.25

70.00 68.61 47.34 36.66

33.05 34.45 21.67 22.25

61.38 60.56 43.50 39.89

66.12 62.94 49.45 50.83

33.78 45.28 24.45 23.05

63.34 57.78 43.89 46.47

57.78 64.72 44.45 23.05

80.00 78.61 57.34 46.66

84.17 86.94 70.67 60.56

Total average

g in %

68.67 62.29 39.29 38.90

Int. J. Mach. Learn. & Cyber.

123

every class. The CDPDF and autocorrelation features are

found irrelevant based on their hedge values and therefore

rejected.

K-means clustering method is employed for creating initial

fuzzy classification rules from the input space. In the FS parts

of experiments, ANFC-FH is trained using all data instances

without testing set. For classification, the number of fuzzy

rules is determined according to the number of classes.

8 Results and discussion

Experiments show that ANFC-FH achieves recognition

rate of 58.78 % as shown in Table 3. Feature selection

algorithm selects two important features GPXNP and pat-

tern descriptor according to FS algorithm. Because, their

P values are the biggest, and also their FHs. Remaining

features are considered irrelevant and therefore eliminated.

Feature selection improves overall recognition rate as

68.67 % than ANFC-FH as shown in Table 4.

Authors present the hybrid feature extraction framework,

which combines the strengths of both statistical and struc-

tural feature extractors [50]. Thanks to the combination of

features describing both local and global properties of

characters, thus providing a wide range of recognition clues.

Novel pattern descriptor uniquely describes character

shapes with relatively large intra-class and inter-class varia-

tions. Next, this study indicates that Gabor phase embodies

good discriminating power, if it is appropriately exploited.

Authors also experimentally reveal [50] that proposed

GPXNP method based on local Gabor patterns work rea-

sonably well under relatively complex testing scenarios. The

third feature, CDPDF refers to the probability distribution of

handwritten characters that captures peculiarity of writers

and trends in writing. This help to provide stability of rec-

ognition results for variety of handwriting samples from

different writers. Notion of self-matching is provided by

autocorrelation feature, fourth feature.

This work has been oriented towards investigation of

hybrid pattern recognition systems especially applied to

character recognition task that further could be deployed

for people with visual impairments. However, this inves-

tigation has provided the basis for further fruitful work in

the future. A series of points described next will comment

on the lines of research that could be pursued.

The performance of the proposed classifiers is compared

with existing back-propagation neural network (NN) clas-

sifier, conventional k–NN classifier ANFC (without fuzzy

hedges) and weighted k–NN with novel Mean v2 distance

measure. This comparison is shown in Table 5.

It is seen from Table 5, that weighted k–NN [42] out-

performs ANFC and ANFC-FH, as far as recognition

efficiency is concerned. The reason lies in application of

different distance measures for all features. The range of

extracted feature values of varies widely; therefore, clas-

sifiers will not work properly without normalization. For

example, this classifier handles broad range of feature

values; therefore the results are not up to the mark. The

solution to this problem is to normalize the range of all

features so that each feature contributes significantly to

classification.

The results further show that ANFC results are better

than a back-propagation NN classifier. This shows that

proposed ANFC is successful than the NN. When the

complexity of the classification problem and the number of

samples of data sets are increased, the difference in the

training times between the ANFC and the NN is increased

in favor of the ANFC. It implies that the ANFC is faster

than the NN algorithm considering the training times.

However the recognition efficiency is not that satisfying.

The reason is that, majority of characters are similar/

identical in shape and thus resulting into overlapped clas-

ses. This drawback of ANFC is eliminated to some extent

by ANFC-FH.

The ANFC rules are improved with fuzzy hedges, and

the meanings of these rules are also extended that are less

known concept in fuzzy systems. In this way, the effects of

features on classification are demonstrated. The FHs are

not represented as stable meaning words due to variability.

But it can easily be said that if the fuzzy hedge value of

fuzzy set in any feature for any class is close to 1, this

feature is important for that class.

On the other hand, if the fuzzy hedge value of any fuzzy

set in any feature for any class is near to 0, it means that

this feature is not important for that class. The usage of

fuzzy hedges in ANFC improves the classification

accuracy.

9 Conclusion

This study presents a classifier that has potential of rea-

sonable generalization capabilities. Authors use fuzzy

hedges that are applied to the fuzzy sets of rules, and are

adapted by SCG algorithm.

Table 5 Comparison of recognition efficiency of existing classifiers

Sr. no. Classification method Recognition g in percent

1 k–NN 16.09

2 Neural network 24.38

3 ANFC 55.67

4 ANFC (FH) 58.78

5 ANFC using feature selection 68.67

6 Weighted k–NN 86.33

Int. J. Mach. Learn. & Cyber.

123

The system creates a feature selection and a rejection

criterion by using power values of features. It emphasizes

some distinctive features and damps irrelevant features

with power values. Feature selection employed with fuzzy

hedges, increases the overall recognition efficiency.

Fuzzy classification rules are improved with fuzzy

hedges, and the meanings of these rules are also extended

that are less known concept in fuzzy systems. Tables 3 and

4 demonstrate the effect of feature selection on

classification.

The fuzzy modifier has been used for the classification

problem, and how they affect the fuzzy classification rules

is shown. Experimental results show that when the fuzzy

hedge value of the fuzzy classification set in any feature is

close to 1, this feature is relevant for that class, otherwise it

may be irrelevant.

Feature selection algorithm considerably decreases the

number of features for classification problems. This char-

acteristic of the method is satisfied to simplify the complex

problem. It is shown by the experimental studies that the

algorithm is successful to select the relevant features, and it

can also eliminate the irrelevant features and thereby

improving classification results.

10 Scope for future research

Gujarati is a language that has a rich cultural heritage. This

combined with the relatively higher literacy rate of people

and significant requirements of visually impaired people

make the problem of OCR and handwriting recognition

relevant, and solutions immediately useful.

The problem of Gujarati OCR and handwriting recog-

nition is very challenging, and authors attempt at under-

standing the challenges and exploring possible solutions to

these problems. A large number of issues still remain to be

solved and active research in this area is required to take

this potential problem to useful levels, when product using

the solution would become available to common man.

References

1. Jang JSR, Sun CT, Mizutani E (1997) Neuro-fuzzy and soft

computing. Prentice Hall, Upper Saddle River

2. Joshi A, Ramakrishman N, Houstis EN, Rice JR (1997) On

neurobiological, neuro-fuzzy, machine learning and statistical

pattern recognition techniques. IEEE Trans Neural Netw

8(1):18–31

3. Lin CT, Yeh CM, Liang SF, Chung JF, Kumar N (2006) Support

vector based fuzzy neural network for pattern classification. IEEE

Trans Fuzzy Syst 14(1):31–41

4. Mitra S, De RK, Pal SK (1997) Knowledge-based fuzzy MLP for

classification and rule generation. IEEE Trans Neural Netw

8(6):1338–1350

5. Nozaki K, Ishibuchi H, Tanaka H (1996) Adaptive fuzzy rule-

based classification systems. IEEE Trans Fuzzy Syst

4(3):238–250

6. Simpson PK (1992) Fuzzy min–max neural networks classifica-

tion. IEEE Trans Neural Netw 3(5):776–786

7. Zadeh LA (1972) A fuzzy set theoretic interpretation of linguistic

hedges. J Cybernet 2(3):4–34

8. Jang JSR (1993) ANFIS: adaptive network based fuzzy inference

systems. IEEE Trans Syst Man Cybernet 23(3):665–685

9. Juang CF, Lin CT (1998) An online self-constructing neural

fuzzy inference network and its applications. IEEE Trans Fuzzy

Syst 6(1):12–32

10. Kasabov NK (2001) Evolving fuzzy neural networks for super-

vised/unsupervised online knowledge-based learning. IEEE Trans

Syst Man Cybernet Part B 31(6):902–918

11. Kasabov NK, Song Q (2002) DENFIS: dynamic evolving neural-

fuzzy inference system and its application for time-series pre-

diction. IEEE Trans Fuzzy Syst 10(2):144–154

12. Nauck D, Kruse R (1999) Neuro-fuzzy systems for function

approximation. Fuzzy Sets Syst 101(2):261–271

13. Chatterjee A, Siarry P (2007) A PSO-aided neuro-fuzzy classifier

employing linguistic hedge concepts. Expert Syst Appl

33(4):1097–1109

14. Marin-Blazquez JG, Shen Q (2002) From approximative to

descriptive fuzzy classifiers. IEEE Trans Fuzzy Syst 10(4):484–497

15. Nauck D (2003) Fuzzy data analysis with NEFCLASS. Int J

Approx Reason 32(2–3):103–130

16. Zadeh LA (1996) Fuzzy logic = computing with words. IEEE

Trans Fuzzy Syst 4(2):103–111

17. Zadeh LA (1999) From computing with numbers to computing

with words—from manipulation of measurements to manipula-

tion of perceptions. IEEE Trans Circ Syst I: Fundam Theory Appl

45(1):105–119

18. De Cock M, Kerre EE (2004) Fuzzy modifiers based on fuzzy

relations. Inf Sci 160:173–199

19. Huynh VN, Ho TB, Nakamori Y (2002) A parametric represen-

tation of linguistic hedges in Zadeh’s fuzzy logic. Int J Approx

Reason 30:203–223

20. Rubin SH (1999) Computing with words. IEEE Trans Syst Man

Cybernet Part B 29(4):518–524

21. Turksn IB (2004) A foundation for CWW: meta-linguistic axi-

oms. In: IEEE fuzzy information, processing NAFIPS’04,

pp 395–400

22. Casillas J, Cordon O, Del Jesus MJ, Herrera F (2005) Genetic

tuning of fuzzy rule deep structures preserving interpretability

and its interaction with fuzzy rule set reduction. IEEE Trans

Fuzzy Syst 13(1):13–29

23. Ho NC, Wechler W (1992) Extended hedge algebras and their

application to fuzzy logic. Fuzzy Sets Syst 52(3):259–281

24. Novak V (1996) A horizon shifting model of linguistic hedges for

approximate reasoning. In: Proceedings of the fifth IEEE inter-

national conference on fuzzy systems, pp 423–427

25. Huang CY, Chen CY, Liu BD (1999) Current-mode fuzzy lin-

guistic hedge circuits. Analog Integr Circ Sig Process 19:255–278

26. Zadeh LA (1975) The concept of a linguistic variable and its

application to approximate reasoning, parts 1, 2 and 3. Inf Sci

8–9:199–249 (pp 301–357, 43–80)

27. Banks W (1994) Mixing crisp and fuzzy logic in applications. In:

WESCON’94 idea microelectronics Conference record, Ana-

heim, CA, pp 94–97

28. Bouchon-Meunier B (1992) Linguistic hedges and fuzzy logic.

In: Proceedings of the first IEEE international conference on

fuzzy systems, San Diego, CA, pp 247–254

29. Liu BD, Chen CY, Tsao JY (2001) Design of adaptive fuzzy logic

controller based on linguistic-hedge concepts and genetic algo-

rithms. IEEE Trans Syst Man Cybernet Part B 31(1):32–53

Int. J. Mach. Learn. & Cyber.

123

30. Chakraborty D, Pal NR (2004) A neuro-fuzzy scheme for

simultaneous feature selection and fuzzy rule-based classifica-

tion. IEEE Trans Neural Netw 15(1):110–123

31. Rutkowski L, Cpalka K (2003) Flexible neuro-fuzzy systems.

IEEE Trans Neural Netw 14(3):554–574

32. Shilton A, Lai DTH (2007) Iterative fuzzy support vector

machine classification. IEEE international fuzzy systems con-

ference, pp 1–6

33. Dholakia J, Negi A, Rama Mohan S (2005) Zone identification in

the printed Gujarati text. ICDAR, pp 272–276

34. Maloo M, Kale KV (2011) Support vector machine based

Gujarati numeral recognition. Int J Computer Sci Eng (IJCSE)

3(7):2595–2600

35. Maloo M, Kale KV (2011) Gujarati script recognition: a review.

Int J Computer Sci Eng (IJCSE) 4(1):1694–1814

36. Shah SK, Sharma A (2006) Design and implementation of optical

character recognition system to recognize Gujarati script using

template matching. IE(I) J ET 86:44–49

37. Antani S, Lalitha A (1999) Gujarati character recognition. IC-

DAR, pp 418–421

38. Shannon CE (1938) A symbolic analysis of relay and switching

circuits. Trans Am Inst Electr Eng 57:713–723

39. Prasad J, Kulkarni U (2009) Offline handwritten character rec-

ognition of Gujrati script using pattern matching. In: Proceedings

of IEEE ASID 2009, pp 611–615

40. Prasad J, Kulkarni U (2011) Statistical feature extraction and

recognition of isolated handwritten Gujrati characters. In: CiiT

Int J Digital Image Process 3(19). ISSN 0975-9691

(pii:DIP122011008)

41. Prasad J, Kulkarni U (2014) Gujarati character recognition using

adaptive neuro fuzzy classifier with fuzzy hedges. In: The pro-

ceedings of ICESC-2014, at RKNEC Nagpur, January 2014

(press)

42. Prasad J, Kulkarni U (2013) Gujarati character recognition using

weighted k–NN with mean v2 distance measure. Int J Mach Learn

Cybernet. ISSN 1868-8071. doi:10.1007/s13042-013-0187-z

43. Tsang ECC, Yeung DS, Wang XZ (2003) OFFSS: optimal fuzzy-

valued feature subset selection. IEEE Trans Fuzzy Syst

11(2):202–213

44. Desai A (2010) Gujarati handwritten numeral optical character

reorganization through neural network. In: Pattern recognition,

vol 43, issue 7. Elsevier Science Inc., New York, pp 2582–2589

45. Cetisli B (2010) Development of an adaptive neuro-fuzzy clas-

sifier using linguistic hedges: part 1. J Expert Syst Appl

37:6093–6101

46. Cord A, Ambroise C, Cocquerez JP (2006) Feature selection in

robust clustering based on Laplace mixture. Pattern Recogn Lett

27:627–635

47. Kwak N, Choi CCH (2002) Input feature selection for classifi-

cation problems. IEEE Trans Neural Netw 13(1):143–159

48. Sindhwani V, Rakshit S (2004) Feature selection in MLPs and

SVMs based on maximum output information. IEEE Trans

Neural Netw 15(4):937–948

49. Jensen R, Shen Q (2007) Fuzzy-rough sets assisted attribute

selection. IEEE Trans Fuzzy Syst 15(1):73–89

50. Lee HM, Chen CM, Chen JM, Jou YL (2001) An efficient fuzzy

classifier with feature selection based on fuzzy entropy. IEEE

Trans Syst Man Cybernet Part B 31(3):426–432

51. Møller M (1993) A scaled conjugate gradient algorithm for fast

supervised learning. Neural Netw 6(4):525–533. doi:10.1016/

S0893-6080(05)80056-5

52. Cetisli B (2010) The effect of linguistic hedges on feature

selection: part 2. Expert Syst Appl 37:6102–6108

53. Liu H et al (2005) Evolving feature selection. IEEE Intell Syst

20:64–76

54. Sankar KP, Rajat KD, Basak J (2000) Unsupervised feature

evaluation: a neuro-fuzzy approach. IEEE Trans Neural Netw

11(2):366–376

55. Uncu O, Turks IB (2007) A novel feature selection approach:

combining feature wrappers and filters. Inf Sci 177:449–466

Int. J. Mach. Learn. & Cyber.

123