a design of granular fuzzy classifier

1

3

4

5

6 Q1

78 Q29 Q3

1011

1 3

14151617181920

2 1

35

36

37

38

39

40 Q5

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

Q4

Expert Systems with Applications xxx (2014) xxx–xxx

ESWA 9309 No. of Pages 10, Model 5G

12 May 2014

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

A design of granular fuzzy classifier

http://dx.doi.org/10.1016/j.eswa.2014.04.0400957-4174/� 2014 Published by Elsevier Ltd.

⇑ Corresponding author. Tel.: +82 63 850 6344; fax: +82 63 853 2196.E-mail addresses: [email protected] (S.-B. Roh), [email protected]

(W. Pedrycz), [email protected] (T.-C. Ahn).

Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy classifier. Expert Systems with Applications (2014), http://dx.doi.org/1j.eswa.2014.04.040

Seok-Beom Roh a, Witold Pedrycz b,c, Tae-Chon Ahn a,⇑a Department of Electronics Convergence Engineering, Wonkwang University, 344-2, Shinyong-Dong, Iksan, Jeonbuk 570-749, South Koreab Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2G7, Canadac Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

a r t i c l e i n f o a b s t r a c t

2223242526

Keywords:Granular fuzzy classifierPattern classifierInformation granulesWeighting scheme

27282930313233

In this paper, we propose a new design methodology of granular fuzzy classifiers based on a concept ofinformation granularity and information granules. The classifier uses the mechanism of information gran-ulation with the aid of which the entire input space is split into a collection of subspaces. When designingthe proposed fuzzy classifier, these information granules are constructed in a way they are made reflec-tive of the geometry of patterns belonging to individual classes. Although the elements involved in thegenerated information granules (clusters) seem to be homogeneous with respect to the distribution ofpatterns in the input (feature) space, they still could exhibit a significant level of heterogeneity whenit comes to the class distribution within the individual clusters. To build an efficient classifier, weimprove the class homogeneity of the originally constructed information granules (by adjusting the pro-totypes of the clusters) and use a weighting scheme as an aggregation mechanism.

� 2014 Published by Elsevier Ltd.

34

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

1. Introduction

Classification is a method of supervised learning producing amapping from a feature space onto classes encountered in the clas-sification problem. Classification problems are encountered in var-ious domains including medicine (Sun, Zhang, & Zhang, 2007),economics (Zhang, Lu, & Li, 2010), and fault detection (Guo, Jack,& Nandi, 2005), etc. In order to improve classification performance,a large number of methods have been developed. There are variousimportant categories of classification techniques including statisti-cal techniques, neural networks, and rule based classification tech-niques (Pajares, Guijarro, & Ribeiro, 2010).

In statistical techniques, we encounter various techniques suchas weighted voting scheme (Jahromi & Taheri, 2008), naive Bayesapproach (Tao, Li, Zhu, & Li, 2012), least square and logistic regres-sion (Pendharkar, 2012), and nearest neighbor classification(Chosh, 2012). Most ‘‘conventional’’ statistical classification tech-niques are based on the Bayesian decision theory where the classlabel of a given pattern is decided based on the posteriori probabil-ity. This aspect of the statistical classification approach results in acertain drawback: if assumptions are not met, the efficiency ofthese classification techniques could be negatively impacted(Wu, Lin, & Lee, 2011).

81

82

83

84

There are numerous research activities in neural classification.In light of the existing developments, neural networks form a prom-ising alternative to various conventional classification methods(Wu et al., 2011). Neural networks have been applied to the variousfields such as pattern recognition, modeling, and prediction.

Although various types of neural networks classifiers haveshown very good classification performance, there are several dif-ficulties when using neural network classifiers.

There are a large number of parameters (connections) to beestimated in neural networks classifiers (Oh, Kim, Pedrycz, &Park, 2011). Neural networks are ‘‘black boxes’’ that lack interpret-ability (Wu et al., 2011).

Radial basis function neural networks (RBF NNs) came as asound design alternative when it comes to the reduction of thenumber of parameters to be adjusted. RBF NNs exhibit someadvantages including global optimal approximation and classifica-tion capabilities, and a rapid convergence of the learning process(Wu et al., 2011). Although RBF NNs exhibit powerful capabilitieswith respect to their classification performance and learning speed,they do not offer interpretability aspects. On the other hand, it iswell known that fuzzy logic can handle uncertainty and vagueness(Ganji & Abadeh, 2011). Subsequently the use of ‘‘if-then’’ rulesimproves the interpretability of the results and provides a betterinsight into the structure of the classifier (Alcala-Fdez, Alcala, &Herrera, 2011; Chacon-Murguia, Nevarez-Santana, & Sandoval-Rodriguez, 2009; Ishibuchi & Yamamoto, 2005; Juang & Chen,2012) and decision making process (Sh, Eberhart, & Chen, 1999).

0.1016/

http://dx.doi.org/10.1016/j.eswa.2014.04.040

mailto:[email protected]




http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa



85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153154

156156

157

158

159

160161

163163

164

165

166

167168

170170

171

172

173174

176176

177

2 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx


12 May 2014

These observations lead to the emergence of hybrid architecturesbringing together learning capabilities of neural networks andinterpretability of fuzzy systems giving rise to neurofuzzy architec-tures (Nandedkar & Biswas, 2009).

In this study, we propose a new design methodology for fuzzyclassifiers. The proposed design method is based on informationgranulation originally introduced by Zadeh (1997). More specifi-cally, information granulation is a process, which decomposes auniverse of discourse into some regions of high homogeneity(Liu, Xiong, & Fang, 2011).

Lin studied granular computing and neighborhood systems,mainly focusing on a granular computing model which includedthe binary relation, the granular structure, the granule’s represen-tation, and the applications in granular computing (Lin,1988,2000,2005a,2005b). Yao introduced rough sets to granularcomputing, and discussed data mining methods, rule extractionmethods and machine learning methods based on granular com-puting in Yao (2001). Bargiela and Pedrycz (2003) established thefundamentals of Granular Computing. The most recent advance-ments along with a comprehensive treatment of the subject areaare presented in the literature (Pedrycz, 2013).

We develop information granules on a basis of given numericdata (patterns) by exploiting two approaches such as fuzzy cluster-ing and a supervised optimization algorithm. The Fuzzy C-Means(FCM) clustering algorithm is used to form information granulesin an unsupervised mode. In a supervised mode, Particle SwarmOptimization (PSO) searches for an optimal distribution of proto-types over the feature space.

After forming the information granules, we determine a distri-bution of patterns belonging to each class and allocated to a givencluster. This provides information about the heterogeneity of theclusters, which in sequel is used to determine class assignmentof a pattern to be classified.

The paper is structured as follows. First, in Section 2, we intro-duce the new granular classifier. In Section 3, we present experi-mental results. Conclusions are covered in Section 4.

178

179

180

181

182183

185185

186

187

188

189

190

191

Fig. 1. Example 2-dimensional patterns and their prototypes.

2. Design of a granular of fuzzy classifier

Information granulation is defined as a process, which parti-tions a universe into several regions (Song, Yang, Soh, & Wang,2010). Information granules are built through information granula-tion for the given patterns expressed in some feature space. Infor-mation granulation can be done from different viewpoints.

Information granules being reflective of the geometry of pat-terns can be realized by running a certain clustering algorithm.In our study, we are concerned with the Fuzzy C-Means (FCM).As noted earlier, the granulation process is realized in unsuper-vised mode (via clustering) and subsequently the clustering resultsare improved by adjusting the position of the prototypes alreadyformed by the FCM method.

In this study, we adhere to the standard notation. A finite set ofdata (patterns) is denoted by X ¼ fx1;x2; . . . ;xNg, where xi 2 Rn.

2.1. Information granulation realized in unsupervised mode and itsrefinement in supervised mode

The given patterns are clustered by the FCM method into ‘‘c’’clusters. The method generates ‘‘c’’ prototypes v1;v2; . . . ;vcg (cen-ters of the clusters) and a partition matrix whose rows are mem-bership functions of successive fuzzy sets.

2.1.1. The supervised mode for information granulationClustering algorithm is the assignment of a set of observations

into clusters so that data located in the same cluster are similar

Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040

in a certain sense. The FCM clustering algorithm can be succinctlydescribed as follows. The FCM clustering method creates a collec-tion of information granules in the form of fuzzy sets (Song et al.,2010).

To elaborate on the essence of the method, let us consider a setof patterns X ¼ fx1;x2; . . . ;xNg, xk 2 Rm (where m stands for thedimensionality of the input space).

The objective function used in FCM clustering is defined asfollows

J ¼Xc

i¼1

XN

k¼1

ðuikÞq � kxk � vik2 s:t:Xc

i¼1

uik ¼ 1 ð1Þ

where uik is the membership degree of the kth pattern to the ithcluster, vi is the center of the ith cluster and c is the number ofclusters.

The optimization problem is expressed in the form

minU;v

J subject toXc

i¼1

uik ¼ 1 ð2Þ

The clustering procedure minimizes the objective function (2)through the two update formulas iteratively. The update formulassuch as (3) and (4) successively modify the partition matrix and thelocation of prototypes.

RiðxkÞ ¼ uik ¼1Pc

j¼1kxk�vikkxk�vjk

� �2=ðq�1Þ ð3Þ

where RiðxkÞ denotes the membership function of the fuzzy set Ri.The center of the ith cluster is determined to minimize the

objective function (1) as follows.

vi ¼PN

k¼1ðuikÞq � xkPNk¼1ðuikÞq

ð4Þ

As an illustration, Fig. 1 shows a location of the prototypesgenerated by the FCM for a synthetic two-dimensional data.

Fig. 2 shows the activation levels (membership functions)describing fuzzy clusters.

For the ith cluster, we determine the data belonging to thiscluster according to the following expression

Xi ¼ fxkjRiðxkÞ ¼maxj

RjðxkÞg; k ¼ 1;2; . . . ;N ð5Þ

where N is the number of patterns.The local areas described by (5) are depicted in Fig. 3.

2.1.2. The supervised mode in the refinement of informationgranulation

In the unsupervised mode, the prototypes of clusters aredetermined by the clustering method where we investigate the

assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/



192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209210212212

213

214

215

216

217

218

219220

222222

223

224

225

226

227

228

229

230

231

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Fig. 2. Contour plot of membership functions of the clusters.

Cluster 3

Cluster 1

Cluster 4

Cluster 2

Class 1Class 2

Fig. 3. The local areas formed for the clusters.

Cluster 3

Class 1Class 2

Cluster 1

Cluster 4

Cluster 2

29%

71%64%

72%

28%

37%

63%

36%

Fig. 5. Fraction distribution of patterns located in each cluster.

S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx 3


12 May 2014

distribution of the patterns without considering information aboutclass labels, viz not looking at the classes the patterns belong to.

In the supervised mode, the locations of the prototypesobtained so far are refined by running particle swarm optimization(PSO). The prototypes are adjusted so that the classification perfor-mance is maximized. In this section, we briefly elaborate on theessence of the PSO. The method is a bio-inspired optimization tech-nique. The algorithm mimics the social behavior of a flock of birds.The underlying principle comes from a population-based search inwhich the individuals in the solution space (representing possiblesolutions) carry out a collective search by exchanging their individ-ual findings while taking into consideration their own local

FuzzificationCoefficient Position of the 1st Prototype

1.6 10.56 97.61 1.45

Fig. 4. The structure of the particl


experience and evaluating their own previous performance. ThePSO method is becoming popular due to its simplicity of imple-mentation and its ability to quickly converge to an optimalsolution.

The position of each particle in the solution space is updated asfollows

piðkþ 1Þ ¼ piðkÞ þ v iðkþ 1Þ ð6Þ

where piðkÞ and v iðkÞ represent the position and velocity of the ithparticle at the kth step, respectively.

The velocity vector reflects both the own experimental knowl-edge of the particle and the social information exchanged withother particles, which constitute a society.

The velocity of each particle at the sampling instant (k + 1) iscalculated in the form

v iðkþ1Þ ¼wðkÞv iðkÞþ c1r1ðpbestiðkÞ�piðkÞÞþ c2r2ðgbestðkÞ�piðkÞÞð7Þ

where, r1 and r2 are random numbers from a uniform distribution in[0,1], and c1 and c2 are positive coefficients referred to as a cogni-tive and social parameter, respectively.

As PSO is an iterative search strategy, we iterate until there is nosubstantial improvement of the fitness function or we haveexceeded the number of iterations allowed in this search. In thispaper, the fitness function is defined as (19).

In general, the algorithm can be outlined as the sequence ofsteps:

Position of the pth Prototype

8.31 103.7 2.10

e for information granulation.




232

233

234

235

236

237

238

239

240241

243243

244

245

246

247

248

249

250

251

252253255255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275



12 May 2014

Plej.e

Step 1: Randomly generate ‘‘N’’ particles pi, and their velocitiesv i. Each particle in the initial swarm (population) is evaluatedusing the objective (fitness) function. For each particle, setpbesti ¼ pi and search the best particle of pbest. Set the best par-ticle associated with the global best, gbest.Step 2: Adjust the inertia weight w. Typically, its valuesdecrease linearly over the time of search. We start with its max-imal value, say wmax ¼ 0:9 and reduce it to wmin ¼ 0:4 at the endof the iterative process,

wðkÞ ¼ wmax �wmax �wmin

itermax� k ð8Þ

where itermax denotes the maximum number of iterations of thesearch and ‘‘k’’ stands for the current index of the iteration.Step 3: Given the current values of gbest and pbesti, the velocityof the ith particle is adjusted by (7). If required, we clip the val-ues making sure that they are positioned within the requiredsearch region.Step 4: Based on the updated velocities, each particle changesits position using the expression (6). Furthermore, we keepthe particle within the boundaries of the search space, that is

pmin6 pi 6 pmax ð9Þ

276

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

212

Cluster 1 Cluster 2

Cluster 3 Cluster 4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(a) Crisp class fraction matrix [1 0; 0 1; 1 0; 0 1]

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

212

Cluster 1 Cluster 2

Cluster 3 Cluster 4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(c) Fuzzy class fraction matrix [0.8 0.2; 0.3 0.7; 0.1 0.9; 0.6 0.4]

Fig. 6. Boundary surface of the proposed

ase cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clswa.2014.04.040

Step 5: Move the particles in the search space and evaluate theirfitness both in terms of gbest and pbesti.Step 6: Repeat Steps 2–5 until the termination criterion hasbeen met. Otherwise return gbest as the solution being found.

We adopt the PSO algorithm to optimize information granules(i.e., the position of each prototype in the feature space) and opti-mize the fuzzification coefficient of the FCM method whichimpacts the shape of a fuzzy membership function. The member-ship function of the cluster depends on the position of prototypesand the fuzzification coefficient.

Fig. 4 shows the structure of a particle used in the PSOalgorithm used for information granulation.

2.2. Analysis of information granules

As shown in Fig. 3, each cluster defined by a clustering approach(in this paper, we use FCM to define several clusters) comes as amixture of patterns belonging to different classes. A center pointof a cluster may be considered a representative of that cluster froma geometrical viewpoint. We think that the center points definedby a clustering method can be considered as code books of proto-type based classifiers such as k-Nearest Neighbor (kNN) and Learn-

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

12

Cluster 1 Cluster 2

Cluster 3 Cluster 4

(b) Fuzzy class fraction matrix [0.9 0.1; 0.3 0.7; 0.2 0.8; 0.7 0.3]

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

12

Cluster 1 Cluster 2

Cluster 3 Cluster 4

(d) Fuzzy class fraction matrix [0.9 0.1; 0.7 0.3; 0.2 0.8; 0.7 0.3]

classifier with class fraction matrix.




277

278

279

280

281

282

283

284

285

286287

289289

290

291

292

293

294295

297297

298

300300

301

302

303

304

305

306

307

-3 -2 -1 0 1 2 3-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.512

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 8. prototypes and the contour plots of the membership functions.

-1

-0.5

0

0.5

1

1.5

2

2.512

Cluster 1 Cluster 2



12 May 2014

ing Vector Quantization (LVQ). In general, a code book of the pro-totype based classifiers has the only one class label in which it isinvolved. However, in the case of the prototype determined bythe unsupervised clustering method, a prototype is involved in sev-eral classes.

Therefore, in order to understand the characteristics of the pro-totypes, we calculate the contribution of each class to the givenprototype.

The fraction of patterns in ith cluster Xi belonging to lth class(l = 1, 2, . . ., p, ‘‘p’’ is the number of classes) is defined in the form

ail ¼R� countðfxk 2 Xi; mxk 2 HlgÞ

R� countðXiÞð10Þ

where, Hl = {xk is belonging to the class l}.In (10) R� countðXiÞ is defined as (11-a) or (11-b). Zadeh’s

sigma-count is defined as (11-b) which is the generalized versionof the classical cardinality of a set Xi. The classical cardinality ofa set Xi is defined as (11-a).

Type a : R� countðXiÞ ¼X

x

IXiðxÞ; IXi

ðxÞ ¼l; if x 2 Xi

0; if x R Xi

�

ð11-aÞ

Type b : R� countðXiÞ ¼Xx2Xi

RiðxÞ ð11-bÞ

where RiðxÞ Ri(x) is the membership function of the ith cluster.In this paper, we use the two types of R� countðXiÞ described

in (11) to calculate the fraction of patterns in ith cluster Xi.Fig. 5 shows the fraction of data patterns located in each cluster.As shown in Fig. 5, the clusters are not homogeneous: the infor-

mation granules are heterogeneous from the viewpoint of classlabels.

308

309

310311313313

Table 1Selected numeric values of the parameters of the proposed technique.

Parameter Value

FCM parameters Number of rules in each class(r) 2, 4, 6, 8, 10, 15Fuzzification coefficient (p) In the range of 1.2–3.0

varying with step of 0.2

PSO parameters Swarm size 70Maximum Number ofGenerations

50

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

312

Fig. 7. Two-class synthetic dataset of a mixture of Gaussian distributions.

-3 -2 -1 0 1 2 3-2.5

-2

-1.5 Cluster 3

Cluster 4

Fig. 9. The local area implied by the prototype.

Cluster 1 Cluster 2 Cluster 3 Cluster 40

0.2

0.4

0.6

0.8

1

0.8663

0.20280.1403

0.7972

0.1337

0.79720.8597

0.2028

Class 1Class 2

Fig. 10. The distribution of classes in each cluster.


The overall feature space is split into several local areas throughextracting prototypes and they are used to complete classification.

The class fraction in the ith cluster is a vector forming the form

ai ¼ ½ai1 ai2 . . . aip �; ai2 2 ½0; 1 � ð12Þ




314

316316

317

318319

321321

322

323

324

325

326

327328



12 May 2014

Xp

k¼1

aik ¼ 1 ð13Þ

The class fraction should satisfy the condition (13).The class fraction matrix is defined as (14).

Class Fraction Matrix : M ¼a1

..

.

ac

2664

3775 ¼

a11 . . . a1p

..

. . .. ..

.

ac1 . . . acp

2664

3775 ð14Þ

330330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

Table 2Machine Learning datasets used in the experiments.

Datasets Number offeatures

Number of patterns(data)

Number ofClasses

Australian 42 690 2Balance 4 625 3Diabetes 8 768 2German 24 1000 2Glass 9 214 6Hayes 5 132 3Ionosphere 34 351 2Iris 4 150 3Liver 6 345 2Sonar 60 208 2Thyroid 5 215 3Vehicle 18 846 4Wine 13 178 3Zoo 16 101 7

2 4 6 8 10 150

10

20

30

40

50

60

70

No. of Clusters

Cla

ssifi

catio

n E

rror

Rat

e

1234567891011121314

(a) Training Data with Unsupervised Mode

2 4 6 8 10 150

5

10

15

20

25

30

35

40

45

50

No. of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(c) Training Data with Supervised Mode

Fig. 11. Classification Error Rate of the proposed classi


2.3. Aggregation of the information granules

In the above section, we analyze the input space by using a clus-tering method. After that, we calculate the class fraction distribu-tion of data patterns over each sub-space (i.e., cluster) using (11).For example, let’s assume that we get the class fraction matrix like(15).

Class Fraction Matrix : M ¼0:75 0:250:6 0:40:2 0:8

264

375 ð15Þ

In the second cluster, the class fraction distribution is likea2 ¼ ½0:6 0:4�; this class fraction distribution means that the possi-bility that a data pattern in the second cluster is involved in class 1is 0.6 and the possibility related to class 2 is 0.4. In other words, theelement aij of the class fraction matrix says that how much possi-bly a data pattern in the ith cluster is involved in the jth class. Theclass fraction matrix says the locally defined information related toeach cluster. However, we are interested in the possibility that adata pattern can be involved in each class over the whole inputspace not the local input space. Therefore, we should aggregatethe local information extracted from the class distribution matrix.

When a new pattern x is provided, we determine activationlevels of the predefined clusters. The activation levels u ¼½u1 u2 . . . uc � are calculated as (3). Where u1 ¼ R1ðxÞ;u2 ¼R2ðxÞ; . . . ; uc ¼ RcðxÞ.

2 4 6 8 10 150

10

20

30

40

50

60

70

No. of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(b) Test Data with Unsupervised Mode

2 4 6 8 10 150

10

20

30

40

50

60

No. of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(d) Test Data with Supervised Mode

fier (Type a) according to the increase of clusters.




346

347

348

349350

352352

353354356356

357

358359361361

362

363

364365

367367

368

369

370

371

372

373

374

375

376

377

378

379

380381

383383

384

385

Q6



12 May 2014

The possibility that a new data pattern is involved in a classover the whole input space can be calculated by the linear combi-nation of the element of the class distribution matrix with activa-tion levels u as (16).

u ¼ u �M ¼Xc

i¼1

ui � ai1

Xc

i¼1

ui � ai2 . . .Xc

i¼1

ui � aip

" #ð16Þ

The overall results are described in a vector form as shown below

u ¼ u1 u2 . . . up� �

ð17Þ

Ideally, if x 2 class l, one would like to have a result coming as(18).

u ¼ ½0 � � � 0 10 � � � 0� ð18Þ

The overall performance index is then a sum of distancebetween the estimation vector u and vectors of classesu� ¼ ½0 � � � 1 0 0 � � � 0� defined as

Q ¼PN

k¼1kuðxkÞ � u�ðxkÞk2

Nð19Þ

Table 3Classification performance (error rate%) of the proposed classifier.

Data set Type A

Unsupervised mode S

Australian c 8 4q 1.2 NError Rate(%) 14.49 1

Balance c 10 1q 1.8 NError Rate(%) 14.24 9

Diabetes q 6 1p 1.4 NError Rate(%) 29.82 2

German q 15 6p 1.2 NError Rate(%) 29.70 2

Glass q 15 1p 1.6 NError Rate(%) 37.38 3

Hayes q 45 1p 1.2 NError Rate(%) 40.82 3

Ionosphere q 10 1p 1.2 NError Rate(%) 14.20 1

Iris q 15 1p 1.6 NError Rate(%) 4.00 5

Liver q 10 8p 1.2 NError Rate(%) 38.87 3

Sonar q 15 1p 1.2 NError Rate(%) 29.79 2

Thyroid q 8 8p 1.2 NError Rate(%) 7.94 2

Vehicle q 15 1p 1.2 NError Rate(%) 44.08 3

Wine q 84 8p 1.2 NError Rate(%) 2.22 2

Zoo q 15 1p 1.2 NError Rate(%) 8.00 7


So far, the overall performance index (19) expresses the perfor-mance of the classifier produced is an unsupervised mode.

Next, we have to optimize the position of the prototypes insupervised mode.

Here, we minimize the performance index Q by moving theoriginally computed prototypes v1;v2; . . . ;vc . As stated, this opti-mization procedure is realized with the aid of the PSO.

2.4. The variation of a boundary surface by affection of class fractionmatrix

As above mentioned, the generic prototypes of the prototypebased classifiers have the only one class label. For generic proto-type based classifiers, the class label of a prototype can bedescribed in a vector form as (20).

ai ¼ ½ai1 ai2 . . . aip�; aik 2 f0; 1 g;Xp

k¼1

aik ¼ 1 ð20Þ

The Eq. (20) is the crisp version of the class fraction defined as(12).

Type B

upervised mode Unsupervised mode Supervised mode

8 10/A 1.2 N/A4.20 14.64 14.35

5 15 15/A 1.8 N/A.12 13.13 8.96

0 15 8/A 1.4 N/A3.84 29.56 23.95

2 10/A 1.2 N/A7.40 30.0 28.10

5 15 10/A 1.6 N/A5.13 37.36 35.04

5 45 6/A 1.4 N/A4.07 27.97 34.18

5 10 10/A 1.2 N/A2.77 11.61 13.08

5 10 6/A 1.4 2.0.33 4.00 3.33

10 10/A 1.2 N/A1.31 39.71 26.97

0 15 15/A 1.2 N/A4.60 27.86 24.52

15 15/A 1.2 N/A.79 6.88 4.11

0 15 8/A 1.2 N/A6.64 43.60 31.92

10 15/A 1.4 N/A.29 1.70 2.84

5 15 8/A 1.2 N/A.00 8.00 7.91




386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418



12 May 2014

The difference between the crisp version of the class fractionvector and the fuzzy version of the class fraction vector is depictedin Fig. 6.

Fig. 6 shows the boundary surface of prototype based classifierswith different class fraction matrices.

As shown Fig. 6, the boundary surface of the proposed classifieris very flexible. For the conventional prototype based classifier, thevariation of location of prototypes can change the boundary sur-face. For the proposed classifier, the boundary surface can be chan-ged by the varying the position of prototypes or the class fractionmatrix.

419

420

421

422

424424

425

426

427

428

429

430

431

432

433

434

3. Experimental studies

In this section, we report on a number of experiments that helpevaluate the classification performance of the proposed granularfuzzy classifier.

In the experiments, we use a series of synthetic datasets andseveral machine learning datasets (http://www.ics.uci.edu/~mlearn/MLRepository.html). In the assessment of the perfor-mance of the classifiers, we use the error rate of the classifier.The K-fold cross-validation has been applied to evaluate the classi-fication result. The whole data set is divided into K blocks usingK � 1 blocks as a training set, and the remaining block as test set.Here, the number of blocks is set to10.

We consider the values of the parameters reported in Table 1.The choice of these particular numeric values has been motivated

2 4 6 8 10 150

10

20

30

40

50

60

70

No .of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(a) Training Datawith Unsupervised Mode

2 4 6 8 10 150

10

20

30

40

50

60

No. of Clusters

Cla

ssifi

catio

n E

rror

Rat

e

1234567891011121314

(c) Training Data with Supervised Mode

Fig. 12. Classification Error Rate of the proposed classi


by the need to come up with a possibility to investigate the perfor-mance of the model in a fairly comprehensive range of scenarios.

4. Synthetic datasets

The two-dimensional synthetic examples are convenient forillustrating the design procedure of the fuzzy classifier.

We use some normally distributed data set composed of sub-groups described by their mean vector m and covariance matricesU as shown in Fig. 7. There are 2 sub-groups. Each class is com-posed of two clusters.

The mean vectors mi and covariance matrix (being the same forall groups) are the following.

m1 ¼�10

� �; m2 ¼

10

� �; U ¼

0:5 0:00:0 0:5

� �

Each sub-group consists of 200 data.We show the position of prototypes determined with the use of

the FCM in Fig. 8.The local areas related to the prototype are depicted in Fig. 9. In

this figure, the local area is defined by (5).After determining the local areas, we calculate the fraction of

classes included in each cluster. Fig. 10 shows the values of thesefractions. For clusters 1 and 2, there are much more patternsbelonging to class 1 than other classes whereas the patternsbelonging to class 2 are predominantly visible in clusters 2 and 3.

2 4 6 8 10 150

10

20

30

40

50

60

70

No. of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(b) Test Datawith Unsupervised Mode

2 4 6 8 10 150

10

20

30

40

50

60

No. of Clusters

Cla

ssifi

catio

n Er

ror

Rat

e

1234567891011121314

(d) Test Data with Supervised Mode

fier (Type b) according to the increase of clusters.


http://www.ics.uci.edu/~mlearn/MLRepository.html

http://www.ics.uci.edu/~mlearn/MLRepository.html



435

436

437438

440440

441

442443

445445

446

447

448

450450

451



12 May 2014

From the fraction of classes and the apex defined in each clus-ter, we can build up a fuzzy classifier based on the granulation asfollows.

if x is C1 then p1 ¼ 0:8663; p2 ¼ 0:1337

if x is C2 then p1 ¼ 0:2028; p2 ¼ 0:7972

if x is C3 then p1 ¼ 0:1403; p2 ¼ 0:8597

if x is C4 then p1 ¼ 0:7972; p2 ¼ 0:2028

ð21Þ

When the new pattern xnew is given, the class label of the newlygiven data pattern is determined by calculating as (22).

452

Table 4Results of comparative analysis in terms of the number of prototypes and theclassification Performance.

Data set kNN Proposedclassifier

K = 1 K = 2 K = 3

Australian N 621 4Error Rate(%) 20.0 16.38 15.22 14.20

Balance N 562.5 15Error Rate(%) 20.97 13.77 12.16 8.96

Diabetes N 691.2 10Error Rate(%) 29.82 27.35 26.82 23.84

German N 900 6Error Rate(%) 32.20 27.90 27.60 27.40

Glass N 192.6 10Error Rate(%) 29.5 28.05 32.21 35.04

Hayes N 118.8 14Error Rate(%) 29.67 48.57 66.21 27.97

Ionosphere N 315.9 10Error Rate(%) 13.67 13.40 15.10 11.61

Iris N 135 6Error Rate(%) 4.67 4.67 4.67 3.33

Liver N 310.5 10Error Rate(%) 37.08 38.27 40.90 26.97

Sonar N 187.2 15Error Rate(%) 13.43 13.98 15.38 24.52

Thyroid N 193.5 8Error Rate(%) 2.79 6.54 6.06 2.79

Vehicle N 761.4 8Error Rate(%) 30.14 28.49 27.31 31.92

Wine N 160.2 10Error Rate(%) 5.03 5.03 4.48 1.70

Zoo N 90.9 15Error Rate(%) 2.91 7.82 4.91 7.00

N means the number of prototypes.

Table 5Results of comparative analysis (the best results shown in boldface).

Datasets Proposed classifier PFARSWu et al., 2011 CCPYao, 2001 P

Australian 14.20(1.91) 13.9 N/A 1Balance 8.96(1.53) 33.1 N/A 1Diabetes 23.84(4.69) 24.7 N/A 2German 27.40(2.84) 30.0 N/A 2Glass 35.04(6.67) N/A 28.51 3Hayes 27.97(10.62) N/A N/A 2Ionosphere 11.61 N/A N/A 8Iris 3.33(3.51) 4.0 N/A 5Liver 26.97(6.91) 32.4 N/A 3Sonar 24.52(9.40) N/A 22.6 2Thyroid 2.79 N/A N/A 6Vehicle 31.92(5.96) N/A 30.5 2Wine 1.70(3.78) 4.0 N/A NZoo 7.0 N/A N/A 7


y ¼1; if u�1 > u�22; if u�1 < u�2

�ð22Þ

where, u� ¼ ½u�1 u�2�, and R1ðxnewÞ;R2ðxnewÞ;R3ðxnewÞ, and R4ðxnewÞ arethe activation levels of information granules and

a ¼

0:8663 0:13370:2028 079720:1403 0:85970:7972 0:2028

26664

37775

The misclassification rate of the fuzzy classifier described by(20) is 8.5 and the performance index defined as (19) is 0.1842.

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

5. Machine learning datasets

In what follows, we report on several experiments when usingmachine learning data sets coming from the Machine LearningRepository (http://www.ics.uci.edu/~mlearn/ MLRepository.ht ml).

Table 2 summarizes the pertinent details of the data such as thenumber of features and the number of patterns.

Table 3 summarizes the classification performance (error rate%)of the proposed classifier developed in the unsupervised mode andafter in the supervised mode.

From Table 3, we observe that the proposed classifier con-structed in the supervised mode is superior to that produced inthe unsupervised mode; this happens for most of the data. In caseof the Ionosphere data, and Wine data, the classifier with informa-tion granules in unsupervised mode is superior. This phenomenoncomes from the fact that the proposed fuzzy classifier with theoptimization algorithm is over fitted to the training data set.

Fig. 11 shows the change of the classification error rate of theproposed classifier (Type a) according to the increase of the num-ber of clusters.

Fig. 12 depicts the change of the classification error rate of theproposed classifier (Type b) according to the increase of the num-ber of clusters.

From Figs. 11 and 12, we can see that for training data set themisclassification rate seems to decrease.

The proposed classifier can be considered a sort of prototypebased classifiers. The nearest neighbor classifier is the representa-tive classifier of prototype based classifiers. However, the draw-backs of the nearest neighbor classifier are that this classifierneeds a big memory in order to store all the training samplesand there is huge computational burden to calculate the distancebetween the all training data samples and the test sample(Souza, Rittner, & Lotufo, 2014).

ART (WEKA) Bayes networks (WEKA) SMO (WEKA) RBFNN (WEKA)

5.55 22.14 15.12 20.456.83 9.47 12.43 13.816.55 24.25 23.2 25.969.46 24.84 24.91 26.421.25 50.55 42.64 35.082.03 39.34 44.07 25.99.25 10.54 11.40 7.09.8 4.47 3.73 4.04.75 45.11 42.02 34.942.6 32.29 23.4 27.38.08 5.61 10.26 3.257.79 55.32 25.92 34.64/A N/A N/A N/A.82 3.0 6.91 5.82


http://www.ics.uci.edu/~mlearn/%20MLRepository.ht%20ml



485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520521522523524525526527528529

530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589

590



12 May 2014

Table 4 shows the comparison of the proposed classifier with k-nearest neighbor classifier in terms of classification performanceand the number of prototypes.

Table 5 contrasts the values of the classification error of the pro-posed classifier with the errors produced by some other methods(Bargiela & Pedrycz, 2003; Pedrycz, 2013). These classifiers suchas PART, Bayes networks, SMO, and RBFNN were experimentedwithin the framework of WEKA. To confirm the improvement ofthe performance, we use the t-test at 95% confidence level.

6. Conclusion

In this paper, we have introduced a new granular fuzzy classi-fier in which information granulation plays a pivotal role. Theinformation granulation to design information granules is realizedin an unsupervised mode and a supervised mode. In the unsuper-vised mode, the FCM clustering algorithm is used to constructinformation granules. In the supervised mode, the informationgranules are further refined and this adjustment is completed withthe use of a PSO algorithm. After forming and optimizing the infor-mation granules, we aggregate information granules.

By looking at the classification performance, it is not surprisingthat the supervised mode of design produces better results thanthe classifier constructed in the unsupervised mode. It is of interestthough, to investigate the quantitative differences present thereand an impact of the level of granularity (the number of clusters).

In contrast with the generic prototypes of the conventional pro-totype based classifier that have the crisp class labels, the proposedprototype with the fuzzy class labels is studied. For the genericprototype based classifiers, in order to improve the classificationperformance it is studied how to define the position of the proto-types. In this paper, we improved the classification performanceof the prototype based classifier by assigning the fuzzy class labelsto each predefined prototype. At the next research step, we willanalyze a cluster geometrically and define sub-prototypes in alocal space to improve the classification performance.

References

Alcala-Fdez, J., Alcala, R., & Herrera, F. (2011). A fuzzy association rule-basedclassification model for high-dimensional problems with genetic rule selectionand lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5), 857–872.

Bargiela, A., & Pedrycz, W. (2003). Granular computing: An introduction. Dordrecht:Kluwer Academic Publishers.

Chacon-Murguia, M. I., Nevarez-Santana, J. I., & Sandoval-Rodriguez, R. (2009).Multiblob cosmetic defect description/classification using a fuzzy hierarchicalclassifier. IEEE Transactions on Industrial Electronics, 56(4), 1292–1299.

Chosh, A. K. (2012). A probabilistic approach for semi-supervised nearest neighborclassification. Pattern Recognition Letters, 33, 1127–1133.


Ganji, M. F., & Abadeh, M. S. (2011). A fuzzy classification system based on antcolony optimization for diabetes disease diagnosis. Expert Systems withApplications, 38, 14650–14659.

Guo, H., Jack, L. B., & Nandi, A. K. (2005). Feature generation using geneticprogramming with application to fault classification. IEEE Transactions onSystems, Man, and Cybernetics, Part B, 35(1), 89–99.

Ishibuchi, H., & Yamamoto, T. (2005). Rule weight specification in fuzzy rule-based classification systems. IEEE Transactions on Fuzzy Systems, 13(4),428–435.

Jahromi, M. Z., & Taheri, M. (2008). A proposed method for learning rule weights infuzzy rule-based classification systems. Fuzzy Sets and Systems, 159, 449–459.

Juang, C.-F., & Chen, G.-C. (2012). A TS fuzzy system learned through a supportvector machine in principal component space for real-time object detection.IEEE Transactions on Industrial Electronics, 59(8), 3309–3320.

Lin, T. Y. (1988). Neighborhood systems and relational database. In Proceedings ofthe ACM Sixteenth Annual Conference on Computer Science, New York, February23–25, (pp. 725).

Lin, T. Y. (2000). Data mining and machine oriented modeling: a granularcomputing approach. Journal of Applied Intelligence, 13(2), 113–124.

Lin, T. Y. (2005a). Granular computing rough set perspective. The Newsletter of theIEEE Computational Intelligence Society, 4, 1543–4281.

Lin, T. Y. (2005). Granular computing: a problem solving paradigm. In Proceedings ofthe IEEE International Conference on Fuzzy Systems, Reno, Nevada, USA, May 22–25, (pp. 132–137).

Liu, H., Xiong, S., & Fang, Z. (2011). FL-GrCCA: A granular computing algorithmbased on fuzzy lattices. Computers and Mathematics with Applications, 61,138–147.

Nandedkar, A. V., & Biswas, P. K. (2009). A granular reflex fuzzy min-max neuralnetwork for classification. IEEE Transactions on Neural Networks, 20(7),1117–1134.

Oh, S.-K., Kim, W.-D., Pedrycz, W., & Park, B.-J. (2011). Polynomial-based radial basisfunction neural networks (P-RBF NNs) realized with the aid of particle swarmoptimization. Fuzzy Sets and Systems, 163, 54–77.

Pajares, G., Guijarro, M., & Ribeiro, A. (2010). A hopfield neural network forcombining classifiers applied to textured images. Neural Networks, 23, 144–153.

Pedrycz, W. (2013). Analysis and design of intelligent systems: A framework of granularcomputing. Boca Raton, Fl: CRC Press.

Pendharkar, P. (2012). Fuzzy classification using the data envelopment analysis.Knowledge-Based Systems, 31, 183–192.

Sh, Y., Eberhart, R., & Chen, Y. (1999). Implementation of evolutionary fuzzysystems. IEEE Transactions on Fuzzy Systems, 7(2), 109–119.

Song, Q., Yang, X., Soh, Y. C., & Wang, Z. M. (2010). An information-theoretic fuzzy C-spherical shells clustering algorithm. Fuzzy Sets and System, 161, 1755–1773.

Souza, R., Rittner, L., & Lotufo, R. (2014). A comparison between k-optimum pathforest and k-nearest neighbors supervised classifiers. Pattern Recognition Letters,39, 2–10.

Sun, S., Zhang, C., & Zhang, D. (2007). An experimental evaluation of ensemblemethods for EEG signal classification. Pattern Recognition Letters, 28, 2157–2163.

Tao, J., Li, Q., Zhu, C., & Li, J. (2012). A hierarchical naive Bayesian network classifierembedded GMM for textural Image. International Journal of Applied EarthObservation and Geo information, 14, 139–148.

Wu, C.-F., Lin, C.-J., & Lee, C.-Y. (2011). A functional neural fuzzy network forclassification applications. Expert Systems with Applications, 38, 6202–6208.

Yao, Y. Y. (2001). Information granulation and rough set approximation.International Journal of Intelligent Systems, 16(1), 87–104.

Zadeh, L. A. (1997). Toward a theory of fuzzy information granulation and itscentrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 90,111–127.

Zhang, Y., Lu, Z., & Li, J. (2010). Fabric defect classification using radial basis functionnetwork. Pattern Recognition Letters, 31, 2033–2042.




a design of granular fuzzy classifier

Documents