a design of granular fuzzy classifier
TRANSCRIPT
1
3
4
5
6 Q1
78 Q29 Q3
1011
1 3
14151617181920
2 1
35
36
37
38
39
40 Q5
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Q4
Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
Contents lists available at ScienceDirect
Expert Systems with Applications
journal homepage: www.elsevier .com/locate /eswa
A design of granular fuzzy classifier
http://dx.doi.org/10.1016/j.eswa.2014.04.0400957-4174/� 2014 Published by Elsevier Ltd.
⇑ Corresponding author. Tel.: +82 63 850 6344; fax: +82 63 853 2196.E-mail addresses: [email protected] (S.-B. Roh), [email protected]
(W. Pedrycz), [email protected] (T.-C. Ahn).
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy classifier. Expert Systems with Applications (2014), http://dx.doi.org/1j.eswa.2014.04.040
Seok-Beom Roh a, Witold Pedrycz b,c, Tae-Chon Ahn a,⇑a Department of Electronics Convergence Engineering, Wonkwang University, 344-2, Shinyong-Dong, Iksan, Jeonbuk 570-749, South Koreab Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2G7, Canadac Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
a r t i c l e i n f o a b s t r a c t
2223242526
Keywords:Granular fuzzy classifierPattern classifierInformation granulesWeighting scheme
27282930313233
In this paper, we propose a new design methodology of granular fuzzy classifiers based on a concept ofinformation granularity and information granules. The classifier uses the mechanism of information gran-ulation with the aid of which the entire input space is split into a collection of subspaces. When designingthe proposed fuzzy classifier, these information granules are constructed in a way they are made reflec-tive of the geometry of patterns belonging to individual classes. Although the elements involved in thegenerated information granules (clusters) seem to be homogeneous with respect to the distribution ofpatterns in the input (feature) space, they still could exhibit a significant level of heterogeneity whenit comes to the class distribution within the individual clusters. To build an efficient classifier, weimprove the class homogeneity of the originally constructed information granules (by adjusting the pro-totypes of the clusters) and use a weighting scheme as an aggregation mechanism.
� 2014 Published by Elsevier Ltd.
34
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
1. Introduction
Classification is a method of supervised learning producing amapping from a feature space onto classes encountered in the clas-sification problem. Classification problems are encountered in var-ious domains including medicine (Sun, Zhang, & Zhang, 2007),economics (Zhang, Lu, & Li, 2010), and fault detection (Guo, Jack,& Nandi, 2005), etc. In order to improve classification performance,a large number of methods have been developed. There are variousimportant categories of classification techniques including statisti-cal techniques, neural networks, and rule based classification tech-niques (Pajares, Guijarro, & Ribeiro, 2010).
In statistical techniques, we encounter various techniques suchas weighted voting scheme (Jahromi & Taheri, 2008), naive Bayesapproach (Tao, Li, Zhu, & Li, 2012), least square and logistic regres-sion (Pendharkar, 2012), and nearest neighbor classification(Chosh, 2012). Most ‘‘conventional’’ statistical classification tech-niques are based on the Bayesian decision theory where the classlabel of a given pattern is decided based on the posteriori probabil-ity. This aspect of the statistical classification approach results in acertain drawback: if assumptions are not met, the efficiency ofthese classification techniques could be negatively impacted(Wu, Lin, & Lee, 2011).
81
82
83
84
There are numerous research activities in neural classification.In light of the existing developments, neural networks form a prom-ising alternative to various conventional classification methods(Wu et al., 2011). Neural networks have been applied to the variousfields such as pattern recognition, modeling, and prediction.
Although various types of neural networks classifiers haveshown very good classification performance, there are several dif-ficulties when using neural network classifiers.
There are a large number of parameters (connections) to beestimated in neural networks classifiers (Oh, Kim, Pedrycz, &Park, 2011). Neural networks are ‘‘black boxes’’ that lack interpret-ability (Wu et al., 2011).
Radial basis function neural networks (RBF NNs) came as asound design alternative when it comes to the reduction of thenumber of parameters to be adjusted. RBF NNs exhibit someadvantages including global optimal approximation and classifica-tion capabilities, and a rapid convergence of the learning process(Wu et al., 2011). Although RBF NNs exhibit powerful capabilitieswith respect to their classification performance and learning speed,they do not offer interpretability aspects. On the other hand, it iswell known that fuzzy logic can handle uncertainty and vagueness(Ganji & Abadeh, 2011). Subsequently the use of ‘‘if-then’’ rulesimproves the interpretability of the results and provides a betterinsight into the structure of the classifier (Alcala-Fdez, Alcala, &Herrera, 2011; Chacon-Murguia, Nevarez-Santana, & Sandoval-Rodriguez, 2009; Ishibuchi & Yamamoto, 2005; Juang & Chen,2012) and decision making process (Sh, Eberhart, & Chen, 1999).
0.1016/
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153154
156156
157
158
159
160161
163163
164
165
166
167168
170170
171
172
173174
176176
177
2 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
These observations lead to the emergence of hybrid architecturesbringing together learning capabilities of neural networks andinterpretability of fuzzy systems giving rise to neurofuzzy architec-tures (Nandedkar & Biswas, 2009).
In this study, we propose a new design methodology for fuzzyclassifiers. The proposed design method is based on informationgranulation originally introduced by Zadeh (1997). More specifi-cally, information granulation is a process, which decomposes auniverse of discourse into some regions of high homogeneity(Liu, Xiong, & Fang, 2011).
Lin studied granular computing and neighborhood systems,mainly focusing on a granular computing model which includedthe binary relation, the granular structure, the granule’s represen-tation, and the applications in granular computing (Lin,1988,2000,2005a,2005b). Yao introduced rough sets to granularcomputing, and discussed data mining methods, rule extractionmethods and machine learning methods based on granular com-puting in Yao (2001). Bargiela and Pedrycz (2003) established thefundamentals of Granular Computing. The most recent advance-ments along with a comprehensive treatment of the subject areaare presented in the literature (Pedrycz, 2013).
We develop information granules on a basis of given numericdata (patterns) by exploiting two approaches such as fuzzy cluster-ing and a supervised optimization algorithm. The Fuzzy C-Means(FCM) clustering algorithm is used to form information granulesin an unsupervised mode. In a supervised mode, Particle SwarmOptimization (PSO) searches for an optimal distribution of proto-types over the feature space.
After forming the information granules, we determine a distri-bution of patterns belonging to each class and allocated to a givencluster. This provides information about the heterogeneity of theclusters, which in sequel is used to determine class assignmentof a pattern to be classified.
The paper is structured as follows. First, in Section 2, we intro-duce the new granular classifier. In Section 3, we present experi-mental results. Conclusions are covered in Section 4.
178
179
180
181
182183
185185
186
187
188
189
190
191
Fig. 1. Example 2-dimensional patterns and their prototypes.
2. Design of a granular of fuzzy classifier
Information granulation is defined as a process, which parti-tions a universe into several regions (Song, Yang, Soh, & Wang,2010). Information granules are built through information granula-tion for the given patterns expressed in some feature space. Infor-mation granulation can be done from different viewpoints.
Information granules being reflective of the geometry of pat-terns can be realized by running a certain clustering algorithm.In our study, we are concerned with the Fuzzy C-Means (FCM).As noted earlier, the granulation process is realized in unsuper-vised mode (via clustering) and subsequently the clustering resultsare improved by adjusting the position of the prototypes alreadyformed by the FCM method.
In this study, we adhere to the standard notation. A finite set ofdata (patterns) is denoted by X ¼ fx1;x2; . . . ;xNg, where xi 2 Rn.
2.1. Information granulation realized in unsupervised mode and itsrefinement in supervised mode
The given patterns are clustered by the FCM method into ‘‘c’’clusters. The method generates ‘‘c’’ prototypes v1;v2; . . . ;vcg (cen-ters of the clusters) and a partition matrix whose rows are mem-bership functions of successive fuzzy sets.
2.1.1. The supervised mode for information granulationClustering algorithm is the assignment of a set of observations
into clusters so that data located in the same cluster are similar
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
in a certain sense. The FCM clustering algorithm can be succinctlydescribed as follows. The FCM clustering method creates a collec-tion of information granules in the form of fuzzy sets (Song et al.,2010).
To elaborate on the essence of the method, let us consider a setof patterns X ¼ fx1;x2; . . . ;xNg, xk 2 Rm (where m stands for thedimensionality of the input space).
The objective function used in FCM clustering is defined asfollows
J ¼Xc
i¼1
XN
k¼1
ðuikÞq � kxk � vik2 s:t:Xc
i¼1
uik ¼ 1 ð1Þ
where uik is the membership degree of the kth pattern to the ithcluster, vi is the center of the ith cluster and c is the number ofclusters.
The optimization problem is expressed in the form
minU;v
J subject toXc
i¼1
uik ¼ 1 ð2Þ
The clustering procedure minimizes the objective function (2)through the two update formulas iteratively. The update formulassuch as (3) and (4) successively modify the partition matrix and thelocation of prototypes.
RiðxkÞ ¼ uik ¼1Pc
j¼1kxk�vikkxk�vjk
� �2=ðq�1Þ ð3Þ
where RiðxkÞ denotes the membership function of the fuzzy set Ri.The center of the ith cluster is determined to minimize the
objective function (1) as follows.
vi ¼PN
k¼1ðuikÞq � xkPNk¼1ðuikÞq
ð4Þ
As an illustration, Fig. 1 shows a location of the prototypesgenerated by the FCM for a synthetic two-dimensional data.
Fig. 2 shows the activation levels (membership functions)describing fuzzy clusters.
For the ith cluster, we determine the data belonging to thiscluster according to the following expression
Xi ¼ fxkjRiðxkÞ ¼maxj
RjðxkÞg; k ¼ 1;2; . . . ;N ð5Þ
where N is the number of patterns.The local areas described by (5) are depicted in Fig. 3.
2.1.2. The supervised mode in the refinement of informationgranulation
In the unsupervised mode, the prototypes of clusters aredetermined by the clustering method where we investigate the
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209210212212
213
214
215
216
217
218
219220
222222
223
224
225
226
227
228
229
230
231
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Fig. 2. Contour plot of membership functions of the clusters.
Cluster 3
Cluster 1
Cluster 4
Cluster 2
Class 1Class 2
Fig. 3. The local areas formed for the clusters.
Cluster 3
Class 1Class 2
Cluster 1
Cluster 4
Cluster 2
29%
71%64%
72%
28%
37%
63%
36%
Fig. 5. Fraction distribution of patterns located in each cluster.
S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx 3
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
distribution of the patterns without considering information aboutclass labels, viz not looking at the classes the patterns belong to.
In the supervised mode, the locations of the prototypesobtained so far are refined by running particle swarm optimization(PSO). The prototypes are adjusted so that the classification perfor-mance is maximized. In this section, we briefly elaborate on theessence of the PSO. The method is a bio-inspired optimization tech-nique. The algorithm mimics the social behavior of a flock of birds.The underlying principle comes from a population-based search inwhich the individuals in the solution space (representing possiblesolutions) carry out a collective search by exchanging their individ-ual findings while taking into consideration their own local
FuzzificationCoefficient Position of the 1st Prototype
1.6 10.56 97.61 1.45
Fig. 4. The structure of the particl
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
experience and evaluating their own previous performance. ThePSO method is becoming popular due to its simplicity of imple-mentation and its ability to quickly converge to an optimalsolution.
The position of each particle in the solution space is updated asfollows
piðkþ 1Þ ¼ piðkÞ þ v iðkþ 1Þ ð6Þ
where piðkÞ and v iðkÞ represent the position and velocity of the ithparticle at the kth step, respectively.
The velocity vector reflects both the own experimental knowl-edge of the particle and the social information exchanged withother particles, which constitute a society.
The velocity of each particle at the sampling instant (k + 1) iscalculated in the form
v iðkþ1Þ ¼wðkÞv iðkÞþ c1r1ðpbestiðkÞ�piðkÞÞþ c2r2ðgbestðkÞ�piðkÞÞð7Þ
where, r1 and r2 are random numbers from a uniform distribution in[0,1], and c1 and c2 are positive coefficients referred to as a cogni-tive and social parameter, respectively.
As PSO is an iterative search strategy, we iterate until there is nosubstantial improvement of the fitness function or we haveexceeded the number of iterations allowed in this search. In thispaper, the fitness function is defined as (19).
In general, the algorithm can be outlined as the sequence ofsteps:
Position of the pth Prototype
8.31 103.7 2.10
e for information granulation.
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
232
233
234
235
236
237
238
239
240241
243243
244
245
246
247
248
249
250
251
252253255255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
4 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
Plej.e
Step 1: Randomly generate ‘‘N’’ particles pi, and their velocitiesv i. Each particle in the initial swarm (population) is evaluatedusing the objective (fitness) function. For each particle, setpbesti ¼ pi and search the best particle of pbest. Set the best par-ticle associated with the global best, gbest.Step 2: Adjust the inertia weight w. Typically, its valuesdecrease linearly over the time of search. We start with its max-imal value, say wmax ¼ 0:9 and reduce it to wmin ¼ 0:4 at the endof the iterative process,
wðkÞ ¼ wmax �wmax �wmin
itermax� k ð8Þ
where itermax denotes the maximum number of iterations of thesearch and ‘‘k’’ stands for the current index of the iteration.Step 3: Given the current values of gbest and pbesti, the velocityof the ith particle is adjusted by (7). If required, we clip the val-ues making sure that they are positioned within the requiredsearch region.Step 4: Based on the updated velocities, each particle changesits position using the expression (6). Furthermore, we keepthe particle within the boundaries of the search space, that is
pmin6 pi 6 pmax ð9Þ
276
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
212
Cluster 1 Cluster 2
Cluster 3 Cluster 4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
(a) Crisp class fraction matrix [1 0; 0 1; 1 0; 0 1]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
212
Cluster 1 Cluster 2
Cluster 3 Cluster 4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
(c) Fuzzy class fraction matrix [0.8 0.2; 0.3 0.7; 0.1 0.9; 0.6 0.4]
Fig. 6. Boundary surface of the proposed
ase cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clswa.2014.04.040
Step 5: Move the particles in the search space and evaluate theirfitness both in terms of gbest and pbesti.Step 6: Repeat Steps 2–5 until the termination criterion hasbeen met. Otherwise return gbest as the solution being found.
We adopt the PSO algorithm to optimize information granules(i.e., the position of each prototype in the feature space) and opti-mize the fuzzification coefficient of the FCM method whichimpacts the shape of a fuzzy membership function. The member-ship function of the cluster depends on the position of prototypesand the fuzzification coefficient.
Fig. 4 shows the structure of a particle used in the PSOalgorithm used for information granulation.
2.2. Analysis of information granules
As shown in Fig. 3, each cluster defined by a clustering approach(in this paper, we use FCM to define several clusters) comes as amixture of patterns belonging to different classes. A center pointof a cluster may be considered a representative of that cluster froma geometrical viewpoint. We think that the center points definedby a clustering method can be considered as code books of proto-type based classifiers such as k-Nearest Neighbor (kNN) and Learn-
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
12
Cluster 1 Cluster 2
Cluster 3 Cluster 4
(b) Fuzzy class fraction matrix [0.9 0.1; 0.3 0.7; 0.2 0.8; 0.7 0.3]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
12
Cluster 1 Cluster 2
Cluster 3 Cluster 4
(d) Fuzzy class fraction matrix [0.9 0.1; 0.7 0.3; 0.2 0.8; 0.7 0.3]
classifier with class fraction matrix.
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
277
278
279
280
281
282
283
284
285
286287
289289
290
291
292
293
294295
297297
298
300300
301
302
303
304
305
306
307
-3 -2 -1 0 1 2 3-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.512
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fig. 8. prototypes and the contour plots of the membership functions.
-1
-0.5
0
0.5
1
1.5
2
2.512
Cluster 1 Cluster 2
S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx 5
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
ing Vector Quantization (LVQ). In general, a code book of the pro-totype based classifiers has the only one class label in which it isinvolved. However, in the case of the prototype determined bythe unsupervised clustering method, a prototype is involved in sev-eral classes.
Therefore, in order to understand the characteristics of the pro-totypes, we calculate the contribution of each class to the givenprototype.
The fraction of patterns in ith cluster Xi belonging to lth class(l = 1, 2, . . ., p, ‘‘p’’ is the number of classes) is defined in the form
ail ¼R� countðfxk 2 Xi; mxk 2 HlgÞ
R� countðXiÞð10Þ
where, Hl = {xk is belonging to the class l}.In (10) R� countðXiÞ is defined as (11-a) or (11-b). Zadeh’s
sigma-count is defined as (11-b) which is the generalized versionof the classical cardinality of a set Xi. The classical cardinality ofa set Xi is defined as (11-a).
Type a : R� countðXiÞ ¼X
x
IXiðxÞ; IXi
ðxÞ ¼l; if x 2 Xi
0; if x R Xi
�
ð11-aÞ
Type b : R� countðXiÞ ¼Xx2Xi
RiðxÞ ð11-bÞ
where RiðxÞ Ri(x) is the membership function of the ith cluster.In this paper, we use the two types of R� countðXiÞ described
in (11) to calculate the fraction of patterns in ith cluster Xi.Fig. 5 shows the fraction of data patterns located in each cluster.As shown in Fig. 5, the clusters are not homogeneous: the infor-
mation granules are heterogeneous from the viewpoint of classlabels.
308
309
310311313313
Table 1Selected numeric values of the parameters of the proposed technique.
Parameter Value
FCM parameters Number of rules in each class(r) 2, 4, 6, 8, 10, 15Fuzzification coefficient (p) In the range of 1.2–3.0
varying with step of 0.2
PSO parameters Swarm size 70Maximum Number ofGenerations
50
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
312
Fig. 7. Two-class synthetic dataset of a mixture of Gaussian distributions.
-3 -2 -1 0 1 2 3-2.5
-2
-1.5 Cluster 3
Cluster 4
Fig. 9. The local area implied by the prototype.
Cluster 1 Cluster 2 Cluster 3 Cluster 40
0.2
0.4
0.6
0.8
1
0.8663
0.20280.1403
0.7972
0.1337
0.79720.8597
0.2028
Class 1Class 2
Fig. 10. The distribution of classes in each cluster.
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
The overall feature space is split into several local areas throughextracting prototypes and they are used to complete classification.
The class fraction in the ith cluster is a vector forming the form
ai ¼ ½ai1 ai2 . . . aip �; ai2 2 ½0; 1 � ð12Þ
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
314
316316
317
318319
321321
322
323
324
325
326
327328
6 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
Xp
k¼1
aik ¼ 1 ð13Þ
The class fraction should satisfy the condition (13).The class fraction matrix is defined as (14).
Class Fraction Matrix : M ¼a1
..
.
ac
2664
3775 ¼
a11 . . . a1p
..
. . .. ..
.
ac1 . . . acp
2664
3775 ð14Þ
330330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
Table 2Machine Learning datasets used in the experiments.
Datasets Number offeatures
Number of patterns(data)
Number ofClasses
Australian 42 690 2Balance 4 625 3Diabetes 8 768 2German 24 1000 2Glass 9 214 6Hayes 5 132 3Ionosphere 34 351 2Iris 4 150 3Liver 6 345 2Sonar 60 208 2Thyroid 5 215 3Vehicle 18 846 4Wine 13 178 3Zoo 16 101 7
2 4 6 8 10 150
10
20
30
40
50
60
70
No. of Clusters
Cla
ssifi
catio
n E
rror
Rat
e
1234567891011121314
(a) Training Data with Unsupervised Mode
2 4 6 8 10 150
5
10
15
20
25
30
35
40
45
50
No. of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(c) Training Data with Supervised Mode
Fig. 11. Classification Error Rate of the proposed classi
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
2.3. Aggregation of the information granules
In the above section, we analyze the input space by using a clus-tering method. After that, we calculate the class fraction distribu-tion of data patterns over each sub-space (i.e., cluster) using (11).For example, let’s assume that we get the class fraction matrix like(15).
Class Fraction Matrix : M ¼0:75 0:250:6 0:40:2 0:8
264
375 ð15Þ
In the second cluster, the class fraction distribution is likea2 ¼ ½0:6 0:4�; this class fraction distribution means that the possi-bility that a data pattern in the second cluster is involved in class 1is 0.6 and the possibility related to class 2 is 0.4. In other words, theelement aij of the class fraction matrix says that how much possi-bly a data pattern in the ith cluster is involved in the jth class. Theclass fraction matrix says the locally defined information related toeach cluster. However, we are interested in the possibility that adata pattern can be involved in each class over the whole inputspace not the local input space. Therefore, we should aggregatethe local information extracted from the class distribution matrix.
When a new pattern x is provided, we determine activationlevels of the predefined clusters. The activation levels u ¼½u1 u2 . . . uc � are calculated as (3). Where u1 ¼ R1ðxÞ;u2 ¼R2ðxÞ; . . . ; uc ¼ RcðxÞ.
2 4 6 8 10 150
10
20
30
40
50
60
70
No. of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(b) Test Data with Unsupervised Mode
2 4 6 8 10 150
10
20
30
40
50
60
No. of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(d) Test Data with Supervised Mode
fier (Type a) according to the increase of clusters.
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
346
347
348
349350
352352
353354356356
357
358359361361
362
363
364365
367367
368
369
370
371
372
373
374
375
376
377
378
379
380381
383383
384
385
Q6
S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx 7
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
The possibility that a new data pattern is involved in a classover the whole input space can be calculated by the linear combi-nation of the element of the class distribution matrix with activa-tion levels u as (16).
u ¼ u �M ¼Xc
i¼1
ui � ai1
Xc
i¼1
ui � ai2 . . .Xc
i¼1
ui � aip
" #ð16Þ
The overall results are described in a vector form as shown below
u ¼ u1 u2 . . . up� �
ð17Þ
Ideally, if x 2 class l, one would like to have a result coming as(18).
u ¼ ½0 � � � 0 10 � � � 0� ð18Þ
The overall performance index is then a sum of distancebetween the estimation vector u and vectors of classesu� ¼ ½0 � � � 1 0 0 � � � 0� defined as
Q ¼PN
k¼1kuðxkÞ � u�ðxkÞk2
Nð19Þ
Table 3Classification performance (error rate%) of the proposed classifier.
Data set Type A
Unsupervised mode S
Australian c 8 4q 1.2 NError Rate(%) 14.49 1
Balance c 10 1q 1.8 NError Rate(%) 14.24 9
Diabetes q 6 1p 1.4 NError Rate(%) 29.82 2
German q 15 6p 1.2 NError Rate(%) 29.70 2
Glass q 15 1p 1.6 NError Rate(%) 37.38 3
Hayes q 45 1p 1.2 NError Rate(%) 40.82 3
Ionosphere q 10 1p 1.2 NError Rate(%) 14.20 1
Iris q 15 1p 1.6 NError Rate(%) 4.00 5
Liver q 10 8p 1.2 NError Rate(%) 38.87 3
Sonar q 15 1p 1.2 NError Rate(%) 29.79 2
Thyroid q 8 8p 1.2 NError Rate(%) 7.94 2
Vehicle q 15 1p 1.2 NError Rate(%) 44.08 3
Wine q 84 8p 1.2 NError Rate(%) 2.22 2
Zoo q 15 1p 1.2 NError Rate(%) 8.00 7
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
So far, the overall performance index (19) expresses the perfor-mance of the classifier produced is an unsupervised mode.
Next, we have to optimize the position of the prototypes insupervised mode.
Here, we minimize the performance index Q by moving theoriginally computed prototypes v1;v2; . . . ;vc . As stated, this opti-mization procedure is realized with the aid of the PSO.
2.4. The variation of a boundary surface by affection of class fractionmatrix
As above mentioned, the generic prototypes of the prototypebased classifiers have the only one class label. For generic proto-type based classifiers, the class label of a prototype can bedescribed in a vector form as (20).
ai ¼ ½ai1 ai2 . . . aip�; aik 2 f0; 1 g;Xp
k¼1
aik ¼ 1 ð20Þ
The Eq. (20) is the crisp version of the class fraction defined as(12).
Type B
upervised mode Unsupervised mode Supervised mode
8 10/A 1.2 N/A4.20 14.64 14.35
5 15 15/A 1.8 N/A.12 13.13 8.96
0 15 8/A 1.4 N/A3.84 29.56 23.95
2 10/A 1.2 N/A7.40 30.0 28.10
5 15 10/A 1.6 N/A5.13 37.36 35.04
5 45 6/A 1.4 N/A4.07 27.97 34.18
5 10 10/A 1.2 N/A2.77 11.61 13.08
5 10 6/A 1.4 2.0.33 4.00 3.33
10 10/A 1.2 N/A1.31 39.71 26.97
0 15 15/A 1.2 N/A4.60 27.86 24.52
15 15/A 1.2 N/A.79 6.88 4.11
0 15 8/A 1.2 N/A6.64 43.60 31.92
10 15/A 1.4 N/A.29 1.70 2.84
5 15 8/A 1.2 N/A.00 8.00 7.91
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
8 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
The difference between the crisp version of the class fractionvector and the fuzzy version of the class fraction vector is depictedin Fig. 6.
Fig. 6 shows the boundary surface of prototype based classifierswith different class fraction matrices.
As shown Fig. 6, the boundary surface of the proposed classifieris very flexible. For the conventional prototype based classifier, thevariation of location of prototypes can change the boundary sur-face. For the proposed classifier, the boundary surface can be chan-ged by the varying the position of prototypes or the class fractionmatrix.
419
420
421
422
424424
425
426
427
428
429
430
431
432
433
434
3. Experimental studies
In this section, we report on a number of experiments that helpevaluate the classification performance of the proposed granularfuzzy classifier.
In the experiments, we use a series of synthetic datasets andseveral machine learning datasets (http://www.ics.uci.edu/~mlearn/MLRepository.html). In the assessment of the perfor-mance of the classifiers, we use the error rate of the classifier.The K-fold cross-validation has been applied to evaluate the classi-fication result. The whole data set is divided into K blocks usingK � 1 blocks as a training set, and the remaining block as test set.Here, the number of blocks is set to10.
We consider the values of the parameters reported in Table 1.The choice of these particular numeric values has been motivated
2 4 6 8 10 150
10
20
30
40
50
60
70
No .of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(a) Training Datawith Unsupervised Mode
2 4 6 8 10 150
10
20
30
40
50
60
No. of Clusters
Cla
ssifi
catio
n E
rror
Rat
e
1234567891011121314
(c) Training Data with Supervised Mode
Fig. 12. Classification Error Rate of the proposed classi
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
by the need to come up with a possibility to investigate the perfor-mance of the model in a fairly comprehensive range of scenarios.
4. Synthetic datasets
The two-dimensional synthetic examples are convenient forillustrating the design procedure of the fuzzy classifier.
We use some normally distributed data set composed of sub-groups described by their mean vector m and covariance matricesU as shown in Fig. 7. There are 2 sub-groups. Each class is com-posed of two clusters.
The mean vectors mi and covariance matrix (being the same forall groups) are the following.
m1 ¼�10
� �; m2 ¼
10
� �; U ¼
0:5 0:00:0 0:5
� �
Each sub-group consists of 200 data.We show the position of prototypes determined with the use of
the FCM in Fig. 8.The local areas related to the prototype are depicted in Fig. 9. In
this figure, the local area is defined by (5).After determining the local areas, we calculate the fraction of
classes included in each cluster. Fig. 10 shows the values of thesefractions. For clusters 1 and 2, there are much more patternsbelonging to class 1 than other classes whereas the patternsbelonging to class 2 are predominantly visible in clusters 2 and 3.
2 4 6 8 10 150
10
20
30
40
50
60
70
No. of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(b) Test Datawith Unsupervised Mode
2 4 6 8 10 150
10
20
30
40
50
60
No. of Clusters
Cla
ssifi
catio
n Er
ror
Rat
e
1234567891011121314
(d) Test Data with Supervised Mode
fier (Type b) according to the increase of clusters.
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
435
436
437438
440440
441
442443
445445
446
447
448
450450
451
S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx 9
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
From the fraction of classes and the apex defined in each clus-ter, we can build up a fuzzy classifier based on the granulation asfollows.
if x is C1 then p1 ¼ 0:8663; p2 ¼ 0:1337
if x is C2 then p1 ¼ 0:2028; p2 ¼ 0:7972
if x is C3 then p1 ¼ 0:1403; p2 ¼ 0:8597
if x is C4 then p1 ¼ 0:7972; p2 ¼ 0:2028
ð21Þ
When the new pattern xnew is given, the class label of the newlygiven data pattern is determined by calculating as (22).
452
Table 4Results of comparative analysis in terms of the number of prototypes and theclassification Performance.
Data set kNN Proposedclassifier
K = 1 K = 2 K = 3
Australian N 621 4Error Rate(%) 20.0 16.38 15.22 14.20
Balance N 562.5 15Error Rate(%) 20.97 13.77 12.16 8.96
Diabetes N 691.2 10Error Rate(%) 29.82 27.35 26.82 23.84
German N 900 6Error Rate(%) 32.20 27.90 27.60 27.40
Glass N 192.6 10Error Rate(%) 29.5 28.05 32.21 35.04
Hayes N 118.8 14Error Rate(%) 29.67 48.57 66.21 27.97
Ionosphere N 315.9 10Error Rate(%) 13.67 13.40 15.10 11.61
Iris N 135 6Error Rate(%) 4.67 4.67 4.67 3.33
Liver N 310.5 10Error Rate(%) 37.08 38.27 40.90 26.97
Sonar N 187.2 15Error Rate(%) 13.43 13.98 15.38 24.52
Thyroid N 193.5 8Error Rate(%) 2.79 6.54 6.06 2.79
Vehicle N 761.4 8Error Rate(%) 30.14 28.49 27.31 31.92
Wine N 160.2 10Error Rate(%) 5.03 5.03 4.48 1.70
Zoo N 90.9 15Error Rate(%) 2.91 7.82 4.91 7.00
N means the number of prototypes.
Table 5Results of comparative analysis (the best results shown in boldface).
Datasets Proposed classifier PFARSWu et al., 2011 CCPYao, 2001 P
Australian 14.20(1.91) 13.9 N/A 1Balance 8.96(1.53) 33.1 N/A 1Diabetes 23.84(4.69) 24.7 N/A 2German 27.40(2.84) 30.0 N/A 2Glass 35.04(6.67) N/A 28.51 3Hayes 27.97(10.62) N/A N/A 2Ionosphere 11.61 N/A N/A 8Iris 3.33(3.51) 4.0 N/A 5Liver 26.97(6.91) 32.4 N/A 3Sonar 24.52(9.40) N/A 22.6 2Thyroid 2.79 N/A N/A 6Vehicle 31.92(5.96) N/A 30.5 2Wine 1.70(3.78) 4.0 N/A NZoo 7.0 N/A N/A 7
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
y ¼1; if u�1 > u�22; if u�1 < u�2
�ð22Þ
where, u� ¼ ½u�1 u�2�, and R1ðxnewÞ;R2ðxnewÞ;R3ðxnewÞ, and R4ðxnewÞ arethe activation levels of information granules and
a ¼
0:8663 0:13370:2028 079720:1403 0:85970:7972 0:2028
26664
37775
The misclassification rate of the fuzzy classifier described by(20) is 8.5 and the performance index defined as (19) is 0.1842.
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
5. Machine learning datasets
In what follows, we report on several experiments when usingmachine learning data sets coming from the Machine LearningRepository (http://www.ics.uci.edu/~mlearn/ MLRepository.ht ml).
Table 2 summarizes the pertinent details of the data such as thenumber of features and the number of patterns.
Table 3 summarizes the classification performance (error rate%)of the proposed classifier developed in the unsupervised mode andafter in the supervised mode.
From Table 3, we observe that the proposed classifier con-structed in the supervised mode is superior to that produced inthe unsupervised mode; this happens for most of the data. In caseof the Ionosphere data, and Wine data, the classifier with informa-tion granules in unsupervised mode is superior. This phenomenoncomes from the fact that the proposed fuzzy classifier with theoptimization algorithm is over fitted to the training data set.
Fig. 11 shows the change of the classification error rate of theproposed classifier (Type a) according to the increase of the num-ber of clusters.
Fig. 12 depicts the change of the classification error rate of theproposed classifier (Type b) according to the increase of the num-ber of clusters.
From Figs. 11 and 12, we can see that for training data set themisclassification rate seems to decrease.
The proposed classifier can be considered a sort of prototypebased classifiers. The nearest neighbor classifier is the representa-tive classifier of prototype based classifiers. However, the draw-backs of the nearest neighbor classifier are that this classifierneeds a big memory in order to store all the training samplesand there is huge computational burden to calculate the distancebetween the all training data samples and the test sample(Souza, Rittner, & Lotufo, 2014).
ART (WEKA) Bayes networks (WEKA) SMO (WEKA) RBFNN (WEKA)
5.55 22.14 15.12 20.456.83 9.47 12.43 13.816.55 24.25 23.2 25.969.46 24.84 24.91 26.421.25 50.55 42.64 35.082.03 39.34 44.07 25.99.25 10.54 11.40 7.09.8 4.47 3.73 4.04.75 45.11 42.02 34.942.6 32.29 23.4 27.38.08 5.61 10.26 3.257.79 55.32 25.92 34.64/A N/A N/A N/A.82 3.0 6.91 5.82
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520521522523524525526527528529
530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589
590
10 S.-B. Roh et al. / Expert Systems with Applications xxx (2014) xxx–xxx
ESWA 9309 No. of Pages 10, Model 5G
12 May 2014
Table 4 shows the comparison of the proposed classifier with k-nearest neighbor classifier in terms of classification performanceand the number of prototypes.
Table 5 contrasts the values of the classification error of the pro-posed classifier with the errors produced by some other methods(Bargiela & Pedrycz, 2003; Pedrycz, 2013). These classifiers suchas PART, Bayes networks, SMO, and RBFNN were experimentedwithin the framework of WEKA. To confirm the improvement ofthe performance, we use the t-test at 95% confidence level.
6. Conclusion
In this paper, we have introduced a new granular fuzzy classi-fier in which information granulation plays a pivotal role. Theinformation granulation to design information granules is realizedin an unsupervised mode and a supervised mode. In the unsuper-vised mode, the FCM clustering algorithm is used to constructinformation granules. In the supervised mode, the informationgranules are further refined and this adjustment is completed withthe use of a PSO algorithm. After forming and optimizing the infor-mation granules, we aggregate information granules.
By looking at the classification performance, it is not surprisingthat the supervised mode of design produces better results thanthe classifier constructed in the unsupervised mode. It is of interestthough, to investigate the quantitative differences present thereand an impact of the level of granularity (the number of clusters).
In contrast with the generic prototypes of the conventional pro-totype based classifier that have the crisp class labels, the proposedprototype with the fuzzy class labels is studied. For the genericprototype based classifiers, in order to improve the classificationperformance it is studied how to define the position of the proto-types. In this paper, we improved the classification performanceof the prototype based classifier by assigning the fuzzy class labelsto each predefined prototype. At the next research step, we willanalyze a cluster geometrically and define sub-prototypes in alocal space to improve the classification performance.
References
Alcala-Fdez, J., Alcala, R., & Herrera, F. (2011). A fuzzy association rule-basedclassification model for high-dimensional problems with genetic rule selectionand lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5), 857–872.
Bargiela, A., & Pedrycz, W. (2003). Granular computing: An introduction. Dordrecht:Kluwer Academic Publishers.
Chacon-Murguia, M. I., Nevarez-Santana, J. I., & Sandoval-Rodriguez, R. (2009).Multiblob cosmetic defect description/classification using a fuzzy hierarchicalclassifier. IEEE Transactions on Industrial Electronics, 56(4), 1292–1299.
Chosh, A. K. (2012). A probabilistic approach for semi-supervised nearest neighborclassification. Pattern Recognition Letters, 33, 1127–1133.
Please cite this article in press as: Roh, S.-B., et al. A design of granular fuzzy clj.eswa.2014.04.040
Ganji, M. F., & Abadeh, M. S. (2011). A fuzzy classification system based on antcolony optimization for diabetes disease diagnosis. Expert Systems withApplications, 38, 14650–14659.
Guo, H., Jack, L. B., & Nandi, A. K. (2005). Feature generation using geneticprogramming with application to fault classification. IEEE Transactions onSystems, Man, and Cybernetics, Part B, 35(1), 89–99.
Ishibuchi, H., & Yamamoto, T. (2005). Rule weight specification in fuzzy rule-based classification systems. IEEE Transactions on Fuzzy Systems, 13(4),428–435.
Jahromi, M. Z., & Taheri, M. (2008). A proposed method for learning rule weights infuzzy rule-based classification systems. Fuzzy Sets and Systems, 159, 449–459.
Juang, C.-F., & Chen, G.-C. (2012). A TS fuzzy system learned through a supportvector machine in principal component space for real-time object detection.IEEE Transactions on Industrial Electronics, 59(8), 3309–3320.
Lin, T. Y. (1988). Neighborhood systems and relational database. In Proceedings ofthe ACM Sixteenth Annual Conference on Computer Science, New York, February23–25, (pp. 725).
Lin, T. Y. (2000). Data mining and machine oriented modeling: a granularcomputing approach. Journal of Applied Intelligence, 13(2), 113–124.
Lin, T. Y. (2005a). Granular computing rough set perspective. The Newsletter of theIEEE Computational Intelligence Society, 4, 1543–4281.
Lin, T. Y. (2005). Granular computing: a problem solving paradigm. In Proceedings ofthe IEEE International Conference on Fuzzy Systems, Reno, Nevada, USA, May 22–25, (pp. 132–137).
Liu, H., Xiong, S., & Fang, Z. (2011). FL-GrCCA: A granular computing algorithmbased on fuzzy lattices. Computers and Mathematics with Applications, 61,138–147.
Nandedkar, A. V., & Biswas, P. K. (2009). A granular reflex fuzzy min-max neuralnetwork for classification. IEEE Transactions on Neural Networks, 20(7),1117–1134.
Oh, S.-K., Kim, W.-D., Pedrycz, W., & Park, B.-J. (2011). Polynomial-based radial basisfunction neural networks (P-RBF NNs) realized with the aid of particle swarmoptimization. Fuzzy Sets and Systems, 163, 54–77.
Pajares, G., Guijarro, M., & Ribeiro, A. (2010). A hopfield neural network forcombining classifiers applied to textured images. Neural Networks, 23, 144–153.
Pedrycz, W. (2013). Analysis and design of intelligent systems: A framework of granularcomputing. Boca Raton, Fl: CRC Press.
Pendharkar, P. (2012). Fuzzy classification using the data envelopment analysis.Knowledge-Based Systems, 31, 183–192.
Sh, Y., Eberhart, R., & Chen, Y. (1999). Implementation of evolutionary fuzzysystems. IEEE Transactions on Fuzzy Systems, 7(2), 109–119.
Song, Q., Yang, X., Soh, Y. C., & Wang, Z. M. (2010). An information-theoretic fuzzy C-spherical shells clustering algorithm. Fuzzy Sets and System, 161, 1755–1773.
Souza, R., Rittner, L., & Lotufo, R. (2014). A comparison between k-optimum pathforest and k-nearest neighbors supervised classifiers. Pattern Recognition Letters,39, 2–10.
Sun, S., Zhang, C., & Zhang, D. (2007). An experimental evaluation of ensemblemethods for EEG signal classification. Pattern Recognition Letters, 28, 2157–2163.
Tao, J., Li, Q., Zhu, C., & Li, J. (2012). A hierarchical naive Bayesian network classifierembedded GMM for textural Image. International Journal of Applied EarthObservation and Geo information, 14, 139–148.
Wu, C.-F., Lin, C.-J., & Lee, C.-Y. (2011). A functional neural fuzzy network forclassification applications. Expert Systems with Applications, 38, 6202–6208.
Yao, Y. Y. (2001). Information granulation and rough set approximation.International Journal of Intelligent Systems, 16(1), 87–104.
Zadeh, L. A. (1997). Toward a theory of fuzzy information granulation and itscentrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 90,111–127.
Zhang, Y., Lu, Z., & Li, J. (2010). Fabric defect classification using radial basis functionnetwork. Pattern Recognition Letters, 31, 2033–2042.
assifier. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/