a fuzzy case based reasoning approach to value engineering

6
A fuzzy case based reasoning approach to value engineering M.H. Fazel Zarandi , Zahra S. Razaee, M. Karbasian Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran article info Keywords: Value engineering Fuzzy case-based reasoning Fuzzy clustering Fuzzy data abstract This paper is intended to assist the experts during the creativity phase of value engineering through uti- lizing the past experiences and avoid them in a specific domain from repeating the same experience. To this purpose, a general fuzzy case based reasoning (CBR) system is developed. Our system benefits from a fuzzy clustering model for fuzzy data to facilitate case retrieval and reduce the time complexity. The inherent analogical nature of a case-based reasoning (CBR) model and its integration with fuzzy theory would facilitate access to more precise and systematically classified information during a VE workshop. In order to test the performance of the proposed system, it is applied to suburban highway design data extracted from National Cooperative Highway Research Program (NCHRP) Report 282. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Value engineering (VE) is an organized approach directed at analyzing the function of systems, facilities, services, and supplies for the purpose of achieving their essential functions at the lowest life-cycle cost consistent with required performance, reliability, quality and safety (Mandelbaum & Reed, 2006). The VE process consists of several phases, including the information phase, func- tion analysis phase, creativity phase, evaluation phase, presenta- tion phase and implementation phase. Creativity depends on the human brain and cannot be computerized easily by conventional programming. Case-based reasoning (CBR) from AI can be used to improve efficiency of this stage, since this approach is able to utilize the specific knowledge of experiences by retrieving and adapting the solutions from similar past cases. In the literature, existing models mainly involve conventional approaches and less has been devoted to devising AI approaches. One of the earliest works was done by the US Army Corps of Engi- neers through establishing an information retrieval system called VE-trieval. This program can be queried by key-word methodology on a particular subject to obtain an abstract and other useful infor- mation (Degenhardt, 1985). Park (1994) developed VEPRO which is a spreadsheet rule-based system with database features and con- sists of several models parallel to the VE job plan. Alcantara (1996) designed a support program for the information phase of VE, which assigned data structure for representing and performing analytical tasks on rational data. A computer model for VE method- ology was developed by Assaf, Jannadi, and Al-Tamimi (2000) emphasizing life cycle cost calculations. Dahim (2001) at Pitts- burgh University developed an expert system for VE application in suburban highway design. It utilizes the analytical hierarchy process (AHP) method for the evaluation phase of VE. Naderpajouh and Afshar (2008) proposed a conceptual expert case-based rea- soning (CBR) framework that outlines knowledge entities and their relations in the VE workshop. It also benefits from a fuzzy approach to handle uncertainties in the evaluation phase of the job plan. In general, devising an expert system for a VE job plan is recom- mended by different researches (Al-Yousefi, 1991; Assaf et al., 2000; Shen & Brandon, 1991). The main objective of this study is to assist the experts during the creativity phase of VE through utilizing the past experiences to prevent repeating the same experience in a particular domain. To this purpose, a comprehensive fuzzy CBR system is proposed involving fuzzy representation of cases and a fuzzy clustering of fuzzy data model to similarity matching in order to facilitate case retrieval. The basic idea that motivates us to use fuzzy theory is that in early stages of the project development, where VE has the greatest payoffs (Dell’Isola, 1998), most of the parameters have uncertainties (Naderpajouh, Afshar, & Mirmohammadsadeghi, 2006). In addition, many experts cannot express their judgments in accurate numerical terms and use linguistic expressions. In these cases, fuzzy theory may be employed to handle uncertainties and support linguistic assessments. Thus, the inherent analogical nature of a case-based reasoning (CBR) model and its integration with fuzzy theory would facilitate access to more precise and sys- tematically classified information during a VE workshop. The rest of the paper is organized as follows. Section 2 summa- rizes the literature survey for the related areas. We propose a dis- tance measure for fuzzy data based on Wasserstein Metric in Section 3; by means of this distance and following Keller’s ap- proach, we propose a fuzzy clustering model for fuzzy data with outliers (Section 4). For determining the optimal number of 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.01.124 Corresponding author. Tel.: +98 21 641 3034; fax: +98 21 6641 3025. E-mail addresses: [email protected] (M.H. Fazel Zarandi), [email protected] (Z.S. Razaee), [email protected] (M. Karbasian). Expert Systems with Applications 38 (2011) 9334–9339 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: mh-fazel-zarandi

Post on 21-Jun-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A fuzzy case based reasoning approach to value engineering

Expert Systems with Applications 38 (2011) 9334–9339

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

A fuzzy case based reasoning approach to value engineering

M.H. Fazel Zarandi ⇑, Zahra S. Razaee, M. KarbasianDepartment of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran

a r t i c l e i n f o a b s t r a c t

Keywords:Value engineeringFuzzy case-based reasoningFuzzy clusteringFuzzy data

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.01.124

⇑ Corresponding author. Tel.: +98 21 641 3034; faxE-mail addresses: [email protected] (M.H. Fazel Z

Razaee), [email protected] (M. Karbasian).

This paper is intended to assist the experts during the creativity phase of value engineering through uti-lizing the past experiences and avoid them in a specific domain from repeating the same experience. Tothis purpose, a general fuzzy case based reasoning (CBR) system is developed. Our system benefits from afuzzy clustering model for fuzzy data to facilitate case retrieval and reduce the time complexity. Theinherent analogical nature of a case-based reasoning (CBR) model and its integration with fuzzy theorywould facilitate access to more precise and systematically classified information during a VE workshop.In order to test the performance of the proposed system, it is applied to suburban highway design dataextracted from National Cooperative Highway Research Program (NCHRP) Report 282.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Value engineering (VE) is an organized approach directed atanalyzing the function of systems, facilities, services, and suppliesfor the purpose of achieving their essential functions at the lowestlife-cycle cost consistent with required performance, reliability,quality and safety (Mandelbaum & Reed, 2006). The VE processconsists of several phases, including the information phase, func-tion analysis phase, creativity phase, evaluation phase, presenta-tion phase and implementation phase. Creativity depends on thehuman brain and cannot be computerized easily by conventionalprogramming. Case-based reasoning (CBR) from AI can be usedto improve efficiency of this stage, since this approach is able toutilize the specific knowledge of experiences by retrieving andadapting the solutions from similar past cases.

In the literature, existing models mainly involve conventionalapproaches and less has been devoted to devising AI approaches.One of the earliest works was done by the US Army Corps of Engi-neers through establishing an information retrieval system calledVE-trieval. This program can be queried by key-word methodologyon a particular subject to obtain an abstract and other useful infor-mation (Degenhardt, 1985). Park (1994) developed VEPRO which isa spreadsheet rule-based system with database features and con-sists of several models parallel to the VE job plan. Alcantara(1996) designed a support program for the information phase ofVE, which assigned data structure for representing and performinganalytical tasks on rational data. A computer model for VE method-ology was developed by Assaf, Jannadi, and Al-Tamimi (2000)emphasizing life cycle cost calculations. Dahim (2001) at Pitts-

ll rights reserved.

: +98 21 6641 3025.arandi), [email protected] (Z.S.

burgh University developed an expert system for VE applicationin suburban highway design. It utilizes the analytical hierarchyprocess (AHP) method for the evaluation phase of VE. Naderpajouhand Afshar (2008) proposed a conceptual expert case-based rea-soning (CBR) framework that outlines knowledge entities and theirrelations in the VE workshop. It also benefits from a fuzzy approachto handle uncertainties in the evaluation phase of the job plan. Ingeneral, devising an expert system for a VE job plan is recom-mended by different researches (Al-Yousefi, 1991; Assaf et al.,2000; Shen & Brandon, 1991).

The main objective of this study is to assist the experts duringthe creativity phase of VE through utilizing the past experiencesto prevent repeating the same experience in a particular domain.To this purpose, a comprehensive fuzzy CBR system is proposedinvolving fuzzy representation of cases and a fuzzy clustering offuzzy data model to similarity matching in order to facilitate caseretrieval. The basic idea that motivates us to use fuzzy theory isthat in early stages of the project development, where VE has thegreatest payoffs (Dell’Isola, 1998), most of the parameters haveuncertainties (Naderpajouh, Afshar, & Mirmohammadsadeghi,2006). In addition, many experts cannot express their judgmentsin accurate numerical terms and use linguistic expressions. Inthese cases, fuzzy theory may be employed to handle uncertaintiesand support linguistic assessments. Thus, the inherent analogicalnature of a case-based reasoning (CBR) model and its integrationwith fuzzy theory would facilitate access to more precise and sys-tematically classified information during a VE workshop.

The rest of the paper is organized as follows. Section 2 summa-rizes the literature survey for the related areas. We propose a dis-tance measure for fuzzy data based on Wasserstein Metric inSection 3; by means of this distance and following Keller’s ap-proach, we propose a fuzzy clustering model for fuzzy data withoutliers (Section 4). For determining the optimal number of

Page 2: A fuzzy case based reasoning approach to value engineering

M.H. Fazel Zarandi et al. / Expert Systems with Applications 38 (2011) 9334–9339 9335

clusters, we modify Kown (1998) validity index so that it can beused in a complete fuzzy framework and also in noisy environ-ments (Section 5). In Section 6, the main methodology is proposed.As an application, our system is tested on suburban highway de-sign data provided in NCHRP Report 282 (NCHRP, 1986). Finally,conclusions and future works are presented in Section 8.

2. Background

This section will briefly provide some relative literature in theareas of case-based reasoning, fuzzy case-based reasoning, cluster-ing analysis, fuzzy data and Metrics for fuzzy data.

2.1. Case-based reasoning

The case based reasoning was first proposed by Watson (1997).It is a problem-solving paradigm that involves solving new prob-lems by searching through a database of previously-solved prob-lems (called a case library) for one or more cases whoseidentifying features closely resemble the current problem. Whenfound, the solution employed in the historical case (s) is retrievedand applied to the current problem. However, if the retrieved caseis not a close match, the solution is revised producing a new casethat can be retained. Finally, the current problem with the newsolution can be added to the case library to increase its robustness.Aamodt and Plaza (1994) regarded CBR as composed of the follow-ing cycle (CBR cycle) with four main subjects:

� Retrieving similar previously experienced cases whose problemis judged to be similar.� Reusing the cases by copying or integrating the solutions from

the cases retrieved.� Revising or adapting the solution (s) retrieved in an attempt to

solve the new problem.� Retaining the new solution once it has been confirmed or

validated.

The procedures for a CBR are shown in Fig. 1.

2.2. Fuzzy case-based reasoning

Adding a fuzzy logic concept into the conventional CBR meth-ods can improve the CBR performance. Fuzzy logic can be used incase representation to provide a characterization of impreciseand uncertain information. In other words, fuzzy logic allows usto represent cases whose attributes have imprecise and vaguevalues. Moreover, one of the major issues in fuzzy set theory ismeasuring similarities in order to design robust systems. The

Fig. 1. CBR cycle (Aamodt & Plaza, 1994).

inherent fuzzy nature of similarity measurement in CBR is anothermotivation to use fuzzy theory in case retrieval (Burkhardm &Richterm, 2001). For related work in this area, see for exampleHirota et al. (1998), Dvir, Langholz, and Schneider (1999), Liangand Shi (2003) and Wang (1997).

2.3. Clustering analysis

Clustering is a division of a given set of objects into subgroupsor clusters, so that objects in the same cluster are as similar as pos-sible, and objects in different clusters are as dissimilar as possible.From a machine learning perspective, clustering is an unsupervisedlearning of a hidden data concept (Berkhin, 2002). In conventional(hard) clustering analysis, each datum belongs to exactly one clus-ter, whereas in fuzzy clustering, data points can belong to morethan one cluster, and associated with each datum is a set of mem-bership degrees.

Fuzzy data are imprecise data obtained from measurements,human judgements or linguistic assessments. In cluster analysis,when there is simultaneous uncertainty in both the partition anddata, a fuzzy clustering model for fuzzy data should be applied(D’Urso & Giordani, 2006a). In our CBR system, cases are fuzzydata. Thus, in Section 4 we propose a fuzzy clustering of fuzzy datafor clustering cases in order to reduce the cases necessary forsearching and to save time.

2.4. LR-type fuzzy data

The LR-type fuzzy data represent a general class of fuzzy data.When we are dealing with univariate LR fuzzy data, this kind ofdata can be shown by a vector of LR-fuzzy numbers. In the moregeneral case of multivariate analysis, we have a matrix of LR-fuzzynumbers (De Oliveira & Pedrycz, 2007). To be more specific, let L(and R) be a decreasing shape function, which map Rþ ! ½0;1�with L(0) = 1; L(x) < 1,"x > 0; L(x) > 0,"x < 1; L(1) = 0 or(L(x) > 0,"x and L(+1) = 0) (Zimmermann, 2001). Then, a fuzzynumber ~A is of LR-type if for c,l > 0,r > 0 in R,

leAðxÞ ¼ L c�xl

� �for x 6 c;

R x�cr

� �for x P c:

(ð1Þ

where, c, l, r are the center, left and right spreads of eA, respectively.Symbolically we can write eA ¼ ðc; l; rÞLR.

In LR-type fuzzy numbers, the triangular fuzzy numbers (TFNs)are most commonly used. An LR-type fuzzy number ~A is called tri-angular fuzzy number if L(x) = R(x) = 1 � x, characterized by the fol-lowing membership function:

leAðxÞ ¼ 1� c�xl for x 6 c;

1� x�cr for x P c:

(ð2Þ

2.5. Metrics for fuzzy data

In the recent literature, there are some distance measures forfuzzy data. We review some of them in this section.

Definition (The Hausdorff distance). Considering two crisp setsA;B # Rk, and a distance d(x,y) where, x 2 A and y 2 B, the Hausdorffdistance is defined as follows:

dHðA;BÞ ¼max supx2A

infy2B

dðx; yÞ; supy2B

infx2A

dðx; yÞ( )

: ð3Þ

According to the concept of a-cuts, the Hausdorff metric dH canbe generalized to fuzzy numbers eF ; eG, where eFðor eGÞ : R! ½0;1�:

Page 3: A fuzzy case based reasoning approach to value engineering

9336 M.H. Fazel Zarandi et al. / Expert Systems with Applications 38 (2011) 9334–9339

dqðeF ; eGÞ ¼R 1

0 ðdHðFa;GaÞÞqda

h i1=qif q 2 ½1;1�

supa2½0;1�

dHðFa;GaÞ if q ¼ 1;

8><>: ð4Þ

where, the crisp set Fa � fx 2 Rk : FðxÞP ag;a 2 ½0;1�, is called thea-cut of eF (Näther, 2000).

Tran and Duckstein (2002) proposed the following distance be-tween two intervals:

dTDðA;BÞ ¼Z 1

2

�12

Z 12

�12

aþ b2

� �þ xðb� aÞ

� ��

� uþ v2

þ yðv � uÞ

�� �2

dx dy

¼ aþ b2

� �� uþ v

2

� �2

þ 13

b� a2

� �2

þ v � u2

2" #

: ð5Þ

Then, they used it to formulate their distance measure for fuzzynumbers, but dTD does not satisfy the reflexivity property (Irpino& Verde, 2008):

dTDðA;AÞ ¼aþ b

2

� �� aþ b

2

� �� �2

þ 13

b� a2

� �2

þ b� a2

� �2" #

¼ 23

b� a2

� �2

P 0: ð6Þ

A squared Euclidean distance between a pair of LR-type fuzzy dataeA1 ¼ ðc1; l1; r1Þ and eA2 ¼ ðc2; l2; r2Þ, where c denotes the center and l,r indicate, respectively, the left and right spread, is defined by Yangand Ko (1996):

d2YKðk;qÞ ¼ ðc1 � c2Þ2 þ ½ðc1 � kl1Þ � ðc2 � kl2Þ�2

þ ½ðc1 þ qr1Þ � ðc2 þ qr2Þ�2; ð7Þ

where, k ¼R 1

0 L�1ðtÞdt;q ¼R 1

0 R�1ðtÞdt are parameters that summa-rize the shape of the left and right tails of the membership functionand L,R are decreasing shape functions which were defined in Sec-tion 2.

3. The proposed distance for fuzzy data

In this section, we first present a new distance measure forinterval-valued data, and then it is used to formulate the distancemeasure for fuzzy data.

Let Ii = [ai,bi], be an interval for i ¼ 1;2. We can parameterize Ii

as follows:

IiðtÞ ¼ ai þ tðbi � aiÞ 0 6 t 6 1: ð8Þ

If we represent Ii by means of its midpoint mi ¼ aiþbi2 and radius

di ¼ bi�ai2 , Eq. (9) can be rewritten as follows:

IiðtÞ ¼ mi þ ð2t � 1Þdi 0 6 t 6 1: ð9Þ

The distance measure between I1 and I2 can be defined as follows:

d2ðI1; I2Þ ¼Z 1

0½I1ðtÞ � I2ðtÞ�2dt

¼Z 1

0½ðm1 �m2Þ þ ðd1 � d2Þð2t � 1Þ�2 dt

¼ ðm1 �m2Þ2 þ13ðd1 � d2Þ2: ð10Þ

This distance takes into account all the points in both intervals. Irpi-no and Verde (2008) has derived Eq. (10) from another point ofview, using the Wasserstein distance. To be more specific, let F1

and F2 be distribution functions, the Wasserstein L2 metric is de-fined as follows (Gibbs & Su, 2002):

dWassðF1; F2Þ ¼Z 1

0ðF�1

1 ðtÞ � F�12 ðtÞÞ

2 dt� �1=2

; ð11Þ

where F�11 and F�1

2 are the quantile functions of the two distribu-tions. If we assume Fi for i = 1,2 to be the uniform distribution func-tion on [ai, bi], then F�1

i ðtÞ is the same as the parametricrepresentation Ii(t) in Eq. (8). Thus, the Wasserstein distance coin-cides with the distance defined in Eq. (10).

Now we are ready to construct a distance between fuzzy data.According to a-cuts, the Wasserstein distance dWass can be general-ized to fuzzy numbers eA1 and eA2:

dðeA1; eA2Þ ¼Z 1

0d2

Wass ðeA1Þa; ðeA2Þa

da� �1

2

: ð12Þ

We calculate this distance for triangular fuzzy numbers. Let eAi ¼ðci; li; riÞ; i ¼ 1;2 be triangular fuzzy numbers and ðeAiÞa ¼ ½liaþðci � liÞ � riaþ ðci þ riÞ�, the midpoint and the radius of ðeAiÞa are asfollows:

mðeAiÞa¼ ci þ

12ð1� aÞðri � liÞ: ð13Þ

dðeAiÞa¼ 1

2ð1� aÞðri þ liÞ: ð14Þ

Then we have:

d2ðeA1; eA2Þ ¼Z 1

0d2

WassððeA1Þa; ðeA2ÞaÞ da

¼Z 1

0mðeA1Þa�m

ðeA2Þa

� �2

þ 13

dðeA1Þa� d

ðeA2Þa

� �2( )

da

¼Z 1

0ðc1 � c2Þ þ

12ð1� aÞ ðr1 � r2Þ � ðl1 � l2Þ½ �

� �2(

þ 112ð1� aÞ2½ðr1 � r2Þ þ ðl1 � l2Þ�2

)da

¼ ðc1 � c2Þ2 þ19½ðl1 � l2Þ2 þ ðr1 � r2Þ2 � ðl1 � l2Þðr1 � r2Þ�

� 12ðc1 � c2Þ½ðl1 � l2Þ � ðr1 � r2Þ�: ð15Þ

We use this distance in the next section for fuzzy clustering of fuzzydata.

4. Fuzzy clustering of fuzzy data with outliers

In this section Keller’s approach (Keller, 2000) is modified sothat it can be used for fuzzy data. Similar to his approach, an addi-tional weighting factor is added for each datum to identify outliersand reduce their effects. Before describe the procedure, let us intro-duce the following notation:

U � {uik:i = 1, . . .,c;k = 1, . . .,n} is the membership matrix of order(c � n), where c is the number of clusters, n is the number of datavectors; uik 2 [0,1] denotes the membership degree of the kth ob-ject to the ith cluster. In contrast to Keller’s approach where dataelements and cluster prototypes are crisp, we define them as trian-

gular fuzzy data. Thus, eX � ~xjk ¼ c~xj

k; l~xj

k; r~xj

k

: k ¼ 1; . . . ;n; j ¼

n1; . . . ; pg and eV � ~v j

i ¼ c~v ji; l~v j

i; r~v j

i

: i ¼ 1; . . . ; c; j ¼ 1; . . . ; p

n oare

fuzzy data and fuzzy prototype matrices, respectively. Let us nowintroduce the objective function:

J ðeX ;U; eV Þ ¼Xc

i¼1

Xn

k¼1

umik :

1xq

k

:d2ð~v i; ~xkÞ: ð16Þ

Page 4: A fuzzy case based reasoning approach to value engineering

M.H. Fazel Zarandi et al. / Expert Systems with Applications 38 (2011) 9334–9339 9337

under the constraintsXn

k¼1

xk ¼ x: ð17Þ

Xc

i¼1

uik ¼ 1: ð18Þ

where, m is the degree of fuzziness, and d2ð~v i; ~xkÞ is as follows:

d2ð~v i; ~xkÞ ¼Xp

j¼1

d2 ~v ji; ~x

jk

¼Xp

j¼1

c~v ji� c~xj

k

2þ 1

9l~v j

i� l~xj

k

2��

þ r~v ji� r~xj

k

2� l~v j

i� l~xj

k

r~v j

i� r~xj

k

�� 1

2c~v j

i� c~xj

k

l~v j

i� l~xj

k

� r~v j

i� r~xj

k

h i�: ð19Þ

As Keller points out, the factor xk represents the weight for the kthdatum and x is a constant real valued parameter. With constantparameter q, the influence of the outlier weighting factor can becontrolled. For this purpose, outliers are assigned a large weightxk, so 1

xqk

is small in this case. The necessary conditions for minimiz-ing the objective function are as follows:

c~v ji¼

Pnk¼1um

ik � 1xq

k2c~xj

kþ 1

2 lv ji� l~xj

k

� rv j

i� r~xj

k

h ih i2Pn

k¼1umik � 1

xqk

: ð20Þ

l~v ji¼

Pnk¼1um

ik � 1xq

k

29 l~xj

kþ 1

9 r~v ji� r~xj

k

þ 1

2 c~v ji� c~xj

k

h i29

Pnk¼1um

ik � 1xq

k

: ð21Þ

r~v ji¼

Pnk¼1um

ik � 1xq

k

29 r~xj

kþ 1

9 l~v ji� l~xj

k

� 1

2 c~v ji� c~xj

k

h i29

Pnk¼1um

ik � 1xq

k

: ð22Þ

xk ¼Pc

i¼1umik :d

2ð~v i; ~xkÞ 1

qþ1

Pnl¼1

Pci¼1um

il :d2ð~v i; ~xlÞ

1qþ1�x: ð23Þ

uik ¼1Pc

r¼1d2ð~v i ;~xkÞd2ð~vr ;~xkÞ

1m�1

: ð24Þ

As it is observed, the membership degrees are left unchanged, whilethe cluster centers take into account the weights; points with highrepresentativeness are more effective than outliers. On the basis ofthe necessary conditions, we can construct an iterative algorithm asfollows:

Algorithm.

Step 1: Fix the degree of fuzziness (m), the number of clusters (c),x and q. Choose an initial fuzzy c-partition U(0). Also,choose initial spreads and weights for each datum subjectto Eq. (17). Set t ¼ 0.

Step 2: Calculate eV ðtÞ ¼ c~vðtÞ ; l~vðtÞ ; r~vðtÞð Þ using U(t), spreads, weightsand Eqs. (20)–(23).

Step 3: Update xðtÞk ; k ¼ 1; . . . ;n using Eq. (23) and update U(t) byU(t+1) using eV ðtÞ ¼ c~vðtÞ ; l~vðtÞ ; r~vðtÞð Þ and Eq. (24).

Step 4: If kU(t+1) � U(t)k < e, where e is a non-negative small num-ber fixed by the researcher, the algorithm has converged.Otherwise, set t = t + 1 and go to Step 2.

5. Cluster validity index

As Pal and Bezdek (1995) pointed out, once clusters are found, itis necessary to validate them. This is a cluster validity problem. Inthe literature, we can find many validity indices. Early indices such

as partition coefficient and partition entropy (Bezdek, 1974a,1974b) can be directly applied to the fuzzy clustering of fuzzy data,but they use only fuzzy memberships, which may not have closeconnection to the geometrical structure of data, (Zhang, Wang,Zhang, & Li, 2008). There is also another class of indices whichsimultaneously take fuzzy memberships and the data structureinto consideration. These indices cannot be directly applied tothe fuzzy clustering of fuzzy data and should be extended to acomplete fuzzy framework. Kwon validity index (Kown, 1998) isa member of the second class of proposed validity indices. It is amodification of Xie and Beni validity index (Xie & Beni, 1991) withthe added advantage of monotonically decreasing tendency as thenumber of clusters increases, but it has the disadvantage of notbeing robust to noise. Here, in order to obtain the number of clus-ters c in a complete fuzzy framework and also in noisy environ-ments, we modify Kwon validity index as follows:

FfrðU; eV ; eXÞ ¼Pc

i¼1

Pnk¼1um

ik � 1xq

k� d2ð~v i; ~xkÞ þ 1

c

Pci¼1

d2ð~v i;�~v f Þ

mini–k

d2ð~v i; ~vkÞð25Þ

where, �~v f ¼

Pi

Pkum

ik� 1

xqk

~xkPi

Pkum

ik� 1

xqk

.

Our goal is to find the fuzzy c-partition with the smallest valueof Ffr. The differences between the modified version of Kown valid-ity index and Kown validity index are as follows:

� The modified validity index can be used in a complete fuzzyframework.� The weighted fuzzy mean is used instead of crisp mean of data.� The factor 1

xqk

is added to the first term of the numerator, so that

it can be used in noisy environments.� The weighting exponent is generalized from 2 to m.

Thus, this modified version of Kwon validity index is robust tonoise and can be used for fuzzy clustering of fuzzy data.

6. Methodology

This section presents the methodology of developing our sys-tem and presents its modules in detail.

6.1. Case representation and indexing

Each case is a project to which VE have been applied. One fea-ture is the name of the part of project on which VE studies havebeen conducted. We use this feature as an index. Cases are classi-fied according to this feature. Other features are project character-istics which are triangular fuzzy numbers and determined byexperts. These are usually domain dependent. The features areweighted through a weighting method like fuzzy AHP. Solutionsare practical ideas (or alternatives) which were generated by ex-perts in VE workshop. Each solution is a binary vector, where eachentry is correspondent to each idea, i.e., if an idea is generated, thecorresponding entry is equal to one; otherwise, it is equal to zero.Fig. 2 illustrates case representation in the case library.

6.2. Case retrieval

The retrieval algorithm is begun by deciding to which class thequery case belongs. After determining the class of the query case,we have to search for similar cases in that class. For this purpose,the cases will be clustered by the algorithm proposed in Section3.7 into several groups. Next, the degree of similarity betweenthe query case and each cluster prototype vi is calculated usingd(W) and substituting vi for r in the similarity function defined asfollows:

Page 5: A fuzzy case based reasoning approach to value engineering

Fig. 2. Case representation in the case library.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1bad poor good excellent

Fig. 3. Linguistic variables and the associated fuzzy numbers.

Table 1The average MSE for each class of the indexed cases.

Existing design MSE

2U 0.083T 0.084U 0.134D 0.135T 0.056D 0.087T 0.07

9338 M.H. Fazel Zarandi et al. / Expert Systems with Applications 38 (2011) 9334–9339

SMðq; rÞ ¼ e�bdðWÞFRðq;rÞ; ð26Þ

where, b is a positive constant.The most similar cluster to the query case is the one satisfying

arg maxi SM(q,vi) for i = 1,. . .,c. The cases within this cluster will becompared to the query case according to Eq. (26).

6.3. Case adaptation

If there is a case with the similarity degree equal to one of theretrieved cases, null adaptation will be used, applying the solutionfrom the retrieved case to the query case without any modification.Otherwise, we suggest compositional adaptation based on the factthat cases usually have little variance in similarity degrees. This al-lows us to combine the corresponding solutions in an efficient wayto obtain the final solution. This is done as follows: for each fea-ture, we take a weighted average over the data points with weightsproportional to similarity degrees and then we apply a threshold;those above the threshold are mapped to one, otherwise to zero.

If we represent the solutions of the retrieved cases by Si, theirsimilarity with the query case by SMi and the threshold by h, thenthe solution of the query case (sq) is as follows:

Sq ¼Pk

i¼1SMi � SiPki¼1SMi

� h1

& ’; ð27Þ

where, d.e is the ceiling function.

6.4. Case retainment

Eventually, if the experts find the solution (ideas/alternatives)acceptable, it will be added as a new case to the case library; other-wise, they run a brainstorming session and generate ideas and addthe practical ones to the case library as the solution of the querycase.

7. Application

Our system was tested on suburban highway design data whichwas extracted from the National Cooperative Highway ResearchProgram (NCHRP) Report 282 (NCHRP, 1986). The features includeexisting design, maximum available width and the desirability ofoperational and safety indices. The existing design can be one ofthese options:

� Two-lane Undivided, abbreviated as 2U.� Three-Lane Divided with Center Two-Way- Left-Turn Lane,

abbreviated as 3T.� Four-Lane Undivided, abbreviated as 4U.� Four-Lane Divided with raised Median, abbreviated as 4D.� Five-Lane Divided with Center Two-Way- Left-Turn-Lane,

abbreviated as 5D.� Six-Lane Divided with raised median, abbreviated as 6D.� Seven-Lane with Center Two-Way- Left-Turn Lane, abbreviated

as 7T.

There are 44 feasible alternatives that we use as case solutions.We pick up the ‘‘existing design’’ feature as an index and classifythe cases according to this feature. Then, the class of the query caseis determined. The ‘‘Maximum available width’’ is crisp. We changeit to fuzzy singleton. Other features are linguistic terms and aretransformed to triangular fuzzy numbers according to Fig. 3.

After retrieving the similar cases through our clustering algo-rithm and adapting their solutions as explained in Sections 6.2and 6.3, the solution of the query case is generated. If this solutionis acceptable, it will be added to the case library.

The performance of our system was validated by leave-one-outcross-validation. LOOCV involves using a single observation fromthe original sample as the validation data, and the remainingobservations as the training data. This is repeated such that eachobservation in the sample is used once as the validation data. Eachtime the mean squared error (MSE) is computed. The average MSEfor each class of the indexed cases is shown in Table 1.

Since we wanted to develop a general system that can be usedin all domains, we used compositional adaptation. For developing asystem for a specific domain, other methods of adaptation such astransformational and derivational adaptation may reduce theerror.

8. Conclusion and future works

This paper presented a fuzzy CBR system for value engineering.This system can contribute significantly to the efficiency of the va-lue study, providing the VE team with an extensive memory of pre-vious experiences. Since cases are fuzzy data, a fuzzy clusteringmodel for fuzzy data, based on a new distance is used to reducethe cases necessary for searching and save time. In addition, Kwoncluster validity index is modified to validate the number of clus-ters. Finally, to test the performance of our system, it is appliedto suburban highway design data extracted from NCHRP Report282. Another problem that can be explored is to develop a rough

Page 6: A fuzzy case based reasoning approach to value engineering

M.H. Fazel Zarandi et al. / Expert Systems with Applications 38 (2011) 9334–9339 9339

set-based case-based reasoner for value engineering and as wementioned before, trying other methods of adaptation such astransformational and derivational adaptation may reduce theerror.

References

Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Fundamental issues,methodological variations, and system approaches. AI Communications. IOSPress (Vol. 7( 1), pp. 39–59).

Alcantara, P. Jr. (1996). Development of a computer understandable representation of adesign rationale to support value engineering. Unpublished Ph.D. dissertation,School of Virginia Polytechnic Institute and State University.

Al-Yousefi, A. S. (1991). Expert system: A programmable approach to VE logic. InProceeding of the 1991 SAVE international conference (pp. 155–167). Kansas City.

Assaf, S., Jannadi, O. A., & Al-Tamimi, A. (2000). Computerized system forapplication of value engineering methodology. ASCE Journal of Computing inCivil Engineering, 14(3), 206–214.

Berkhin, P. (2002). Survey of clustering data mining techniques. Accrue Software Inc.<http://www.accrue.com/products/researchpapers.html>.

Bezdek, J. C. (1974a). Numerical taxonomy with fuzzy sets. Journal of MathematicalBiology, 1, 57–71.

Bezdek, J. C. (1974b). Cluster validity with fuzzy sets. Journal of Cybernetics, 9,58–72.

Burkhardm, H. D., & Richterm, M. M. (2001). On the notion of similarity in case basedreasoning and fuzzy theory. Soft computing in case-based reasoning. London:Springer (chap. 2).

Dahim, H., & Mohammad A. (2001). Value engineering expert system in suburbanhighway design (VEESSHD). Ph.D. thesis, University of Pittsburgh.

Degenhardt, G. (1985). VE-TRIEVAL a corp of engineers value engineeringinformation retrieval system. In Proceeding of the 1985 SAVE internationalconference (pp. 14–25). Texas.

Dell’Isola, A. J. (1998). Value engineering: Practical applications (BK and Disk ed.). R.S.Means Company.

De Oliveira, J. V., & Pedrycz, W. (2007). Advances in fuzzy clustering and itsapplications. San Francisco: Wiley.

D’Urso, P., & Giordani, P. (2006a). A weighted fuzzy c-means clustering model forfuzzy data. Computational Statistics Data Analysis, 50(6), 1496–1523.

Dvir, G., Langholz, G., & Schneider, M. (1999). Matching attributes in a fuzzy casebased reasoning. Fuzzy Information Processing Society, 33–36.

Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics.International Statistical Review, 70, 419.

Hirota, K., Yoshino, H., Xu, M. Q., Zhu, Y., Li, X. Y., & Horie, D. (1998). A fuzzy casebased reasoning system for the legal inference. Fuzzy systems proceedings. IEEE

world congress on computational intelligence. In The 1998 IEEE internationalconference (Vol. 2, pp. 1350–1354).

Irpino, A., & Verde, R. (2008). Dynamic clustering for interval data using aWasserstein-based distance. Pattern Recognition Letters, 29, 1648–1658.

Keller, A. (2000). Fuzzy clustering with outliers. In T. Whalen (Ed.), Proceedings of the19th international conference on the North American fuzzy information processingsociety, NAFIPS00 (pp. 143–147).

Kown, S. H. (1998). Cluster validity index for fuzzy clustering. IEEE Electronic Letters,34(22).

Liang, Z., & Shi, P. (2003). Similarity measures on intuitionistic fuzzy sets. PatternRecognition Letters, 24, 2687–2693.

Mandelbaum, J., & Reed, D. L. (2006). Value engineering handbook, IDA Paper P-4114,Alexandria, VA: Institute for Defense Analysis.

Naderpajouh, N., & Afshar, A. (2008). A case-based reasoning approach toapplication of value engineering methodology in the construction industry.Journal of Construction Management and Economics, 26, 363–372.

Naderpajouh, N., Afshar, SA., & Mirmohammadsadeghi, A. (2006). Fuzzy decisionsupport system for application of value engineering in construction industry.International Journal of Civil Engineering, 4(4), 261–273.

Näther, W. (2000). On random fuzzy variables of second order and their applicationto linear statistical inference with fuzzy data. Metrika, 51, 201–221.

National Cooperative Highway Research Program (NCHRP) Report 282 (1986).Multi-lane design alternatives for improving suburban highways. Washington, DC:Transportation Research Board.

Pal, N. R., & Bezdek, J. C. (1995). On cluster validity for fuzzy c-means model. IEEETransactions on Fuzzy Systems, 1, 370–379.

Park, C. (1994). An integrated value engineering computer system or constructionprojects. Unpublished Ph.D. dissertation, School of Engineering University ofFlorida.

Shen, Q., & Brandon, P. S. (1991). Can expert systems improve VM implementation?In Proceedings of the 1991 SAVE international conference (pp. 168–176). KansasCity.

Tran, L., & Duckstein, L. (2002). Comparison of fuzzy numbers using a fuzzy distancemeasure. Fuzzy Sets and Systems, 130, 331–341.

Wang, W. J. (1997). New similarity measures on fuzzy sets and on elements. FuzzySets and Systems, 85, 305–309.

Watson, I. (1997). Applying case-based reasoning: Techniques for enterprise systems.Morgan Kaufmann Publishers.

Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactionson Pattern Analysis Machine Intelligence, 13, 841–847.

Yang, M. S., & Ko, C. H. (1996). On a class of fuzzy c-numbers clustering proceduresfor fuzzy data. Fuzzy Sets and Systems, 84, 49–60.

Zhang, Y., Wang, W., Zhang, X., & Li, Y. (2008). A cluster validity index for fuzzyclustering. Information Science, 178, 1205–1218.

Zimmermann, H. J. (2001). Fuzzy Set Theory and its Applications. Dordrecht: KluwerAcademic Press.