research article ve bayesian classifier for selecting good...

13
Research Article Na\ve Bayesian Classifier for Selecting Good/Bad Projects during the Early Stage of International Construction Bidding Decisions Woosik Jang, 1 Jung Ki Lee, 1 Jaebum Lee, 2 and Seung Heon Han 1 1 School of Civil and Environmental Engineering, Yonsei University, Seoul 120-749, Republic of Korea 2 Department of Civil and infrastructure, Hyundai Engineering Co., Ltd., Seoul 140-2, Republic of Korea Correspondence should be addressed to Seung Heon Han; [email protected] Received 11 April 2015; Revised 17 July 2015; Accepted 26 July 2015 Academic Editor: Mohamed Marzouk Copyright © 2015 Woosik Jang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Since the 1970s, revenues generated by Korean contractors in international construction have increased rapidly, exceeding USD 70 billion per year in recent years. However, Korean contractors face significant risks from market uncertainty and sensitivity to economic volatility and technical difficulties. As the volatility of these risks threatens project profitability, approximately 15% of bad projects were found to account for 74% of losses from the same international construction sector. Anticipating bad projects via preemptive risk management can better prevent losses so that contractors can enhance the efficiency of bidding decisions during the early stages of a project cycle. In line with these objectives, this paper examines the effect of such factors on the degree of project profitability. e Na¨ ıve Bayesian classifier is applied to identify a good project screening tool, which increases practical applicability using binomial variables with limited information that is obtainable in the early stages. e proposed model produced superior classification results that adequately reflect contractor views of risk. It is anticipated that when users apply the proposed model based on their own knowledge and expertise, overall firm profit rates will increase as a result of early abandonment of bad projects as well as the prioritization of good projects before final bidding decisions are made. 1. Introduction In 2013, the international construction industry generated USD 8.9 trillion in annual construction spending [1]. e compound annual growth rate is predicted to reach 4.1% by 2021, suggesting future steady growth and thus implying that more contractors will enter the international construction market. Korean contractors have attained the highest rev- enues in this market, in which total international revenues over the last five decades have reached USD 600 billion [2]. In recent years, this volume has reached more than USD 70 billion per year, placing Korea sixth among countries gen- erating the highest international revenues. In achieving this level of success, large Korean contractors have aggressively entered the international market, competing against other contractors from around the world. However, given the presence of market volatility and other forms of uncertainty, particularly in developing and underdeveloped countries, international construction pro- jects still carry a high degree of risk exposure, which affects the overall financial soundness of the contractor. Korean international construction profit rate data show that the average dropped from 4.7% to 3.8% and then to 3.2% from 2010 to 2012. As Korean contractors have actively penetrated several countries and won bids for several international projects, the number of loss projects has also simultaneously increased, causing contractors to experience financial insta- bility. According to actual data related to Korean contractors, as shown in Figure 1, 15% of approximately 300 projects were severely underperforming projects, and these projects accounted for 74% of all losses [2], demonstrating that a small portion of loss projects causes the majority of financial losses among international projects. For instance, Samsung Engineering’s annual report [3] showed an increase in total revenues of KRW 9.8 trillion. However, the report revealed operating profit losses of KRW 1.03 trillion in 2013, which was mainly the result of a few major construction projects such as the UAE refinery project, Saudi gas projects, and the aluminum project. Despite the successful bidding that increases firms’ overall revenues, a small number of projects Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 830781, 12 pages http://dx.doi.org/10.1155/2015/830781

Upload: others

Post on 11-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Research ArticleNave Bayesian Classifier for Selecting GoodBad Projects duringthe Early Stage of International Construction Bidding Decisions

Woosik Jang1 Jung Ki Lee1 Jaebum Lee2 and Seung Heon Han1

1School of Civil and Environmental Engineering Yonsei University Seoul 120-749 Republic of Korea2Department of Civil and infrastructure Hyundai Engineering Co Ltd Seoul 140-2 Republic of Korea

Correspondence should be addressed to Seung Heon Han shh6018yonseiackr

Received 11 April 2015 Revised 17 July 2015 Accepted 26 July 2015

Academic Editor Mohamed Marzouk

Copyright copy 2015 Woosik Jang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Since the 1970s revenues generated by Korean contractors in international construction have increased rapidly exceeding USD70 billion per year in recent years However Korean contractors face significant risks from market uncertainty and sensitivity toeconomic volatility and technical difficulties As the volatility of these risks threatens project profitability approximately 15 of badprojects were found to account for 74 of losses from the same international construction sector Anticipating bad projects viapreemptive risk management can better prevent losses so that contractors can enhance the efficiency of bidding decisions duringthe early stages of a project cycle In line with these objectives this paper examines the effect of such factors on the degree of projectprofitabilityTheNaıve Bayesian classifier is applied to identify a good project screening tool which increases practical applicabilityusing binomial variables with limited information that is obtainable in the early stages The proposed model produced superiorclassification results that adequately reflect contractor views of risk It is anticipated that when users apply the proposed modelbased on their own knowledge and expertise overall firm profit rates will increase as a result of early abandonment of bad projectsas well as the prioritization of good projects before final bidding decisions are made

1 Introduction

In 2013 the international construction industry generatedUSD 89 trillion in annual construction spending [1] Thecompound annual growth rate is predicted to reach 41 by2021 suggesting future steady growth and thus implying thatmore contractors will enter the international constructionmarket Korean contractors have attained the highest rev-enues in this market in which total international revenuesover the last five decades have reached USD 600 billion [2]In recent years this volume has reached more than USD 70billion per year placing Korea sixth among countries gen-erating the highest international revenues In achieving thislevel of success large Korean contractors have aggressivelyentered the international market competing against othercontractors from around the world

However given the presence of market volatility andother forms of uncertainty particularly in developing andunderdeveloped countries international construction pro-jects still carry a high degree of risk exposure which affects

the overall financial soundness of the contractor Koreaninternational construction profit rate data show that theaverage dropped from 47 to 38 and then to 32 from2010 to 2012 As Korean contractors have actively penetratedseveral countries and won bids for several internationalprojects the number of loss projects has also simultaneouslyincreased causing contractors to experience financial insta-bility According to actual data related to Korean contractorsas shown in Figure 1 15 of approximately 300 projectswere severely underperforming projects and these projectsaccounted for 74 of all losses [2] demonstrating that asmall portion of loss projects causes the majority of financiallosses among international projects For instance SamsungEngineeringrsquos annual report [3] showed an increase in totalrevenues of KRW 98 trillion However the report revealedoperating profit losses of KRW 103 trillion in 2013 whichwas mainly the result of a few major construction projectssuch as the UAE refinery project Saudi gas projects andthe aluminum project Despite the successful bidding thatincreases firmsrsquo overall revenues a small number of projects

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 830781 12 pageshttpdxdoiorg1011552015830781

2 Mathematical Problems in Engineering

Negative profit projects(number of projects)

Negative profit (85)

Extreme negativeprofit(15)

Negative profit projects(summation of cost)

Extremenegative profit

(74)

Negative profit (26)

Figure 1 Distribution of bad projects in international construction

such as these which cause negative profits after constructionare completed Other Korean firms have also experiencedsignificant earnings shocks in recent years illustrating theimportance of screening projects that cause such losses

Risk management has become central to successfulproject management particularly in an uncertain interna-tional construction environment [4ndash7] However approachesto risk management focus primarily on the engineering andconstruction phases that is after a bidding decision hasbeen made [8ndash12] Moreover these approaches are oftenlimited by a reliance on historical data which largely relyon the subjective judgments of users and hence do notprovide comprehensive support for all levels of industry[13 14] To overcome these limitations this paper examinesretrospective data to elicit implicit meaning from the actualperformance of previous cases The paper also identifies riskfactors that affect the profitability of international projectsand examines the relationship between these factors andproject profitability To this end the authors employ theNaıveBayesian classifier using binomial variables which improvepractical applicability and thus the burden of screening outbad projects through simple yet accurate methodology andresults for industry practitioners that are similar to actualproject performances Through the lens of this relationshipthe proposed model evaluates whether a project is likely tobe good or bad based on limited information available beforefinal bidding decisions are made

The procedure for developing a project classificationmodel is shown schematically in Figure 2 First the contentof previous research is examined to enhance the theoret-ical aspects of this study Based on a review of previousapproaches the limitations of existing strategies are alsoidentified

Second risk attributes that affect the soundness of con-struction projects in the early stages are identified in inter-views with relevant experts risk types are categorized basedon this informationThird case-based surveys are conductedto relate risk attributes with project profitability Case projectswere performed from 1999 to 2010 by Korean internationalcontractors Questionnaires assessed risk attributes of a given

project and actual levels of profitability forming the basis ofthe proposed model

Third the correlations between each risk attribute leveland project profitability are analyzed using a probabilisticclassification model To enhance the practical applicability ofthe model the Naıve Bayesian classifier is used as a properprobabilistic tool to support the screening of goodbadprojects in the early stages of a project The advantages anddetailed structure of this approach are also described

Finally cross validation is employed to test its usability Tovalidate the project classification model actual project dataare used to compare results obtained from the model withactual performance data testing the accuracy and reliabilityof the proposed model

2 Research Background

Risk management has been widely investigated by variousresearchers focusing on different phases of construction forthe management scope For example Hastak and Shaked[15] developed a model based on country firm and riskassessment projects Han et al [16] also developed a fullyintegrated risk management system (FIRMS) that consid-ers the enterprise and individual risk management projectlevel Furthermore various methodologies have been usedto develop risk management systems including case basedreasoning [17] artificial neural networks [18] the total riskindex [19 20] and the fuzzy analytic hierarchy process [4 21]

As international construction volumes have grownrapidly in recent decades various studies have assessedproject conditions based on the assumption that a full riskassessment of a given project can be performed prior tothe bidding stage by analyzing the information collectedHowever in reality it is impossible for contractors to considerall of the risks that affect project profitability at this early stageAccordingly del Cano and de la Cruz [22] identified certainproject characteristics including complexity size and ownercapabilities and suggested that risk analysis results shouldbe derived based on project characteristics The authors alsoemphasized that an appropriate risk management method

Mathematical Problems in Engineering 3

Literature review

Profitability risk attributes derivation

Project classification model development

Model validation

(i) International construction projectrsquos characteristics(ii) International construction risk management system

(iii) Project prediction methodology

(i) International construction projectrsquos risk attributes derivation

(ii) Profitability related risk attributes derivation(iii) Risk attributes assessment for impact

Cross validity for model validation

Naiumlve Bayesian classifier model development

Figure 2 Research procedure

based on representing attributes should be employed foreffective project screening modeling Thus a systematicmethodology that predicts project profitability during thisearly stage is imperative

For developing the prediction models the various the-ories and tools can be categorized as statistical or learningmethods Several researchers have used regression modelsthrough which the impact of each risk attribute is assessedon a Likert scale by experts [23 24] However such assess-ment is usually plagued by ambiguousness frommaintainingconsistency with different standards depending on expertrsquoscognitive limitations or the cooperation cultures duringsurveys In addition input data usually include subjectiveassessments that often distort practitionerrsquos decision-makingand judgment [15 25] As such previous studies that haveattempted to classify profit levels by Likert or numeric scalehave not yet reached a statistical level of significance inprofit prediction due to their compound mechanism anda large number of inputs for understanding complicatedproject information [15 26] To develop a more reliableprediction model and improve its applicability for industrypractitioners expert assessments should be consistent and amore simplified risk assessment model is imperative

Data-based learning tools have also been widely appliedto support the reliability of classifications such as Matlabfor Artificial Neural Network (ANN) C50 for decision treeand Linear Discriminant Analysis (LDA) ANN and C50are considered representative data mining techniques fordetecting dependent variables from the loads of input datawith complex calculations [27 28] LDA utilizes the linearfunctions between various classes where the data can becategorized according to the appropriate number of classesand its relationships [29] Unfortunately these data-miningtools do not properly explain the underpinning rule ofprediction results In addition these methods strictly requirea large number of data for improved prediction Hence theNaıve Bayesian classifier is considered a suitable method forthis research that uses a relatively small amount of historicaldata to determine whether a given project is likely to be bador good

To sum up previous studies on profit prediction havebeen limited to simple Likert scale assessments basedon expertsrsquo subjective judgments of contractor experiencepotentially generating skewed results These not only containunrealistic assumptions and a high burden for practical appli-cability but also show limited accuracy when such complexcalculations with a large number of input data are utilized[29] The authors thus present a simple binary classificationmodel based on the updated conditional probability datafrom the actual projectrsquos performance This approach isexpected to reduce bias while improving practical applicabil-ity

3 Risk Attributes of Project Profitability

Project classification model development begins by incorpo-rating overall risk attributes for international projects thatare present prior to early bidding stages Risks identified asimportant in previous studies were selected and analyzedvia meta-analysis and frequency measurements Studies byHastak and Shaked [15] theAmericanConstruction IndustryInstitute [30] the International Construction Associationof Korea [2] and Han et al [16 25] identified risk factorsthat affect construction project profitability In additionto conducting a literature review the authors performedinterviews with eight industry experts as a result 20 riskfactors were added to the analysis Table 1 presents a total of 71risk attributes derived from the aforementioned procedures

A case-based questionnaire with refined risk attributeswas designed to identify relevant information for the pro-posedmodelThe questionnairemeasured the effect of 71 riskfactors on profit levels on a 7-point Likert scale in which minus2refers to a positive impact (in the sense of gain) 0 indicates noimpact 2 refers to a negative impact and 4 refers to a criticalimpact on project profitability Sample projects were chosenamong actual international projects performed by the mostprominent 20 Korean contractors since 2000 Approximately900 projects were available of these 140 projects whichincluded sufficient data for prediction model developmentwere compiled for further analysis The collected data were

4 Mathematical Problems in Engineering

Table 1 Risk attributes affecting project profitability

Category Risk code Risk attribute Cronbachrsquos 120572 value

Political and social risks

R1 High degree of corruption collusion and illegal transaction

0787R2 Lack of familiarity with local cultures customs and

traditionsR3 Frequent policy and postlegislative changesR4 Insufficient legal system

Economic and inflation risksR5 High material unit price volatility

0735R6 High equipment unit price volatilityR7 High local currency exchange rate volatility

Owner risks

R8 Unsatisfactory owner management capabilities

0869

R9 Insufficient design information from the owner

R10 Lacking specifications and bidding documents provided bythe owner

R11 Approval and permission delays on documentation andconstruction

R12 Illegal requirements without contracts

Bidding and contract risks

R13 Insufficient time for bidding and construction quotationpreparation

0834

R14 Inadequate consideration of variations in contract durationR15 High-level local content requirementsR16 Complex tax and tariff regulationsR17 Low-level inflation compensation conditionsR18 Inadequate warranty-related conditions

R19 Inadequate payment regulations relative to internationalcontracts

R20 Ineffective dispute resolution procedures and regulations

Procurement risks

R21 Inadequate local labor supplies and procurement conditions

0905

R22 Inadequate local labor supplies and demand conditions

R23 Inadequate material production supplies and procurementconditions

R24 Inadequate material production supplies and demandconditions

R25 Inadequate material production from third-world-countrysupplies and demand conditions

R26 Inadequate equipment from local country supplies andprocurement conditions

R27 Inadequate local equipment supply procurement anddemand conditions

R28 Inadequate local subcontractor procurement conditions

Geographic risks

R29 Inadequate geographical conditions of materials andequipment delivery

0853

R30 Inadequate logistical conditions (ie electricity water gasand communication)

R31 Inadequate business environment and welfare conditionsR32 Inadequate weather conditions for constructionR33 Inadequate geological site conditions

R34 Severe complaints regarding construction throughout projectimplementation

Mathematical Problems in Engineering 5

Table 1 Continued

Category Risk code Risk attribute Cronbachrsquos 120572 value

Construction performance risks

R35 Lacking design experience

0917

R36 Lacking bid estimation experienceR37 Unsatisfactory cash flow management capabilities

R38 Unsatisfactory site manager leadership and organizationmanagement capabilities

R39 Unsatisfactory site manager experience relative to similarconstruction projects

R40 Unsatisfactory international project planning andmanagement capabilities

R41 Unsatisfactory quality management capabilitiesR42 Unsatisfactory resource management capabilitiesR43 Unsatisfactory conflict management capabilitiesR44 Unsatisfactory contract management capabilitiesR45 Lacking connectivity between sites and headquartersR46 Unsatisfactory localizationR47 Lacking trust between project participantsR48 Poor communication between project participantsR49 Poor project management capabilities based on ITR50 Inadequate construction methods for the specific projectR51 Insufficient subcontractor project performance

first tested for reliability stability consistency predictabil-ity accuracy and dependency [31] Cronbachrsquos coefficientalpha internal consistency method was employed and resultsranging from 08 to 09 were considered reliable The PASWStatistics 21 program was used to analyze the data As Table 1shows missing values in the raw data were excluded fromthe reliability test to increase reliability As a result a total ofseven categories with 51 attributes were selected an averagereliability of 0843 was obtained indicating a high level ofreliability A total of 121 datasets for the 140 projects wereused to develop the Naıve Bayesian classifier indicating thatapproximately 20 of the responses were excluded due todata inconsistency

4 Nave Bayesian Classifier

A Naıve Bayesian classifier based on Bayesrsquo theorem isemployed to derive a probabilistic classification model frompreviously assessed attributes It is most commonly used forantispam mail filtering which trains the filter to automati-cally separate spam mail and legitimate messages in a binarymanner [32 33] In doing so each attribute correspondswith words included in the spam and legitimate mails thusdistinguishing between the two in turn the former areblocked by a trained filter

41 Conceptual Structure of a Proposed Model Because theobjective of this research is to screen bad or good projectsin the early stages of international construction projects theresult of the proposed model should also be binary (good orbad) Thus the authors transformed the original conditions

of each risk attribute (7-point Likert scale) into a binaryscale to develop the Naıve Bayesian model In additionthe early stages of the construction possess a high level ofuncertainties which prohibit analysis of exact ranges of profitrates or detailed numeric classification of project conditionsAccordingly a specific profit level or numeric number ofeach risk attribute is not essential particularly during thisearly phase To this end authors consider the unity of thebinary distribution of the dependent variable as well as thepreassessed risk attributes to improve its practical application

Thebenefits of employing theNaıve Bayesian classifier arethe following [34ndash36] First the model is less sensitive to out-liers Generally when an outlier exists in the data the resultsmay skew The reduction of outlier effects is thus requiredfor data analysis Because the Naıve Bayesian classifier usesa probabilistic distribution it enables minimizing the outliereffects of the original data and prediction results are lesssensitive to outliers Second less data is required to predictthe parameter By using the utility function relearning theprobability of a given conditional probability is not strictlyrequired in the Naıve Bayesian classifier thus minimizingdata collection efforts for risk prediction Third the classifieris more applicable to the industry practitioners by reducingthe burden of data collection Although the model utilizessimple assumptions and algorithms the classification resultsare powerful even though the specified assumptions maycontain errors Finally themodels allow for the considerationof additional attributes by combining existing characteris-tics of the attributes thus increasing the accuracy of themodelrsquos predictions Based on the aforementioned benefitsof the Naıve Bayesian method the authors have chosen this

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

2 Mathematical Problems in Engineering

Negative profit projects(number of projects)

Negative profit (85)

Extreme negativeprofit(15)

Negative profit projects(summation of cost)

Extremenegative profit

(74)

Negative profit (26)

Figure 1 Distribution of bad projects in international construction

such as these which cause negative profits after constructionare completed Other Korean firms have also experiencedsignificant earnings shocks in recent years illustrating theimportance of screening projects that cause such losses

Risk management has become central to successfulproject management particularly in an uncertain interna-tional construction environment [4ndash7] However approachesto risk management focus primarily on the engineering andconstruction phases that is after a bidding decision hasbeen made [8ndash12] Moreover these approaches are oftenlimited by a reliance on historical data which largely relyon the subjective judgments of users and hence do notprovide comprehensive support for all levels of industry[13 14] To overcome these limitations this paper examinesretrospective data to elicit implicit meaning from the actualperformance of previous cases The paper also identifies riskfactors that affect the profitability of international projectsand examines the relationship between these factors andproject profitability To this end the authors employ theNaıveBayesian classifier using binomial variables which improvepractical applicability and thus the burden of screening outbad projects through simple yet accurate methodology andresults for industry practitioners that are similar to actualproject performances Through the lens of this relationshipthe proposed model evaluates whether a project is likely tobe good or bad based on limited information available beforefinal bidding decisions are made

The procedure for developing a project classificationmodel is shown schematically in Figure 2 First the contentof previous research is examined to enhance the theoret-ical aspects of this study Based on a review of previousapproaches the limitations of existing strategies are alsoidentified

Second risk attributes that affect the soundness of con-struction projects in the early stages are identified in inter-views with relevant experts risk types are categorized basedon this informationThird case-based surveys are conductedto relate risk attributes with project profitability Case projectswere performed from 1999 to 2010 by Korean internationalcontractors Questionnaires assessed risk attributes of a given

project and actual levels of profitability forming the basis ofthe proposed model

Third the correlations between each risk attribute leveland project profitability are analyzed using a probabilisticclassification model To enhance the practical applicability ofthe model the Naıve Bayesian classifier is used as a properprobabilistic tool to support the screening of goodbadprojects in the early stages of a project The advantages anddetailed structure of this approach are also described

Finally cross validation is employed to test its usability Tovalidate the project classification model actual project dataare used to compare results obtained from the model withactual performance data testing the accuracy and reliabilityof the proposed model

2 Research Background

Risk management has been widely investigated by variousresearchers focusing on different phases of construction forthe management scope For example Hastak and Shaked[15] developed a model based on country firm and riskassessment projects Han et al [16] also developed a fullyintegrated risk management system (FIRMS) that consid-ers the enterprise and individual risk management projectlevel Furthermore various methodologies have been usedto develop risk management systems including case basedreasoning [17] artificial neural networks [18] the total riskindex [19 20] and the fuzzy analytic hierarchy process [4 21]

As international construction volumes have grownrapidly in recent decades various studies have assessedproject conditions based on the assumption that a full riskassessment of a given project can be performed prior tothe bidding stage by analyzing the information collectedHowever in reality it is impossible for contractors to considerall of the risks that affect project profitability at this early stageAccordingly del Cano and de la Cruz [22] identified certainproject characteristics including complexity size and ownercapabilities and suggested that risk analysis results shouldbe derived based on project characteristics The authors alsoemphasized that an appropriate risk management method

Mathematical Problems in Engineering 3

Literature review

Profitability risk attributes derivation

Project classification model development

Model validation

(i) International construction projectrsquos characteristics(ii) International construction risk management system

(iii) Project prediction methodology

(i) International construction projectrsquos risk attributes derivation

(ii) Profitability related risk attributes derivation(iii) Risk attributes assessment for impact

Cross validity for model validation

Naiumlve Bayesian classifier model development

Figure 2 Research procedure

based on representing attributes should be employed foreffective project screening modeling Thus a systematicmethodology that predicts project profitability during thisearly stage is imperative

For developing the prediction models the various the-ories and tools can be categorized as statistical or learningmethods Several researchers have used regression modelsthrough which the impact of each risk attribute is assessedon a Likert scale by experts [23 24] However such assess-ment is usually plagued by ambiguousness frommaintainingconsistency with different standards depending on expertrsquoscognitive limitations or the cooperation cultures duringsurveys In addition input data usually include subjectiveassessments that often distort practitionerrsquos decision-makingand judgment [15 25] As such previous studies that haveattempted to classify profit levels by Likert or numeric scalehave not yet reached a statistical level of significance inprofit prediction due to their compound mechanism anda large number of inputs for understanding complicatedproject information [15 26] To develop a more reliableprediction model and improve its applicability for industrypractitioners expert assessments should be consistent and amore simplified risk assessment model is imperative

Data-based learning tools have also been widely appliedto support the reliability of classifications such as Matlabfor Artificial Neural Network (ANN) C50 for decision treeand Linear Discriminant Analysis (LDA) ANN and C50are considered representative data mining techniques fordetecting dependent variables from the loads of input datawith complex calculations [27 28] LDA utilizes the linearfunctions between various classes where the data can becategorized according to the appropriate number of classesand its relationships [29] Unfortunately these data-miningtools do not properly explain the underpinning rule ofprediction results In addition these methods strictly requirea large number of data for improved prediction Hence theNaıve Bayesian classifier is considered a suitable method forthis research that uses a relatively small amount of historicaldata to determine whether a given project is likely to be bador good

To sum up previous studies on profit prediction havebeen limited to simple Likert scale assessments basedon expertsrsquo subjective judgments of contractor experiencepotentially generating skewed results These not only containunrealistic assumptions and a high burden for practical appli-cability but also show limited accuracy when such complexcalculations with a large number of input data are utilized[29] The authors thus present a simple binary classificationmodel based on the updated conditional probability datafrom the actual projectrsquos performance This approach isexpected to reduce bias while improving practical applicabil-ity

3 Risk Attributes of Project Profitability

Project classification model development begins by incorpo-rating overall risk attributes for international projects thatare present prior to early bidding stages Risks identified asimportant in previous studies were selected and analyzedvia meta-analysis and frequency measurements Studies byHastak and Shaked [15] theAmericanConstruction IndustryInstitute [30] the International Construction Associationof Korea [2] and Han et al [16 25] identified risk factorsthat affect construction project profitability In additionto conducting a literature review the authors performedinterviews with eight industry experts as a result 20 riskfactors were added to the analysis Table 1 presents a total of 71risk attributes derived from the aforementioned procedures

A case-based questionnaire with refined risk attributeswas designed to identify relevant information for the pro-posedmodelThe questionnairemeasured the effect of 71 riskfactors on profit levels on a 7-point Likert scale in which minus2refers to a positive impact (in the sense of gain) 0 indicates noimpact 2 refers to a negative impact and 4 refers to a criticalimpact on project profitability Sample projects were chosenamong actual international projects performed by the mostprominent 20 Korean contractors since 2000 Approximately900 projects were available of these 140 projects whichincluded sufficient data for prediction model developmentwere compiled for further analysis The collected data were

4 Mathematical Problems in Engineering

Table 1 Risk attributes affecting project profitability

Category Risk code Risk attribute Cronbachrsquos 120572 value

Political and social risks

R1 High degree of corruption collusion and illegal transaction

0787R2 Lack of familiarity with local cultures customs and

traditionsR3 Frequent policy and postlegislative changesR4 Insufficient legal system

Economic and inflation risksR5 High material unit price volatility

0735R6 High equipment unit price volatilityR7 High local currency exchange rate volatility

Owner risks

R8 Unsatisfactory owner management capabilities

0869

R9 Insufficient design information from the owner

R10 Lacking specifications and bidding documents provided bythe owner

R11 Approval and permission delays on documentation andconstruction

R12 Illegal requirements without contracts

Bidding and contract risks

R13 Insufficient time for bidding and construction quotationpreparation

0834

R14 Inadequate consideration of variations in contract durationR15 High-level local content requirementsR16 Complex tax and tariff regulationsR17 Low-level inflation compensation conditionsR18 Inadequate warranty-related conditions

R19 Inadequate payment regulations relative to internationalcontracts

R20 Ineffective dispute resolution procedures and regulations

Procurement risks

R21 Inadequate local labor supplies and procurement conditions

0905

R22 Inadequate local labor supplies and demand conditions

R23 Inadequate material production supplies and procurementconditions

R24 Inadequate material production supplies and demandconditions

R25 Inadequate material production from third-world-countrysupplies and demand conditions

R26 Inadequate equipment from local country supplies andprocurement conditions

R27 Inadequate local equipment supply procurement anddemand conditions

R28 Inadequate local subcontractor procurement conditions

Geographic risks

R29 Inadequate geographical conditions of materials andequipment delivery

0853

R30 Inadequate logistical conditions (ie electricity water gasand communication)

R31 Inadequate business environment and welfare conditionsR32 Inadequate weather conditions for constructionR33 Inadequate geological site conditions

R34 Severe complaints regarding construction throughout projectimplementation

Mathematical Problems in Engineering 5

Table 1 Continued

Category Risk code Risk attribute Cronbachrsquos 120572 value

Construction performance risks

R35 Lacking design experience

0917

R36 Lacking bid estimation experienceR37 Unsatisfactory cash flow management capabilities

R38 Unsatisfactory site manager leadership and organizationmanagement capabilities

R39 Unsatisfactory site manager experience relative to similarconstruction projects

R40 Unsatisfactory international project planning andmanagement capabilities

R41 Unsatisfactory quality management capabilitiesR42 Unsatisfactory resource management capabilitiesR43 Unsatisfactory conflict management capabilitiesR44 Unsatisfactory contract management capabilitiesR45 Lacking connectivity between sites and headquartersR46 Unsatisfactory localizationR47 Lacking trust between project participantsR48 Poor communication between project participantsR49 Poor project management capabilities based on ITR50 Inadequate construction methods for the specific projectR51 Insufficient subcontractor project performance

first tested for reliability stability consistency predictabil-ity accuracy and dependency [31] Cronbachrsquos coefficientalpha internal consistency method was employed and resultsranging from 08 to 09 were considered reliable The PASWStatistics 21 program was used to analyze the data As Table 1shows missing values in the raw data were excluded fromthe reliability test to increase reliability As a result a total ofseven categories with 51 attributes were selected an averagereliability of 0843 was obtained indicating a high level ofreliability A total of 121 datasets for the 140 projects wereused to develop the Naıve Bayesian classifier indicating thatapproximately 20 of the responses were excluded due todata inconsistency

4 Nave Bayesian Classifier

A Naıve Bayesian classifier based on Bayesrsquo theorem isemployed to derive a probabilistic classification model frompreviously assessed attributes It is most commonly used forantispam mail filtering which trains the filter to automati-cally separate spam mail and legitimate messages in a binarymanner [32 33] In doing so each attribute correspondswith words included in the spam and legitimate mails thusdistinguishing between the two in turn the former areblocked by a trained filter

41 Conceptual Structure of a Proposed Model Because theobjective of this research is to screen bad or good projectsin the early stages of international construction projects theresult of the proposed model should also be binary (good orbad) Thus the authors transformed the original conditions

of each risk attribute (7-point Likert scale) into a binaryscale to develop the Naıve Bayesian model In additionthe early stages of the construction possess a high level ofuncertainties which prohibit analysis of exact ranges of profitrates or detailed numeric classification of project conditionsAccordingly a specific profit level or numeric number ofeach risk attribute is not essential particularly during thisearly phase To this end authors consider the unity of thebinary distribution of the dependent variable as well as thepreassessed risk attributes to improve its practical application

Thebenefits of employing theNaıve Bayesian classifier arethe following [34ndash36] First the model is less sensitive to out-liers Generally when an outlier exists in the data the resultsmay skew The reduction of outlier effects is thus requiredfor data analysis Because the Naıve Bayesian classifier usesa probabilistic distribution it enables minimizing the outliereffects of the original data and prediction results are lesssensitive to outliers Second less data is required to predictthe parameter By using the utility function relearning theprobability of a given conditional probability is not strictlyrequired in the Naıve Bayesian classifier thus minimizingdata collection efforts for risk prediction Third the classifieris more applicable to the industry practitioners by reducingthe burden of data collection Although the model utilizessimple assumptions and algorithms the classification resultsare powerful even though the specified assumptions maycontain errors Finally themodels allow for the considerationof additional attributes by combining existing characteris-tics of the attributes thus increasing the accuracy of themodelrsquos predictions Based on the aforementioned benefitsof the Naıve Bayesian method the authors have chosen this

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Mathematical Problems in Engineering 3

Literature review

Profitability risk attributes derivation

Project classification model development

Model validation

(i) International construction projectrsquos characteristics(ii) International construction risk management system

(iii) Project prediction methodology

(i) International construction projectrsquos risk attributes derivation

(ii) Profitability related risk attributes derivation(iii) Risk attributes assessment for impact

Cross validity for model validation

Naiumlve Bayesian classifier model development

Figure 2 Research procedure

based on representing attributes should be employed foreffective project screening modeling Thus a systematicmethodology that predicts project profitability during thisearly stage is imperative

For developing the prediction models the various the-ories and tools can be categorized as statistical or learningmethods Several researchers have used regression modelsthrough which the impact of each risk attribute is assessedon a Likert scale by experts [23 24] However such assess-ment is usually plagued by ambiguousness frommaintainingconsistency with different standards depending on expertrsquoscognitive limitations or the cooperation cultures duringsurveys In addition input data usually include subjectiveassessments that often distort practitionerrsquos decision-makingand judgment [15 25] As such previous studies that haveattempted to classify profit levels by Likert or numeric scalehave not yet reached a statistical level of significance inprofit prediction due to their compound mechanism anda large number of inputs for understanding complicatedproject information [15 26] To develop a more reliableprediction model and improve its applicability for industrypractitioners expert assessments should be consistent and amore simplified risk assessment model is imperative

Data-based learning tools have also been widely appliedto support the reliability of classifications such as Matlabfor Artificial Neural Network (ANN) C50 for decision treeand Linear Discriminant Analysis (LDA) ANN and C50are considered representative data mining techniques fordetecting dependent variables from the loads of input datawith complex calculations [27 28] LDA utilizes the linearfunctions between various classes where the data can becategorized according to the appropriate number of classesand its relationships [29] Unfortunately these data-miningtools do not properly explain the underpinning rule ofprediction results In addition these methods strictly requirea large number of data for improved prediction Hence theNaıve Bayesian classifier is considered a suitable method forthis research that uses a relatively small amount of historicaldata to determine whether a given project is likely to be bador good

To sum up previous studies on profit prediction havebeen limited to simple Likert scale assessments basedon expertsrsquo subjective judgments of contractor experiencepotentially generating skewed results These not only containunrealistic assumptions and a high burden for practical appli-cability but also show limited accuracy when such complexcalculations with a large number of input data are utilized[29] The authors thus present a simple binary classificationmodel based on the updated conditional probability datafrom the actual projectrsquos performance This approach isexpected to reduce bias while improving practical applicabil-ity

3 Risk Attributes of Project Profitability

Project classification model development begins by incorpo-rating overall risk attributes for international projects thatare present prior to early bidding stages Risks identified asimportant in previous studies were selected and analyzedvia meta-analysis and frequency measurements Studies byHastak and Shaked [15] theAmericanConstruction IndustryInstitute [30] the International Construction Associationof Korea [2] and Han et al [16 25] identified risk factorsthat affect construction project profitability In additionto conducting a literature review the authors performedinterviews with eight industry experts as a result 20 riskfactors were added to the analysis Table 1 presents a total of 71risk attributes derived from the aforementioned procedures

A case-based questionnaire with refined risk attributeswas designed to identify relevant information for the pro-posedmodelThe questionnairemeasured the effect of 71 riskfactors on profit levels on a 7-point Likert scale in which minus2refers to a positive impact (in the sense of gain) 0 indicates noimpact 2 refers to a negative impact and 4 refers to a criticalimpact on project profitability Sample projects were chosenamong actual international projects performed by the mostprominent 20 Korean contractors since 2000 Approximately900 projects were available of these 140 projects whichincluded sufficient data for prediction model developmentwere compiled for further analysis The collected data were

4 Mathematical Problems in Engineering

Table 1 Risk attributes affecting project profitability

Category Risk code Risk attribute Cronbachrsquos 120572 value

Political and social risks

R1 High degree of corruption collusion and illegal transaction

0787R2 Lack of familiarity with local cultures customs and

traditionsR3 Frequent policy and postlegislative changesR4 Insufficient legal system

Economic and inflation risksR5 High material unit price volatility

0735R6 High equipment unit price volatilityR7 High local currency exchange rate volatility

Owner risks

R8 Unsatisfactory owner management capabilities

0869

R9 Insufficient design information from the owner

R10 Lacking specifications and bidding documents provided bythe owner

R11 Approval and permission delays on documentation andconstruction

R12 Illegal requirements without contracts

Bidding and contract risks

R13 Insufficient time for bidding and construction quotationpreparation

0834

R14 Inadequate consideration of variations in contract durationR15 High-level local content requirementsR16 Complex tax and tariff regulationsR17 Low-level inflation compensation conditionsR18 Inadequate warranty-related conditions

R19 Inadequate payment regulations relative to internationalcontracts

R20 Ineffective dispute resolution procedures and regulations

Procurement risks

R21 Inadequate local labor supplies and procurement conditions

0905

R22 Inadequate local labor supplies and demand conditions

R23 Inadequate material production supplies and procurementconditions

R24 Inadequate material production supplies and demandconditions

R25 Inadequate material production from third-world-countrysupplies and demand conditions

R26 Inadequate equipment from local country supplies andprocurement conditions

R27 Inadequate local equipment supply procurement anddemand conditions

R28 Inadequate local subcontractor procurement conditions

Geographic risks

R29 Inadequate geographical conditions of materials andequipment delivery

0853

R30 Inadequate logistical conditions (ie electricity water gasand communication)

R31 Inadequate business environment and welfare conditionsR32 Inadequate weather conditions for constructionR33 Inadequate geological site conditions

R34 Severe complaints regarding construction throughout projectimplementation

Mathematical Problems in Engineering 5

Table 1 Continued

Category Risk code Risk attribute Cronbachrsquos 120572 value

Construction performance risks

R35 Lacking design experience

0917

R36 Lacking bid estimation experienceR37 Unsatisfactory cash flow management capabilities

R38 Unsatisfactory site manager leadership and organizationmanagement capabilities

R39 Unsatisfactory site manager experience relative to similarconstruction projects

R40 Unsatisfactory international project planning andmanagement capabilities

R41 Unsatisfactory quality management capabilitiesR42 Unsatisfactory resource management capabilitiesR43 Unsatisfactory conflict management capabilitiesR44 Unsatisfactory contract management capabilitiesR45 Lacking connectivity between sites and headquartersR46 Unsatisfactory localizationR47 Lacking trust between project participantsR48 Poor communication between project participantsR49 Poor project management capabilities based on ITR50 Inadequate construction methods for the specific projectR51 Insufficient subcontractor project performance

first tested for reliability stability consistency predictabil-ity accuracy and dependency [31] Cronbachrsquos coefficientalpha internal consistency method was employed and resultsranging from 08 to 09 were considered reliable The PASWStatistics 21 program was used to analyze the data As Table 1shows missing values in the raw data were excluded fromthe reliability test to increase reliability As a result a total ofseven categories with 51 attributes were selected an averagereliability of 0843 was obtained indicating a high level ofreliability A total of 121 datasets for the 140 projects wereused to develop the Naıve Bayesian classifier indicating thatapproximately 20 of the responses were excluded due todata inconsistency

4 Nave Bayesian Classifier

A Naıve Bayesian classifier based on Bayesrsquo theorem isemployed to derive a probabilistic classification model frompreviously assessed attributes It is most commonly used forantispam mail filtering which trains the filter to automati-cally separate spam mail and legitimate messages in a binarymanner [32 33] In doing so each attribute correspondswith words included in the spam and legitimate mails thusdistinguishing between the two in turn the former areblocked by a trained filter

41 Conceptual Structure of a Proposed Model Because theobjective of this research is to screen bad or good projectsin the early stages of international construction projects theresult of the proposed model should also be binary (good orbad) Thus the authors transformed the original conditions

of each risk attribute (7-point Likert scale) into a binaryscale to develop the Naıve Bayesian model In additionthe early stages of the construction possess a high level ofuncertainties which prohibit analysis of exact ranges of profitrates or detailed numeric classification of project conditionsAccordingly a specific profit level or numeric number ofeach risk attribute is not essential particularly during thisearly phase To this end authors consider the unity of thebinary distribution of the dependent variable as well as thepreassessed risk attributes to improve its practical application

Thebenefits of employing theNaıve Bayesian classifier arethe following [34ndash36] First the model is less sensitive to out-liers Generally when an outlier exists in the data the resultsmay skew The reduction of outlier effects is thus requiredfor data analysis Because the Naıve Bayesian classifier usesa probabilistic distribution it enables minimizing the outliereffects of the original data and prediction results are lesssensitive to outliers Second less data is required to predictthe parameter By using the utility function relearning theprobability of a given conditional probability is not strictlyrequired in the Naıve Bayesian classifier thus minimizingdata collection efforts for risk prediction Third the classifieris more applicable to the industry practitioners by reducingthe burden of data collection Although the model utilizessimple assumptions and algorithms the classification resultsare powerful even though the specified assumptions maycontain errors Finally themodels allow for the considerationof additional attributes by combining existing characteris-tics of the attributes thus increasing the accuracy of themodelrsquos predictions Based on the aforementioned benefitsof the Naıve Bayesian method the authors have chosen this

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

4 Mathematical Problems in Engineering

Table 1 Risk attributes affecting project profitability

Category Risk code Risk attribute Cronbachrsquos 120572 value

Political and social risks

R1 High degree of corruption collusion and illegal transaction

0787R2 Lack of familiarity with local cultures customs and

traditionsR3 Frequent policy and postlegislative changesR4 Insufficient legal system

Economic and inflation risksR5 High material unit price volatility

0735R6 High equipment unit price volatilityR7 High local currency exchange rate volatility

Owner risks

R8 Unsatisfactory owner management capabilities

0869

R9 Insufficient design information from the owner

R10 Lacking specifications and bidding documents provided bythe owner

R11 Approval and permission delays on documentation andconstruction

R12 Illegal requirements without contracts

Bidding and contract risks

R13 Insufficient time for bidding and construction quotationpreparation

0834

R14 Inadequate consideration of variations in contract durationR15 High-level local content requirementsR16 Complex tax and tariff regulationsR17 Low-level inflation compensation conditionsR18 Inadequate warranty-related conditions

R19 Inadequate payment regulations relative to internationalcontracts

R20 Ineffective dispute resolution procedures and regulations

Procurement risks

R21 Inadequate local labor supplies and procurement conditions

0905

R22 Inadequate local labor supplies and demand conditions

R23 Inadequate material production supplies and procurementconditions

R24 Inadequate material production supplies and demandconditions

R25 Inadequate material production from third-world-countrysupplies and demand conditions

R26 Inadequate equipment from local country supplies andprocurement conditions

R27 Inadequate local equipment supply procurement anddemand conditions

R28 Inadequate local subcontractor procurement conditions

Geographic risks

R29 Inadequate geographical conditions of materials andequipment delivery

0853

R30 Inadequate logistical conditions (ie electricity water gasand communication)

R31 Inadequate business environment and welfare conditionsR32 Inadequate weather conditions for constructionR33 Inadequate geological site conditions

R34 Severe complaints regarding construction throughout projectimplementation

Mathematical Problems in Engineering 5

Table 1 Continued

Category Risk code Risk attribute Cronbachrsquos 120572 value

Construction performance risks

R35 Lacking design experience

0917

R36 Lacking bid estimation experienceR37 Unsatisfactory cash flow management capabilities

R38 Unsatisfactory site manager leadership and organizationmanagement capabilities

R39 Unsatisfactory site manager experience relative to similarconstruction projects

R40 Unsatisfactory international project planning andmanagement capabilities

R41 Unsatisfactory quality management capabilitiesR42 Unsatisfactory resource management capabilitiesR43 Unsatisfactory conflict management capabilitiesR44 Unsatisfactory contract management capabilitiesR45 Lacking connectivity between sites and headquartersR46 Unsatisfactory localizationR47 Lacking trust between project participantsR48 Poor communication between project participantsR49 Poor project management capabilities based on ITR50 Inadequate construction methods for the specific projectR51 Insufficient subcontractor project performance

first tested for reliability stability consistency predictabil-ity accuracy and dependency [31] Cronbachrsquos coefficientalpha internal consistency method was employed and resultsranging from 08 to 09 were considered reliable The PASWStatistics 21 program was used to analyze the data As Table 1shows missing values in the raw data were excluded fromthe reliability test to increase reliability As a result a total ofseven categories with 51 attributes were selected an averagereliability of 0843 was obtained indicating a high level ofreliability A total of 121 datasets for the 140 projects wereused to develop the Naıve Bayesian classifier indicating thatapproximately 20 of the responses were excluded due todata inconsistency

4 Nave Bayesian Classifier

A Naıve Bayesian classifier based on Bayesrsquo theorem isemployed to derive a probabilistic classification model frompreviously assessed attributes It is most commonly used forantispam mail filtering which trains the filter to automati-cally separate spam mail and legitimate messages in a binarymanner [32 33] In doing so each attribute correspondswith words included in the spam and legitimate mails thusdistinguishing between the two in turn the former areblocked by a trained filter

41 Conceptual Structure of a Proposed Model Because theobjective of this research is to screen bad or good projectsin the early stages of international construction projects theresult of the proposed model should also be binary (good orbad) Thus the authors transformed the original conditions

of each risk attribute (7-point Likert scale) into a binaryscale to develop the Naıve Bayesian model In additionthe early stages of the construction possess a high level ofuncertainties which prohibit analysis of exact ranges of profitrates or detailed numeric classification of project conditionsAccordingly a specific profit level or numeric number ofeach risk attribute is not essential particularly during thisearly phase To this end authors consider the unity of thebinary distribution of the dependent variable as well as thepreassessed risk attributes to improve its practical application

Thebenefits of employing theNaıve Bayesian classifier arethe following [34ndash36] First the model is less sensitive to out-liers Generally when an outlier exists in the data the resultsmay skew The reduction of outlier effects is thus requiredfor data analysis Because the Naıve Bayesian classifier usesa probabilistic distribution it enables minimizing the outliereffects of the original data and prediction results are lesssensitive to outliers Second less data is required to predictthe parameter By using the utility function relearning theprobability of a given conditional probability is not strictlyrequired in the Naıve Bayesian classifier thus minimizingdata collection efforts for risk prediction Third the classifieris more applicable to the industry practitioners by reducingthe burden of data collection Although the model utilizessimple assumptions and algorithms the classification resultsare powerful even though the specified assumptions maycontain errors Finally themodels allow for the considerationof additional attributes by combining existing characteris-tics of the attributes thus increasing the accuracy of themodelrsquos predictions Based on the aforementioned benefitsof the Naıve Bayesian method the authors have chosen this

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Mathematical Problems in Engineering 5

Table 1 Continued

Category Risk code Risk attribute Cronbachrsquos 120572 value

Construction performance risks

R35 Lacking design experience

0917

R36 Lacking bid estimation experienceR37 Unsatisfactory cash flow management capabilities

R38 Unsatisfactory site manager leadership and organizationmanagement capabilities

R39 Unsatisfactory site manager experience relative to similarconstruction projects

R40 Unsatisfactory international project planning andmanagement capabilities

R41 Unsatisfactory quality management capabilitiesR42 Unsatisfactory resource management capabilitiesR43 Unsatisfactory conflict management capabilitiesR44 Unsatisfactory contract management capabilitiesR45 Lacking connectivity between sites and headquartersR46 Unsatisfactory localizationR47 Lacking trust between project participantsR48 Poor communication between project participantsR49 Poor project management capabilities based on ITR50 Inadequate construction methods for the specific projectR51 Insufficient subcontractor project performance

first tested for reliability stability consistency predictabil-ity accuracy and dependency [31] Cronbachrsquos coefficientalpha internal consistency method was employed and resultsranging from 08 to 09 were considered reliable The PASWStatistics 21 program was used to analyze the data As Table 1shows missing values in the raw data were excluded fromthe reliability test to increase reliability As a result a total ofseven categories with 51 attributes were selected an averagereliability of 0843 was obtained indicating a high level ofreliability A total of 121 datasets for the 140 projects wereused to develop the Naıve Bayesian classifier indicating thatapproximately 20 of the responses were excluded due todata inconsistency

4 Nave Bayesian Classifier

A Naıve Bayesian classifier based on Bayesrsquo theorem isemployed to derive a probabilistic classification model frompreviously assessed attributes It is most commonly used forantispam mail filtering which trains the filter to automati-cally separate spam mail and legitimate messages in a binarymanner [32 33] In doing so each attribute correspondswith words included in the spam and legitimate mails thusdistinguishing between the two in turn the former areblocked by a trained filter

41 Conceptual Structure of a Proposed Model Because theobjective of this research is to screen bad or good projectsin the early stages of international construction projects theresult of the proposed model should also be binary (good orbad) Thus the authors transformed the original conditions

of each risk attribute (7-point Likert scale) into a binaryscale to develop the Naıve Bayesian model In additionthe early stages of the construction possess a high level ofuncertainties which prohibit analysis of exact ranges of profitrates or detailed numeric classification of project conditionsAccordingly a specific profit level or numeric number ofeach risk attribute is not essential particularly during thisearly phase To this end authors consider the unity of thebinary distribution of the dependent variable as well as thepreassessed risk attributes to improve its practical application

Thebenefits of employing theNaıve Bayesian classifier arethe following [34ndash36] First the model is less sensitive to out-liers Generally when an outlier exists in the data the resultsmay skew The reduction of outlier effects is thus requiredfor data analysis Because the Naıve Bayesian classifier usesa probabilistic distribution it enables minimizing the outliereffects of the original data and prediction results are lesssensitive to outliers Second less data is required to predictthe parameter By using the utility function relearning theprobability of a given conditional probability is not strictlyrequired in the Naıve Bayesian classifier thus minimizingdata collection efforts for risk prediction Third the classifieris more applicable to the industry practitioners by reducingthe burden of data collection Although the model utilizessimple assumptions and algorithms the classification resultsare powerful even though the specified assumptions maycontain errors Finally themodels allow for the considerationof additional attributes by combining existing characteris-tics of the attributes thus increasing the accuracy of themodelrsquos predictions Based on the aforementioned benefitsof the Naıve Bayesian method the authors have chosen this

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

6 Mathematical Problems in Engineering

RisknYes

No

Risk2Yes

No

Good

Bad

Risk1Yes

No

Real data profitability

P(r1 = yes | good)

Figure 3 Conceptual model of the Naıve Bayesian classifier

approach as themainmechanism for the proposed predictionmodel

To structure the Naıve Bayesian classifier we consideredthe unity of the binary distribution of the dependent vari-ables As mentioned earlier the effect of each risk attributeon the level of profit is assessed by using a 7-point Likertscale (minus2 to 4) to improve themodelrsquos practical application Inthis study a binary approach is used to identify the presenceof risk and final project profitability is predicted via theNaıve Bayesian classification Because this research is focusedon the early stages of the construction project screeningand recognizing a bad project are the priority rather thananalyzing the specific value of the project Therefore theassessment range from minus2 to 1 correlated with absence of riskor a ldquoNordquo answer and a score from 2 to 4 was correlated withthe presence of risk or a ldquoYesrdquo answer as shown in Figure 3

As the assessment approach is fairly simple users per-form fewer risk assessment tasks increasing the accuracy ofpredictions The benefit of a Naıve Bayesian classifier is thefact that all risk probabilities are assumed to be independentaccording to Murphy [35] although these assumptions maynot be correct developing a model using the Naıve Bayesianclassifier is feasible and generates highly reliable results forclassification compared to other methodologies

Equation (1) presents the classification group 119888 set (iegood or bad project) where 119903

1 119903

2 119903

3 119903

119863are vectors

ascribed to each independent risk variable in the NaıveBayesian classifier This categorization of attributes maxi-mizes the probability of each classification result Consider

119875 = (119888 | 119903

1 1199032 119903119863) (1)

Equation (2) shows the prediction mode that applies (1) toBayesrsquo theorem Consider

119875 (119888) times 119875 (1199031 1199032 119903119863 | 119888)

119875 (1199031 1199032 119903119863) (2)

As discussed above the Naıve Bayesian classifier assumesthat each attribute divided into each classification is indepen-dent thus (3) can be derived accordingly where V is the targetvalue of the group 119888 set (ie the profit rate) Consider

119875 (1199032 | 119888 1199031) = 119875 (1199032 | 119888)

119875 (119903 | V= 119888) =119863

prod

119894=1119875 (119903

119894| V119895= 119888)

(3)

Hence the Naıve Bayesian classifier is considered a suit-able method that uses a relatively small amount of historicaldata to determine whether projects are likely to be bador good The next section describes the proposed modelwhich uses theNaıve Bayesian classifier to examine the actualproject data collected previously

42 Naıve Bayesian Project Classification Model In devel-oping the Naıve Bayesian model the aforementioned riskattributes were used as relevant independent variables forpredicting good or bad projects An overview of the modelis presented in Figure 4

The dependent variable was predicted over two or threeclassification intervals and the two models are presentedaccordingly The model with two intervals is based on aprofitability of less than 0 for a bad project and morethan 0 for a good project The model with three intervalsrepresents projects with less than 0 profitabilitymdashthosegenerating a 0 to 45 rate of returnmdashand a profit rateabove 45 which is considered a good project A marginalprofit rate of 45 was chosen because the average profitrate for good projects out of the total sample of projectswas calculated at this range Because projects with negativeprofitability are all considered ldquobadrdquo ldquomoderaterdquo and ldquogoodrdquoprojects are differentiated based on the reference profit valueof 45 Table 2 represents intervals divided based on theresults

Using the training dataset each risk attribute is catego-rized into a specific class to develop the project-screeningmodelTherefore the conditional probability of each attribute

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Mathematical Problems in Engineering 7

[Input]Profitability assessment

(i) Define the attributes for risk management (51 risk attributes)

(ii) Assess the impact on profitability (Likert scale)

[Function]Classification and analysis

(i) Determine the number of classification intervals (Naiumlve Bayesian classifier )

(ii) Selection of analysis methodrisk assessment

[Output]

Profitability assessment result

Figure 4 Overview of project classification model

Table 2 Classification of profitability

Bad project Moderate project Good project2 classificationintervals Below 0 mdash Above 0

3 classificationintervals Below 0 0ndash45 Above 45

given to the bad project in advance is shown in the followingequation

119875 (119903

119894| V119895) =

119899

119888

119899

(4)

119875 is estimated conditional probability 119903119894is risk attributes 119894

V119895is the condition of profit rate on Bad or Good based on the

training data (V=below0) 119899119888is number of bad projects under

the specific risk 119886119894in the training data (V = V

119895and 119903 = 119903

119894) and

119899 is number of bad projects in the training data (V = V119895)

However whereas this study examines 140 units of datathe number of training data is insufficient relative to thenumber of risk attributes Unbalanced datasets also lead to aninappropriate result in the statistical analysis [37] It is thusnecessary to augment the small datasets in the training setwhich decreases the accuracy of themodel To overcome thisthe authors borrowed the concept of the 119898-estimate from[38ndash40] By setting the value of a suitable ldquo119898rdquo number theshifting of probability toward parameter ldquo119901rdquo is controlledthis also further increases model accuracy as shown in thefollowing equation

119875

1015840(119903

119894| V119895) =

119899

119888+ 119898119901

119899 + 119898

(5)

119875

1015840 is estimated conditional probability using 119898-estimate 119898is equivalent data size and 119901 is prior probability based ontraining data (4)

Zadrozny and Elkan [38] recommend that the value of119898119901 be set to less than 10 thus using a probabilistic equationto cross-validate 140 data units 70 units are set as trainingdata and the remaining 70 units of119898 parameter are set to anequivalent data size to develop the project screening modelIn turn parameter 119901 becomes 1070 or 17 for (5) Of the 140data units collected 20 units were therefore classified as badprojects and 120 were deemed good projects Therefore thetwo datasetsmdashtraining and equivalent data sizemdashconsist of 10bad projects and 60 good projects that are chosen via randomselection

Five different datasets are created randomly to developand validate the model in view of cross validation Randomdata are selected using Microsoft Excel which generates datain a range from 0 to 1 for random probability In addition forthe training dataset the actual risk exposure level for eachattribute in a given project is used to develop the model Inthis paper only one set of training data results out of fivedifferent datasets is presented in detail as the other resultswere found to be almost equivalent

The Naıve Bayesian classifier consists of an 119898-estimatorparameter which gives a random value of 119898 to controlthe classification bias This study uses 119898 value from theldquomoderaterdquo interval of project profitability which had thehighest error percentage for further validation According tothe results of the first analysis119898 value in the Naıve Bayesianclassifier was reduced in the second analysis by modifying119898 value the moderate profitability project produces a morestable result In addition the weights given to the bad projectsincreased which results in both increased stability of theclassification and reduction of bad projectrsquos classificationerror Therefore the analysis of the moderate classificationresults was reanalyzed To identify a suitable 119898 value asensitivity analysis was also performed for the changes as 119898value decreases

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

8 Mathematical Problems in Engineering

Table 3 Part of the conditional probability of Naıve Bayesianclassifier model

Category Risk code 1198751015840(119903119894| BAD) 1198751015840(119903

119894| GOOD)

Political and socialrisks

R1 35 234R3 15 56

Economic andinflation risks

R5 15 89R7 20 242

Bidding and contractrisk

R15 10 274R19 15 153

Procurement risks R22 30 362R24 5 185

Geographic risks R29 40 226

By using the derived model using 119898-estimate each riskfactor is calculated for the conditional probability for Badand Good The two classification intervals model is usedto analyze each risk factor Table 3 shows the conditionalprobability of the key risk factors from each category thathas a severely negative impact (assessed as 3 or 4 points)on the total sample data (140 projects) For example theR24 risk had an 185 probability of occurring in a ldquoGoodrdquoproject and a 5 probability of occurring in a ldquoBadrdquo projectTo summarize Table 3 external (politic or economic) factorsmore frequently appear in the ldquobadrdquo projects while internal(bidding or procurement) factors more frequently occurr inthe ldquogoodrdquo projects

5 Model Analysis and Validation

51 Model Validation In employing the classifier model twodatasets (training and sample test datasets) for 70 projectswere used to compare actual and predicted risk impacts onthe dependent variable Actual risk effects in a given projectwere used to determine the profitability of the 70 projects Inaddition to evaluating the performance level of themodel themodelrsquos usability and accuracy were tested using a confusionmatrix of Bayesrsquo theorem To determine the accuracy ofthe model the dependent variable profitability included asample test dataset of 70 projects for comparison with theactual project performance and predicted performance ofa given project The 2 and 3 classification intervals modelswere also tested by classification into bad (PB PredictedBad) moderate (PM Predicted Moderate) and good (PGPredicted Good) categories the training dataset for the 70projects also employs the actual profit level which is classifiedas bad (AB Actual Bad) normal (AM Actual Moderate) orgood (AG Actual Good) Tables 4 and 5 present the results ofthe two models

The aforementioned fivemodels were randomly tested forcross validation In the test models a dataset of 70 projectswas randomly selected to determine the modelrsquos accuracy Asan initial measure each project was deemed bad or goodAs shown in Table 4 the overall correct level is found to be797 by dividing the bold numbers with the total number of

Table 4 Validation of profitability at 2 classification intervals

Model 1 out of fivecross validations

Classification levelBad (PB) Good (PG) Total

Actual LevelBad (AB) 5 5 10Good (AG) 9 51 60Total 14 56 70

Table 5 Validation of profitability at 3 classification intervals

Model 1 out of fivecross validations

Classification levelTotalBad

(PB)Moderate(PM)

Good(PG)

Actual LevelBad (AB) 3 5 2 10Moderate (AM) 3 16 13 32Good (AG) 3 13 12 28Total 9 34 27 70

samples Other cross validation models also showed highlyaccurate levels of 786 771 827 and 80 respectively

In contrast the model with three classified intervalspresented relatively low levels of 443 429 514 486and 500 with a mean percentage of 448 reflectingless accurate results (refer to Table 5) Models with 2 and3 classification intervals showed standard deviations of 207and 307 respectively this suggests the presence of a fewdifferences and consistent results between the two modelsThe 2 classification intervals model is considered an excellentvehicle for screening purposes as it presents highly accurateresults and less deviation Although the 3 classificationintervals model shows a relatively low level of accuracy ina real business environment the users also may apply the 3classification intervals model for screening out a moderaterange of projects by sacrificing the prediction accuracy

Additional analysis to validate the Naıve Bayesian classi-fication model is performed through comparing with linearregression model using SPSS 21 The authors used the samebinary method of dividing the profit rate into good andbad (2 classification intervals) By utilizing the same 140projectsrsquo units of data that is composed of 70 training unitsof data and 70 test units of data the accuracy result of linearregression model is found to be 522 This result shows asignificantly lower accuracy than the 2 classification intervalsNaıve Bayesian model (797) and slightly higher accuracythan the 3 classification intervals model (448)

In summary this study proposed two types of modelsdepending on the userrsquos need and degree of risk exposureThemodelswere validated through comparing the accuracy of theprediction result with other learning methods In additionthe user can flexibly select the proper model that best suitedto a given firm whether he or she requires an accurateclassification between good and bad project or is contend toidentify a moderate or good project

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Mathematical Problems in Engineering 9

Profitability assessment

Method

Category Risk code Risk attributes Risk

Countryrsquos politicaland social risk

R1 Countryrsquos corruption collusion and illegal transaction O

R2 Culture customs and tradition difference X

R3 Policy and postlegislative changes X

R4 Countryrsquos overall legal systemrsquos maturity X

Countryrsquos economic and inflation risks

R5 Average material unit price change X

R6 Average equipment unit price change X

R7 Local currency exchange rate change X

Stakeholder risks

R8 Stakeholderrsquos understanding and management ability O

R9 Design error provided to stakeholder O

R10 Clarity of specifications and bidding documents provided to stakeholder X

R11 Stakeholderrsquos administrative and construction consent delay X

R12 Requirements regardless of contract from stakeholder X

Bidding and contract terms

R13 Bidding and quotation preparation time for construction X

R14 Contract of construction duration for construction X

R15 Required degree of local contents X

R16 Tax and tariff regulations X

R17 Inflation compensation conditions compared to international contract X

R18 Warranty-related conditions compared to international contract X

R19 Payment regulations compared to international contract X

R20 Dispute resolution procedures regulation compared to international contract X

1 Naiumlve Bayesian classifier

Figure 5 Excel-based risk assessments

52 Excel-Based Model Development Based on the afore-mentioned basic algorithm the authors developed an Excel-based system that serves as a continuous risk managementplatform for project screening for the international con-struction sector To initiate project screening risk attributesrelated to a given project are assessed for their impacton project profit depending on the amount of informationacquired As shown in Figure 5 system inputs for riskassessment are based on two scenarios the specified riskeither affects the project or does not In addition usersmay select their model depending on their objectives themore accurate Naıve Bayesian classifier with a higher projectclassification rate (2 classification intervals model) or the lessaccurate model with amore conservative classification rate (3classification intervals model)

The authors present a sample case in which the developedprogram is utilized to test the project screening modelthrough the user interface This case considers a Koreancontractor working on a harbor expansion project that lasts22 months Figure 5 shows step 1 which is the actual user

interface of the programgeneratedwith case study data inputsfor goodbad project screening As shown in Figure 5 theuser first evaluates the risk attribute using ldquoOrdquo for ldquoriskexistsrdquo indicating the presence of a given risk and ldquoXrdquo forldquorisk does not existrdquo indicating an absence of project riskTheuser then selects a model that is the number of classificationintervals Depending on the model selected the classificationmodel categorizes projects as bad moderate or good Themodel classifies the good projects as ranging from 0 to 45of the profit rate Project screening is conducted by selectingprofit interval ranges depending on firm risk behaviorModelresults are obtained from the training sample via randomselection through which classification results are selectedfrom the highest predicted value from 100 prediction cyclesThis process increases reliability and the results are shown inpercentage form for ease of comprehension

Figure 6 shows the procedural steps in the process of theExcel-based model Risk assessment is performed primarilyin step 1 Then in step 2 the model counts for the riskexistence based on the historical dataset of 140 real projects

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

10 Mathematical Problems in Engineering

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

Bad

10 10 10 10 10 10 10 10 10 10

1 1 0 0 2 3 1 0 0 0

0142857143 0142857143 0142857 0142857 0142857 0142857 0142857 0142857 0142857 0142857

001 001 001 001 001 001 001 001 001 001

01000 01000 00001 00001 01999 02998 01000 00001 00001 00001

Select

Result 1 1 1 1 1 1 1 1 1 1

0142857143

Moderate

31 31 31 31 31 31 31 31 31 31

0 0 0 0 0 0 0 0 0 0

0442857143 0442857143 0442857 0442857 0442857 0442857 0442857 0442857 0442857 0442857

001 001 001 001 001 001 001 001 001 001

0000142811 0000142811 0000143 0000143 0000143 0000143 0000143 0000143 0000143 0000143

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0442857143

Good

29 29 29 29 29 29 29 29 29 29

17 10 15 12 29 12 10 19 14 9

0414285714 0414285714 0414286 0414286 0414286 0414286 0414286 0414286 0414286 0414286

001 001 001 001 001 001 001 001 001 001

0586147634 0344851529 0517206 0413793 0999798 0413793 0344852 0655089 0482735 0310381

Select 0 0 0 0 0 0 0 0 0 0

Result 1 1 1 1 1 1 1 1 1 1

0414285714

Result table Bad Good Moderate Final result Good

Max

Profitability assessment result

Project screening result Good project

Top 5 risks to be managed Probability

1 Average material unit price change 0999

2 Countryrsquos local subcontractorrsquos procurement conditions 0704

3 Complaints from construction during the project 0655

4 Stakeholderrsquos administrative and construction consent delay 0586

5 Clarity of specifications and bidding documents provided to stakeholders 0555

4

5

3

2

n

nc

p

m

p (good)

17238E minus 35

69489E minus 07

719685E minus 4769489E minus 07

n

nc

p

m

p (normal)

n

nc

p

m

p (bad)

p (ai | bad)

p (ai | normal)

p (ai | good)

Figure 6 Process for assessment result

Through the sum of its risk existence the conditional proba-bility is calculated given each classification of a project (GoodModerate or Bad) separately

Step 3 shows the result table for each classification thathas been selected from the profitability assessmentThis is thestep in which users are to perform the risk assessment fromFigure 6 where selected risks are calculated for conditionalprobability The result is calculated from the product ofeach probability from the 51 risks attributes Step 4 thenshows the final result by selecting the highest probabilityfrom each classification In the illustrative example ldquoGoodrdquohad the highest probability In addition as shown in step 5in the user interface the risks that most directly affectproject profitability can be prioritized which should be

continuouslymonitored and controlled by the userThese fiverisk attributes serve as the basis of a risk response strategy

The proposed model prioritizes the top 5 risks that affectoverall construction profitability According to Halawa et al[41] construction projects are dramatically influenced bythis financial evaluation and awareness of the financial statusimproves it offers a better chance of determining project fea-sibility Therefore this study incorporates the Naıve Bayesianclassifier with an Excel-based model to give practitioners aquantitative analysis result (statistical references) and quali-tative analysis results (the top risk attributes to be managed)It should also be noted that although the result of the modelshows an absolute value of bad moderate or good it is upto the managers and decision-makers to make a bad project

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Mathematical Problems in Engineering 11

a moderate project or even good project by careful andproactive risk management as presented in the illustrativecase application

6 Conclusions

This study predicted the profitability of early-stage inter-national construction projects and identified good projectsthat may affect firmrsquos financial stability The current stateof the international construction market was first analyzedlosses were shown to account for 73 of the entire marketreflecting the importance of predicting project profitability inadvance Risk attributes that affect project profitability wereidentified and a risk management system was developed tomanage prioritize and monitor higher impact risks TheNaıve Bayesian classifier method was used to improve thepractical application of the project-screening model Whenrisk attribute assessments for determining risk exposurelevels are completed assessed risk exposure is classified intoa binary parameter that presents an approach to riskmanage-ment that is supported by a probabilistic methodology

This study used case study data to create a model usingMicrosoft Excel Depending on the Naıve Bayesian classifier119898-estimator value of the weight application and the userrsquosrisk behaviors users can choose the model that best suitsa given firm The results show that when a 70-unit trainingdataset and 70-unit sample dataset area are used (a 140-unitset in total) to cross-validate the Naıve Bayesian classifier themodel has on average a 797 prediction accuracy for the 2classification intervals model and a 448 accuracy for the 3classification intervals model Additional analysis to validatethe model is performed through linear regression analysisto show the higher precision result from the Naıve Bayesianclassification model

For project screening conditional probability is usedwithinformation acquired during early project stages Althoughrisk attributes were largely derived in the model riskattributes that more clearly affect projects can be furtherextracted elucidating the effects of bad projects

As for the limitations of this study analytical results forthe application of the developed program to a more diversescenario will be essential for example different types ofconstruction projects may be examined for superior resultsIn addition if contract types were categorized as contractorsubcontractor and joint ventures this may have altered theprofitability of each project A new-entrymarket scenariowillalso be analyzed separately due to the unique characteristicsof newmarkets Given a range of unique construction projectqualities a more realistic and practical application can beobtained in future research Furthermore if additional dataon both project risk assessment and the actual reliable profitrates are accumulated a diverse segmentation of the intervalscan be obtained to examine each project in a precisemeasure

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This research was supported by a grant (14IFIP-B089065-01)funded by Ministry of Land Infrastructure and Transport ofKorean government

References

[1] Global Insight Global Construction Outlook Global InsightLexington Mass USA 2014

[2] International Contractors Association of Korea (ICAK) Inter-national Construction Information Service Construction Statis-tics 2013 httpwwwicakorkrstasta 0101php

[3] Samsung Engineering Business Results ReportThird Quarter of2013 Samsung Engineering Seoul Republic of Korea 2013

[4] M Abdelgawad and A R Fayek ldquoRisk management in theconstruction industry using combined fuzzy FMEA and fuzzyAHPrdquo Journal of Construction Engineering and Managementvol 136 no 9 pp 1028ndash1036 2010

[5] G Aydogan and A Koksal ldquoAn analysis of internationalconstruction risk factors on partner selection by applying ANPapproachrdquo in Proceedings of the International Conference onConstruction and Real Estate Management (ICCREM rsquo13) pp658ndash669 Karlsruhe Germany October 2013

[6] B-G Hwang X Zhao and L P Toh ldquoRisk management insmall construction projects in Singapore status barriers andimpactrdquo International Journal of Project Management vol 32no 1 pp 116ndash124 2014

[7] A E Yildiz I Dikmen M T Birgonul K Ercoskun andS Alten ldquoA knowledge-based risk mapping tool for costestimation of international construction projectsrdquo Automationin Construction vol 43 pp 144ndash155 2014

[8] O Taylan AO Bafail RM S Abdulaal andMRKabli ldquoCon-struction projects selection and risk assessment by fuzzy AHPand fuzzy TOPSISmethodologiesrdquoApplied Soft Computing vol17 pp 105ndash116 2014

[9] S Mu H Cheng M Chohr and W Peng ldquoAssessing riskmanagement capability of contractors in subway projects inmainland Chinardquo International Journal of Project Managementvol 32 no 3 pp 452ndash460 2014

[10] F Acebes J Pajares J M Galan and A Lopez-Paredes ldquoA newapproach for project control under uncertainty Going back tothe basicsrdquo International Journal of ProjectManagement vol 32no 3 pp 423ndash434 2014

[11] Y Zhang and Z-P Fan ldquoAn optimization method for selectingproject risk response strategiesrdquo International Journal of ProjectManagement vol 32 no 3 pp 412ndash422 2014

[12] M Rohaninejad andM Bagherpour ldquoApplication of risk analy-sis within value management a case study in dam engineeringrdquoJournal of Civil Engineering and Management vol 19 no 3 pp364ndash374 2013

[13] Q Yang and Y Zhang ldquoApplication research of AHP in projectrisk managementrdquo Advanced Materials Research vol 785-786pp 1480ndash1483 2013

[14] O Kaplinski ldquoThe utility theory in maintenance and repairstrategyrdquo Procedia Engineering vol 54 pp 604ndash614 2013

[15] M Hastak and A Shaked ldquoICRAM-1 model for internationalconstruction risk assessmentrdquo Journal of Management in Engi-neering vol 16 no 1 pp 59ndash69 2000

[16] S H Han W Jung D Y Kim and H Park ldquoA study for theimprovement of international construction risk managementrdquo

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

12 Mathematical Problems in Engineering

Conference of Korean Society of Civil Engineering (KSCE) no 10pp 791ndash794 2009

[17] V Kumar and N Viswanadham ldquoA CBR-based decision sup-port system framework for construction supply chain riskmanagementrdquo in Proceedings of the 3rd IEEE InternationalConference on Automation Science and Engineering (CASE rsquo07)pp 980ndash985 IEEE Scottsdale Ariz USA September 2007

[18] B Yang L X Li H Ji and J Xu ldquoAn early warning systemfor loan risk assessment using artificial neural networksrdquoKnowledge-Based Systems vol 14 no 5-6 pp 303ndash306 2001

[19] P Sanchez ldquoNeural-risk assessment system for constructionprojectsrdquo in Proceedings of the Construction Research Congresspp 1ndash11 San Diego Calif USA April 2005

[20] B R Fortunato IIIM RHallowellM Behm andKDewlaneyldquoIdentification of safety risks for high-performance sustainableconstruction projectsrdquo Journal of Construction Engineering andManagement vol 138 no 4 pp 499ndash508 2012

[21] J Li and P X W Zou ldquoFuzzy AHP-based risk assessmentmethodology for PPP projectsrdquo Journal of Construction Engi-neering and Management vol 137 no 12 pp 1205ndash1209 2011

[22] A del Cano and M P de la Cruz ldquoIntegrated methodology forproject risk managementrdquo Journal of Construction Engineeringand Management vol 128 no 6 pp 473ndash485 2002

[23] A P C Chan D C K Ho and C M Tam ldquoDesign andbuild project success factors multivariate analysisrdquo Journal ofConstruction Engineering and Management vol 127 no 2 pp93ndash100 2001

[24] D J Lowe and J Parvar ldquoA logistic regression approachto modelling the contractorrsquos decision to bidrdquo ConstructionManagement and Economics vol 22 no 6 pp 643ndash653 2004

[25] S H Han D Y Kim and H Kim ldquoPredicting profit per-formance for selecting candidate international constructionprojectsrdquo Journal of Construction Engineering andManagementvol 133 no 6 pp 425ndash436 2007

[26] A S Bu-Qammaz I Dikmen andM T Birgonul ldquoRisk assess-ment of international construction projects using the analyticnetwork processrdquoCanadian Journal of Civil Engineering vol 36no 7 pp 1170ndash1181 2009

[27] T Hegazy and A Ayed ldquoNeural network model for parametriccost estimation of highway projectsrdquo Journal of ConstructionEngineering and Management vol 124 no 3 pp 210ndash218 1998

[28] RuleQuest Data mining tools 2008 httpwwwrule-questcom

[29] W Jung Three-Staged Risk Evaluation Model for Bidding onInternational Construction Projects Graduate School YonseiUniversity School of Civil and Environmental EngineeringSeoul Republic of Korea 2011

[30] Construction Industry Institute (CII) ldquoRisk assessment forinternational projectsrdquo Research Report 181-11 ConstructionIndustry Institute (CII) Austin Tex USA 2003

[31] G Young ldquoGlossary and discussion of termsrdquo in MalingeringFeigning and Response Bias in PsychiatricPsychological Injuryvol 56 of International Library of Ethics Law and the NewMedicine pp 777ndash795 Springer Dordrecht The Netherlands2014

[32] M Sahami S Dumais D Heckerman and E Horvitz ldquoABayesian approach to filtering junk e-mailrdquo in Learning for TextCategorization Papers from the 1998 Workshop vol 62 pp 98ndash105 1998

[33] I Androutsopoulos J Koutsias K V Chandrinos G Paliourasand C Spyropoulos ldquoAn evaluation of Naıve Bayesian anti-spam filteringrdquo in Proceedings of the Workshop on Machine

Learning in the New Information Age 11th European Conferenceon Machine Learning G Potamias V Moustakis and M vanSomeren Eds pp 9ndash17 Barcelona Spain 2000 httparxivorgpdfcs0006013pdf

[34] H Zhang ldquoThe optimality of naıve Bayesrdquo in Proceedings of the17th International Florida Artificial Intelligence Research SocietyConference (FLAIRS rsquo04) Miami Beach Fla USA May 2004

[35] K P Murphy Naive Bayes Classifiers University of BritishColumbia Vancouver Canada 2006

[36] R R Yager ldquoAn extension of the naive Bayesian classifierrdquoInformation Sciences vol 176 no 5 pp 577ndash588 2006

[37] D Lowd and P Domingos ldquoNaive Bayes models for probabilityestimationrdquo in Proceedings of the 22nd International Conferenceon Machine Learning (ICML rsquo05) pp 529ndash536 ACM BonnGermany August 2005

[38] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the 7th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 204ndash213 ACM August 2001

[39] L Jiang H Zhang Z H Cai and D Wang ldquoWeighted averageof one-dependence estimatorsrdquo Journal of Experimental andTheoretical Artificial Intelligence vol 24 no 2 pp 219ndash230 2012

[40] J Wu and Z Cai ldquoA naive Bayes probability estimation modelbased on self-adaptive differential evolutionrdquo Journal of Intelli-gent Information Systems vol 42 no 3 pp 671ndash694 2014

[41] W S Halawa A M K Abdelalim and I A ElrashedldquoFinancial evaluation program for construction projects at thepre-investment phase in developing countries a case studyrdquoInternational Journal of Project Management vol 31 no 6 pp912ndash923 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article ve Bayesian Classifier for Selecting Good ...downloads.hindawi.com/journals/mpe/2015/830781.pdf · Na \ve Bayesian Classifier for Selecting Good/Bad Projects during

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of