analisis de mineria de datos emergencias medicas

7
Analysis by data mining in the emergency medicine triage database at a Taiwanese regional hospital W.T. Lin a , Y.C. Wu b,, J.S. Zheng b , M.Y. Chen a a Department of Industrial Engineering and Management, National Chin-Yi University of Technology, Taiwan b Department of Industrial Engineering, Chung Yuan Christian University, Taiwan article info Keywords: Cluster analysis Data mining Emergency medicine Rough set Triage abstract ‘‘Emergency medicine’’ is the front line of medical service a hospital provides; also it is the department people seek medical care from immediately after an emergency happens. The statistics by the Department of Health, Executive Yuan, indicate that over years, the number of people at the emergency department has been increasing. The US has introduced and practiced the triage system in the emergency medicine in 1960, whereby to aid the emergency department in allocating the patients, to give them appropriate medical care by the fast decision of the nurses and doctors in case of the patients’ seriousness through their judgment. This study takes on the knowledge contained in the massive data of unknown characteristics in the tri- age database at a Taiwanese regional hospital, using the cluster analysis and the rough set theory as tools for data mining to extract, with the analysis software ROSE2 (Rough Sets Data Explorer) and through rule induction technique, the imprecise, uncertain and vague information of rules from the massive database, and builds the model that is capable of simplifying massive data while maintaining the accuracy in clas- sifying rules. After analyzing and evaluating the knowledge obtained from relevant mining in the hospitals past medical data for the consumption of emergency medical resources, this thesis proposes suggestions as reference for the hospitals in subsequent elevation of medical quality and decrease in operative costs. Ó 2011 Elsevier Ltd. All rights reserved. 1. Motivation and objectives of the research 1.1. Background and motivation of the research Emergency department, the front line of a hospital facing urgent patients, consists of doctors, nurses, technicians, social workers, emergency medical technicians, administrative persons, employ- ees and volunteers as members, who maintain a 24-h operation and are able to do anything like first aid, observation in detention or surgical operation, in a way as if of a hospital in hospital (Shi, 2008). According to the 2007 statistics by the Department of Health, Executive Yuan, as shown in Fig. 1, the daily emergency medical services provided by all hospitals in Taiwan increased from 14,405 person-visits in 1997 to 18,392 person-visits in 2007, a significant growth. The statistics by US Center for Disease Control and Prevention also showed an increase in the number of emergency patients from 94.9 million in 1997 to 175 million in 2001 (McCaig & Burt, 2003). These all suggest a trend, worldwide, of continuous increase in visitors to emergency department, which has also kept such environment in hectic condition like in warfare. To avoid the delay in saving the really urgent patients among the numerous visitors to the emergency room, the emergency tri- age system was established. As such, the US introduced the triage system in emergency medicine in 1960 (Weiner & Edwards, 1964); the US Emergency Nurse Association published the ‘‘Standards of Emergency Nursing Practice’’, which specifically provides that the emergency nurses should conduct a triage on every patient show- ing up in the emergency room from the physiological and psycho- logical angles to identify the priority of medical care among patients (Gilboy, Travers, & Wuerz, 1999). Triage is the screening station set up in the emergency medi- cine; its purpose is chiefly to ‘‘place the right person at the right time in the right place to use the right resources’’ (Chan, 2006). This study investigates the current condition of the emergency patients, extracting by data mining techniques, from the implicit and latent data of emergency patients in the hospital, the trend and data that can serve as reference, and analyzing and under- standing the correlation between triage and patient structure and consumption of medical resources. The study, then, evaluates the data obtained from relevant mining to present suggestions for improvement as reference for the hospitals in subsequent eleva- tion of medical quality and decrease in operative costs. It is hoped to serve the basis of reference for the government’s health agencies in deliberation on the human power training and allocation in 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.152 Corresponding author. Tel./fax: +886 4 23723808. E-mail addresses: [email protected], [email protected] (Y.C. Wu). Expert Systems with Applications 38 (2011) 11078–11084 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: hector-f-bonilla

Post on 24-Oct-2014

23 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Analisis de Mineria de Datos Emergencias Medicas

Expert Systems with Applications 38 (2011) 11078–11084

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Analysis by data mining in the emergency medicine triage database at a Taiwaneseregional hospital

W.T. Lin a, Y.C. Wu b,⇑, J.S. Zheng b, M.Y. Chen a

a Department of Industrial Engineering and Management, National Chin-Yi University of Technology, Taiwanb Department of Industrial Engineering, Chung Yuan Christian University, Taiwan

a r t i c l e i n f o

Keywords:Cluster analysisData miningEmergency medicineRough setTriage

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.02.152

⇑ Corresponding author. Tel./fax: +886 4 23723808E-mail addresses: [email protected]

(Y.C. Wu).

a b s t r a c t

‘‘Emergency medicine’’ is the front line of medical service a hospital provides; also it is the departmentpeople seek medical care from immediately after an emergency happens. The statistics by the Departmentof Health, Executive Yuan, indicate that over years, the number of people at the emergency department hasbeen increasing. The US has introduced and practiced the triage system in the emergency medicine in1960, whereby to aid the emergency department in allocating the patients, to give them appropriatemedical care by the fast decision of the nurses and doctors in case of the patients’ seriousness through theirjudgment.

This study takes on the knowledge contained in the massive data of unknown characteristics in the tri-age database at a Taiwanese regional hospital, using the cluster analysis and the rough set theory as toolsfor data mining to extract, with the analysis software ROSE2 (Rough Sets Data Explorer) and through ruleinduction technique, the imprecise, uncertain and vague information of rules from the massive database,and builds the model that is capable of simplifying massive data while maintaining the accuracy in clas-sifying rules. After analyzing and evaluating the knowledge obtained from relevant mining in the hospitalspast medical data for the consumption of emergency medical resources, this thesis proposes suggestionsas reference for the hospitals in subsequent elevation of medical quality and decrease in operative costs.

� 2011 Elsevier Ltd. All rights reserved.

1. Motivation and objectives of the research

1.1. Background and motivation of the research

Emergency department, the front line of a hospital facing urgentpatients, consists of doctors, nurses, technicians, social workers,emergency medical technicians, administrative persons, employ-ees and volunteers as members, who maintain a 24-h operationand are able to do anything like first aid, observation in detentionor surgical operation, in a way as if of a hospital in hospital (Shi,2008). According to the 2007 statistics by the Department ofHealth, Executive Yuan, as shown in Fig. 1, the daily emergencymedical services provided by all hospitals in Taiwan increasedfrom 14,405 person-visits in 1997 to 18,392 person-visits in2007, a significant growth. The statistics by US Center for DiseaseControl and Prevention also showed an increase in the number ofemergency patients from 94.9 million in 1997 to 175 million in2001 (McCaig & Burt, 2003). These all suggest a trend, worldwide,of continuous increase in visitors to emergency department, whichhas also kept such environment in hectic condition like in warfare.

ll rights reserved.

., [email protected]

To avoid the delay in saving the really urgent patients amongthe numerous visitors to the emergency room, the emergency tri-age system was established. As such, the US introduced the triagesystem in emergency medicine in 1960 (Weiner & Edwards, 1964);the US Emergency Nurse Association published the ‘‘Standards ofEmergency Nursing Practice’’, which specifically provides that theemergency nurses should conduct a triage on every patient show-ing up in the emergency room from the physiological and psycho-logical angles to identify the priority of medical care amongpatients (Gilboy, Travers, & Wuerz, 1999).

Triage is the screening station set up in the emergency medi-cine; its purpose is chiefly to ‘‘place the right person at the righttime in the right place to use the right resources’’ (Chan, 2006).

This study investigates the current condition of the emergencypatients, extracting by data mining techniques, from the implicitand latent data of emergency patients in the hospital, the trendand data that can serve as reference, and analyzing and under-standing the correlation between triage and patient structure andconsumption of medical resources. The study, then, evaluates thedata obtained from relevant mining to present suggestions forimprovement as reference for the hospitals in subsequent eleva-tion of medical quality and decrease in operative costs. It is hopedto serve the basis of reference for the government’s health agenciesin deliberation on the human power training and allocation in

Page 2: Analisis de Mineria de Datos Emergencias Medicas

Fig. 1. Averaged daily emergency medical services provided by hospitals in Taiwan.

Fig. 2. Flow chart of knowledge discovery in database. Data source: organized bythis study.

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084 11079

emergency medicine related units at hospitals when reviewingmedical expenses and revising health insurance policies in future.Also, the medical modes and trends obtained by data mining tech-niques can be stored in the existed database of medical knowledgeand will be able to make the management of information andknowledge, which is very useful to the medical institutions.

1.2. Research objectives

Taking a regional hospital for example, this study explores theeffect of patients’ use of resources by analyzing the data of triageof emergency patients. The study also finds the knowledge of diag-nosis by employing the knowledge discovery theory in the field ofdata mining—rough set theory, RST. Through the application of RSTmethod, data mining is conducted in the historic triage data at aTaiwanese regional hospital to uncover the implicit knowledge inthe database, to build the model that can simply massive datawhile maintaining the accuracy in rules of classification, whichserves as the tool for analyzing the original anamnesis data thatare massive, vague and full of uncertainty, whereby to analyzethe triage data.

This study has the following objectives:

1. To use the cluster analysis to classify the triage and modifica-tion cases in the triage database to reduce the noises in theclassification and, then, to find out the classifying model of rou-tine triage and modification by classification.

2. To analyze the data and to employ the rough set theory touncover the implicit knowledge in the database, to build themodel that can simply massive data while maintaining theaccuracy in rules of classification.

3. To analyze the triage database to identify the key attributes ofthe triage and to summarize the important rules of decision.

2. Documentary review

This chapter comprises two parts; the first presents the defini-tion for emergency medicine, followed by sorting the related re-search in triage during the period from 2000 to 2008 in Taiwan,in hope of straightening up the definition of triage. It is also foundfrom many studies, both domestic and abroad, the problems cur-rently facing emergency triage. The second part describes thedevelopment of the technique of data mining, including the rise,the definition, the technique and functions, the medical applicationof and the research in medical industry with data mining, with thehope of using data mining as the research tool here after havingmade in-depth understanding of its techniques.

2.1. Definition of emergency medicine

Thanks to the feature of convenience from the 24-7 service ofemergency medicine, people are allowed to make full use of its

resources. But, from the angle of medical management, the func-tion of the emergency medicine in a hospital is greatly differentfrom what people think of. Among the differences are the treatingprocess of complex conditions and the urgency of medicine thatdiffer significantly from the treating process in general in-patientservice (Huang, 1993).

With the above reorganized summary, we can roughly under-stand the definitions and views by the researchers, both from Tai-wan or in abroad, about emergency medicine; of which definitionsand views the most important point in common is the widelyreferring to various kinds of urgent conditions (that affect safetyof life and health condition) as emergency medicine. Such defini-tion has encompassed the general explanation by most scholarsabout ‘‘emergency medicine’’; however, in this era with rising con-sumer sense, where people all strongly call for personal life qualityand physical health condition, the patients are seeking the assis-tance from emergency medicine just because they feel under theweather or have slight pain, creating congestion in the emergencydepartment and more workload on the medical persons there. Thepurpose of solving this and of assisting the medical persons towork more efficiently thus gives rise to the work of ‘‘triage’’. Tohave deeper cognition about triage, the explanation for the pur-pose and the methods of triage will be given immediately below.

2.2. Data mining

Data mining is a new technique that emerged with the develop-ment of artificial intelligence and database techniques in recentyears. It focuses on the re-analysis of database, including theconstruction of models or the determination of data pattern, withthe purpose mainly of discovering the valuable information con-cerned about yet unknown to the owner of database (Hand, Blunt,Kelly, & Adams, 2000). Data mining is a ‘‘process of automaticallyselecting, by computers, some important and potentially usefuldata types or knowledge from massive data or large database’’. Thistechnique uses classification, relationships, sequential analysis,cluster analysis and other statistic methods to find out, from enor-mous database, implicit, unknown yet very useful information forbusiness operation. While the historical data of most enterprisesare millions or tens of millions in number, which are difficult toanalyze, it becomes possible to extract useful information fromhuge information by using the tool of data mining.

Data mining is sometimes called knowledge discovery, KD; but,in fact, by definition, knowledge discovery is a non-tedious proce-dure for identifying effective and potential benefits amid data. It isknown from Fig. 2 below that data mining is one of the importantprocesses of knowledge discovery.

From the definitions by the scholars, it is clear that the usage ofdata mining is an analysis process within a series of knowledgediscovery. But, as time changes, the term ‘‘data mining’’ graduallyreplaces ‘‘knowledge discovery’’. The above summarized, the ulti-

Page 3: Analisis de Mineria de Datos Emergencias Medicas

11080 W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084

mate purpose of data mining is to uncover the rules that are help-ful to decision process from massive data.

2.2.1. Theoretical techniques of data miningThe tools of data mining generally have two main functions;

one is predicting the future trend from the built models to provideto decision makers as reliable information when making decision,like the model built from classification that can be applied to find-ing out most probable financial clients with frequent bad debts inorder to avoid excessive credit line. The other is revealing the un-known patterns of data, whereby it is possible to use data miningto identify the pattern of the specimen concealed in database, forexample, identifying, from customers online shopping experiences,the probable combination of merchandise that customers may pur-chase, such that the decision maker can do the marketing directedonly to certain subjects without wasting much money on printingand mailing but with scarce response.

The theoretical techniques of data mining can be divided in con-ventional techniques and improved techniques. The so calledconventional techniques are represented by statistic analysis,including the descriptive statistics, probability theory, regressionanalysis and categorical data analysis of the study of statistics.Especially the factor analysis that is of the multivariate analysis,one of the advanced statistic methods, used to summarize variables,the discriminate analysis used to classify and the cluster analysisused to separate groups, and others, as the subjects of data miningare mostly data with multiple variables and large in number.

In improved techniques, a wide range of artificial intelligencemethods are used; they include the more popular decision trees,genetic algorithms, neural network, rules induction, fuzzy logicand rough set theory.

The commonly used data mining techniques are organized andbriefly presented as follows.

1. Regression analysis: regression analysis is an analytic methodused by many statistic tools, especially in the making of econ-omy and business related decisions. The purpose of regressionanalysis is to deal with the effects of a multiple of independentvariables on a certain dependent variable. But, when using it, itis necessary to assume that each population is independentamong others, each having to be consistent to normal distribu-tion and the sampling being randomly made from the popula-tion (Chen, 2000).

2. Discriminate analysis: this is a very suitable technique whenthe dependent variables of a problem encountered are qualita-tive and the independent variables (predictors) thereof arequantitative. Discriminate analysis is generally applied to solv-ing categorical problems where the dependent variables arecomposed of two groups. Such case is called two-group discrim-inate analysis, while multiple discriminate analysis if composedof a multiple of groups (Huang, 2003).

3. Cluster analysis: cluster analysis can be used to first roughlyclassify the data when they are very complex and jumbled, orcontain too many variables or of too many dimensions. Unlikediscriminate analysis, in the practice of cluster analysis, no clas-sification variables are inserted to divide the data. Cluster anal-ysis is rarely used alone, because finding the groups is not theobject in itself; rather, once the groups are detected, it is neces-sary to use other methods to understand the meaning of theclustering (Huang, 2003).

4. Market basket analysis: also called association rule analysis.This bears the same meaning as cluster analysis; both are in aform of clustering. They differ in that the market basket analysisis to find out probable combination of merchandise, e.g., thesequence of purchasing, product display, the designing of prod-uct combination and merchandise promotion. What market

basket analysis appeals is that it applies association rule toexplain the correlation between physical merchandise andwhy they are combined (Agrawal & Srikant, 1994).

5. Neural network: this is an information processing method thatresembles living neural network. It uses a large quantity ofsimple and connected artificial neurons to simulate the capabil-ity of the biological neural network. With the abilities of mem-ory, learning, screening noises and debugging in addition to thefunction of high speed computation, neural network can solvemany complex problems such as classification or prediction(Yeh, 1999).

6. Decision trees: one of the methods of creating classifying mod-els, decision trees can create, by employing induction method, atree-like structured model specifically for given data and makepredictive analysis on data, whether of discrete or continuoustypes. In order to classify the inputted data, each node in thedecision trees is a determinant, which determines whether arecord of data is of a certain attribute; as such, every nodecan classify the inputted data in several categories to form atree, e.g., a CART (Classification and Regression Trees) or aCHAID (Chi-Square Automatic Interaction Detector) (Quinlan,1993).

7. Genetic algorithms: this is an optimal spatial query method,very suitable for solving problems of optimization. It employsnatural selections, such as selection, reproduction, crossoverand mutation, and the genetic evolving mechanism to createnew cells. It creates a model in advance, whereby it operatesthrough a series of procedures that are similar to productionand generational propagation until the function converges toan optimal solution (Holland, 1975).

2.2.2. Medical related research using data mining techniquesAt present, research of medical issues by data mining has been

relatively prevalent in Taiwan. The most common are the possibil-ities of using currently massive medical and patient data to inves-tigate the causes of a certain disease, or using classificationmethods as a data mining technique to induce, by the algorithmof data mining, the consumption of resources from the historicaldata, specifically the cataloging process of medical fees, in medicaldatabase. Besides those, data mining is also used to explore thereduction of patient complaints that arise from improper treat-ment or inefficiencies, so as to upgrade the medical quality andto save waste of medical resources. Related literatures includeShi (2008), who improved emergency triage and physician shiftscheduling by data mining analysis, where in the analysis of triageaccuracy, he uses cluster analysis to classify the triage modificationcases of similar nature, reducing the noises in classification, fol-lowed by determining the classifying model for triage modificationlevels by classification.

Lai (2007) applied the data mining technique to increase theconsistence in triage classification in emergency medicine, wherehe used three techniques of data mining to increase such consis-tence, with the research results indicating that the Back-propaga-tion NN has better performance in the prediction of triageclassification.

Chen (2008) constructed a management and planning system oftriage knowledge—an example of a medical center in Taiwan—defining the key factors that affect triage before employing princi-pal components analysis, ontology and the method of decisiontrees to uncover the implicit knowledge in the database.

2.3. Cluster analysis

Presently, a great number of unsupervised clustering methodshas been developed, e.g., K-Means algorithm of the conventionalmultivariate statistics, Agglomerative clustering method and the

Page 4: Analisis de Mineria de Datos Emergencias Medicas

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084 11081

Fuzzy C-Means, FCM, which introduces fuzzy theory in the K-Meansalgorithm. Additionally, there are Self-Organizing Map, SOM, fromthe neural network, Fuzzy Adaptive Resonance Theory, or FuzzyART for short, and the like, which work well in clustering. Nonethe-less, a number of published documents pointed out that mixedclustering methods, such as the framework composed of supervisedlearning grouping and unsupervised learning grouping, or a mixedframework of two unsupervised learning grouping methods, allachieved in better grouping effects (Lin, & Huang, 1999).

Self-Organization Map, SOM, proposed by Kohonen as an unsu-pervised learning algorithm (Kohonen, 1989, 1997; Kohonen,Raivio, Simula, Venta, & Henriksson, 1990), is a network modelframework based on competitive learning. In the realm of neuralnetwork, SOM is an outstanding data mining tool; it can projectthe inputted graph of high dimension onto the topological grid oflower dimensions and can provide man with the seeing visuallyand examination of the property of data clustering; also, theresearch can conduct the quantity analysis in a precise mannerby increasing topological nodes when the volume of data increases.

The improved two-stage clustering method, proposed by Kuo,Ho, and Hu (2000), has the objective of evaluating the conventionaltwo-stage method and using SOM to determine the initial solu-tions before substituting them in K-Means to find the best solution.

In the first stage of this study, the unsupervised SOM networkfinds the initial solution, and then the initial population is substi-tuted into K-Means, followed by the 2nd-stage analysis of thenodes on the map by K-Means of different distant concepts. Theexperiment results indicate that whether viewed in the aspect ofefficiency or the aspect of speed, this clustering method outper-forms conventional direct clustering.

Fig. 3. Diagram of data mining. Data source: organized by this study.

Table 1Synopsis of selected database fields for data mining.

Database Selected fields

Database ofregistration query

Anamnesis number, triage, registration date,registration time, discharge date, discharge time,fees, health insurance

Database of physicianorder screen

Anamnesis number, age, triage, registration date,registration time, overstay date, overstay time

Database of triagescreen

Gender, triage, chief complaint, past disease records,life sign, physician chief diagnosis (incl. codes),subject

2.4. Rough set theory

Rough set theory, RST, a new mathematical method proposedby Pawlak from Poland in 1982, is used to analyze imprecise,vague, and uncertain data, where all the information comes fromthe data of its own and it needs no hypothesis of models. It isnot restricted by any of the above stated when it is used, that is,when RST is used to analyze, it need not obey any hypothesis(Pawlak, 1991). Also, it is capable of unveiling the informationand knowledge behind data; thus, RST often works well in findinginformation and knowledge in data regardless when the data formis vague or is with uncertainty (Dimitras, Slowinski, Susmaga, &Zopounidis, 1999).

This study uses rough set theory as the tool of data mining forits features or advantages outlined below.

1. Capable of analyzing massive data.2. Taking the data in every field as a symbol, avoiding obtaining, as

conventional statistic analyses do, different analytic resultsfrom different magnitudes (sizes) of data values.

3. Capable of further dividing the affecting factors in core affectingfactors and non-core affecting factors, which the conventionalstatistic analyses cannot.

4. Compared with other analytical techniques, capable of obtain-ing better accuracy when predicting regarding data with lesserattribute factors.

3. Methods

This study studies the operation mode, allocation of emergencymedical persons and the process of patient visiting and triage atthe hospital in this case. It is found from the medical operationprocess at the hospital in this case that data can be obtained fromthe triage database to serve as the variables of our data mining.

Once the data have been pre-treated, it is possible to enter thefocus of this study: data mining and analysis. First, the rule andmodels of interest are found by various mining tools and theiralgorithms; then, the rule is extended to produce concrete mana-gerial decision and recommendations so as to achieve the finalobjective of research. For methodology, this study uses two-stagecluster analysis and rough set theory.

In analysis of the data mining algorithm step, we use the data-base to the methods and data mining algorithms, the correspon-dence is as follows whichFig. 3. Table 1 shows the selecteddatabase fields for data mining.

After the data source is classified, the data to mine are also pre-pared. The preparing process includes selection, cleaning, estab-lishment, integration and formatting of data.

1. Data selection: considering the attributes needed or not by theselection with the relationship to the aim of data mining as thecriteria of data selection and analysis; in this study, for exam-ple, gender, age, overstay length, insurance status, arrival,expenses, consultation, period, subject, admission and triageare picked up in the patient’s basic data from Table 1, whichcontains a synopsis of the database fields to select by data min-ing, as the patient’s basic data selected for this study.

2. Data cleaning: proceeding to the second stage of data process tosatisfy the need of the analytic tool for the format. As there arethree databases in data source, which require sorting individu-ally and merging, the aid of software Excel is used in this studyto merge the data and screen, delete and modify the data typesthat correspond to one another.

3. Data establishment: in this study, the data in the databases aremodified to be in the types that can be executed by the analytictools; for example, the type of gender, in which 1 and 2 are usedto replace male and female.

Page 5: Analisis de Mineria de Datos Emergencias Medicas

11082 W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084

4. Data integration: integrating the information contained in mul-tiple forms and data sources to generate new and completerecords and conversion of variables. Also, merging the dat-abases to make way for subsequent processes of data mining.

5. Data formatting: converting part of the data in terms of formatfor use by the analytic tools, and changing the meaning of theinitial data, where the analytic tools are used in this study torearrange every data attribute so that the model of data miningcan be constructed easily.

4. Empirical study and data analysis

4.1. Data extraction and sorting

In this study, data of the attributes of patient consultation in theemergency registration database at a hospital in central Taiwan,including visiting date, consultation, period, age, gender and thelike, were first screened and rearranged on Excel. Then, the clustertechnique of data mining and the two-stage clustering were com-bined to seek for the cluster mode of the patients consultation, and,by analysis, the research results related to emergency patient data-base and triage were obtained.

The patient data obtained from the emergency departmentwere in the number of 22,990, of which the patient distributionwas of triage. Of the total number of patients, that of triage level1 accounts for 5.94%, level 2 31.94%, level 3 61.62% and level 40.5%. It can be seen from the basic statistics that of the various lev-els of triage, levels 2 and 3 patients are predominant, and triagemostly falls on level 3, followed by levels 2, 1 and 4.

Of the attributes of patient data, some are in the type of nu-meric; some are text, which need to be converted into numerictype while the data of numeric type need segmentation to facilitatethe data analysis with subsequent software. Changing data fromtext type to numeric type is easier, which only takes defining eachtext datum by a value.

The data of attributes were sorted; the attributes to be analyzedwere age, gender, patient type, insurance status, period, admission,arrival, subject, triage, overstay length and expenses. Gender is di-vided in male and female; patient type is divided in first visit andrevisit; insurance status is either with health insurance or not (self-covered); period is divided in AM, PM and night; admission is di-vided in yes and no; arrival is divided in by ambulance, referral,on foot, outpatient, 119 and others; subject comprises internal,surgery, obstetrics and gynecology, pediatrics, dentistry and psy-chosomatic medicine; and triage is divided in levels 1, 2, 3 and 4.

In segmenting the numeric-type data, how to segment eachtype of data is a complex undertaking. Thus, in order to avoid theuncertainty at time of analysis caused by arbitrary definition ofindices for segmentation, also to let the data after segmentationrepresent better the data characteristics in every field of the data-base, the cluster analysis method was thus introduced to group thenumeric-type data. The fields in the basic structural data that are ofnumeric type are age, medical expenses and overstay length.

4.2. Analysis by clustering technique

The statistic software SPSS Clementine10.0 was used to groupthe 22,990 records of patient data that have been obtained at theemergency department. As the samples of a same cluster aftergrouping have similar characteristics, in this study, the patient datawere subjected to cluster analysis, where, in stage one, through thetraining and learning with SOM network, they were displayed visu-ally as six groups. From the gradient shades of color in the graph, itis obvious that the patient triage data are divided in six groups asFig. 4. With the population ascertained, it is possible to proceed tothe second stage, K-Means cluster analysis.

In this study, the populations that were obtained by SOM wereused as the initial populations of K-Means. The grouping by K-Means resulted in six clusters, of which cluster 1 comprises 7513records, with average medical expense at NT$1378.93, averageage at 17.18, average overstay length at 1.48 days; cluster 2 com-prises 3205 records, with average medical expense at NT$3073.39,average age at 70.98, average overstay length at 2.1 days; cluster 3comprises 2433 records, with average medical expense atNT$2,239.46, average age at 19.39, average overstay length at1.68 days; cluster 4 comprises 3626 records, with average medicalexpense at NT$1669.23, average age at 37.51, average overstaylength at 1.86 days; cluster 5 comprises 4336 records, with averagemedical expense at NT$1985.62, average age at 64.04, averageoverstay length at 1.63 days; cluster 6 comprises 1877 records,with average medical expense at NT$2761.67, average age at44.55, average overstay length at 2.22 days, as shown in Table 2.

This study made further investigation in the cluster with higherconsumption of resources, such as the patients with excessiveoverstay length and those with high medical expenses. However,in view of fully understanding the property of emergency patentsto reveal the potential consumers of emergency resources, Cluster2 was put to RST analysis, whereby to find out the decision rules.

4.3. Application of rough set theory

In this study, the software ROSE2 (Rough Sets Data Explorer),which was developed by Wilk Dari Poznan University of Technol-ogy, Poland, was used to conduct the empirical analysis of RST on3205 records of data in Cluster 2. Out of these data, 2885 records(90%) were randomly chosen for rule induction, with the remaining320 records (10%) of data serving as rule verification. This softwareallows users to conduct analysis in Windows environment, and hasgood performance in the discovery of data attributes. The data ofattributes were sorted, where the criteria analyzed included gen-der, age, period, subject, overstay length, insurance status, arrival,expenses, consultation, admission and triage, as well as triage (D)as decision attribute.

Using RST to single out the key attribute from 3205 records ofdata and ruling out unnecessary attributes can increase the preci-sion of analytic results and reduce the time run by the program.Having integrated the data, we obtained a total of 10 criterion-attributes and one decision attribute. Based on the analytic resultsby the software ROSE2, in which a multiple of sets of attributes canbe obtained; sorting all the discovered sets of attributes resulted inset intersection, of which the attributes are called core attribute,and set union, the attributes of which except those of the set inter-section are called non-core attribute. Utilizing RST’s ability to sim-plify the attributes, we can find smaller sets of attributes torepresent the original ones. In the case of this study, we simplifiedthe attributes in the decision information table.

As the rule of decision classification in this study, the algorithmof the LEM2 (Learning from Examples Module, version 2) (Pawlak,1982) was adopted to generate the decision rules. This algorithmwas combined with a number of induction methods to generatedecision rule, where the equivalence class method described everyrule, with the assumption that the rules are in the smallest classi-fication sets, that is, it is no longer possible to completely describethe data without any rule.

By the RST analysis, the decision rules were sorted; we induced thesmallest set of rules from 11 attributes, and a total of 326 rules wereinduced from 2885 records of data, as shown in Table 3. Of which,from the patients of level 1 triage, 133 rules were generated, and196 rules from those of level 2. In row 1 of Table 3 is Rule 1: If (arri-val = in ambulance) & (overstay length = 1–2 days) & (period =night) & (medical expense = 4000–5000) & (subject = internal), thenthe triage is level 1, with 90 records revealed to meet this rule.

Page 6: Analisis de Mineria de Datos Emergencias Medicas

Fig. 4. SOM clustering chart.

Table 2Table of variable means in SOM + K-Means clustering.

Item Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6

Average expense (NT dollara) 1378.93 3073.39 2239.46 1669.23 1985.62 2761.67Average age (year) 17.18 70.98 19.39 37.51 64.04 44.55Average overstay length 1.48 2.10 1.68 1.86 1.63 2.22Quantity 7513 3205 2433 3626 4336 1877

a NT dollar = New Taiwan dollar.

Table 3Table of RST decision rules.

Item Record quantity Decision rule

1 90 (arrival = 1) & (overstay length = 2) & (period83) & (medical expense = 5) & (subject = 1) [level 1]2 79 (arrival = 1) & (period = 3) & (admission = 2) & (medical expense = 4) & (subject = 1) [level 1]. . . . . . . . .

326 1 (arrival = 3) & (overstay length = 1) & (period = 2) & (medical expense = 4) & (subject = 1) & (gender = 2) [level 1]

Table 4RST rule testing.

Predicted value

Actual value 1 2 Record quantity Accuracy Coverage1 120 8 135 0.936 0.9482 11 163 185 0.937 0.941True positive rate 0.916 0.953

Total number of tested objects: 320Total accuracy: 0.937Total coverage: 0.944

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084 11083

In Table 4, the remaining 320 records of triage data were used toverify the rule induction and the analyses as follows:

1. Record quantity: the number of records in the table of data thatare actually of such triage level. For example, 135 represents135 of the 320 records which are of actually level 1 triage.

2. Accuracy: the rate of capability of accurately inducing the triagelevels. For example, 0.936 = 120/(120 + 8), where 120 means120 records that are actually of level 1 triage and were level 1

after the induction by the rule, and if there were eight recordsdetermined as level 2 by the induction, then these eight wereinduction errors.

3. Coverage: the rate of induction algorithms conducted amongthe identifiable as the rules for each actual triage level. Forexample, 0.948 = (120 + 8)/135.

4. True positive rate: the rate of accurate induction of rules ulti-mately achievable for each actual triage level. For example,0.916 = 120/(120 + 11).

Page 7: Analisis de Mineria de Datos Emergencias Medicas

11084 W.T. Lin et al. / Expert Systems with Applications 38 (2011) 11078–11084

5. Total number of tested objects: the number of records tested.6. Total accuracy: the rate of the tested objects identifiable by the

rules. For example, 0.937 = (120 + 163)/(120 + 8 + 11 + 163).7. Total coverage: if 1, it means all tested objects can be identified

by the rule induction (including the induction errors after iden-tification); but, when the induction rule cannot identify anyrecord that has not appeared in previous training stage, thetotal coverage becomes less than 1. For example, 0.944 =(120 + 8 + 11 + 163)/(135 + 185).

5. Conclusions

This study employed cluster analysis technique as the tool ofdata mining to examine the emergency triage database at a localhospital in Taiwan. The implicit knowledge with unknown featuresin huge databases were analyzed by the combination of SOM andK-Means cluster analysis, as cluster analysis has the advantage ofthe ability to avoid the uncertainty in the analysis of numeric-typedata caused by arbitrary definition of classification and clusteringcriteria, as well as to effectively segment the data with differentgroup characteristics. Then, by rough set theory analysis, theuncertain, vague and rough data could be treated, with every fieldof data regarded as a symbol when they were being read, theadvantage of which is that, unlike conventional statistic analysis,RST analysis does not produce different analytical results from dif-ferent sizes of data. Also, it was allowed to classify the affectingattributes in core attributes (period, arrival, gender, age, subjectand medical expenses) and non-core attributes. This study com-bined these two techniques to apply in data process and as toolof data mining, and achieved good results.

As the results of this study, the patients ‘‘with longer overstay atemergency’’, ‘‘with higher consumption of medical expenses’’ and‘‘of older average ages’’, which were found by two-stage clusteranalysis, were in the group with high risks that consumes re-sources, and they had certain similarities as most of them were pa-tients of level 1 and level 2 triage. In ‘‘triage’’, the medical expensesalso increased with the aggravation of seriousness; in ‘‘subject’’,patients of internal medicine departments consume most; in ‘‘arri-val’’, most patients arrived in ambulance. Apart from the overstaylength at emergency department, the classification of patient dis-eases is another key factor if it is desired to monitor the emergencypatients medical expenses. Finally in this study, the rough set the-ory was used to find out the attributes of decision rules for com-parison with original data. It was found in this study that in‘‘disease classification’’, the triage types with high expenses con-centrated on rare diseases or severe casualties. Therefore, to con-trol the medical expenses of emergency patients, besidesoverstay length at emergency, the classification of patient diseasesis also one of the key factors.

Acknowledgment

The authors would like to thank Mr. Wu Tsung-Ling for hisassistance in collecting the material and helpful suggestions onan earlier version of this paper.

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. InProceedings of 1994 international conference on very large data bases (pp. 487–499).

Chen, S. Y. (2000). Multivariate analysis. Taiwan: Hwa Tai.Chan, C.-L. (2006). A analysis of the relationship between the triage of Emergency

Department, patient structure and medical resource use. Master dissertation. YuanZe University.

Chen, W. Y. (2008). Construction of management and planning system for triageknowledge – an example of a Taiwanese medical center. Master dissertation,National Chin-Yi University of Technology.

Dimitras, A. I., Slowinski, R., Susmaga, R., & Zopounidis, C. (1999). Business failureprediction using rough sets. European Journal of Operational Research, 114(2),263–280.

Gilboy, N., Travers, D. A., & Wuerz, R. (1999). Re-evaluating triage in the newmillennium: A comprehensive look at the need for standardization and quality.Journal of Emergency Nursing, 25(6), 468–473.

Huang, H. N. (1993). A survey research on emergency service and patientsatisfaction. Master dissertation. National Taiwan University.

Huang, J. Y. (2003). Marketing (2nd ed.). Taiwan: Book Zone.Hand, D. J., Blunt, G., Kelly, M. G., & Adams, N. M. (2000). Data mining for fun and

profit. Statistical Science, 15(2), 111–131.Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor:

University of Michigan Press.Kohonen, T. (1989). Self organization and associative memory (third ed.). Berlin:

Springer-Verlag.Kohonen, T. (1997). Self-organized maps. New York: Springer-Verlag.Kohonen, T., Raivio, K., Simula, O., Venta, O., & Henriksson, J. (1990). Combining

linear equalization and self-organizing adaptation in dynamic discrete-signaldetection. In Proceedings of the international joint conference on neural networks,San Diego (pp. 223–228).

Kuo, R. J., Ho, L. M., & Hu, C. M. (2000). Integration of self-organizing feature mapand K-means algorithm for marketing segmentation. Journal of Computers andOperation Research.

Lai, C. H. (2007). Data Mining applied to the predictive model of triage system inEmergency Department: A case of medical center in Taiwan. Master dissertation,National Chin-Yi University of Technology.

Lin, S. F., & Huang, C. A. (1999). Foundations of Neural Networks. Taiwan: Chuan-HwaBooks.

McCaig, L. F., & Burt, C. W. (2003). National hospital ambulatory medical care survey2001 emergency department summary. Online Statistical of Centers for DiseaseControl and Prevention, available.

Pawlak, Z. (1982). Rough sets. International Journal of Computer and InformationSciences, 11, 341–356.

Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. London:Kluwer Academic Publishers.

Quinlan, J. R. (1993). C4.5: Programs for machine learning. Berlin: Springer.Shi, Y. S. (2008). Using data mining techniques to analyze and improve for emergency

triage and operation of doctor schedule. Master dissertation. National Chin-YiUniversity of Technology.

Weiner, E. R., & Edwards, H. R. (1964). Yales studies in ambulatory medical carechanging patterns in hospital emergency services. Hospital, 38(1), 55–62.

Yeh, I. C. (1999). Application of artificial neural network model and implementation.Taiwan: Scholars Books.