an optimized experimental protocol based on neuro-evolutionary algorithms

27
An optimized experimental protocol based on neuro-evolutionary algorithms Application to the classification of dyspeptic patients and to the prediction of the effectiveness of their treatment M. Buscema a, * , E. Grossi b , M. Intraligi a , N. Garbagna b , A. Andriulli c , M. Breda a a Semeion Research Center for Sciences of Communication, Via Sersale 117, 00128 Rome, Italy b Bracco Imaging S.p.A., Medical Affairs Europe, Via Egidio Folli 50, 20134 Milan, Italy c Division of Gastroenterology, ‘‘Casa Sollievo della Sofferenza’’ Hospital, I.R.C.S.S., V.le Cappuccini, 71013 San Giovanni Rotondo (FG), Italy Received 16 June 2004; received in revised form 6 December 2004; accepted 7 December 2004 Artificial Intelligence in Medicine (2005) 34, 279—305 http://www.intl.elsevierhealth.com/journals/aiim KEYWORDS Artificial adaptive systems; Dyspepsia; Evolutionary systems; Experimental protocol Summary Objective: This paper aims to present a specific optimized experimental protocol (EP) for classification and/or prediction problems. The neuro-evolutionary algorithms on which it is based and its application with two selected real cases are described in detail. The first application addresses the problem of classifying the functional (FD) or organic (OD) forms of dyspepsia; the second relates to the problem of predicting the 6- month follow-up outcome of dyspeptic patients treated by helicobacter pylori (HP) eradication therapy. Methods and material: The database built by the multicentre observational study, performed in Italy by the NUD-look Study Group, provided the material studied: a collection of data from 861 patients with previously uninvestigated dyspepsia, being referred for upper gastrointestinal endoscopy to 42 Italian Endoscopic Services. The proposed EP makes use of techniques based on advanced neuro-evolutionary systems (NESs) and is structured in phases and steps. The use of specific input selection (IS) and training and testing (T&T) techniques together with genetic doping (GenD) algorithm is described in detail, as well as the steps taken in the two benchmark and optimiza- tion protocol phases. * Corresponding author. Tel.: +39 06 50652350; fax: +39 06 5060064. E-mail address: [email protected] (M. Buscema). 0933-3657/$ — see front matter # 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2004.12.001

Upload: unibo

Post on 01-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Artificial Intelligence in Medicine (2005) 34, 279—305

http://www.intl.elsevierhealth.com/journals/aiim

An optimized experimental protocol based onneuro-evolutionary algorithmsApplication to the classification of dyspepticpatients and to the prediction of theeffectiveness of their treatment

M. Buscema a,*, E. Grossi b, M. Intraligi a, N. Garbagna b,A. Andriulli c, M. Breda a

a Semeion Research Center for Sciences of Communication, Via Sersale 117, 00128 Rome, ItalybBracco Imaging S.p.A., Medical Affairs Europe, Via Egidio Folli 50, 20134 Milan, ItalycDivision of Gastroenterology, ‘‘Casa Sollievo della Sofferenza’’ Hospital, I.R.C.S.S., V.le Cappuccini,71013 San Giovanni Rotondo (FG), Italy

Received 16 June 2004; received in revised form 6 December 2004; accepted 7 December 2004

KEYWORDSArtificial adaptive

systems;Dyspepsia;Evolutionary systems;Experimental protocol

Summary

Objective: This paper aims to present a specific optimized experimental protocol(EP) for classification and/or prediction problems. The neuro-evolutionary algorithmson which it is based and its application with two selected real cases are described indetail. The first application addresses the problem of classifying the functional (FD) ororganic (OD) forms of dyspepsia; the second relates to the problem of predicting the 6-month follow-up outcome of dyspeptic patients treated by helicobacter pylori (HP)eradication therapy.Methods and material: The database built by the multicentre observational study,performed in Italy by the NUD-look Study Group, provided the material studied: acollection of data from 861 patients with previously uninvestigated dyspepsia, beingreferred for upper gastrointestinal endoscopy to 42 Italian Endoscopic Services. Theproposed EP makes use of techniques based on advanced neuro-evolutionary systems(NESs) and is structured in phases and steps. The use of specific input selection (IS) andtraining and testing (T&T) techniques together with genetic doping (GenD) algorithmis described in detail, as well as the steps taken in the two benchmark and optimiza-tion protocol phases.

* Corresponding author. Tel.: +39 06 50652350; fax: +39 06 5060064.E-mail address: [email protected] (M. Buscema).

0933-3657/$ — see front matter # 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.artmed.2004.12.001

280 M. Buscema et al.

Results: In terms of accuracy results, a value of 79.64% was achieved during opti-mization, with mean benchmark values of 64.90% for the linear discriminant analysis(LDA) and 68.15% for the multi layer perceptron (MLP), for the classification task. Avalue of 88.61% was achieved during optimization for the prediction task, with meanbenchmark values of 49.32% for the LDA and 70.05% for the MLP.Conclusions: The proposed EP has led to the construction of inductors that are viableand usable on medical data which is representative but highly not linear. In particular,for the classification problem, these new inductors may be effectively used on thebasal examination data to support doctors in deciding whether to avoid endoscopicexaminations; whereas, in the prediction problem, they may support doctors’ deci-sions about the advisability of eradication therapy. In both cases the variablesselected indicate the possibility of reducing the data collection effort and also ofproviding information that can be used for general investigations on symptomrelevance.# 2004 Elsevier B.V. All rights reserved.

1. Introduction

This paper describes, proposes and presents the useof advanced neural organisms and the application ofa sophisticated experimental protocol (EP). Theartificial neural networks (ANNs) used adopt severaldifferent highly effective learning laws and topolo-gies, while the EP employs traditional probabilisticand intelligent non-probabilistic sampling techni-ques as well as intelligent input selection systems.Advanced evolutionary algorithms have been usedfor the intelligent sampling and input selection.

It is important to understand that when therelations between input and target data are com-plex enough and we are interested in classificationand/or prediction induction operations, the resultsobtained from the application of traditional statis-tical methods such as linear discriminant analysis(LDA) are unsatisfactory; there is not even anyadvantage in the use of simple ANNs. This is parti-cularly relevant to the two cases presented in thispaper.

To demonstrate the benefits of this approach thestudy has focused on the dyspepsia syndrome. Dys-pepsia is an extremely frequent gastrointestinaldisorder, found in almost a third of outpatients seenin gastroenterology departments [1] and in almostone fourth of patients encountered in a generalpractice setting [2].

Two distinct but significant problems arise whendealing with dyspepsia. Since dyspepsia is still anunsolved clinical enigma [3], it typically presents aninitial classification problem. The majority ofpatients with dyspepsia do not present any definiteanatomical or biochemical anomaly to explain theirsymptoms. These patients are usually classified asaffected by functional dyspepsia (FD) [3—8], andoften undergo unnecessary endoscopies, increasingthe workload and waiting lists of GastrointestinalUnits.

Because of this, in the past there has beenexhaustive research into the best predictors oforganic dyspepsia (OD), and the following riskfactors have been identified: age over 45 years,male gender, anemia, abdominal mass, weightloss, family history of gastric cancer or pepticulcer, chronic use of non-steroidal anti-inflamma-tory drugs, dysphagia, cigarette smoking, jaundiceand vomiting. Recently, Helicobacter pylori (HP)infection has been added to the list of risk factors,and non-invasive screenings have been proposed todecide which patients require endoscopies.

This strategy, however, does not seem to be cost-effective. When using conventional data analysis indyspeptic patients, the unaided clinical diagnosis isoften unreliable in predicting an underlying majorendoscopic lesion.

A first aim of this study was to determine whichsigns and symptoms were predictors of organicdisease in uninvestigated dyspepsia. It is veryimportant to have viable means to directly classifythe typology of disease based on data collected inthe basal examination before performing poten-tially useless endoscopic examinations [9]. From asocial—medical point of view, this can preventinconveniences and loss of time and money to bothpatients and health services. The classificationsystem we propose in this article is able to accu-rately classify OD and FD following a learningphase using the basal examination data. This isdescribed in the classification experiment (CE)[10].

In the case of functional dyspeptic patients, wealso encounter a second prediction problem. Forfunctional dyspeptic patients, infected with theHelicobacter pylori, a possible therapy consists ofthe eradication of the infection. The effectivenessof the therapy, in terms of disappearance of symp-toms is, however, amatter of debate [11—19]. In thissecond case it would, therefore, be very useful to

Optimized experimental protocol based on neuro-evolutionary algorithms 281

have the means necessary to predict the follow-upsuccess of a possible eradication therapy based onbasal examination data [20]; from a social—medicalpoint of view, this would support doctors in theirprescription of a therapy with only probable successand it would help patients obtain the desiredbenefits.

The prediction system proposed in this article isable to foresee, with good accuracy, the 6-monthfollow-up outcomes of the therapy based on theresults of the previous learning phase in which thebasal examination data is studied. This is describedin the prediction experiment (PE).

It is of greatmedical interest to have data-drivenautomatic analysis able to identify input variablesfundamental to increasing the accuracy of theprediction. This is true in both cases described inorder to have indications about the scientificaspects of the pathology. The classification andprediction systems proposed here also performinput selections, specifically tailored to deal withthis issue.

These specific classification and PEs have beencarried out using the NUD-look database [21]. Itrefers to a multicenter observational study per-formed in several Italian Gastrointestinal Units onoutpatients manifesting symptoms of uninvesti-gated dyspepsia. Symptoms were scored and theHP status was determined. Patients with FD andHP infection were randomly given either a standarderadicating treatment or a 1-month course ofempirical treatment. The latter was also given tofunctional dyspeptic patients without infection.Symptoms were re-assessed in functional dyspepticpatients during 2- and 6-months follow-up visits.Patients receiving eradicating treatment were re-tested for HP at the 2-months visit. A total of 861patients were studied and 70.3% of them resultedaffected by FD. Helicobacter pylori infection wasdiagnosed in 71.8% of patients with OD and in 65.0%with FD. The data set was characterized by 98 inputvariables representing personal-socio-environmen-tal, anamnestic, diagnostic and symptom data col-lected at the basal examination and by two differentdependent variables: (a) the verified type of dys-pepsia (FD or OD); and (b) the relief of symptoms 6months after successful eradication, in the subsetof 150 patients with FD, HP positive, undergoingeradication treatment.

This paper is organized as follows. Section 2presents the material studied, describing the pro-blems as well as the methods and techniques used tosolve them; Section 3 presents the results obtainedand Section 4 discusses the results as well as con-clusions, and notes suggestions for future neededresearch.

2. Material studied, methods andtechniques

Section 2.1 formally defines the classification andprediction problems, making reference to specificproblems but using a general approach. Section 2.2presents the neuro-evolutionary systems (NESs) atthe basis of the general EP adopted to solve theproblems described. Finally, Section 2.3 describesthe optimized EP.

2.1. Clinical protocol and problem definition

The study was conducted using prospectively-col-lected data from 861 patients (411 males, mean ageof 46.2 years, range: 16—83 years) with a previouslyuninvestigated dyspepsia, being referred for uppergastrointestinal endoscopy to 42 Italian EndoscopicServices, affiliated to both district general and com-munity hospitals. Exclusion criteria were a previousHP eradication treatment, and concomitant occur-rence of gastroesophageal reflux disease or symp-toms of irritable bowel syndrome as predominantfeatures.

At each center, data was collected before endo-scopy by means of a structured questionnaire. HPstatus was determined on the basis of at least one ofthe following tests: histology, serology, rapid ureasetest, and urea breath test.

After endoscopy, 605 patients (70.3%) were diag-nosed with FD, whereas in the remaining 256patients a major endoscopic lesion was found. HPinfection was ascertained in 393 and 183 patientswith FD or OD, respectively. Patients with endo-scopic lesions were treated appropriately. In HPpositive patients with FD the decision to eradicatethe infection or not was left to the individual inves-tigator participating in the study: of the 393infected patients, 150 subjects received a PPI-basedtriple therapy, and the remaining ones received anempirical course of antisecretory or prokineticdrugs. The 212 uninfected patients were treatedempirically. All patients with FD were followed up asoutpatients with scheduled visits at 2 and 6 months.

We dealt with two distinct problems: the classi-fication of the type of dyspepsia pathology (FD orOD), and the prediction of the 6-months follow-upoutcome (based on patient’s opinion about the dis-appearance/persistence of most of the symptoms).Both the classification and the prediction werebased on the basal examination data, from patientssuffering from FD, infected with HP and who werereferred for and treated with eradication therapy.

Looking at the quantitative aspect, the originaldataset DV ¼ fxig ¼ fðxi1 ; . . . ; xiV Þg�R

V, i = 1, . . .,N, is composed of N records of V real variables, with

282 M. Buscema et al.

N = 861 and V = 98. However, during the pre-proces-sing phase, some variables have been added in orderto distribute, one per variable, in a timely way,the possible values of categorical variables (maximi-zation operation). The resulting number of variablesis then C = 126 and the resulting set is DC ¼fxig ¼ fðxi1 . . . ; xicÞg�R

C, i = 1, . . ., N. The entireDC dataset has been selected for the classificationproblem, together with the corresponding availabletarget dataset TC = {tC,i} � {1, . . ., KC} with i = 1, . . .,N and KC = 2, where the tC,i value indicates theclass of disease (FD or OD) for the ith xi patient. Onlysome of the variables have been selected to berelevant for the prediction problem and only someof the records have been used. The DP ¼ fxig ¼fðxi1 ; . . . ; xiP Þg�R

P, i = 1, . . ., M dataset has beenextracted from the DC dataset, with P = 111 (on 126)maximized variables, corresponding to 89 (on 98)non-maximized variables, and M = 150 records(on 861). The corresponding target dataset TP ={tP,i} � {1, . . ., KP} with i = 1, . . ., M and KP = 2, hasbeen selected, together with the DP dataset,where the tP,i value indicates the health statusof the ith xi patient (disappearance/persistence ofmost of the symptoms). In order to refer to one of thetwo classification and prediction problems withouthaving to differentiate them, whenever possible acommon notation G is used instead of the C andP symbols.

Multiple induction operations can be carried outfor each of the two problems. The ith operation,with a specific sampling algorithm, considers a sub-

set (learning set) D½i�G DG of records, together with

the corresponding target subsets T½i�G TG and,

through a specific induction algorithm (inducer)A[i], with configuration parameters F [i] and initiali-zation parameters Z[i], builds a classifier/predi-ctor V

D½i�G;A½i�;F ½i�;Z½i�

ðÞ capable of classifying/pre-

dicting all the records of a subspace S½i�G , with

D½i�G � S

½i�G �R

G and:

VD½i�G ;A

½i�;F ½i�;Z½i� ðÞ : S½i�G !f1; . . . ;KG gx 7!V

D½i�G ;A

½i�;F ½i�;Z½i� ðxÞ(1)

The classification/prediction accuracy of eachinduction operation on the testing set can be eval-uated with the standard arithmetic and weightedmean values, here indicated with Accarithmetic-ðV

D½i�G;A½i�;F ½i�;Z½i�

ðÞÞ and AccweightedðVD½i�G;A½i�;F ½i�;Z½i�

ðÞÞ.However, a single induction operation is not verysignificant. Several operations have to be performedon the same DG input and TG target dataset, varying

the training and testing sets D½i�G and DG =D

½i�G , or the

induction algorithms A[i]; possibly modifying, in the

latter case, their configuration and/or initialization

parameters F [i] and Z[i]. Instead of a single classi-fier/predictor V

D½i�G;A½i�;F ½i�;Z½i�

ðÞ, specific sets are then

used, constituting a class of induction operations.Their results, statistically evaluated as a whole,make it possible to draw conclusions about the classof induction operations.

The problems assessed in the experimentsdescribed in this paper consist of classifying/pre-dicting the specified variables through an EP thatuses specific collections of classifiers/predictors,trained on the basis of deep DB information mining.

2.2. Neuro-evolutionary organisms

The proposed EP is characterized by the use ofcomplex organisms, based on a strict interactionbetween several ANNs and evolutionary algorithms.In particular, two powerful NESs are exploited: thetraining and testing (T&T) system and the inputselection (IS) system. The genetic doping (GenD)evolutionary algorithm constitutes the core of thesesystems. GenD, T&Tand IS are presented in detail inthe following three sections. The GenD is describedmore thoroughly and more aspects are included forthe purpose of review, even though not all itsoptions are used in these specific T&T and IS appli-cations.

2.2.1. Genetic dopingGenD is an evolutionary algorithm [22] conceived byBuscema in 1998. Unlike classic genetic algorithms[23—27], as we will see, the GenD system maintainsan inner instability during the evolution. Thispresents a type of continuous evolution of the evo-lution and a natural increase of biodiversity duringthe progress of the algorithm. This is mainly dueto specific important characteristics, such as thedefinition of a species’ health aware evolutionarylaw (Section 2.2.1.1), the use of smart geneticoperators (Section 2.2.1.2) and the adoption of astructured organization of individuals (Section2.2.1.3).

2.2.1.1. Species’ health aware evolutionary law. -The GenD behavior is based on considerations aboutthe species’ health, which can be defined as anoverall level of fitness of the entire populationP(k). In each kth generation the average FðkÞ ofthe population’s fitness is calculated. Based on this,a subset V(k) of the population, called the vulner-ability subset of individuals, can be defined as thoseindividuals with a fitness status lower or equal to theaverage value. This subset has a cardinality NV(k)depending on current generation.

Optimized experimental protocol based on neuro-evolutionary algorithms 283

As we will see, two of the main genetic operatorsare the crossover between two individuals and themutation of a single one. Also important, in additionto their specificities, is their use, with regard to thelaw regulating the number of their occurrences ineach generation. Whereas in traditional algorithmsthese numbers are often fixed to appropriate butarbitrary values, in GenD they are immediate func-tions of the species’ health represented by theaverage health. It is clear in the following that acrossover genetic operation with two individuals canor cannot be successful. The number NACO(k) ofattempts of crossover operations is defined for eachgeneration proportional to the number of vulner-able individuals, by a factor 1/a:

NACOðkÞ ¼1

aNVðkÞ (2)

being a = 1 in case of crossover generating one out-put individual, and a = 2 in the case of crossovergenerating two output individuals.

Results of these attempts will divide up into anumber NSCO(k) of successful and a number NUCO(k)of unsuccessful crossover operations:

NACOðkÞ ¼ NSCOðkÞ þ NUCOðkÞ (3)

In each kth generation, all of the aNSCO(k) indivi-duals obtained in the output are inserted into theP(k + 1)th population, taking the place of the vul-nerable ones with aworse fitness value, according toa so-called weakest vulnerable substitution criter-ion.

The choice of which of the input individuals toselect for a crossover is purely random. The choiceof which input individual to select for a mutationoperation, as well as the number of them, is auto-matically calculated based on the results of thecrossover operations. Actually the individualsselected for a mutation are the ‘‘vulnerables’’ thathave not been substituted by new individuals pro-duced by the crossover operations. The numberNM(k) of mutation operations, which are alwayssuccessful, is consequently given by:

NMðkÞ ¼ aNUCOðkÞ (4)

This principle for selecting individuals to whom toapply the mutation operation gives a final opportu-nity to the ‘‘worst’’ (pathologized) individual tocontribute to the evolution of the population (lastopportunity criterion).

The specification of the vulnerability subset andthe dependency defined by the cardinality of thecrossover attempts number and then of the muta-tions number, the individual selection criterionand the substitutions criterion, are the basis of aspecies’ health-aware evolutionary law and, in

combination, constitute the posited vulnerabilitycriterion concept.

2.2.1.2. Smart genetic operators. The geneticoperators used by GenD can be distinguished intwo basic categories: one of global optimizationand one of local optimization.

The former type is defined by genetic operatorsworking in each generation on selected individualsof the population, chosen according to a properselection criterion. They are activated with occur-ring numbers defined by appropriate criteria, suchas the vulnerability criterion. Their successdepends upon the fitness of the selected indivi-dual, as specified by the matchability criterion,and return new individuals who will substitute forother ones in the population, on the basis of asubstitution criterion.

The latter type is defined by genetic operatorsworking in each generation trying to transform eachindividual of the population. They are successfulonly if the fitness of the output individual is greaterthan that of the input individual and if successful,substitute the same transformed input (local opti-mization criterion).

GenD uses two global genetic optimization opera-tors: crossover and mutation. Both are used in allthe real applications and their specificities will bedescribed here in detail. On the other hand, manylocal optimization operators have been defined[28], but they are only used in specific cases notpresented in this paper. Their description and appli-cation are outside the scope of this paper.

Crossover. There are different possible kinds ofgenetic crossover operators, depending upon thetype of solution space within which to operate andupon specificities of the algorithms. GenD uses, inall cases, algorithms representing important com-mon criteria. In addition to the previouslydescribed vulnerability criterion linking the num-ber of crossover attempts to average fitness,another specific criterion called the matchabilitycriterion is available. According to this criterionthe crossover between two individuals x[in 1] andx[in 2] can occur, returning one or two individuals,only if at least one of the input individuals hasfitness value equal to or greater than the averagefitness value:

Fðx½in 1�Þ � F or Fðx½in 2�Þ � F (5)

For each attempted crossover, two input indivi-duals are randomly chosen and they generate out-put (successful crossover operation) only if thematchability criterion is satisfied. The specificoperation details, that define the genetic code

284 M. Buscema et al.

of the output individuals, depend upon the type ofsolution space.

For an ordinary genetic coded problem, in whichevery gene can assume any value of the alphabet,GenD uses a crossover with an algorithm based onthe usual crossover operation and on the specificrejection criterion. The former defines the pair ofoutput individuals by exchanging the genes of thepair of input individuals before and after a crossoverpoint. On the other hand, with reference to analphabet F ¼ f’1;’2; . . . ;’NF

g, the latter definesthe possible previous change to the alphabeticallysuccessive gene (rejection):

’i !’ðiþ1ÞmodNF(6)

of every gene of any of the two input individuals,only if vulnerable, while migrating in the outputindividuals. More specifically: called x

½in h�j the jth

gene of the hth input argument of the (crossover)genetic operator, indicated with x

½in h�j the rejection

operation on the jth gene and fixed to # the ran-domly chosen crossover point, this type of crossovergenetic operator between:

x½in 1� ¼ ½x½in 1�1 ; . . . ; x½in 1�NG

� (7)

and

x½in 2� ¼ ½x½in 2�1 ; . . . ; x½in 2�NG

� (8)

can give rise to two other individuals only if the (5) isverified (matchability criterion). In this case, sup-posing, for instance, the vulnerability of x

½in 2�j , the

following individuals are generated:

x½out 1� ¼ ½x½in 1�1 ; . . . ; x½in 1�# ; x

½in 2�#þ1 ; . . . ; x

½in 2�#þNG

�and

x½out 2� ¼ ½x½in 2�1 ; . . . ; x½in 2�# ; x

½in 1�#þ1 ; . . . ; x

½in 1�#þNG

� (9)

For the permutation genetic coded problems, inwhich the genes of each individual must form apermutation of the entire set of alphabet symbols,and in the special case where the fitness functioncan be evaluated summing partial values F j(x j) onthe single genes:

FðxÞ ¼XNG

j¼1

FjðxjÞ (10)

GenD uses an effective specific algorithm. For thesesub-cases of problems, GenD uses a crossover withan algorithm based on a specific rule preserving thepermutation structure of the solution space. Morespecifically, having fixed to # the randomly chosencrossover point, this type of crossover geneticoperator between the two individuals (7) and (8)can give rise to another individual only if the (5) is

verified (matchability criterion). In this case theoutput individual is generated according to thefollowing algorithm steps:

(a) T

he crossover point # is used to define the firstgene of the output individual:

x½out�1 ¼ ’#

In each of the two input individuals, the tworespective unique genes are found, having the

(b)

same alphabet symbol of the output first gene,fixed in the previous step. The so-called currentgenes positions are initialized to these posi-tions: c1(1) and c2(1) of the input individuals:

x½in 1�c1ð1Þ ¼ x

½in 2�c2ð1Þ ¼ x

½out�1 (11)

The genes of the output individual successive tothe first are iteratively evaluated by a loop of

(c)

steps. For the kth gene (k = 2, . . ., NG):

(c1) In each of the two input individuals, thecurrent gene position is synchronouslymoved circularly a step forward:

c1ðiþ 1Þ ¼ ðc1ðiÞ þ 1ÞmodNG and

c2ðiþ 1Þ ¼ ðc2ðiÞ þ 1ÞmodNG

(12)

until at least one of the two genes in thepositions c1(ik) and c2(ik) corresponds to asymbol not already present in the outputindividual.

(c2) If only one input individual has in the c5(ik)position a symbol not present in the outputindividual, the kth gene is chosen from thatparticular input individual; or else, if bothof them contain symbols not present in theoutput individual, the kth gene is chosenfrom the input individual that maximizesthe first k composing terms of the fitnessfunction for the output individual:

Xki¼1

Fiðx½out�i Þj

x½out�k

¼x½out�k

¼ Maxs¼1;2

XkFiðx

½out�i Þj

x½out�k

¼x½in s�csði Þ

( )(13)

i¼1 k

Mutation. The adopted genetic mutation operatorfollows the classic laws for ordinary and permuta-tion genetic coded problems: random change of thealphabet value of a random gene and exchange oftwo random genes, respectively. On the other hand,it follows the specific vulnerability criterion as far asthe number of mutations is concerned, which is r-elated to the average fitness as specified in (4), aswell as to the selection of individuals subject to

Optimized experimental protocol based on neuro-evolutionary algorithms 285

mutations, selected as vulnerable not substituted bythe crossover.

2.2.1.3. Structured organization of individuals. Inclassical genetic algorithms the population is con-sidered as a unique set. GenD can also adopt astructured organization of individuals, distinguish-ing them in NT tribes:

PðkÞ ¼ [NT

t¼1P½t�ðkÞ (14)

In the tribe form of GenD, two types of geneticoperators are used: the intra-tribe and the inter-tribe. The first type, the intra-tribe, includes globaland local genetic operators as defined in the pre-vious sections, but applied to the population ofsingle tribes as if they were the complete popula-tion. All of the self-defined parameters regulatingtheir behavior are calculated at the tribe level. Thesecond type, the inter-tribe, includes only globalgenetic operators.

The evolution of the tribe form of GenD canthen be broken down into two distinct phases: theintra-tribe phase and the inter-tribe phase, car-ried out in this order on each generation. In thefirst phase, parallel intra-tribe processes generatethe new first-phase tribe populations P0[t](k + 1) onthe basis of the old tribe populations P[t](k), usingthe intra-tribe genetic operators. In the secondphase, for each pair of tribes, parallel inter-tribeprocesses generate the new final second-phasetribe populations P[t](k + 1) on the basis ofthe first-phase tribe populations of all tribesP0[t](k + 1), using the inter-tribe genetic operators.The intra-tribe genetic operators are the same asfor the previously noted single tribe algorithm.The inter-tribe genetic operators are derived fromthem, with some additional specificity that isnecessary for considering the demands of themulti-tribe dimensions. More specifically, twooperators are used: the inter-crossover and themigration.

Inter-crossover. In the second phase, the inter-tribe phase, one inter-crossover attempt is imple-mented on each generation for each unordered pairof tribes: an overall number NAICO(k) of attempts ofinter-crossover operations is then fixed:

NAICOðkÞ ¼ NTNT � 1

2(15)

A random individual is selected in each attempt,both from the tth and the sth tribe: x[t][in 1] andx[s][in 2]. An inter-tribe matchability criterion isadopted, stating that the inter-crossover between

x[t][in 1] and x[s][in 2] can occur only if both theinput individuals have fitness values equal to or

greater than their respective average tribe-fitnessvalue:

Fðx½t�½in 1�Þ � F ½t� and Fðx½s�½in 2�Þ � F ½s� (16)

If this condition is verified, the inter-crossover isperformed as a usual crossover, as described in theprevious section, differently for the ordinarygenetic coded problems and for the permutationgenetic coded problems. In the first case two out-puts are generated, and they have to replace oneindividual in each tribe of the pair. According to asolidarity criterion, the output individual with thebest fitness substitutes for the individual with theworst fitness in the tribe with worst average fitness,and the other output individual substitutes for theindividual with worst fitness in the other tribe. In thesecond case, that of the permutation genetic codedproblems, one output individual is generated, and ithas to replace one individual in only one of the twotribes. According to the single output version of thesolidarity criterion, the output individual substi-tutes for the individual manifesting the worst fitnessin the tribe with worst average fitness.

Migration. After the inter-crossover one migra-tion attempt is carried out on each generation foreach ordered pair of tribes: an overall numberNAMO(k) of attempts of migration operations is thenset:

NAMOðkÞ ¼ NT ðNT � 1Þ (17)

In each attempt, a random individual is selectedfrom the sth tribe: x[s][in]. A migration criterion isadopted, stating that the migration of the x[s][in]

individual into the tth tribe can occur only if thefitness value of the individual to migrate is equal toor greater than the average fitness value of the tribeinto which to migrate:

Fðx½s�½in�Þ � F ½t� (18)

If this condition is verified the migration is carriedout, generating a copy of the source individual,operating a standard mutation on it and substitutingfor that individual the individual with the worstfitness in the target tribe.

2.2.1.4. GenD general performances. The perfor-mances of GenD are characterized by several defi-nitive conditions, all of which are related to thedescribed criteria and detailed operations uponwhich the evolutionary algorithm is based.

At the basis of all mechanisms there is the controlover the crossover and mutation numbers of theaverage fitness, in a feedback loop acting at apopulation level.

An important key feature is a type of innerinstability and evolution of the evolution presented

286 M. Buscema et al.

by GenD during its operation: the average fitnesstypically does not simply grow according to a quasi-monotonic law. Sudden limited drops in its valueoccur and restart growing immediately into highervalues. This happens more frequently as the averagevalue increases. The reasons for this occurrence arefound mainly in the vulnerability criterion, based onthe health-aware evolutionary law, linking the num-ber of crossover attempts and mutations to averagefitness. It enforces more genetic substitutions/changes, with regard to vulnerable individuals,the more this average fitness increases. The benefitof that inner instability is significant in that itallows the algorithm not to stall at low valuesand to regenerate itself continuously, evolving itsevolution.

Another significantmark is the natural increase ofbiodiversity that occurs with the progress of thealgorithm. During evolution, the genetic codes ofindividuals tend to differentiate and to adequatelycover the solution space area near its optimum. Theorigin of this behavior is due to the combined effectof different aspects. The vulnerability criterion con-tinuously increases the genetic operations on indi-viduals as the optimum is approached. Intelligentuse of the matchability criterion does not allow forcrossover between ‘‘bad’’, vulnerable, individuals.These individuals can, however, genetically contri-bute to the biodiversity by effectively joining a non-vulnerable individual in terms of the effects of thelast opportunity criterion. The rejection criterion inthe crossover operation improves the resultant fit-ness and, simultaneously, differentiates the geneticcodes. Finally, the richness of operations existing inthe set of local optimization genetic operators,when used, can have a significant effect on increas-ing biodiversity.

All of these points, together with other GenDcharacteristics that describe possible tribal organi-zation, contribute to the program’s most importantquality–—effectiveness in terms of its speed and capa-city of convergence with a broad range of problems.

Table 1 Three color problem benchmark

Matrix Individuals GA Gen

Generations to obtainthe first solution

Genthe

Min Av. Max Min

5 � 5 50 152 295 549 505 � 5 100 29 145 627 306 � 6 100 104 183 273 427 � 7 200 510 750 (29%) F 877 � 7 300 471 471 (14%) F 847 � 7 400 341 535 (57%) F 87

Detailed GenD benchmarks have been reportedin [22] for both a simple ordinary genetic codedproblems and a more challenging permutationgenetic coded ones. Here we give only an outline,showing the most important aspects. A first bench-mark relates to the three colors problem [23],where a n � nmatrix is to be colored avoiding colorcontiguity in rows and columns. A second bench-mark is based on the traveling salesperson problem(TSP), in which the shortest loop between a set ofdistanced points of an Euclidean space has to befound.

The first problem has been used to analyse thebehavior of the basic GenD engine. The classicgenetic algorithm proposed by Davis [23] (GA) hasbeen taken as reference. For these tests we haveconfigured GenD with a single tribe and with no localoptimization operators. In this way the two algo-rithms have the maximum possible similarity in theirgeneral structure and present major but fundamen-tal differences only in the basic operators’ logic anduse. Wemeasured the results in terms of the numberof generations needed to obtain the first sub-opti-mum solution and the number of new sub-optimumsolutions per generation.

For some chosen initialization values, when thesevalues are pushing the algorithm to its limit theexecution did not produce a solution ( F) in a reason-able computational time (1000 generations). In suchcases the average values have been chosen referringto the successful part of the executions only, and therelative percentage is indicated. For the classic GAthe production of new solutions is equivalent to analgorithm restart. During the test, the equivalencefor the two algorithms in the production of genera-tions per second has been obtained, as well as thenot inferior optimality of the solutions found byGenD with respect to GA. Therefore, the valuesreported in the following table indicate how muchthe essential GenD specificities described clearlyresult in a performance advantage for its basicengine in terms of speed (Table 1).

D

erations to obtainfirst solution

New solutions per generation

Av. Max Min Av. Max

89 129 3.6 8.9 11.660 81 2 6.4 17.554 62 0.62 0.81 1.08

128 (85%) F — 1.18 (85%) 2.1141 (85%) F — 1.03 (85%) 2.02108 139 0.25 0.83 1.07

Optimized experimental protocol based on neuro-evolutionary algorithms 287

Table 2 TSP benchmark–—fixed parameters

Dataset Mst Len Ind (%)

Greedy CGA EGA SA GenD Greedy CGA EGA SA GenD

City30 324 473 425.51 423.74 423.74 423.74 54.01 68.67 69.22 69.22 69.22City100 606 892.05 902 782 800.34 761.53 52.80 51.16 70.96 67.93 74.33Eil51 376 496.4 484 437 435.45 428.87 67.98 71.28 83.78 84.19 85.94Eil76 441 606.77 632 566.95 567.79 544.37 62.41 56.69 71.44 71.25 76.56Eil101 516 736.36 986.55 687.89 685.75 641.02 57.29 8.81 66.69 67.10 75.77Berlin52 5988 8182.19 8618.92 7958.05 7544.36 7544.37 63.36 56.06 67.10 74.01 74.01Bier127 93844 125023 161429 123063 123969 118562 66.78 27.98 68.86 67.90 73.66Ch130 5072 7195.33 9425.41 6478.14 6457.17 6147.82 58.14 14.17 72.28 72.69 78.79

The second problem, the TSP, has been broughtinto play in order to investigate the maximum opti-mizing power of GenD compared to other knownalgorithms. In particular, four of these have beenconsidered for comparison: (a) greedy (Reetz’s ver-sion [29]); (b) classic genetic algorithm (CGA) (Kli-masauskas’ version [30]); (c) enhanced geneticalgorithm (EGA) (Klimasauskas’ version [30]); and(d) simulated annealing (SA) (Klimasauskas’ version[31]). All the algorithms have been verified on theeight well-known datasets: City30, City100 [29] andEil51, Eil76, Eil101, Berlin52, Bier127, Ch130 [32].We implemented twogroupsof tests: thefirst is basedon a fixed number of generations and individuals forall algorithms, the second contains variable para-meters in the attempt to obtain maximum fitness.

For the first group, we configured GenD with ahigh profile: five tribes, each composed of ten indi-viduals, using local optimization operators. We con-figured the other algorithms with the same globalnumber of 50 individuals. During the execution weset a limit at 10,000 generations, because it is asufficient number of generations to stabilize analgorithm configured in this way.

The aim of this benchmarkwas to perform a globalcomparison of GenD with other available algorithms,rather than performing a deep algorithm comparisonon each possible dataset. Therefore we only per-formed one run per algorithm and dataset couple.It is possible to obtain some answers about the power

Table 3 TSP benchmark–—varied parameters

Dataset Mst Tribes Individualsper tribe

City30 324 5 10City100 606 5 10Eil51 376 5 10Eil76 441 5 10Eil101 516 20 50Berlin52 5988 5 10Bier127 93844 5 50Ch130 5072 50 30

of GenD through a global reading of the entire set ofresults. The results have been measured in terms ofpath length, both as pure value (Len) and as valuerelated to the minimum spanning three values (Mst),according to Reetz’s version [29]. In the latter casethe following index (Ind) has been used:

Ind ¼ 1� Len� Mst

Mst(19)

The results show how, in these conditions, GenDalways outperforms or equals the alternative algo-rithms (Table 2).

In the second group of tests on the TSP problem,we have varied the parameters independently foreach algorithm and dataset. While in this case forthe four reference algorithms the results were nobetter, GenD showed some improvements. In thefollowing table we have reported all the best GenDresults (Table 3).

2.2.2. T&TT&T is an evolutionary system conceived by M.Buscema in 2000. Based on the GenD evolutionaryalgorithm, it is able to use the different records ofthe available dataset intelligently in order to trainand test inducers, rather than dividing training andtesting set records randomly [33].

The T&T system extracts an intelligent trainingsample from the dataset following a non-probabil-istic technique.

Len Ind (%) GenD generations

423.74 69.22 141761.53 74.33 742428.87 85.94 682544.37 76.56 6014640.42 75.89 17307544.37 74.01 189118293.52 73.95 16176112.07 79.49 4246

288 M. Buscema et al.

There areNK

� �possible samples D

½tr�G of K records

for the training, given a dataset DG of N records.

Varying K, it is alsoPN

K¼0NK

� �¼ 2N, although only

values of K with 0 < rN � K � (1 � r)N < N are ofinterest, with r, usually, being no smaller than 0.4.

For each specific training sample D½tr�G , aninducer

VD½tr�G

;A;F;ZðÞ can be built by selecting an induction

algorithm A, its configuration parameters F and itsinitialization parameters Z. The problem of whichpair of training and testing sets should be used arisesspontaneously. Single accuracy performances foreach pair can be measured using the standard arith-metic and weighted mean values, whereas theirmean performances, on different training sets,can be measured by performing a specific bench-mark. However, since we are building and using aspecific real classifier/predictor, we are more inter-ested in the performance of the best that is obtain-able from the available data rather than in theperformance of the class to which it belongs.

Byvarying the training set,wecan thereforeassessif the goodness of the results obtained for each of theclassifiers/predictors depends on whether the diffi-cult parts of the dataset and the true outliers areselected in the training or not. If the dataset isrepresentative of the real data, as assumed, thesevalues will likely be included, giving the inducer theability to successfully operate on this data.

Analyzing the recordsof thedatasetbeforebuildingthe inducer constitutes a type of data mining aimedat extracting the maximum amount of informationfrom the data itself. On real data, of which a sampleis presented in this paper, the near optimum induceris expected to work with a similar level of accuracy,provided that the given data is representative.

From a more technical perspective, T&T isachieved with a GenD algorithm considering, as a

space of possible solutions, thePð1�rÞN

K¼rNNK

� �< 2N

sensible partitions of records between the trainingand testing sets. An alphabet of two symbols is usedto genetically code those solutions:

FT&T ¼ f’tr;’tsg (20)

wherein the first symbol (wtr) represents thoserecords belonging to the training set D

½tr�G , and the

second symbol (wts) the records belonging to the test-

ing set D½ts�G . A single possible solution x ¼ ðD½tr�

G ;D½ts�G Þ

is then a pair of training and testing sets and is givenby the vector:

x ¼ ðD½tr�G ;D

½ts�G Þ ¼ ½x1; x2; . . . ; xN� 2FN

T&T; xi 2FT&T

(21)

T&T includes two main phases. The preliminaryphase, which evaluates the parameters of the fit-ness function to be used on the given dataset andthe computational phase, which is used to extractthe best training and testing sets that partition thedataset records.

During the preliminary phase an inducerV

D½tr�G

;A;F;ZðÞ, used to evaluate the fitness of the

evolving population individuals, is configured. T&Tuses a standard back propagation ANN as an inducer.Its configuration parameters are evaluated duringthe preliminary phase: the number of hidden units,the number of layers, and some possible enhance-ment [34] of the standard learning law [35]. Thetotal dataset DG is used during this phase.

Several training trials using the same total setallow us to stabilize the configuration suitable to theavailable dataset, as well as to establish a meannumber of epochs (E0), which is necessary in order toreach the total convergence. The back propagationinducer is used with both fixed configuration andinitialization parameters (weights) during the sub-sequent computational phase.

The choice of this simple type of ANN over morecomplex ones is mainly due to its convergence speedand its ability to correctly address polarizationissues. This is the simplest inducer used. A potentialpolarization side effect on the choice of the sets,potentially disturbing their use with inducers, wouldtherefore result in a possible advantage of thissimple inducer, which is otherwise usually the leastsuccessful, leaving the more useful one unaffected.

In the computational phase, the initial populationof possible solutions evolves following the GenDalgorithm as applied to the specific case. The typeof problem is an ordinary genetic coded one and onlythe simplest options are used: single tribe and globalgenetic operators (crossover and mutation). In suchconditions GenD makes it possible to reach thedesired evolution in a reasonably short time.

There are twomain options in T&T behavior: basicand reverse. When using the basic behavior, in eachGenD epoch the same fixed back propagation inducer

is trained on the proposed D½tr�G training set for E0

epochs and tested on the corresponding D½ts�G testing

set for each possible solution x ¼ ðD½tr�G ;D

½ts�G Þ. The

arithmetic mean accuracy, calculated on the testingset, is used as the solution’s fitness value.

When using the reverse behavior, in each GenDepoch the fixed back propagation inducer is first

trained on the proposed D½tr�G training set for E0

epochs and tested on the corresponding D½ts�G testing

set for each possible solution x ¼ ðD½tr�G ;D

½ts�G Þ. The

Optimized experimental protocol based on neuro-evolutionary algorithms 289

inducer is then trained on the proposed D½ts�G testing

set for E0 epochs and tested on the correspondingD½tr�G training set. The two arithmetic mean accura-

cies, calculated on the two sets used in the testingand averaged, are used as a solution’s fitness value.

The basic behavior is faster and can be usedconveniently when the dataset manifests goodrepresentativeness. The reverse behavior is indi-cated when there are specific representativenessproblems in the dataset, in order to achieve a moreprecise accuracy measure.

Other kinds of accuracies can be considered, forexample, a weighted one or a personalized formula,in addition to the arithmetic mean, as a fitness valueproduced by the back propagation inducer. These canbe defined on the basis of specific industrial needs.

2.2.3. ISIS is an evolutionary system conceived by M. Bus-cema in 2000, which is based on the GenD evolu-tionary algorithm. It is able to intelligently weighthe different variables of the available dataset,rather than burdening the inducer to learn whichof them are relevant [33]. It operates as a specificevolutionary wrapper system for feature subsetextraction. Unlike a filter system, selecting featureson the basis of measures on data alone, it uses thesame learning algorithm for feature selection andevaluation [36].

The IS optimized sets are built considering only asubset of variables. Given a dataset DG of N recordsandM variables, which are divided into a training set

D½tr�G and a testing set D

½ts�G , there are

MH

� �possible

samples D0G , and then correspondently D0½tr�

G and

D0½ts�G , of H extracted variables in which, varying

H, it is alsoPM

H¼0MH

� �¼2M. An inducer

VD0G ½tr�;A;F ;ZðÞ can be built with each sample, choos-

ing an induction algorithm A, with its configurationparameters F and its initialization parameters Z.

It is possible to choose variables to achieve thebest performance, in a transposed mode withrespect to T&T, for a given pair of training andtesting sets. Varying the input variables we canconsider that the goodness of the obtained resultsfor each of the classifiers/predictors depends uponwhether the chosen variables, selected together,are relevant or not. We cannot know, in principle,the exact causes of the variable’s relevance orirrelevance. The variables can be considered tobe irrelevant by the system because they manifestredundant information, already present in othervariables, or because they have no information

at all, or because they manifest confusing informa-tion or noise. In any case, we have no reason toinclude these variables, when we can effectivelyeliminate them. As in T&T, the operation of analyz-ing the variables of the dataset before building theinducer constitutes another type of data mining,which also finalizes the extraction of the maximumamount of information present in the variables.

From a more technical perspective, IS is achievedwith a GenD algorithm, considering, as space of

solutions, thePM

H¼0MH

� �¼ 2M possible combina-

tions of variables and using the following two-sym-bol alphabet:

FIS ¼ f’rel;’irrelg (22)

where the first symbol (wrel) represents a variable’sbelonging to the set of relevant variables Vrel, andthe second symbol (wirrel) represents a variable’sbelonging to the set of irrelevant variables Virrel.A single possible solution x = Vrel, Virrel is then a pairof relevant and irrelevant variable sets and is givenby the vector:

x ¼ ðV rel;V irrelÞ ¼ ½x1; x2; . . . ; xM� 2FMIS;

xi 2FIS

(23)

In a similar way to the T&T, a preliminary phaseand a computational phase are used to evaluate theparameters of the fitness function and, respectively,to extract the most relevant training and testing setvariables. A standard back propagation ANN is usedas inducer V

D½tr�G

;A;F;ZðÞ for the fitness function. It is

configured during the preliminary phase and usedduring the successive computational phase withboth fixed configuration and initialization para-meters (weights).

In the preliminary phase, particular attention isgiven to configuring the ANN parameters in order toavoid potential overfitting problems, which canarise when only a small amount of data is availablewith a large number of variables [36,37]. The num-ber E0 of epochs used to train the inducer is carefullyevaluated so as to stop its learning phase on thetraining data before overfitting occurs.

GenD applied to IS usefully proposes otherchoices in addition to selecting the best set ofvariables, with a fitness that is less or equal tothe best. These can be chosen as a practical libraryof input selections to be used when the best choiceof variables is not accessible or is not convenientdue to economic or other reasons.

As is typical for genetic wrapper systems, com-pared to other machine learning approaches, IS canexhibit a heavier computational time, but it alsopresents some useful advantages [36]. When com-

290 M. Buscema et al.

pared to decision tree algorithms, such as ID3, C4.5and CART or to instance-based algorithms, like IBL,IS presents a greater performance robustness to thepresence of many irrelevant features. Whereas, incomparison with naive-Bayes algorithms, robustnessto a presence of correlated features, even if rele-vant, can be observed. IS inherits the robustness ofgradient based ANNs, like back propagation [38],and the flexibility of evolutionary algorithms toexplore the space of the program solutions [25].

2.3. Experimental protocol

The EP used has two different macro phases: thebenchmark phase and the optimization phase.

In the benchmark phase two standard inductionalgorithms for classification/prediction are used intwo series of induction operations: standard LDAclassical statistical algorithm (A[LDA]) [39—42] andan enhanced version [34] of the standard multi layerperceptron (MLP) [35] classical ANN algorithm

(A[MLP]). The mean and the variance of the resultsreached on each series are evaluated and compared.This allows for a first assessment on the given data-set: its treatability with standard classification/pre-diction algorithms, its representativeness of the realdata and its potential for being effectively processedduring the second phase of the research protocol.

Whereas, during the benchmark phase, meansand variances of accuracy values are determinedusing the usual inducers, the accuracy maximum issearched for and estimated during the optimizationphase. This is achieved by acting on several factorsof the classification/prediction problem. One ofthese factors is the quality of the component induc-tion algorithms. A set of the best ANNs and organ-isms with powerful proprietary architectures is usedfor learning during the training sets and to classifyand predict data. Another factor to take into con-sideration is the differential relevance of inputvariables. The IS system is used to select the bestinput variables, excluding those whose informationcan be more confusing than useful.

Finally, optimization is also implemented throughthe T&T, by considering the variability of obtainableresults when varying the training set D

½i�G . The accu-

racy value achieved is therefore an estimate of themaximum value achievable with a given dataset,using the best networks, choosing the most relevantinput variables and extracting themaximum amountof information from the available records.

2.3.1. Benchmark phaseThe two classification/prediction induction algo-rithms, respectively, the standard and enhanced

standard LDA and MLP, are selected during thebenchmark phase.

Initially, a number B (typically B = 10) of training

sets D½i�G DG , i = 1, . . ., B is randomly extracted,

with a ratio jD½i�G j=jDG j ffi 0:5. Then, for each training

set D½i�G , both an LDA and an MLP classifier/predictor

are built. The first one, which manifests no specificconfiguration and initialization parameters, isnamed V

D½i�G;A½LDA�;F ½��;Z½�� ðÞ. The second one is built

with the same configuration parameters F ½i0� for eachi, and random initialization parameters Z[i] and is

named VD½i�G;A½MLP�;F ½i0 �;Z½i�

ðÞ. Each classifier/predictor

is then used to classify/predict the records in thetesting set DGG

½i�. Their respective accuraciescan be evaluated: AccXðVD

½i�G;A½LDA�;F ½��;Z½�� ðÞÞ and

AccXðVD½i�G;A½MLP�;F ½i0 �;Z½i�

ðÞÞ, where X stands for arith-

metic orweighted. The same ith sampling is also usedto build other B symmetric classifiers/predictorsV

DG G½i�;A½LDA�;F ½��;Z½�� ðÞ and V

DG G½i�;A½MLP�;F ½i0�;Z½i

0 � ðÞ,inverting the training and the testing sets.

The accuracy values of the inverted set could beanalogous to those obtained with the non-invertedset. In this case the selected sample is actually agood sample, being representative of a real phe-nomenon. On the other hand, it is also possible thatthis inversion contributes to depolarizing the mea-sure. Therefore, for each induction algorithm, aperformance measure can be given in terms ofmeans (m

X;A½LDA� and mX;A½MLP�) and standard devia-

tions (sX;A½LDA� and s

X;A½MLP�) of the obtained accura-

cies on the basis of the results obtained with the 2Binduction operations.

A schematic data flow diagram for this bench-mark phase is given in Fig. 1.

These first induction operations, the LDAs,and the measures described give us some usefulinformation about the dataset and the inductionson it, in terms of classification/prediction com-plexity, inducer’s stability on the different samplesand, at the same time, dataset distribution unifor-mity at the variation of the extracted sample.Analogous information is obtained by the secondinduction operations, the MLPs. But, unlike LDA,more complex class regions in the sample spacecan be defined, as allowed by the MLP behavior,and it is then obviously expected that MLP willoutperform LDA. The MLP mean accuracy cons-titutes an inferior limit for the classifications/predictions that are successively made in theoptimization phase of the EP using more powerfulinducers.

The MLP accuracy, when compared to that of theLDA, defines a difference the statistical impor-

Optimized experimental protocol based on neuro-evolutionary algorithms 291

Figure 1 Benchmark phase–—data flow diagram.

tance of which can be evaluated and used to obtaina sort of higher-order measure on the complexityof the dataset. A paired t-test [39] can be used inorder to assess the significance of the differencebetween the two mean values. The paired test ispreferred, with respect to the alternative unpairedtest, since the comparison measure is able to takeinto account only the differences between the twoalgorithms, leaving out the variability due to thetraining tests. The resulting t value is comparedwith the critical level ta,d.f. of the t-distribution,dependent on the d.f. = 2B � 1 degrees of freedomand on the desired level of significance a. In thepresent experimentation a level of significancea = 0.05 has been chosen. The greater the valuejtX;A½LDA�;A½MLP� j is with respect to ta,d.f., the more

significant will be the difference of the meanvalues of the accuracies measured in the bench-mark phase.

When the experimentation described is carriedout with insufficient data, the N-times two-foldcross validation scheme used (N � 2 CV) to handletraining and testing sets [43] could create somedifficulty in the learning phase for the inducers,due to the small size of the training set. Otherpossible schemes could be adopted, like the N-foldCV or the leave-one-out, which allow for easierlearning. On the other hand, these other approaches

create difficulties during the testing phase due tothe small size of the testing sets.

Our choice is therefore to configure the inducerswith a sufficient learning power enabling them tounderstand the given data, so that the testing phasecan maintain a good representativeness.

2.3.2. Optimization phaseThree sub-phases are considered in the optimizationphase: the optimization of samples, the selection ofthe input variables and the training of the advancedANNs.

The optimization of samples is carried out usingthe T&Tsystem, based on the evolutionary algorithmGenD. T&T divides the dataset DG into the two sub-sets, the training set D

½tr��G and the testing set D

½ts��G ,

trying toapproximate theaccuracymaximizationof ageneric neural classifier/predictor V

D½tr�G

;A;F ;ZðÞ:

AccXðVD½tr��G ;A;F ;Z

ðÞÞ ¼ MaxD½tr�G records

fAccXðVD½tr�G ;A;F ;Z

ðÞÞg

(24)

The second sub-phase of the optimizationphase operates on the training and testing setsD½tr��G and D

½ts��G obtained with T&T, and through a

selection on the most relevant of the input vari-ables. Based on this it generates two new training

292 M. Buscema et al.

and testing sets D0½tr���G and D0½ts���

G . This kind ofoptimization is carried out using the IS system.

On both the T&T optimized training and testingsets, generally indicated with D

½ts���G , IS extracts the

same variables, obtaining the D0½ts���G sets, trying to

approximate the accuracy maximization of a gen-eric neural classifier/predictor V

D0½tr��G

;A;F ;ZðÞ:

AccXðVD0 ½tr���G ;A;F ;Z

ðÞÞ ¼ MaxD0 ½tx��

G variables

fAccXðVD0 ½tr��G ;A;F ;Z

ðÞÞg (25)

Significant improvements are not expected in a goodsample. It might even be impossible to successfullyeliminate variables.

The third sub-phase of the optimization phaseoperates on the training and testing sets D0½tr���

G andD0½ts���

G obtained through IS, and performs the train-ing of advanced ANNs. Several networks are trained:from classical ANNs, possibly enhanced in one ormore aspects, to highly performing neural organ-isms. The choice of the set of networks, here calledinducer algorithms A, and the choice of the relativeconfiguration and initialization parameters F and Zis usually based on the researcher’s experience andon the execution of batch processes that examinepre-configured sets of ANNs, rather than on the useof another NES. This is due to the small number ofpossibilities and to prior information about thebehavior of the networks and about the problem.

The use of another NES could be justified inspecific situations. However, the processing that isdone is again an optimization operation on theinducer algorithms A and the relative configurationand initialization parameters F and Z in which thevalues of, A*, F* and Z* are optimized, trying toapproximate the accuracy maximization of a gen-eric neural classifier/predictor V

D0½tr���G

;A;F;ZðÞ:

AccXðVD0 ½tr���G ;A�;F�;Z� ðÞÞ ¼ Max

A;F ;ZfAccXðVD0 ½tr���

G ;A;F ;ZðÞÞg

(26)

A schematic data flow diagram for this optimiza-tion phase is given in Fig. 2.

Figure 2 Optimization phase–—data flow diagram.

3. Results

We applied the EP described in Section 2.3, com-posed of two benchmark and optimization phases,to the two experiments which were the object of thestudy: the classification of the type of pathology (FDor OD) and the prediction of the 6 months follow-upoutcome on patients suffering from FD, infectedwith the HP and treated with eradication therapy.

3.1. Classification of FD and OD

The CE has selected the original DV datasetwith N = 861 records and V = 98 variables and trans-formed it, through a maximization process, in the DC

dataset with N = 861 records and C = 126 variables.The 98/126 selected variables, inputs to the indu-cers, are reported in Table 4:

The N = 861 records were composed of unba-lanced cases, in which:

jfxjjxj 2DC; tC;j ¼ 1gj ¼ 256 cases of OD

and

jfxjjxj 2DC; tC;j ¼ 2gj ¼ 605 cases of FD:

giving an OD versus FD cases ratio for the sample:

jfxjjxj 2DC; tC;j ¼ 1gjjfxjjxj 2DC; tC;j ¼ 2gj ¼

256

605ffi 0:42: (27)

3.1.1. Benchmark phaseIn thebenchmark phase,B = 10 training setsD

½i�C DC,

i = 1, . . ., B have been randomly extracted, with

Optimized experimental protocol based on neuro-evolutionary algorithms 293

Table 4 Variables for the experiments

Variables Classificationset

Classificationselected set

Predictionset

Predictionselectedset

Personal data–—social environmental1 Gender Male 1 1 1 1

Female 2 2 2 22 Body weight In kg 3 3 33 Smoking Never; no; yes 4 4 4 34 Alcohol No; yes 5 5 5 45 Age In years 6 6 6 56 Height In cm 7 7 7 67 Origin General practitioner 8 8 8 7

Specialist doctor 9 9 9 8Other 10 10 10Not specified 11 11 11

8 Job Manual work 12 12 12Professional 13 13 13 9Housewife 14 14 14 10Rural 15 15 15Employee 16 16 16Unemployed 17 17 17Retired 18 18 18Student 19 19 19Not specified 20 20 20

9 Education Primary school 21 21Junior high school 22 21 22Senior high school 23 22 23University 24 23 24Not specified 25 25 11

10 Marital status Married 26 26 12Not married 27 24 27 13Divorced 28 28Widow/widower 29 25 29Note specified 30 26 30 14

11 First gastroenterological visit Yes/no 31 31 15

Previous gastrointestinal diagnosis12 Duodenal ulcer Yes/no 32 3213 Gastric ulcer Yes/no 33 2714 GERD Yes/no 34 28 3315 Chronic gastritis A Yes/no 35 29 34 1616 Chronic gastritis B Absent 36 30 35 17

Present 37 31Hyperaemic 38 32Erosive 39 33Antral 40 34Not erosive 41 35

17 IBD Yes/no 42 3618 Cholecystopathy Yes/no 43 3719 Irritable bowel Yes/no 44 3620 Previous melaena Yes/no 45 38 37

Previous diagnostic tests21 Clinical test Yes/no 46 39 3822 Direct Rx Yes/no 47 40 3923 Phmetry Yes/no 48 4124 Cholecystography Yes/no 49 42 4025 Muddy clyster Yes/no 50 43 4126 Colonoscopy Yes/no 51 44 42

294 M. Buscema et al.

Table 4 (Continued )

Variables Classificationset

Classificationselected set

Predictionset

Predictionselectedset

27 Hepatic echography Yes/no 52 45 4328 Gastroscopy Yes/no 53 46 4429 Other tests Yes/no 54 47 45 18

Previous gastroenterologic therapy30 Anti H2 Yes/no 55 48 4631 IPP Yes/no 56 49 47 1932 Prokinetics Yes/no 57 50 4833 Cytoprotectors Yes/no 58 51 4934 Antiacids Yes/no 59 50 2035 Anxiolytics Yes/no 60 52 5136 Bile acids Yes/no 61 5337 Eradication therapy Yes/no 62 54 5238 Other medicines Yes/no 63 53

Last month therapy39 Antibiotic Yes/no 64 5440 Antisecretories Yes/no 65 55 5541 FANS Yes/no 66 56 56 2142 Cytoprotectors Yes/no 67 57 5743 Other therapy Yes/no 68 58 58

Known factors for organic aetiology44 Anaemia Yes/no 69 59 59 2245 Abdominal mass Yes/no 70 6046 Ponderal loss Yes/no 71 61 60 2347 Familiarity history

for gastric cancerYes/no 72 62 61

48 Familiarity historyfor peptic ulcer

Yes/no 73 63 62

49 Dysphagia Yes/no 74 6450 Smoking

(>10 cigarettes per day)Yes/no 75 63

51 Jaundice Yes/no 7652 Recurrent vomiting Yes/no 77 65 64

Diagnostic module53 Epigastric pain Yes/no 78 66 65 2454 High frequency pain Yes/no 79 67 66 2555 High intensity pain Yes/no 80 68 67 2656 Localized pain Yes/no 81 69 6857 Pain reduced by food Yes/no 82 70 69 2758 Pain reduced by antiacids Yes/no 83 70 2859 Pain present on an empty

stomachYes/no 84 71

60 Waking up pain at night Yes/no 85 71 72 2961 Epigastric pyrosis Yes/no 86 7362 High frequency pyrosis Yes/no 87 72 7463 High intensity pyrosis Yes/no 88 73 75 3064 Discomfort Yes/no 89 74 7665 High frequency discomfort Yes/no 90 75 77 3166 High intensity discomfort Yes/no 91 76 78 3267 Localized discomfort Yes/no 92 77 7968 Discomfort burst by food Yes/no 93 80 3369 Early satiety Yes/no 94 8170 Vomiting Yes/no 95 78 8271 Nausea Yes/no 96 79 83 34

Optimized experimental protocol based on neuro-evolutionary algorithms 295

Table 4 (Continued )

Variables Classificationset

Classificationselected set

Predictionset

Predictionselectedset

72 Bloating Yes/no 97 80 8473 Dyspepsia classification Ulcer like 98 81 85 35

Dismotility like 99 86 36Unspecified 100 82 87

Connected symptoms74 Retrosternal pain Yes/no 101 83 88 3775 High frequency

retrosternal painYes/no 102 89 38

76 High intensityretrosternal pain

Yes/no 103 84 90

77 Regurgitation Yes/no 104 91 4978 High frequency regurgitation Yes/no 105 85 92 4079 High intensity regurgitation Yes/no 106 93 4180 Abdominal widespread pain Yes/no 107 94 4281 High frequency

abdominal painYes/no 108 86 95

82 High intensityabdominal pain

Yes/no 109 87 96

83 Abdominal bloating Yes/no 110 88 97 4384 High frequency

abdominal bloatingYes/no 111 89 98

85 High intensity abdominalbloating

Yes/no 112 90 99 44

86 Diarrhea/constipation Yes/no 113 91 10087 High frequency diarrhea Yes/no 114 92 10188 High intensity diarrhea Yes/no 115 10289 Pain modified

after defecationYes/no 116 93 103

90 High frequencymodified pain

Yes/no 117 104

91 High intensitymodified pain

Yes/no 118 94 105 45

92 Urgency to defecation Yes/no 119 10693 High frequency defecation Yes/no 120 95 107 4694 High intensity defecation Yes/no 121 96 108 4795 Mucous per rectum Yes/no 122 97 10996 High frequency mucous Yes/no 123 98 110 4897 High intensity mucous Yes/no 124 99 11198 HP status HP negative 125

HP positive 126

balanced fixed ratios of samples between trainingand testing sets:

jD½i�C j

jDCj¼ 431

861ffi 0:50 for the training set

and

jDCC½i�j

jDCj¼ 430

861ffi 0:50 for the testing set:

For each training and testing set, we maintaineda balanced fixed ratio of OD versus FD cases

which is almost identical to that of the sample,given in (27):

jfxjjxj 2D½i�C ; tC;j ¼ 1j

jfxjjxj 2D½i�C ; tC;j ¼ 2j

¼ 128

303

ffi 0:42 for the training set

and

jfxjjxj 2DCC½i�; tC;j ¼ 1j

jfxjjxj 2DCC½i�; tC;j ¼ 2j

¼ 128

302

ffi 0:42 for the testing set:

296 M. Buscema et al.

Table 5 LDA benchmark results for the CE

Trainingset

Organic Functional Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

D½1�C

128 70 58 54.69 303 200 103 66.01 431 270 161 60.35 62.65

D½2�C

128 69 59 53.91 303 211 92 69.64 431 280 151 61.77 64.97

D½3�C

128 76 52 59.38 303 214 89 70.63 431 290 141 65.00 67.29

D½4�C

128 82 46 64.06 303 201 102 66.34 431 283 148 65.20 65.66

D½5�C

128 75 53 58.59 303 205 98 67.66 431 280 151 63.13 64.97

D½6�C

128 82 46 64.06 303 211 92 69.64 431 293 138 66.85 67.98

D½7�C

128 74 54 57.81 303 184 119 60.73 431 258 173 59.27 59.86

D½8�C

128 79 49 61.72 303 203 100 67.00 431 282 149 64.36 65.43

D½9�C

128 82 46 64.06 303 197 106 65.02 431 279 152 64.54 64.73

D½10�C

128 83 45 64.84 303 196 107 64.69 431 279 152 64.77 64.73

DCnD½1�C

128 66 62 51.56 302 217 85 71.85 430 283 147 61.71 65.81

DCnD½2�C

128 67 61 52.34 302 210 92 69.54 430 277 153 60.94 64.42

DCnD½3�C

128 66 62 51.56 302 217 85 71.85 430 283 147 61.71 65.81

DCnD½4�C

128 74 54 57.81 302 230 72 76.16 430 304 126 66.99 70.70

DCnD½5�C

128 67 61 52.34 302 192 110 63.58 430 259 171 57.96 60.23

DCnD½6�C

128 78 50 60.94 302 192 110 63.58 430 270 160 62.26 62.79

DCnD½7�C

128 68 60 53.13 302 221 81 73.18 430 289 141 63.15 67.21

DCnD½8�C

128 68 60 53.13 302 195 107 64.57 430 263 167 58.85 61.16

DCnD½9�C

128 81 47 63.28 302 209 93 69.21 430 290 140 66.24 67.44

DCnD½10�C

128 70 58 54.69 302 206 96 68.21 430 276 154 61.45 64.19

s½LDA�X;A

6.01 6.01 4.70 11.02 11.18 3.68 11.16 11.17 2.57 2.59

m½LDA�X;A

73.85 54.15 57.70 205.6 96.95 67.95 279.4 151.1 62.82 64.90

Max 83 62 64.84 230 119 76.16 304 173 66.99 70.70Min 66 45 51.56 184 72 60.73 258 126 57.96 59.86

The B training sets D½i�C have been used for teaching

the benchmark inducers of the two classes LDA andan MLP: V

D½i�C;A½MLP�;F ½i0�;Z½i�

ðÞ and VD½i�C;A½LDA�;F ½��;Z½�� ðÞ.

Following the learning phase, the correspondingtesting phase has been carried out with the relative

testing sets DCC½i�.

The DCC½i� have been used, in a symmetric way, to

teach the other benchmark inducers LDA and anMLP:V

DCC½i�;A½LDA�;F ½��;Z½�� ðÞ and V

DCC½i�;A½MLP�;F ½i0�;Z½i

0 � ðÞ. The

corresponding testing sets in this case were the D½i�C

sets.Table 5 reports the 2B results for the LDA inducers

and Table 6 reports the 2B results for the MLPinducers. Both tables document the specific valuesfor correct and erroneous classifications, as wellas the inducer accuracies AccXðVD

½i�C;A½LDA�;F ½��;Z½�� ðÞÞ

and AccXðVD½i�C;A½MLP�;F ½i0 �;Z½i�

ðÞÞ, in arithmetic and

weighted form. The mean value mX;A½LDA� and

mX;A½MLP� and the variance s

X;A½LDA� and sX;A½MLP� for

the whole set of inducers are also reported.The classification results obtained show a sub-

stantial level of inaccuracy for both types of indu-cers, clearly denoting the complexity of theproblem regarding the power of the two types ofinducers. The difference between the two types isthat the MLP inducers appear to work better. Toevaluate the significance of that difference weperformed the paired t-test.

Applying the standard calculus to the results ofTables 5 and 6, for the weighted and the arithmeticaccuracies we have:

tweighted;A½LDA�;A½MLP� ¼ �3:605

tarithmetic;A½LDA�;A½MLP� ¼ �4:248

The corresponding critical value ta,d.f. = 2.093 of thet distribution is inferior to both the absolute values

Optimized experimental protocol based on neuro-evolutionary algorithms 297

Table 6 MLP benchmark results for the CE

Trainingset

Organic Functional Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

D½1�C

128 67 61 52.34 303 232 71 76.57 431 299 132 64.46 69.37

D½2�C

128 74 54 57.81 303 222 81 73.27 431 296 135 65.54 68.68

D½3�C

128 82 46 64.06 303 194 109 64.03 431 276 155 64.04 64.04

D½4�C

128 73 55 57.03 303 231 72 76.24 431 304 127 66.63 70.53

D½5�C

128 56 72 43.75 303 254 49 83.83 431 310 121 63.79 71.93

D½6�C

128 70 58 54.69 303 242 61 79.87 431 312 119 67.28 72.39

D½7�C

128 73 55 57.03 303 219 84 72.28 431 292 139 64.65 67.75

D½8�C

128 82 46 64.06 303 219 84 72.28 431 301 130 68.17 69.84

D½9�C

128 86 42 67.19 303 199 104 65.68 431 285 146 66.43 66.13

D½10�C

128 80 48 62.50 303 204 99 67.33 431 284 147 64.91 65.89

DCnD½1�C

128 81 47 63.28 302 207 95 68.54 430 288 142 65.91 66.98

DCnD½2�C

128 68 60 53.13 302 215 87 71.19 430 283 147 62.16 65.81

DCnD½3�C

128 63 65 49.22 302 238 64 78.81 430 301 129 64.01 70.00

DCnD½4�C

128 86 42 67.19 302 202 100 66.89 430 288 142 67.04 66.98

DCnD½5�C

128 75 53 58.59 302 222 80 73.51 430 297 133 66.05 69.07

DCnD½6�C

128 89 39 69.53 302 185 117 61.26 430 274 156 65.39 63.72

DCnD½7�C

128 67 61 52.34 302 223 79 73.84 430 290 140 63.09 67.44

DCnD½8�C

128 60 68 46.88 302 242 60 80.13 430 302 128 63.50 70.23

DCnD½9�C

128 86 42 67.19 302 189 113 62.58 430 275 155 64.89 63.95

DCnD½10�C

128 55 73 42.97 302 256 46 84.77 430 311 119 63.87 72.33

s½MLP�X;A

10.06 10.06 7.86 20.13 20.09 6.65 11.49 11.39 1.51 2.65

m½MLP�X;A

73.65 54.35 57.54 219.8 82.75 72.64 293.4 137.1 65.09 68.15

Max 89 73 69.53 256 117 84.77 312 156 68.17 72.39Min 55 39 42.97 185 46 61.26 274 119 62.16 63.72

of themeasured t-values using a level of significancea = 0.05 and with the d.f. = 19 degrees of freedom.This confirms a real difference between the twomean values of accuracy. This difference indicatesthat more complex inducers should be able toimprove classification results.

3.1.2. Optimization phaseThe optimization phase entails a first sub-phaseregarding the optimization of samples. This phaseapproximates the maximization operation definedin (24). For the classification problem, the NES T&Thas produced an optimized subdivision of the data-set: D

½tr��C and D

½ts��C ¼ DCC

½tr��. The N = 861 recordshave been optimally distributed in the two sets in away that is almost balanced:

jD½tr��C jjDCj

¼ 419

861ffi 0:49 for the training set

and

jD½ts��C jjDCj

¼ 442

861ffi 0:51 for the testing set:

The 256 records of OD and the 605 of FD have beendistributed by the T&Tsystem with specific ratios ofOD versus FD cases in the training and testing set:

jfxjjxj 2D½tr��C ; tC;j ¼ 1gj

jfxjjxj 2D½tr��C ; tC;j ¼ 2gj

¼ 151

268ffi 0:56

jfx jx 2D½tr��

; t ¼ 1gj

j j C C;j

jfxjjxj 2D½tr��C ; tC;j ¼ 2gj

¼ 105

337ffi 0:31

The ratio asymmetries reflect the real distributionof information in the data space defined by thedataset. More OD cases are needed in the trainingset with respect to the sample ratio (27).

298 M. Buscema et al.

Table 7 Advanced ANNs results for the CE

Neural networks Organic Functional Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

FF_Bp(10) 105 71 34 67.62 337 281 56 83.38 442 352 90 75.50 79.64FF_Bp(12 � 3) 105 77 28 73.33 337 270 67 80.12 442 347 95 76.73 78.51FF_Sn(12) 105 81 24 77.14 337 253 84 75.07 442 334 108 76.11 75.57SDA 105 92 13 87.62 337 238 99 70.62 442 330 112 79.12 74.66

After the optimization of the sample, a secondsub-phase of input selection allows for the choice ofthe most relevant input variables as well the elim-ination of the others. This phase approximates themaximization operation defined in (25). The NES IShas selected training and testing sets D0½tr���

C andD0½ts���

C for the classification problem, using 99 vari-ables out of 126, which are summarized in Table 4.Interesting indications can be deduced analyzing thespecific variables eliminated by the IS system. Forexample, it is worth noting that the indication of theHP status, taken together with the other variables,seems to be irrelevant.

After the input selection, the third and last sub-phase — the training of the advanced ANNs — com-pletes the optimization phase of the experimenta-tion. This phase approximates the maximizationoperation defined in (26). Four networks have beenused in this case. The description of their architec-ture, which is not within the scope of this article, isavailable in the indicated references.

The first two networks belong to the MLP family[35], and are characterized by the self-momentummodification in the transfer equations [34]. The firstone has one hidden layer with 10 hidden units, and isindicated with FF_Bp(10). The second one has twohidden layers of 12 and 3 hidden units, and isindicated with FF_Bp (12 � 3). The third networkis a SineNet [44], with one hidden layer with 12processing units, indicated with FF_Sn(12). Thefourth one is a Sine Discriminant Analysis network,a SineNet with no hidden layer and a SoftMax outputlayer, and is indicated with SDA.

These four networks have been used on the opti-mized training and testing sets with the optimizedchoice of input variables. Their results are reportedin Table 7.

The results show how the three-phase optimiza-tion process has been able to significantly increaseclassification accuracy. Applying the standardunpaired t-test to these optimization results, com-paring them with those of the ANN benchmark, weobtain probability values equal to 0.000119 and0.001336 for the arithmetic and weighted means,respectively. This shows that we obtained a realaccuracy improvement.

3.2. Prediction of the follow-up outcomewith helicobacter eradicated patients

The PE data was obtained from the DC, DV data-sets considering only those records of patientswho were treated with eradication therapy andfor whom follow-up results are available and con-sidering only a significant subset of variables. Theresulting dataset DP is then composed of M = 150records and P = 111 maximized variables, derivedfrom 89 non-maximized variables. The 89/111variables are reported in Table 4. The M = 150records represent unbalanced cases, consistingof:

jfxjjxj 2DP; tP;j ¼ 1gj ¼ 51 cases of disappearance

of the symptoms;

jfxjjxj 2DP; tP;j ¼ 2gj ¼ 99 cases of persistence

of the symptoms

and giving a disappearance versus persistence casesratio for the sample:

jfxjjxj 2DP; tP;j ¼ 1gjjfxjjxj 2DP; tP;j ¼ 2gj ¼

51

99ffi 0:51: (28)

3.2.1. Benchmark phaseDuring the benchmark phase, B = 10 training sets

D½i�P DP, i = 1, . . ., B were randomly extracted for

the classification operation, with balanced fixedratios of the samples between the training andtesting sets:

jD½i�P j

jDPj¼ 76

150ffi 0:51 for the training set

and

jDPP½i�j

jDPj¼ 74

150ffi 0:49 for the testing set:

A balanced fixed ratio of disappearance cases versuspersistence cases was maintained, almost identical

Optimized experimental protocol based on neuro-evolutionary algorithms 299

Table 8 LDA benchmark results for the PE

Trainingset

Disappearance Persistence Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

D½1�P

25 10 15 40.00 49 24 25 48.98 74 34 40 44.49 45.95

D½2�P

25 9 16 36.00 49 29 20 59.18 74 38 36 47.59 51.35

D½3�P

25 8 17 32.00 49 25 24 51.02 74 33 41 41.51 44.59

D½4�P

25 11 14 44.00 49 24 25 48.98 74 35 39 46.49 47.30

D½5�P

25 14 11 56.00 49 28 21 57.14 74 42 32 56.57 56.76

D½6�P

25 13 12 52.00 49 23 26 46.94 74 36 38 49.47 48.65

D½7�P

25 6 19 24.00 49 27 22 55.10 74 33 41 39.55 44.59

D½8�P

25 12 13 48.00 49 22 27 44.90 74 34 40 46.45 45.95

D½9�P

25 9 16 36.00 49 24 25 48.98 74 33 41 42.49 44.59

D½10�P

25 13 12 52.00 49 25 24 51.02 74 38 36 51.51 51.35

DPnD½1�P

26 12 14 46.15 50 29 21 58.00 76 41 35 52.08 53.95

DPnD½2�P

26 16 10 61.54 50 25 25 50.00 76 41 35 55.77 53.95

DPnD½3�P

26 11 15 42.31 50 23 27 46.00 76 34 42 44.15 44.74

DPnD½4�P

26 14 12 53.85 50 23 27 46.00 76 37 39 49.92 48.68

DPnD½5�P

26 12 14 46.15 50 21 29 42.00 76 33 43 44.08 43.42

DPnD½6�P

26 11 15 42.31 50 33 17 66.00 76 44 32 54.15 57.89

DPnD½7�P

26 13 13 50.00 50 26 24 52.00 76 39 37 51.00 51.32

DPnD½8�P

26 12 14 46.15 50 31 19 62.00 76 43 33 54.08 56.58

DPnD½9�P

26 12 14 46.15 50 25 25 50.00 76 37 39 48.08 48.68

DPnD½10�P

26 11 15 42.31 50 24 26 48.00 76 35 41 45.15 46.05

s½LDA�X;A

2.22 2.06 8.45 3.01 2.97 5.99 3.52 3.26 4.75 4.46

m½LDA�X;A

11.45 14.05 44.85 25.55 23.95 51.61 37 38 48.23 49.32

Max 16 19 61.54 33 29 66.00 44 43 56.57 57.89Min 6 10 24.00 21 17 42.00 33 32 39.55 43.42

to that of the sample, for each training and testingset as given in (28):

jfxjjxj 2D½i�P ; tP;j ¼ 1gj

jfxjjxj 2D½i�P ; tP;j ¼ 2gj

¼ 26

50

ffi 0:52 for the training set

and

jfxjjxj 2DPP½i�; tP;j ¼ 1gj

jfxjjxj 2DPP½i�; tP;j ¼ 2gj

¼ 25

49

ffi 0:51 for the testing set:

We used the B training sets D½i�P to teach the bench-

mark inducers of the two classes LDA and an MLP:V

D½i�P;A½MLP�;F ½i0 �;Z½i�

ðÞ and VD½i�P;A½LDA�;F ½��;Z½�� ðÞ. After the

learning phase, the corresponding testing phase was

carried out on the related testing sets DPP½i�.

The DPP½i� were then used, in a symmetric way, to

teach the other benchmark inducers LDA and anMLP:V

DPP½i�;A½LDA�;F ½��;Z½�� ðÞ and V

DPP½i�;A½MLP�;F ½i0 �;Z½i

0� ðÞ. The

corresponding testing sets, in this case, were theD½i�P sets.The derived 2B results for the LDA inducers are

reported in Table 8, and the 2B results for the MLPinducers are reported in Table 9. The specific valuesfor correct and erroneous classifications, as well asinducer accuracies AccXðVD

½i�P;A½LDA�;F ½��;Z½�� ðÞÞ and

AccXðVD½i�P;A½MLP�;F ½i0 �;Z½i�

ðÞÞ,arereportedforbothtablesin arithmetic and weighted form. The mean valuem

X;A½LDA� and mX;A½MLP� and the variance s

X;A½LDA� and

sX;A½MLP� forthewholesetof inducersarealsoreported.The prediction accuracies obtained immedi-

ately appear to be different for the two typesof inducers. The LDA inducers achieve uselessresults, whereas the MLPs exhibit reasonable

300 M. Buscema et al.

Table 9 MLP benchmark results for the PE

Trainingset

Disappearance Persistence Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

D½1�P

25 14 11 56.00 49 35 14 71.43 74 49 25 63.71 66.22

D½2�P

25 13 12 52.00 49 38 11 77.55 74 51 23 64.78 68.92

D½3�P

25 14 11 56.00 49 35 14 71.43 74 49 25 63.71 66.22

D½4�P

25 13 12 52.00 49 37 12 75.51 74 50 24 63.76 67.57

D½5�P

25 13 12 52.00 49 42 7 85.71 74 55 19 68.86 74.32

D½6�P

25 14 11 56.00 49 36 13 73.47 74 50 24 64.73 67.57

D½7�P

25 14 11 56.00 49 36 13 73.47 74 50 24 64.73 67.57

D½8�P

25 14 11 56.00 49 38 11 77.55 74 52 22 66.78 70.27

D½9�P

25 13 12 52.00 49 38 11 77.55 74 51 23 64.78 68.92

D½10�P

25 15 10 60.00 49 36 13 73.47 74 51 23 66.73 68.92

DPnD½1�P

26 16 10 61.54 50 38 12 76.00 76 54 22 68.77 71.05

DPnD½2�P

26 15 11 57.69 50 39 11 78.00 76 54 22 67.85 71.05

DPnD½3�P

26 16 10 61.54 50 36 14 72.00 76 52 24 66.77 68.42

DPnD½4�P

26 16 10 61.54 50 37 13 74.00 76 53 23 67.77 69.74

DPnD½5�P

26 15 11 57.69 50 39 11 78.00 76 54 22 67.85 71.05

DPnD½6�P

26 15 11 57.69 50 39 11 78.00 76 54 22 67.85 71.05

DPnD½7�P

26 14 12 53.85 50 38 12 76.00 76 52 24 64.92 68.42

DPnD½8�P

26 14 12 53.85 50 44 6 88.00 76 58 18 70.92 76.32

DPnD½9�P

26 14 12 53.85 50 40 10 80.00 76 54 22 66.92 71.05

DPnD½10�P

26 18 8 69.23 50 40 10 80.00 76 58 18 74.62 76.32

s½LDA�X;A

1.24 1.00 4.22 2.22 2.06 4.22 2.54 1.99 2.64 2.82

m½LDA�X;A

14.50 11 56.82 38.05 11.45 76.86 52.55 22.45 66.84 70.05

Max 18 12 69.23 44 14 88.00 58 25 74.62 76.32Min 13 8 52.00 35 6 71.43 49 18 63.71 66.22

behavior. The difference between the two is evi-dent. The paired t-test can be carried out in orderto measure the significance of that difference.Applying standard calculus to the results ofTables 8 and 9, for the weighted and the arith-metic accuracies we have:

tweighted;A½LDA�;A½MLP� ¼ �22:837

tarithmetic;A½LDA�;A½MLP� ¼ �18:228

Setting the level of significance at a = 0.05and with d.f. = 19 degrees of freedom, the corre-sponding critical value ta,d.f. = 2.093 of the well-known t-distribution is significantly inferior to bothof the measured t-values. This confirms a real andmajor difference between the two mean values ofaccuracy. This significant difference indicates thatmore complex inducers should be capable of con-siderably improving the prediction results.

3.2.2. Optimization phaseThe maximization operation defined in (24) isapproximated in the first optimization subphase,the optimization of samples. T&T has produced

an optimized subdivision of the dataset: D½tr��P and

D½ts��P ¼ DPP

½tr��, providing records for an almost

balanced distribution of the M = 150:

jD½tr��P jjDPj

¼ 71

150ffi 0:47 for the training set

and

jD½ts��P jjDPj

¼ 79

150ffi 0:53 for the testing set:

The original 51 records of disappearance casesand the 99 of persistence cases have beendistributed in the training and testing set with

Optimized experimental protocol based on neuro-evolutionary algorithms 301

Table 10 Advanced ANNs results for the PE

Neural networks Disappearance Persistence Global results

Tot. Corr. Err. Corr. (%) Tot. Corr. Err. Corr. (%) Tot. Corr. Err. ArithAcc. (%)

Weigh.Acc. (%)

SelfDASn(24) 20 20 0 100.00 59 50 9 84.75 79 70 9 92.37 88.61FF_Bp(60) 20 19 1 95.00 59 50 9 84.75 79 69 10 89.87 87.34FF_Sn(60) 20 19 1 95.00 59 49 10 83.05 79 68 11 89.03 86.08TasmDASn(24) 20 19 1 95.00 59 49 10 83.05 79 68 11 89.03 86.08FF_Sn(36) 20 20 0 100.00 59 47 12 79.66 79 67 12 89.83 84.81TasmSASn(48) 20 20 0 100.00 59 47 12 79.66 79 67 12 89.83 84.81FF_Sn(48) 20 20 0 100.00 59 46 13 77.97 79 66 13 88.98 83.54FF_Bm(60) 20 19 1 95.00 59 47 12 79.66 79 66 13 87.33 83.54SelfSABp(60) 20 20 0 100.00 59 46 13 77.97 79 66 13 88.98 83.54SelfDABp(60) 20 20 0 100.00 59 46 13 77.97 79 66 13 88.98 83.54SelfSASn(60) 20 20 0 100.00 59 46 13 77.97 79 66 13 88.98 83.54SelfSABm(36) 20 20 0 100.00 59 46 13 77.97 79 66 13 88.98 83.54SMDA 20 17 3 85.00 59 48 11 81.36 79 65 14 83.18 82.28TasmSABp(60) 20 20 0 100.00 59 45 14 76.27 79 65 14 88.14 82.28FF_Cm(12) 20 20 0 100.00 59 44 15 74.58 79 64 15 87.29 81.01TasmDABp(60) 20 20 0 100.00 59 44 15 74.58 79 64 15 87.29 81.01SDA 20 18 2 90.00 59 45 14 76.27 79 63 16 83.14 79.75

specific ratios of disappearance versus persistencecases:

jfxjjxj 2D½tr��P ; tP;j ¼ 1gj

jfxjjxj 2D½tr��P ; tP;j ¼ 2gj

¼ 31

40ffi 0:77

jfx jx 2D½tr��

; t ¼ 1gj 20

j j P P;j

jfxjjxj 2D½tr��P ; tP;j ¼ 2gj

¼59

ffi 0:34

Here the ratio asymmetries, compared with thesample ratio (28), reflect the real distribution ofinformation in the data space. As a result manymoredisappearance cases are required in the training setcompared to the sample ratio.

Once again, for this experiment, the second sub-phase of input selection has been carried out toapproximate the maximization operation defined in(25). IS has selected as training and testing setsD0½tr���

P and D0½ts���P with 48 variables on 111 for the

prediction case. This is summarized in Table 4. TheIS system has achieved a high reduction of variablesin this experiment.

The training of the advanced ANNs, whose archi-tecture is described in the references and which wasthe last sub-phase of the optimization, approxi-mated the maximization operation defined in(26) using 17 different networks. The first net-work, FF_Bp(60), is a MLP type [35] with one hiddenlayer of 60 processing units which implemented theself-momentum improvement [34]. The secondone, FF_Bm(60), is a Bi-modal [45] with 60 hiddenunits. The third one, FF_Cm(12), is a contractivemap [46] with 12 hidden units. Three other net-works, FF_Sn(36), FF_Sn(48) and FF_Sn(60), are

SineNets [44], with 36, 48 and 60 hidden units,respectively. The seventh one, SDA, is a Sine Dis-criminant Analysis network. The eighth network,SMDA, is a SoftMax Discriminant Analysis: a backpropagation with no hidden layer and a SoftMax inthe output layer. The other nine networks are moreproperly defined as neural organisms rather thansimple networks, since they are complex composi-tions of identifiable simpler networks.

The first five organisms belong to the family ofSelf-Recurrent networks (SR) [47] and manifest dif-ferences in the learning laws they use and inselected options, controlling specific aspects ofthe organism’s behavior. The five SR used are: (1)SelfSABp(60) an SR using a back propagation learn-ing law, being static and adaptive, with 60 hiddenunits; (2) SelfDABp(60), an SR manifesting a backpropagation learning law, being dynamic and adap-tive, with 60 hidden units; (3) SelfSASn(60), an SRusing a sine propagation learning law, being staticand adaptive, with 60 hidden units; (4) Self-DASn(24), an SR using a sine propagation learninglaw, being dynamic and adaptive, with 24 hiddenunits; (5) SelfSABm(36) an SRmanifesting a bi-modallearning law, being static and adaptive, with 36hidden units. The last four organisms belong tothe Temporal Associative Subjective Memory net-works (TASM) family [47] and also manifest differ-ences in the learning laws they use and in theirselected options.

The four TASM used are: (1) TasmSABp(60), aTASM using a back propagation learning law, beingstatic and adaptive, with 60 hidden units; (2) Tasm-DABp(60), a TASM using a back propagation learning

302 M. Buscema et al.

law being dynamic and adaptive, with 60 hiddenunits; (3) TasmSASn(48), a TASM using a sine propa-gation learning law, being static and adaptive, with48 hidden units and finally; (4) TasmDASn(24), aTASM using a sine propagation learning law, beingdynamic and adaptive, with 24 hidden units.

The results reported in Table 10 represent the useof these 17 networks on the optimized training andtesting sets and with the optimized choice of inputvariables.

These results document the achievement of veryhigh prediction capacity improvements after theoptimization phase. Applying the standard unpairedt-test to these optimization results compared to theANN benchmark results, we obtain near null prob-ability values: 1.37E�24 and 9.54E�18 for the arith-metic and weighted means, respectively, indicatingthe real effectiveness of the optimization process.

4. Discussion

Recent data in dyspepsia has led to a reassessmentof the value of ‘‘alarm’’ symptoms as a clue foridentifying patients who require further investiga-tion and the effectiveness of HP eradication therapyfor relieving symptoms.

When a condition, such as dyspepsia, has anunclear definition, many symptoms that can beobserved in an uninvestigated patient may lie atthe crossroads of a group of disorders.

For instance, the symptoms could be an expres-sion of FD as well as of OD. Several managementguidelines recommend performing endoscopy inolder patients (more than 45 years of age) or inthose complaining of ‘‘alarm’’ symptoms such asvomiting, dysphagia, haematemesis or melaena,and unintentional weight loss [48,13]. However,recent reports challenge this assumption: althoughthe symptom criteria did have an independent valuein discriminating between OD and FD, their sensi-tivity and specificity in predicting major upper gas-trointestinal pathology is rather limited [3,21].

Currently it is uncertain whether H. pylori plays arole in dyspepsia when an ulcer is not present.Recent meta-analyses evaluating the effectivenessof eradication therapies for relieving symptoms inHP-infected patients with non-ulcer dyspepsia havereached opposite conclusions. In one study the era-dication therapy resolved symptoms in 9% morepatients than would be expected by placebo [19],whereas in the other two studies, at a statisticallevel, the patients treated were not more likely tobe relieved of their symptoms than the untreatedcontrol patients [16,17].

However, previous analyses do not exclude abenefit for a subgroup of patients, albeit a minorityone. In patients manifesting acid secretory disordersthat do not produce endoscopic lesions, eradicationof HP could prevent ulcers and associated dyspepsia[15,18,49]. How to select these patients from theentire group of those with FD is unknown at themoment.

As to the distinction between OD and FD, ourresults suggest that, after an optimization process,ANN classifiers have potential for sorting patientswith OD from those with FD, on the basis of multipleclinical and demographic features. In a first bench-mark phase we have examined the possibility ofusing dynamic mathematical models, such as theLDA and simple ANNs, to find a relationship betweendifferent input variables and binomial output vari-ables (i.e., clinical diagnoses). When LDA modelswere validated against the testing subsets, values of57.70% and 67.96% of sensitivity and specificity wereobtained for the OD and FD, respectively, in com-parison with corresponding values of 57.54% and72.65% for the MLPs. After the optimization,advanced ANNs have reached sensitivity and speci-ficity values of up to 87.62% and 83.38%, respec-tively. Moreover, classification errors were reducedfrom 151 with conventional statistics to 90 withANNs. Therefore, ANNs seem to be able to extractinformation from the data that is not apparent tothe LDA.

From an etiopathogenetic standpoint, dyspepsiais a heterogeneous disorder where various mechan-isms operate differently in different patients. As aconsequence, discriminant analysis, which assumesa linear function and reduces continuous to binaryvariables, might incorrectly represent the complexdynamics of socio-demographic, biological andenvironmental features, which may interact witheach other in these patients. On the other hand, thesimple ANNs are able to understand only little moreabout the classification of FD. This shows how com-plex this classification problem actually is.

The input selection operated by the evolutionaryalgorithm GenD deserves special mention. It is inter-esting to note that the algorithm has selected arelevant number of variables belonging to thedomain of present signs and symptoms (24 out of46), and only a minority of variables belonging to thedomain of past medical history and previous diag-nostic treatment (only 8 out of 26). In particular, thevariable ‘‘known presence of chronic gastritis’’ hasbeen selected in this latter domain, while none ofthe variables referring to cumbersome and expen-sive diagnostic tests has been considered.

This point could potentially offer help in reducingthe need for a complete battery of diagnostic tests

Optimized experimental protocol based on neuro-evolutionary algorithms 303

in patients with dyspeptic symptoms. This couldsubstantially reduce the economic burden of theroutine work-up of these patients, in which thephysician typically is searching for negativeresults as exclusion criteria. It is also noteworthythat many of the diagnostic tests that could pos-sibly be avoided are those that can be very dis-tressing for our patients and could even causeharm. Examples of these tests are colonoscopy,gastroscopy, and cholecystography. On the otherhand, only two out of six alarming signs andsymptoms (anemia and weight loss) have beenselected. The potential impact of this particularsubset of variables could be high in routine diag-nostic practice. First of all, none of the selectedvariables was classified as ‘‘irrelevant’’ for theproblem by the gastroenterologists involved in thestudy. This should not be too surprising since thepreliminary selection of the variables to beincluded was done with extreme care. It is alsopossible to reduce by approximately 50% the num-ber of questions asked about present symptoms.This could reduce the length of the interview withobvious logistic and economic advantages.

We have discussed the results obtained with someof the investigators involved in the study. Most ofthem confirm that the authors’ system is promisingand could have a useful role in ambulatory medi-cine. In particular, using user-friendly software,able to classify patients with suspected FD, wouldhave a major impact in their everyday activity if thesystem performs as documented in this analysis.Their suggestion has been to further strengthenthe model through an external validation in patientsadmitted to independent institutions in otherEuropean countries.

Recent meta-analyses [16,17] have providedlittle support for the use of HP eradication therapyfor a symptomatic relief in the global setting ofpatients with FD. Consequently, the recommenda-tion of the Maastrict consensus, which advises theeradication of HP infection in patients with FD,becomes a matter of debate [13]. On the otherhand, it has been estimated that 16% of patientsinfected will develop a peptic ulcer and that 1—3%will develop a gastric malignancy [14]. Previousstudies have outlined a few features of dyspepticpatients, infected with HP, which predict that thepatient would be likely to benefit from empiricaleradication therapy; these are male gender, heavysmoking, family history of peptic ulcer, severepain which awakens them at night and is amelio-rated by food or antisecretory drugs. Unfortu-nately the reported sensitivity and positivepredictive value of these features were clearlyunsatisfactory [9,11].

This study has shown that the use of traditionalANNs currently offers an advantage over the lineardiscriminant analysis, obtaining sensitivity and spe-cificity values of 56.86% and 76.87% for respondersand not responders, respectively, compared to the44.90% and 51.62% of the LDA. However, the use ofoptimization algorithms, coupled with newer andmore powerful ANNs like the SelfDASn model bringsthe prediction to a very high accuracy value. In fact,all of our patients who were asymptomatic aftertherapy were correctly classified, whereas the pre-diction of those who did not benefit from therapywas less accurate (84.8%), with nine cases beingmisclassified. The achieved sensitivity and specifi-city values reached 100% and 84.75%, respectively.

This study has clearly demonstrated, from amedical perspective, that available data, discardedby conventional assessment methods, need not beconsidered ‘‘useless’’. Such accessible data offerprecious information, providing effective supportfor the physicians’ decisions after appropriate pro-cessing.

From an Artificial Intelligence perspective, thisdata demonstrates that classical induction algo-rithms and classical EP cannot always provide satis-factory results. This is true for classificationoperations in which the mean benchmark accuracieswere 64.90% for the LDA and 68.15% for the MLP;both of which are inadequate. The case is differentfor the prediction operations in which accuraciesare only 49.32% for the LDA and 70.05% for theMLP; both diversely poor. The three phase optimiza-tion has demonstrated that deep mining of thedata makes it possible to extract enough informa-tion to build a near optimum classifier/predictor; inboth cases, although with a different level ofsatisfaction, the results show reasonable levels ofaccuracy: 79.64 for classification and 88.61 for pre-diction.

Inducers can then be created which operate asbest possible classifier/predictor approximationsusing the given data and the given space of inductionalgorithms. As long as this data is truly representa-tive, these inducers are expected to work withanalogously excellent accuracy on all data.

We have also seen that improved results werereached in the optimization phase using more costlycomplex systems, composed of ANNs and evolution-ary sub-systems, working together as hybrid organ-isms. Such a protocol requires a high level ofexpertise. Its use is recommended only when theinput—output relationships are sufficiently com-plex, as in this paper’s application. When thiscriterion is not met it is simpler and less expensiveto use traditional methods, such as discriminantanalysis.

304 M. Buscema et al.

References

[1] Adami HO, Agenas I, Gustavsson S, Loof L, Nyberg A, NyrenO, et al. The clinical diagnosis of ‘‘gastritis’’. Aspects ofdemographic epidemiologic and health care consumptionbased on a nation wide sample survey. Scand J Gastroenterol1984;19:216—9.

[2] Hartley R, Rathbone B. Dyspepsia: a dilemma for doctors.Lancet II 1987;779—82.

[3] Wallace M, Durkalski V, Vaughan J, Palesch Y, Libby E, JowellP, et al. Age and alarm symptoms do not predict endoscopicfindings among patients with dyspepsia: a multicenter data-base study. Gut 2001;49:29—34.

[4] Lundquist P, Seensalu R, Linden N, Nillson LH, Lindberg G.Symptom criteria do not distinguish between functional andorganic dyspepsia. Eur J Surg 1998;164:345—52.

[5] Madsen L, Bytzer P. The value of alarm features in identifyingorganic causes of dyspepsia. Can J Gastroenterol 2000;14:713—20.

[6] Talley N, Colin-Jones D, Kock K. Functional dyspepsia: aclassification with guidelines for diagnosis andmanagement.Gastroenterol Int 1991;4(I):45—60.

[7] Talley N, Stanghellini V, Heading R, Koch K, Malagelada J,Tytgat G. Functional gastroduodenal disorders. Gut 1999;45(Suppl. 2):137—42.

[8] Talley N, Zinsmeister A, Schleck C, Melton L. Dyspepsia anddyspepsia subgroups: a population-based study. Gastroen-terology 1992;102:1259—68.

[9] Halter F, Brignoli R. Epidemiology of dyspepsia: discriminantvalue of smoking andHelicobacter pylori status as predictorsof peptic lesions in primary care. J Phisiol Pharmacol 1997;48(Suppl. 4):75—83.

[10] Sonnenberg A.WhenWilliam of Ochammeets Thomas Bayes:finding a few diagnoses among a great many symptoms.Aliment Pharmacol Ther 2001;15:1403—7.

[11] Asante M, Mendall M, Northfield T. Which Helicobacterpylori-positive dyspeptics are likely to respond symptoma-tically to empirical H. pylori eradication?. Eur J Gastroen-terol Hepatol 1998;10:265—8.

[12] Blytzer P, Møller Hansen J, Schaffalitzky de Muckadell OB.Empirical H2-blocker therapy or prompt endoscopy in man-agement of dyspepsia. Lancet 1994;343:811—6.

[13] EuropeanHelicobacter pylori study group, Current Europeanconcepts in the management of Helicobacter pyloriinfection, The Maastricht consensus report. Gut 41;1997:8—13l.

[14] Grahan D. Can therapy even be denied for Helicobacterpylori infection?. Gastroenterology 1997;113(Suppl.):S113—7.

[15] Hsu P, Lai K, Tseng H, Lo G, Lo C, Lin C, et al. Eradication ofHelicobacter pylori prevents ulcer development in patientswith ulcer-like functional dyspepsia. Aliment PharmacolTher 2001;15:195—201.

[16] Koch M, D’Ambrosio L, Gili L, Bianchi M, Dezi A, Capurso L.Helicobacter pilori and functional dyspepsia: eradicationdoesn’t improve symptoms. A cumulative meta-analysisupdated to May 2001. Dig Liv Dis 2001.

[17] Laine L, Schoenfeld P, Fennerty MB. Therapy for Helicobac-ter pylori in patients with nonulcer dyspepsia. A meta-analysis of randomised, controlled trials. Ann Intern Med2001;134:361—9.

[18] McColl K. Absence of benefit of eradicating Helicobacterpylori in patients with nonulcer dyspepsia (letter). N Engl JMed 2000;342:589—90.

[19] Moayyedi P, Soo S, Decks J, Forman D, Mason J, Innes M, etal., On behalf of the Dyspepsia Review Group. Systematic

review and economic evaluation of Helicobacter pylorieradication treatment for non-ulcer dyspepsia. Br J Med2000;321:659—64.

[20] Blytzer P, Møller Hansen J, Havelund T, Malchow-MøllerA, Schaffalitzky de Muckadell OB. Predicting endoscopicdiagnosis in the dyspeptic patient: the value ofclinical judgement. Eur J Gastroenterol Hepatol 1996;8:359—63.

[21] Perri F, Festa V, Grossi E, Garbagna N, Leandro G, Andriulli A,et al. Dyspepsia and Helicobacter pylori infection: a pro-spective, multicentre, observational study from Italy, sub-mitted for publication.

[22] Buscema M. Genetic doping algorithm (GenD): theory andapplication. Expert Syst 2004;2(2):63—79.

[23] Davis L. Handbook of genetic algorithms. New York: VanNostrand Reinhold; 1991.

[24] Harp S, Samad T, Guha A. Designing application-specificneural networks using the genetic algorithm. In: TouretzkyD, editor. advances in neural information processing sys-tems, vol. 2. San Mateo, CA: Morgan Kaufman; 1990.

[25] Mitchell M. An introduction to genetic algorithms. Cam-bridge, MA: The MIT Press; 1996.

[26] Quagliarella D, Periaux J, Polani C, Winter G. Genetic algo-rithms and evolution strategies in engineering and computerscience. England: John Wiley and Sons Ltd.; 1998.

[27] Rawling G. Foundations of genetic algorithms. San Mateo,CA: Morgan Kaufman; 1991.

[28] Buscema M, Breda M. Local optimization genetic operatorsin GenD. Semeion technical paper TP-31, Rome; 2003.

[29] Reetz B. Greedy solutions to the traveling sales personproblem. Adv Technol Dev May 1993;8—14.

[30] Klimasauskas C. Genetic algorithm optimizes city routein 21 minutes on a PC. Adv Technol Dev February 1993;9—17.

[31] Klimasauskas C. Simulated annealing and the traveling sales-person problem. Adv Technol Dev June 1993;6—16.

[32] Reinelt G. TSPLIB (http://www.crpc.rice.edu/softlib/tsplib/), Universitat Heidelberg, Institut fur AngewandteMathematik, Germany.

[33] Buscema M. T&T: a new pre-processing tool for non-lineardata set(s). Semeion technical paper TP-25, Rome; 2001.

[34] Buscema M, Sacco P. Feed forward networks in financialprediction: the future that modifies the present. Expert Syst2000;17(3).

[35] Rumelhart D, McClelland J. Parallel distributed processing.Cambridge: The MIT Press; 1986.

[36] Kohavi R, John G. Wrappers for feature selection. Artif Intell1997;1—2:273—324.

[37] Jain A, Zongker D. Feature selection: evaluation, applica-tion and small sample performance. IEEE Trans Pattern AnalMach Intell 1997;19(2):153—8.

[38] Chauvin Y, Rumelhart DE, editors. Backpropagotion: theory,architectures and applications. 365 Brodway-Hillsdale, NewJersey: Lawrence Erlbaum Associates Inc. Publishers; 1995.

[39] Morrison D. Multivariate statistical methods. New York:McGraw-Hill; 1990.

[40] Hair J, Anderson R, Tatham R, Black W. Multivariate dataanalysis with readings. New York: Prentice Hall; 1995.

[41] Fukunaga K. Introduction to statistical pattern recognition.San Diego, CA: Academic Press; 1990.

[42] Sharma S. Applied multivariate techniques. New York: JohnWiley & Sons; 1995.

[43] Dietterich TG. Approximate statistical tests for comparingsupervised classification learning algorithms. Neural Comput1988;10(7):1895—924.

[44] Buscema M. Sine Net: a new learning rule for adaptivesystems. Semeion technical paper TP-21, Rome; 2000.

Optimized experimental protocol based on neuro-evolutionary algorithms 305

[45] Buscema M. Bimodal networks. Semeion technical paper TP-29, Rome; 2003.

[46] Buscema M. Contractive maps. Semeion technical paper TP-30, Rome; 2003.

[47] Buscema M, Breda M. Reti neurali ricorrenti. In: BuscemaM, Semeion Group, editors. Reti neurali artificiali e sistemisociali complessi, vol. I. Milan: Franco Angeli; 1999 . p.440—64.

[48] Clinical practice and practice economics committee of theAmerican gastroenterological association, American gastro-enterological association medical position statement: eva-luation of dyspepsia. Gastroenterology 114;1998:579—81.

[49] Samson M, Verhagen M, vaBerge Henegouwen GP, Smout A.Abnormal clearance of exogenous acid and increased acidsensitivity of the proximal duodenum in dyspeptic patients.Gastroenterology 1999;116:515—20.