Predicting Breast Screening Attendance Using Machine Learning Techniques
Post on 22-Sep-2016
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 15, NO. 2, MARCH 2011 251
Predicting Breast Screening Attendance UsingMachine Learning Techniques
Vikraman Baskaran, Aziz Guergachi, Member, IEEE, Rajeev K. Bali,and Raouf N. G. Naguib, Senior Member, IEEE
AbstractMachine learning-based prediction has been effec-tively applied for many healthcare applications. Predicting breastscreening attendance using machine learning (prior to the actualmammogram) is a new field. This paper presents new predictorattributes for such an algorithm. It describes a new hybrid algo-rithm that relies on back-propagation and radial basis function-based neural networks for prediction. The algorithm has been de-veloped in an open source-based environment. The algorithm wastested on a 13-year dataset (19952008). This paper compares thealgorithm and validates its accuracy and efficiency with differentplatforms. Nearly 80% accuracy and 88% positive predictive valueand sensitivity were recorded for the algorithm. The results wereencouraging; 4050% of negative predictive value and specificitywarrant further work. Preliminary results were promising andprovided ample amount of reasons for testing the algorithm on alarger scale.
Index TermsBreast screening, cancer, machine learning, neu-ral networks, prediction, screening attendance.
BREAST cancer is the most common cancer for womenin North America . In the U.K., over 40 000 womenare being diagnosed with breast cancer each year , . Mor-tality due to breast cancer is also one of the highest in theworld , , and is the second highest of all cancers inthe Canada , . Breast cancer should ideally be diagnosedat the earlier stages of its development to considerably reducemortality. Possible treatments include removing or destroyingthe cancer cells to avoid the spread of the affected cells. Breastself-examination is an effective and noninvasive type of check-ing for any lumps in the breast tissue. Unfortunately, this greatlydepends on the size of the lump, technique, and experience incarrying out a self-examination procedure by a woman . Anultrasound test, examining breast tissue using sound waves, canbe utilized to detect lumps but this is usually suited for womenaged below 35 owing to the higher density of breast tissue .
Manuscript received April 11, 2010; revised October 12, 2010; acceptedDecember 23, 2010. Date of publication January 6, 2011; date of current versionMarch 4, 2011. This work was supported in part by the NHS Cancer ScreeningPrograms, U.K.
V. Baskaran and A. Guergachi are with Ryerson University, TRSM, Toronto,ON M5B 2K3, Canada (e-mail: firstname.lastname@example.org; email@example.com).
R. K. Bali is with the KARMAH Group, Health Design and TechnologyInstitute (HDTI), Coventry University, Coventry University Technology Park,Coventry CV1 2TT, U.K. (e-mail: firstname.lastname@example.org).
R. N. G. Naguib is with the Biomedical Computing and BIOCORE, HDTI,Coventry University, Coventry University Technology Park, Coventry CV12TT, U.K. (e-mail: email@example.com).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TITB.2010.2103954
Having a tissue biopsy via a fine needle aspiration or an excisionis often used to examine the cells histopathologically and to di-agnose if the growth, lump, is benign or cancerous. These inves-tigations are mostly employed in treatments or post-treatmentexamination and as second rung diagnostic confirmation meth-ods . Performing a computed tomography or an MRI scanwould result in a thorough examination of the breast tissue butthese techniques are not favored due to reasons which includecost, needs preparation, noise, time, and images that may not beclear .
Mammography is a technique for detecting breast tissuelumps using a low dosage of X-ray. This technique can evendetect a 3-mm-sized lump. The X-ray image of the breast tissueis captured and the image is thoroughly read by experiencedradiologists and specialist mammogram readers . Prelimi-nary research suggests that women aged 50 and above are moresusceptible to breast cancer; mammography is more suited towomen in this age range due to the lower density of breasttissue , . Even though mammography has its criticsmainly due to its high rate of false positives and false neg-atives , it has become the standard procedure forscreening women by the NHS National Breast Screening Pro-gram in the U.K. , , . Mammography is the bestand most viable tool for mass screening to detect cancer in thebreast at an early stage ; however, the effectiveness of diag-nosis through screening is directly dependent on the percentageof women attending the screening program . The NHSBreast Screening Program, catering to the entire eligible womenpopulation, is funded by the Department of Health in the U.K. Itcovers 2.5 million women every year and detected nearly 16 500cancers in the screened population for the year 20072008 .Currently, the screening program routinely screens women be-tween the ages 50 and 70.
Early breast cancer detection through screening is fundamen-tal for increasing the efficacy of cancer treatment , .Mammography has been accepted as the best and mosteconomically viable tool for population screening .Maximizing coverage for the target population is crucial forthe success of such screening programs . Currently, thebreast cancer screening attendance rates are below expectationsin many countries that have publicly funded healthcareprograms . This paper proposes a set of protocols toincrease breast screening attendance for the U.K.s NHS breastscreening program. Based on this protocol, a new softwareprototype was created and tested. The prototype tests theprediction algorithm and shares the prediction results withmultiple healthcare stakeholders for initiating opportunistic
1089-7771/$26.00 2011 IEEE
252 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 15, NO. 2, MARCH 2011
interventions on nonattendees. This prototype is a radicalnew idea that uses machine learning techniques for predictingscreening attendance and shares this knowledge by adoptingthe health informatics initiative of the NHS.
II. CHALLENGEThe NHS Breast Screening Program Annual Review (2008)
states that, out of invited women, only 74% attend the screen-ing program . This sizeable nonattendance could result inmissed cancer detection for nearly 4 000 women (based on thecancer detection rate within screened women) . This largepercentage of nonattendance not only result in loss of life dueto breast cancer but also result in loss of screening resourcesthrough costly imaging equipment laying idle, underutilizationof specialist-imaging expertise, wasted screening slots, and soforth. Screening units are unable to arrange buffered attendeesfor the idle slots since the units do not know a priori whichwomen will attend and which will not. In addition, there is asizeable cost factor involved in sending repeat screening ap-pointments letters to nonattending women.
Reasons for nonattendance may be largely attributed to dis-interest in attending a mammography session, prior or currentmedical problems, and fear of X-rays , . These rea-sons can be negated by proper education provided to women.Education has to be directed at explaining the advantages andimportance of screening and assist in removing the socioculturaland personal barriers . Other possible options include con-venience in terms of time, place, and dates provided to womenfor encouraging their attendance.
In spite of the expedient measures provided to the women,nonattendance has been a grave concern for the NHSNationalScreening Program. This scenario can be properly addressed ifthose women who may probably not attend a screening appoint-ment can be identified in advance so that additional resourcescan be directed at interventions that can increase screeningattendance.
A proposal enumerating the complete software solution issummarized at the end of Section IV. The National ScreeningProgram has been constantly striving to provide better servicesto the public and one of the new enhancements offered by thescreening services is to increase the screening age limit from64 to 70 . This effectively increases the number of screen-ing episodes and results in augmenting the need for effectiveuse of the already stretched NHS resources. All the aforemen-tioned factors underline the need to increase the breast screeningattendance.
III. SOLUTION PROPOSEDTo address these challenges, a set of protocols were devel-
oped as part of the ongoing research. The protocols are based ontwo components: 1) machine learning algorithms for knowledgecreation; and 2) health informatics for knowledge sharing. Thispaper elaborates on how the prediction-based knowledge wascreated through a machine learning algorithm. Machine learning[Artificial Intelligence (AI)-based algorithm] was implemented
Fig. 1. Data filtering, preparation, and preprocessing.
through the creation of a prototype software based on opensource technologies. The prototype software was automated toproduce the preprocessed data and eventually normalize thedata for neural network (AI) assimilation. These activities wereperformed sequentially without human involvement for repeata-bility, reliability, and accuracy.
The AI-based neural network incorporates all additionaltransformations that occurred within the screening process (in-cluding the change in the screening upper age limit). The pro-totype framework was called JAABSJava-based attendanceprediction by AI for breast screening. The prototype combinesthe demographic data pertaining to the nonattending womenand information related to their family physician as a package.This package then triggers the generation of an electronic mes-sage based on the Health Level 7 (HL7) standards and utilizesweb services as the message delivering technology. This paperfocuses on the machine learning techniques used within the pro-totype and subsequent testing of the algorithm for its predictionaccuracy.
A. Data Preprocessing ModuleThe prototype was constructed using two main modules: 1)
data preprocessing module; and 2) AI module. The data prepro-cessing module (see Fig. 1) consists of Screening office mod-ule that accomplishes data extraction from the screening unitsdatabase. The demography details for the three-year call/recallwere downloaded (extraction dateJan 2008) from the localhealth care authoritys database. The downloading is affectedvia the health link network onto a standalone system withinthe breast screening unit. The historical data related to screen-ing, appointments, and results pertaining to screening womenare retained within the screening units Massachusetts Gen-eral Hospital Utility Multi-Programming System (MUMPS)database. MUMPS, also known as the Oxford system, is one ofthe earliest programming languages used since the 1960s .This language was extensively employed to write database ap-plications explicitly for the healthcare domain.
BASKARAN et al.: PREDICTING BREAST SCREENING ATTENDANCE USING MACHINE LEARNING TECHNIQUES 253
Pseudo-code 1. Pseudo-code for filtering raw data and preprocessing it togenerate predictor attributes and classify them based on their episode details.
The MUMPS database is based on the disk operating system(DOS) and employs character-based user interface for databaseinterrogation . The cumbersome DOS-based system is proneto erroneous data entry and hence warranted a change in thesystem. A new software package, the National Breast Screen-ing Computer System (NBSS), was developed in 20022003to address these issues . This NBSS consists of a VisualBasic (VB) front end connected to a Cache database whichis seamlessly integrated with the MUMPS database . Dueto the aforementioned factors, an unstable environment, thus,resulted in considerable complexities during data extraction forthe current research. The screening office module (see Fig. 1)is executed with the existing software programs available in thebreast screening office.
The VB front end made data extraction straightforward fromthe MUMPS database through Structured Query Language(SQL) queries directed at the Cache database. Currently, thebreast screening office is employing Crystal Report (CR) aspart of the NBSS to generate reports for all the screening activi-ties, including screening, administration, invitation, etc. Part ofthe data preprocessing was implemented through the CR soft-ware. The screening unit had earlier indicated that the routinefunctioning of the screening office should not be affected duringthe data extraction process.
Hence, prior to data extraction, a CR template was created toreflect the format of the data to be exported (see pseudo-code1). This template was used to export the data as a flat file tonegate any system instability. All the screening units around thecountry were expected to have some form of minimum facilityfor creating datasets in a flat file format. Coupled with this, aneed for a low overhead on the existing IT system and minimumadditional complexities was considered as fundamental for theprototype. All the aforementioned rationale strengthened theneed for adopting a compromised strategy that exports data asa flat file, so that the mode of data transfer can be standardizedacross the country with minimum or no interrogation with thescreening database.
The SQL query generated details for all the women in asmany records, pertaining to the demography and episodes. Thedemographic data were incomplete and only the first record ofa particular woman had the complete dataset and the remainingrecords of the women corresponded to the historical episodedetails (see Table I). The womens address and name were ex-cluded from the study to address data protection and maintain
TABLE ITHIRTEEN-YEAR DATASET DETAILS
anonymity. In spite of its necessity for the messaging module,the complete dataset was generated without the personal infor-mation of the screening women. The post code of the womenis indispensable for the current study, as it generates the im-portant predictor variable in the form of Townsends reference(Townsend deprivation score denotes the socioeconomic statusof a given postcode) and post annum number.
To address this without compromising the research work,variables related to postcode, such as the Townsend score, postannum (post annum is an arbitrary number associated with thewomens postcode) and screening distance, were all processed togenerate categorical variables within the screening unit and thenthe data were ported to the AI module. The individual womenwere identified by their SX number (pseudo-anonymised uniqueidentifier). The AI module generated the attendance prediction,which formed the core of the knowledge transfer. The recipientof the knowledge transfer is the womans family physician;hence, family physician information in the form of surname,surgery address, and postcode was later collated for sending theHL7-based message.
Pseudo-code 2. Pseudo-code for the AI module and results collation for thefinal output
One Record object was associated with one or moreEpisode objects (see Fig. 2). The gaps in the demographicrecord have to be filled and the episode details were associ-ated with the womens demographic data. Exhaustive analysesof the data indicated that the CR report had duplicate episodedetails and are to be removed before further processing can beimplemented (see Table I). Each record read from the CR re-port has to be first partitioned into episode details and storedas Episode objects. They are finally collated and associatedwith the womens demographic details (represented as Recordobject). In addition to this, all the records have to be automat-ically validated. The earlier work by Arochena had identifiedall the contributing predictor attributes through comprehensive
254 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 15, NO. 2, MARCH 2011
Fig. 2. UML class diagram for data preprocessing module (with I/O process-ing submodule).
TABLE IIDATASET SPREAD ACROSS THE EPISODES AND ITS TRI-FURCATED DATA
statistical analyses . After generating the required attributes,the preprocessor module classifies the Record objects basedon the number of Episode objects it contains (see Fig. 2). Thisdataset was then written as an in-process flat file for reference.All errors generated during the execution of the preprocessingmodule are written in a log (error) and is also saved as a flat filefor future reference.
The data preprocessing module identified episodes with miss-ing data and removed them from the study. In total 2% (9 799)were removed as records with missing data (see Table I). It fur-ther deleted almost 3% (15 778) of the total records due to dupli-cate entries. The valid records constituted 86% (159 412) of theextracted dataset; on an average, each record had 3.2 episodes.Table II depicts the spread of data for each episode. The highestnumber of records was reached for the fourth episode. The firstto fifth episodes had an average of 31 000 records. For the re-maining episodes (sixth, seventh, and eighth) the average is only800 records. This might have a significant impact on the actualprediction capacity of the JAABS algorithm for these episodes.
B. AI ModuleJAABS is the new algorithm designed and developed in a
JAVA environment. As the design process was based on moreof an evolutionary type, a modular design strategy was selected.This assists in parallel development of the implementation andalso enables testing as modules rather than as one single mono-lithic program. The modular design also ensured that any addi-tions or changes happening within the screening units business
Fig. 3. UML class diagram of JAABS algorithm showing back propagation-based neural network and radial-basis function-based neural.
logic can be implemented without affecting the other modules(see pseudo-code 2.). The AI Module encompasses the datanormalizer; the neural networks; and the results collator (seeFig. 3). The Java-based algorithm implements two differentneural networks: feed-forward back-propagation neural network(BPNN) and radial basis function neural network (RBFN).
The neural network algorithm requires the input data vectorclassified as binary values; hence, the input data are normalized.The input data in the RBFN are first passed through a radial basisfunction algorithm, to identify the clusters and assign a radiusfor cluster classification. These cluster centers are calculatedand the real-time data are checked against these establishedcluster centers. Once the distance is calculated, the input datasetis then associated with its nearest cluster. These data then triggera neural network for performing the prediction on attendance.Each episode has a different set of predictor attributes; hence,each episode is fed through separate neural networks that weretrained with their respective training dataset.
The results module collects the collated prediction for eachepisode and submits it to a Pooler based classifier (see Fig. 4).The Pooler finds the best prediction for the given episodeand generates the final prediction output based on the confi-dence value of the prediction. This is fed into the predictionresult collator for all the input (women) based on each episode.The consolidated result is used to generate the nonattendancelist and written as a flat file for processing by the messagingmodule for message generation. The final output is associatedwith the womens SX number so that general physician detailscan be added for knowledge sharing and to initiate physicianintervention.
The predictor attributes (PA: post annum is an arbitrary num-ber associated with the womens postcode, TS: townsend depri-vation score denotes the socioeconomic status of a given post-code, AttBin: previous episodes attendance, NumTest: numberof tests in the previous episodes, Cancer: denotes if cancer wasdiagnosed in previous episodes, FP: false positive in previous
BASKARAN et al.: PREDICTING BREAST SCREENING ATTENDANCE USING MACHINE LEARNING TECHNIQUES 255
Fig. 4. Machine learning algorithm containing artificial intelligence and re-sults module.
TABLE IIIPREDICTOR ATTRIBUTES AND THEIR ASSOCIATION TO THE SCREENING
ATTENDANCE EPISODE WISE
episodes, HFP: history of false positive, HC: history of cancer,AttTypeBin: type of attendance like first or later episodes, Age-Band: age categories, Slip: difference in days between screeningappointment and actual screening date, ScrDist: distance trav-eled by the women for getting a mammogram) were initiallyverified for their association with the screening attendance (seeTable III). The variables, being categorical, were analyzedthrough parameters such as Lambda, Uncertainty, Phi (), Cram-mers V, and Contingency (confidence level at 95%).
These tests for association were conducted for establishingsome kind of linear relationship between the dependent and in-dependent variables. Even though an association was not strong,it was used only to establish some form of relationship betweenthe variables. This was used as an indication and as a first stepfor resolving the real problem space which is multispatial. Thisstrategy assisted in filtering out the nonparticipating attributesand to reduce the introduction of background noise.
Episode 1 lacked the historical variables and had to relyonly on demographic details. The rest of the episodes have
TABLE IVROC FOR ALL EPISODESAIATT AND JAABS (JAVA AND CLEMENTINE)
both the demographic and historical attributes as predictors; es-pecially the new attribute in the form of screening distancewas found to increase the prediction efficiency for all theepisodes. The JAABS algorithm and its predictor attributeswere compared with its predecessor [AI-based attendance pre-diction algorithm(AI-ATT)] for validation . The AI-ATTalgorithm was developed in a visual modeling environmentClementine . This off-the-shelf software assisted in design-ing and implementing the algorithm rapidly, but created newfunctional challenges such as the need for licensing the softwarefor all the screening units, specialist requirement for running thealgorithm, as it was not automated, and is based on outdated dataand semantics (19892001) to name just a few.
AI-ATT provided a base line for comparison and a referencefor validating the JAABS algorithm. To make the validationmore up-to-date, the same dataset that was applied to the JAABSalgorithm was also tested on Clementine (version 12.0). Thedataset was trifurcated into training, validating, and test sets (seeTable II). The training set contained equal numbers of womencategorized as attendees and nonattendees. The validating setcontained data that were never exposed during the training andcontained an equal number of attendees and nonattendees. Thetest set contained skewed data, where nonattendees were only asmall proportion. This ensures that the test set reflects the real-time dataset that would also be skewed (less nonattendees). TheJAABS algorithm was tested with the complete set of episodesafter appropriate training and validation.
256 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 15, NO. 2, MARCH 2011
Fig. 5. ROC curve for Episodes one to eight for the machine learning algorithm.
BASKARAN et al.: PREDICTING BREAST SCREENING ATTENDANCE USING MACHINE LEARNING TECHNIQUES 257
The receiver operator characteristics (ROC) are summarizedin Table IV (ACC: accuracy, NPV: negative predictive value,PPV: positive predictive value, SPC: specificity, SEN: sensitiv-ity). The algorithms final prediction of the screening attendancewas based on a polling strategy that relies on the prediction con-fidence. The accuracy of the algorithm was around 68% for thefirst three episodes. Episode 4 had the maximum accuracy at79%, closely followed by the fifth episode. The accuracies ofthe sixth and seventh episodes were lowest (57% and 51%, re-spectively). The NPV was the maximum at 51% for the fifthepisode. The rest of the episodes had NPV values between 41%and 47%.
Episode 7 had the lowest NPV (30%). These lower NPVswere expected as the proportion of nonattendees was lesser inthe test set (unbalanced). The PPVs for the fourth and fifthepisodes were higher between 83% and 87%. The remainingepisodes had values in the seventies range, except for the sixthepisode where it was 64%. Specificity was highest for the sev-enth episode at 60%, but this may not be a true indicator asthis episode had only 238 records in total. The next highestvalue was in the fifth episode at 49%. Episodes 1, 2, and 6 hadvalues between 40% and 45%. Episodes 3 and 4 had lower val-ues at 26% and 37%, respectively. The sensitivity was around80% for the first four episodes, peaking at 85% for Episode 3.The higher the training set of records, the higher the sensitivityvalues. Since the previous algorithm (AI-ATT) had only fourepisodes, the averages for the first four episodes were used forcomparing the JAABS and AI-ATT algorithms. The same setof attributes, when presented to commercial software (Clemen-tine), generated improved results (see Table IV).
The first three episodes show an almost 10% increase in ac-curacy. Similarly, the later episodes (Episodes 4 and 5) whenpredicted by the JAABSClementine model, on average,do 6%better than the JAABSJava algorithm, whereas Episodes 6 and7 illustrated the maximum difference in accuracy (1027%);this shows that the commercial software performed better evenwith a reduced training dataset. The NPV was lowest for thefirst episode, but was double when compared to AI-ATT andnearly 10% more than JAABS (Java). The NPV for the rest ofthe episodes (second to fifth) was around 73%. The remainder(sixth and seventh) were at 63% and 86%, respectively. TheNPV is the metric that corresponds to the prediction of nonat-tendance and this was much better than that was achieved bythe AI-ATT. Specificity is the next important measure and testson Clementine showed promising results for all the episodesexcept for the first one.
The ROC curves for JAABS (Clementine) showed good pre-diction characteristics for all episodes except for Episode 1 (seeFig. 5). From the models performance perspective, all theseprediction characteristics were positive. The AI model proposed(JAABSimplemented in both Java and Clementine) was con-sistent and even outperformed the earlier model (AI-ATT) inmany aspects. This could be attributed to the larger database andmore complete attribute set and even the new predictor variable(screening distance) assisting in improving the algorithms effi-ciency. The knowledge creation by applying AI (JAABS) is notonly consistent, repeatable, and economical, but also ensures
minimal human intervention. This is ideal for automating thewhole process.
The proposed AI network (JAABS) for predicting screeningnonattendance would be incorporated in a new breast screeningsoftware model that connects to the screening database to gen-erate the screening batch. Based on the prediction, an automatedmessage would be sent to the womens healthcare stakeholders(GPs, nurses, and other clinical specialists). These messageswould be assimilated by the clinical system used by the stake-holders and would eventually flag the women as a nonattendee.When a womans clinical record is opened, a flag/pop-up win-dow would trigger opportunistic interventions that are aimed ateducating the woman. This knowledge transfer would empowerthe woman to make an informed decision toward screening.This multistakeholder-based opportunistic intervention strategywould increase the overall breast screening attendance.
V. CONCLUSIONThis paper discussed the details of how a machine learning-
based prediction tool can be effectively applied to increase thebreast cancer screening attendance. The need for a high degreeof automation was highlighted to simplify the algorithms adop-tion; such automation would also reduce overheads and makeintegration as seamless as possible . From the models per-formance perspective, all the prediction characteristics werepositive. The machine learning-based AI model (JAABSimplemented in both Java and Clementine) proposed was consis-tent and even outperformed the earlier model (AI-ATT) in manyaspects. The performance improvement could be attributed tothe larger database, more complete attribute set and even thenew predictor variable (screening distance). The knowledge cre-ation by applying AI (JAABS) is not only reliable, repeatable,and economical, but also ensures minimal human intervention.There is still scope for improving the prediction efficiency andthis can be achieved through better predictor attributes and/orimproved machine learning techniques. The former would bedifficult to achieve as the data source itself may not be availablebut the latter would be possible as better AI models, such assupport vector machines, fuzzy logic, and genetic algorithms ora combination of these, would enable further investigation forincreasing the efficiency.
The authors would like to thank J. Patnick CBE, Direc-tor, NHS Cancer Screening Programs (U.K.), for funding thisresearch, Dr. M. Wallis, Consultant Radiologist, CambridgeBreast Unit team, and Margot Wheaton, Program Manager forthe Warwickshire, Solihull and Coventry Breast Screening Ser-vice at Coventry and Warwickshire Hospital, for their excellentsupport and guidance throughout this research.
 American Cancer Society. (2010, Feb. 10). Breast Cancer Facts& Figures 20092010 [Online]. Available: http://www.acsevents.org/downloads/STT/F861009_final%209-08-09.pdf.
258 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 15, NO. 2, MARCH 2011
 Cancer Research U.K. (2010, Feb. 10). Breast CancerU.K.Mortality Statistics. [Online]. Available: http://info.cancerresearchuk.org/cancerstats/types/breast/mortality/index.htm.
 NHS Breast Screening ProgrammeCancer Screening ProgrammesAnnual Review 2009. (2010, Feb. 10). [Online]. Available: http://www.cancerscreening.nhs.uk / breastscreen / publications / nhsbsp-annualreview2009.pdf.
 K. Turner, J. Wilson, and J. Gilbert, Improving breast screening uptake:Persuading initial non-attenders to attend, J. Med. Screening, vol. 1,pp. 199202, 1994.
 A. Majeed, R. Given-Wilson, and E. Smith, Impact of follow up letterson non-attenders for breast screening: A general practice based study, J.Med. Screening, vol. 4, pp. 1920, 1997.
 J. P. Sin and A. S. Leger, Interventions to increase breast screeninguptake: Do they make any difference?, J. Med. Screening, vol. 6, no. 1,pp. 170181, 1999.
 Canadian Cancer Society. (2006). Canadian Researchers Find CommonBreast Cancer Chemotherapy Regime Inferior at Preventing DiseaseRecurrence [Online]. Available: http://www.cancer.ca/Canadawide/About%20us/Media%20centre/CW-Media%20releases/CW2006/Canadian%20Researchers% 20Find % 20Common % 20Breast % 20Cancer % 20Chemotherapy % 20Regime % 20Inferior % 20at%20Preventing%20Disease%20Recurrence.aspx?sc_lang=en.
 Canadian Cancer Society. (2008, Mar. 22). Canadian Cancer Statistics2008 [Online]. Available: http://www.cancer.ca/Canada-wide/About%20cancer/Cancer%20statistics//media/CCS/Canada%20wide/Files%20List/English%20files%20heading/pdf%20not%20in%20publications%20section/Canadian%20Cancer%20Society%20Statistics%20PDF%202008_614137951.ashx.
 A. Oikonomou, S. A. Amin, R. N. G. Naguib, A. Todman, and H.Al-Omishy, Breast self examination training through the use of mul-timedia: A prototype multimedia application, IEEE Eng. Med. Biol.Soc., vol. 2, no. 21, pp. 295298, 2003.
 B. V. Marcela, The system does work, J. Am. College Radiol., vol. 1,no. 6, pp. 438440, 2004.
 L. Wyld, Mammographic Breast Screening in Elderly Women, in Man-agement of Breast Cancer in Older Women, part 3, M. W. Reed and R.A. Audisio, Eds. London, U.K.: Springer, 2010, ch. 9, pp. 127142.
 R. G. Blanks, S. M. Moss, C. E. McGahan, M. J. Quinn, and P. J. Babb,Effect of NHS breast screening programme on mortality from breastcancer in England and Wales, 19901998: Comparison of observed withpredicted mortality, BMJ, vol. 321, no. 7262, pp. 665669, 2000.
 S. S. Epstein, The Politics of Cancer. New York: Doubleday, 1979,pp. 537.
 G. Burton, Alternative Medicine. Washington, DC: Future MedicinePublishing, 1997.
 Cancer Research U.K. (2007, Jul. 14). Cancer IncidenceU.K. Statis-tics [Online]. Available: http://info.cancerresearchuk.org/cancerstats/incidence/index.htm
 P. Forest, Breast Cancer ScreeningA Report to the Health Ministers ofEngland, Scotland, Wales and Northern Ireland. London, U.K.: HMSO,1986.
 Medicine net (2010 Feb. 18). Breast Cancer [Online]. Available:http://www.medicinenet.com/breast_cancer/page3.htm
 I. Pirjo, L. Kauhava, I. Parvinen, H. Helenius, and P. Klemi, Customerfee and participation in breast cancer screening, The Lancet, vol. 358,p. 1425, 2001.
 S. H. Woolf, The 2009 Breast Cancer Screening Recommendations of theUS Preventive Services Task Force, JAMA, vol. 303, no. 2, pp. 162163,2010.
 American Cancer Society Inc., (2010, Feb. 18) Cancer ReferenceInformation [Online]. Available: http://www.cancer.org/docroot/CRI/CRI_2_5x.asp?dt=5
 D. P. Weller and C. Campbell, Uptake in cancer screening programmes:A priority in cancer control, Brit. J. Cancer, vol. 101, pp. 5559, 2009.
 Y. Zheng, Breast cancer detection with gabor features from digital mam-mograms, Algorithms, vol. 3, pp. 4462, 2010.
 K. W. Eilbert, K. Carroll, J. Peach, S. Khatoon, I. Basnett, and N. Mc-Culloch, Approaches to improving breast screening uptake: Evidenceand experience from Tower Hamlets, Brit. J. Cancer, vol. 101, no. 2,pp. 6467, 2009.
 D. Schopper and C. de Wolf, How effective are breast cancer screeningprogrammes by mammography? Review of the current evidence, Eur. J.Cancer., vol. 45, no. 11, pp. 19161923, Jul. 2009.
 E. S. Cassandra, Breast cancer screening: Cultural beliefs and diversepopulations, Health Soc. work, vol. 31, no. 1, pp. 3643, 2006.
 NHS Cancer Screening Programmes. (2007, Apr.) Disclosure of Au-dit Results in Cancer Screening Advice on Best Practice (CancerScreening Series 3), J. Patnick, Ed. [Online]. Available: http://www.cancerscreening.nhs.uk/publications/cs3.pdf
 K. Okane. (2005, Apr. 20). Mumps Language BioinformaticDatabase Resources [Online]. Available: http://bioinformatics.org/forums/forum.php?forum_id=1035
 V. Baskaran, R. K. Bali, R. N. G. Naguib, and H. Arochena, A Knowl-edge Management approach to increase uptake in a breast screening pro-gramme, presented at the IEEE 2nd Humanoid, Nanotechnology, In-formation Technology, Communication and Control, Environment andManagement (HNICEM) Int. Conf., Philippines, Mar. 2005.
 S. Tarver, K. Cronin-Cowan, and M. E. Wheaton, A pilots life for us,Breast Cancer Res., vol. 6, suppl. 1, p. 52, 2004.
 H. E. Arochena, Modelling and prediction of parameters affecting atten-dance to the NHS breast cancer screening programme, Ph.D. dissertation,Dept. Comp. Sci., Coventry Univ., Coventry, U.K., 2003.
 C. Bankhead, S. H. Richards, T. Peters, D. Sharp, R. Hobbs, J. Brown,L. Roberts, C. Tydeman, V. Redman, J. Formby, S. Wilson, and J. Austoker,Improving attendance for breast screening among recent non-attenders:A randomised controlled trial of two interventions in primary care, J.Med. Screening, vol. 8, no. 2, pp. 99105, 2001.
Vikraman Baskaran received the AMIE, M.Sc., andPh.D. degrees.
He is currently an Assistant Professor at the Schoolof Information Technology Management of RyersonUniversity, Toronto, ON, Canada. His research inter-ests include finding a viable application of the knowl-edge management paradigm in healthcare applica-tion. His special interest in developing HL7 messag-ing and health informatics has provided opportunitiesin excelling in these fields. His current activities in-clude KM, e-health, artificial intelligence, and health-
care informatics.He is a member of the HL7 U.K. and Canada.
Aziz Guergachi received the B.Eng., B.Sc., andPh.D. degrees.
He is currently an Associate Professor at the TedRogers School of Information Technology Manage-ment of Ryerson University, Toronto, ON, Canada.Prior to becoming part of the Ryerson community, hewas involved in the development of a large softwaresystem for trade promotion management and collab-orative sales forecasting. His current research inter-ests include advanced system modeling and machinelearning with applications to business management
and engineering systems.He is the recipient of the New Opportunities Award of the Canada Foundation
for Innovation and currently runs a research laboratory for advanced systemsmodeling.
Rajeev K. Bali (SM03) received the B.Sc. (Hons.),M.Sc., Ph.D., PgC, and SMIEEE degrees.
He is currently a Reader in Healthcare Knowl-edge Management at Coventry University, U.K. Hismain research interests include clinical and health-care knowledge management (from both technicaland organisational perspectives). He has publishedpeer-reviewed journals and is the author/editor of sev-eral textbooks on healthcare knowledge management.
He serves on various editorial boards and confer-ence committees and is regularly invited to deliver
presentations and speeches internationally.
BASKARAN et al.: PREDICTING BREAST SCREENING ATTENDANCE USING MACHINE LEARNING TECHNIQUES 259
Raouf N. G. Naguib (SM97) received the B.Sc.,M.Sc., Ph.D., DIC, SMIEEE, MIET, MIPEM, CEng.,and CSci. degrees.
He is currently a Professor of Biomedical Com-puting and Head of BIOCORE, Coventry, U.K. Priorto this appointment, he was a Lecturer at Newcas-tle University, Newcastle Upon Tyne, U.K. He haspublished more than 240 journals and conference pa-pers and reports in many aspects of biomedical anddigital signal processing, image processing, artificialintelligence, and evolutionary computation in cancer
research.Mr. Naguib was awarded the Fulbright Cancer Fellowship in 19951996
when he carried out research at the University of Hawaii, Manoa, on the appli-cations of artificial neural networks in breast cancer diagnosis and prognosis.He is a member of several national and international research committees andboards.