[studies in computational intelligence] computational intelligence in healthcare 4 volume 309 || an...

25
Chapter 17 An Advanced Probabilistic Framework for Assisting Screening Mammogram Interpretation Marina Velikova 1, , Nivea Ferreira 1 , Maurice Samulski 2 , Peter J.F. Lucas 1 , and Nico Karssemeijer 2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 6525 ED, Nijmegen, The Netherlands 2 Department of Radiology, Radboud University Nijmegen Medical Centre 6525 GA, Nijmegen, The Netherlands Abstract. Breast cancer is the most common form of cancer among women world-wide. One in nine women will be diagnosed with a form of breast cancer in her lifetime. In an effort to diagnose cancer at an early stage, screening programs have been introduced by using periodic mam- mographic examinations in asymptomatic women. In evaluating screen- ing cases, radiologists are usually presented with two mammographic images of each breast as a cancerous lesion tends to be observed in dif- ferent breast projections (views). Most computer-aided detection (CAD) systems, on the other hand, only analyse single views independently, and thus fail to account for the interaction between the views and the breast cancer detection can be obscured due to the lack of consistency in lesion marking. This limits the usability and the trust in the perfor- mance of such systems. In this chapter, we propose a unified Bayesian network framework for exploiting multi-view dependencies between the suspicious regions detected by a single-view CAD system. The framework is based on a multi-stage scheme, which models the way radiologists in- terpret mammograms, at four different levels: region, view, breast and case. At each level, we combine all available image information for the patient obtained from a single-view CAD system using a special class of Bayesian networks–causal independence models. The results from exper- iments with actual screening data of 1063 cases, from which 383 were cancerous, show that our approach outperforms the single-view CAD system in distinguishing between normal and abnormal cases. This is a promising step towards the development of automated systems that can provide a valuable “second opinion” to the screening radiologists for improved evaluation of breast cancer cases. 1 Introduction According to the International Agency for Research on Cancer (IARC), breast cancer is the most common form of cancer among women world-wide, and evi- dence show that early detection combined to appropriate treatment is currently Corresponding author: [email protected] I. Bichindaritz et al. (Eds.): Computational Intelligence in Healthcare 4, SCI 309, pp. 371–395. springerlink.com c Springer-Verlag Berlin Heidelberg 2010

Upload: lakhmi-c

Post on 02-Dec-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Chapter 17

An Advanced Probabilistic Framework forAssisting Screening Mammogram Interpretation

Marina Velikova1,�, Nivea Ferreira1, Maurice Samulski2,Peter J.F. Lucas1, and Nico Karssemeijer2

1 Institute for Computing and Information Sciences, Radboud University Nijmegen6525 ED, Nijmegen, The Netherlands

2 Department of Radiology, Radboud University Nijmegen Medical Centre6525 GA, Nijmegen, The Netherlands

Abstract. Breast cancer is the most common form of cancer amongwomen world-wide. One in nine women will be diagnosed with a form ofbreast cancer in her lifetime. In an effort to diagnose cancer at an earlystage, screening programs have been introduced by using periodic mam-mographic examinations in asymptomatic women. In evaluating screen-ing cases, radiologists are usually presented with two mammographicimages of each breast as a cancerous lesion tends to be observed in dif-ferent breast projections (views). Most computer-aided detection (CAD)systems, on the other hand, only analyse single views independently,and thus fail to account for the interaction between the views and thebreast cancer detection can be obscured due to the lack of consistencyin lesion marking. This limits the usability and the trust in the perfor-mance of such systems. In this chapter, we propose a unified Bayesiannetwork framework for exploiting multi-view dependencies between thesuspicious regions detected by a single-view CAD system. The frameworkis based on a multi-stage scheme, which models the way radiologists in-terpret mammograms, at four different levels: region, view, breast andcase. At each level, we combine all available image information for thepatient obtained from a single-view CAD system using a special class ofBayesian networks–causal independence models. The results from exper-iments with actual screening data of 1063 cases, from which 383 werecancerous, show that our approach outperforms the single-view CADsystem in distinguishing between normal and abnormal cases. This isa promising step towards the development of automated systems thatcan provide a valuable “second opinion” to the screening radiologists forimproved evaluation of breast cancer cases.

1 Introduction

According to the International Agency for Research on Cancer (IARC), breastcancer is the most common form of cancer among women world-wide, and evi-dence show that early detection combined to appropriate treatment is currently� Corresponding author: [email protected]

I. Bichindaritz et al. (Eds.): Computational Intelligence in Healthcare 4, SCI 309, pp. 371–395.springerlink.com c© Springer-Verlag Berlin Heidelberg 2010

372 M. Velikova et al.

MLO-Right MLO-Left CC-Right CC-Left

Fig. 1. MLO and CC projections of a left and right breast. The circle represents a lesionand the irregular areas represent regions, true positive and false positive, detected bya single-view CAD system

the most effective strategy on reducing mortality rates. This is the reason whymany countries have introduced breast cancer screening programs which targetat women of a certain age, usually starting at age of 50 years, aiming to detectthe occurrence of breast cancer as early as possible. Breast cancer screening iswidely seen as one of the keystone of breast cancer management, and an im-pressive decrease in breast cancer mortality in the age group invited for breastcancer screening has been shown [1].

The screening process involves carrying out an X-ray examination of thebreasts and the reading of the resulting images, or mammograms, by trainedradiologists. A screening mammographic exam usually consists of four images,corresponding to each breast scanned in two views – mediolateral (MLO) viewand craniocaudal (CC) view; see Figure 1. The MLO projection is taken un-der (approximately) 45◦ angle and shows part of the pectoral muscles. The CCprojection is a top-down view of the breast.

Despite the high level of expertise of radiologists involved in breast cancerscreening, the number of misinterpretations in the screening process is still high.For instance, it is not unusual to encounter cases of women which have breastcancer, but have their lesions missed during the reading process. This is due tothe fact that diagnosis is subject to one’s interpretation: either a lesion repre-senting the cancer is not detected by the radiologist or the lesion is identifiedbut incorrectly evaluated. In other words, mammogram reading is not a straight-forward task: image quality, breast density and a vague definition of the char-acteristics of breast cancer lesions are some of the factors which influence suchreading.

In order to support radiologists on their observations, and interpretation,computer-aided detection (CAD) systems are used. Most CAD systems havethe purpose of helping avoiding detection errors. However, CAD systems canalso be exploited to increase the radiologist’s performance by providing someinterpretation to mammograms [2].

An Advanced Probabilistic Framework 373

In reading mammograms, radiologists judge for the presence of a lesion bycomparing views. The general rule is that a lesion is to be observed in bothviews (whenever available). CAD systems, on the other hand, have been mostlydeveloped to analyse each view independently. Hence, the correlations in the le-sion characteristics are ignored and the breast cancer detection can be obscureddue to the lack of consistency in lesion marking. This limits the usability andthe trust in the performance of such systems. That is the reason why we developa Bayesian network model that combines the information from all the regionsdetected by a single-view CAD system in MLO and CC to obtain a single proba-bilistic measure for suspiciousness of a case. We explore multi-view dependenciesin order to improve the breast cancer detection rate, not in terms of individualregions of interest but, at a case level.

In detail, the power of our model relies on

– Incorporating background knowledge with respect to mammogram readingdone by experts: multiple views are not analysed separately, and the cor-relation among corresponding regions in different views are preserved as aconsequence;

– Analysing beyond the features of given regions: the interpretation is extendedup to case level. In the end, an interpretation to whether a given case can,or cannot, be considered as cancerous is our ultimate goal;

– The design itself as representing a step forward on a better understanding ofthe domain, establishing clearly the qualitative and quantitative informationof the relevant concepts in such domain.

Our design of a Bayesian network model in the context of image analysis hasimportant implications with respect to the structure and content of that network,i.e., the understanding of the variables being used, and how those relate to oneanother. This is seen as one of the major contributions of the present work.

The remainder of the chapter is organised as follows. In the next section wegive some introduction on terms of the domain, which will be used through-out this chapter. In Section 3 we briefly review previous research in multi-viewbreast cancer detection. In Section 4 we introduce basic definitions related toBayesian networks and in Section 5 we then describe a general Bayesian networkframework for multi-view detection. The proposed approach is evaluated on anapplication of breast cancer detection using actual screening data. The evalua-tion procedure and the results are presented in Section 6. Finally, conclusionsand directions for extension of our model are given in Section 7.

2 Terminology

To get the reader acquainted with the terminology used in the domain of breastcancer and throughout this chapter, we introduce next a number of concepts.By lesion we refer to a physical cancerous object detected in a patient. We calla contoured area on a mammogram a region. A region can be true positive (forexample, a lesion marked manually by a radiologist or detected automatically

374 M. Velikova et al.

by a CAD system) or false positive; see Figure 1. A region detected by a CADsystem is described by a number of continuous (real-valued) features describing,for example, size, density and location of the region. By link we denote matching(established correspondence) between two regions in MLO and CC views. Theterm case refers to a patient who has undergone a mammographic exam.

In addition, two main types of mammographic abnormalities are distinguished:microcalcifications and masses. Microcalcifications are tiny deposits of calciumand are associated with extra cell activity in breast tissue. They can be scat-tered throughout the mammary gland, which usually is a benign sign, or occurin clusters, which might indicate breast cancer at an early stage.

According to the BI-RADS [3] definition, “a mass is a space occupying le-sion seen in two different projections.” When visible in only one projection, itis referred as a “mammographic density”. However, the density may be a mass,perhaps obscured by overlying glandular tissue on the other view, and if it ischaracterised by enough malignant features then the patient would be referredfor further examination. For instance, many cancerous tumours may be pre-sented as relatively circumscribed masses. Some might be sharply delineated ormay be partly irregular in outline, while other cancers (which induce a reactivefibrosis) have an irregular or stellated appearance resulting in a characteristicradiographic feature, so-called spiculation. Masses might vary in size, usuallynot exceeding 5 cm. In terms of density, they appear denser (brighter on themammogram) when compared to breast fat tissue.

Masses are the more frequently occurring mammographic type of breast can-cer. In the context of screening mammography, there is strong evidence thatmisinterpretation of masses is a more common cause of missing cancer than per-ceptual oversight. Furthermore, applications have shown that CAD tends to per-form better in identifying malignant microcalcifications compared with masses[4]. Masses are more difficult to detect due to the great variability in their phys-ical features and similarity to the breast tissue, especially at early stages ofdevelopment. Hence, the prompt of the current CAD comprises not only thecancer but also a large number of false positive (FP) locations-undesired resultin screening where the reading time is crucial. Given the challenge in true massidentification, in this work, we focus only on the detection of malignant masses.

3 Previous Research

The rapid development of computer technology in the last three decades hasbrought unique opportunities for building up and employing numerous auto-mated intelligent systems in order to facilitate human experts in decision making,to increase human’s performance and efficiency, and to allow cost saving. Oneof the domains where intelligent systems have been applied with great success ishealthcare [5]. Healthcare practitioners usually deal with large amounts of infor-mation ranging from patient and demographic to clinical and billing data. Thesedata need to be processed efficiently and correctly, which often creates enormouspressure for the human experts. Hence, healthcare systems have emerged to

An Advanced Probabilistic Framework 375

facilitate clinicians as intelligent assistants in diagnostic and prognostic pro-cesses, laboratory analysis, treatment protocol, and teaching of medical students.

The initial research efforts in this area were devoted to the development ofmedical expert systems. Their main components comprise knowledge base, con-taining specific facts from the application area, rule-based system to extract theknowledge, explanation module to support and explain the decisions made tothe end user, and user interface to provide natural communication between thesystem and the human expert. One of the early prominent examples is MYCIN,developed in mid-1970s to diagnose and treat patients with infectious blood dis-eases caused by bacteria in the blood [6]. Up to date, the number of developedmedical expert systems has grown up enormously as discussed in [7].

With the extensive use of diagnostics imaging techniques such as X-ray, Mag-netic resonance imaging (MRI) and Ultrasound, recently, another promising typeof medical intelligent paradigm has emerged, namely computer-aided detection,or shortly CAD. As its name implies, the primary goal is to assist, rather thanto substitute, the human expert in image analysis tasks, allowing him compre-hensive evaluation in a short time. One typical application area of CAD systemsis the breast cancer detection and diagnosis. In [8], it is presented a generaloverview of various tasks such as image feature extraction, classification of mi-crocalcifications and masses, and prognosis of breast cancer as well as the relatedtechniques to tackle these tasks, including Bayesian networks, neural networks,statistical and Cox prognostic models. Next, we review some of the recent tech-niques used for the development of breast cancer CAD systems, and then wediscuss a number of previous studies dealing with multi-view dependencies tofacilitate mammographic analysis.

3.1 Computer-Aided Detection for Breast Cancer

In some previous research, Bayesian network technology has been used in or-der to model the intrinsic uncertainty of the breast cancer domain [9,10]. Suchmodels incorporate background knowledge in terms of (causal) relations amongvariables. However, they use BI-RADS terms to describe a lesion, rather thannumerical features automatically extracted from images. This requires the hu-man expert to define and provide the input features a priori, which limits theautomatic support of the system. Other research attempt is the expert systemfor breast cancer detection, proposed in [11]. The main techniques used are as-sociation rules for reducing the feature space and neural networks for classifyingthe cases. Application on the Wisconsin breast cancer dataset [12] showed thatthe proposed synthesis approach has the advantage of reducing the feature space,allowing for better discrimination between cancerous and normal cases. A limita-tion of this study, however, is the lack of any clinically relevant application of theproposed method with a larger dataset and an extensive validation procedure.

Automatic extraction of features is applied in [13], where the authors proposea CAD system for automatic detection of miscrocalcifications on mammograms.The method uses least-squares support vector machines to fit the intensity sur-face of the digital X-ray image and then convolution of the fitted image with

376 M. Velikova et al.

mixture of kernels to detect maximum extremum points of microcalcifications.Experimental results with benchmark mammographic data of 22 mammogramshave demonstrated computational efficiency (detection per image in less than 1second) and the potential reliability of the classification outcome. However, theseresults required a better validation with a larger dataset in order to provide aninsight for the clinical application of the method.

In [14], a logistic generalized additive model with bivariate continuous inter-actions is developed and applied for automatic mammographic mass detection.The main goal is to determine the joint effect of the minimum and maximumgray level value of the pixels belonging to each detected region on the probabilityfor malignant mass, depending on the type of breast tissue (fatty or dense). Theresults on a large dataset of detected mammographic regions showed that thebreast tissue type plays a role in the analysis of detected regions. Advantage ofthis method is its insightful nature, allowing interpretation and understanding ofthe results obtained from the CAD system. However, this approach focuses onlyon the lesion localisation on a mammogram rather than on the case classificationusing multiple images.

Despite the potential of these studies some general drawbacks are observed.On the one hand, the methods using descriptors provided by the human expertallow evaluation on a patient level–a desired outcome in screening. However,they are highly dependent on the expertise level and subjective evaluation ofthe reader, which might lead to high variation in the classification results. Onthe other hand, the approaches developed to automatically extract features fromthe mammograms allow more consistent and efficient computer-aided detection.But they tend to analyse independently every image belonging to a patient’sexamination and prompt localised suspicious regions rather than to use the in-formation from all the images to provide an overall probability for the patienthaving cancer. To overcome some of the limitations of these single-view CAD sys-tems, a number of approaches that deal with multi-view breast cancer detectionhave been suggested, as discussed in the next section.

3.2 Multi-View Mammographic Analysis

In [15] Gupta et al. use LDA to build a model that also makes use of BI-RADSdescriptors from one or two views. The results of two classifiers, one for MLOand one for CC, are combined in order to improve the classification of the CADsystem. However, an overall interpretation of a case-based on its related images isnot taken into account. In [16], Good et al. have proposed a probabilistic methodfor true matching of lesions detected in both views, based on Bayesian networkand multi-view features. The results from experiments demonstrate that theirmethod can significantly distinguish between true and false positive linking ofregions. Van Engeland et al. develop another linking method in [17] based onLinear Discriminant Analysis (LDA) classifier and a set of view-link featuresto compute a correspondence score for every possible region combination. Forevery region in the original view the region in the other view with the highest

An Advanced Probabilistic Framework 377

correspondence score is selected as the corresponding candidate region. Theproposed approach demonstrates an ability to discriminate between true andfalse links.

In [18], Van Engeland and Karssemeijer extend this matching approach bybuilding a cascaded multiple-classifier system for reclassifying the level of suspi-ciousness of an initially detected region based on the linked candidate region inthe other view. Experiments show that the lesion-based detection performance ofthe two-view detection system is significantly better than that of the single-viewdetection method. Paquerault et al. also consider established correspondence be-tween suspected regions in both views to improve lesion detection ([19]). LDAis used to classify each object pair as a true or false mass pair. By combiningthe resulting correspondence score with its one-view detection score the lesiondetection improves and the number of false positives reduces. Other studies onidentifying corresponding regions in different mammographic views have alsoused LDA and Bayesian artificial neural networks, in order to develop classifiersfor lesions [20,21]. The aim is, once again, identifying a lesion rather than givingan overall interpretation of the case based on its related images.

In [22], the authors proposed a neural network approach to compute the likeli-hood of paired regions on MLO and CC views, which are within similar distancefrom the nipple (a common landmark, used by radiologists for finding correspon-dence between view regions). Their system keeps the same case-based sensitivity(true detection rate) while reducing significantly the case-based false positives.In another study [23], some of the previous authors examined further the ef-fect of three methods for matching regions on both views on the performanceof the CAD system. The matching was based on different search criteria forcorresponding region on the other view. Results showed that the straight stripmethod required a smaller search area and achieved the highest level of CADperformance.

In recent study [24], Wei et al. extended a previous dual system for mass de-tection trained with average and subtle masses, with a two-view analysis to im-prove mass detection performance. Potential links between the mass candidatesdetected independently on both views are established using regional registrationtechnique and similarity measure is designed using paired morphological andtexture features to distinguish between true and false links. Using a simple ag-gregation strategy of weighting the similarity measure with the cross-correlationof the object pair, the authors obtained two-view score for malignancy, whichis further combined using Linear Discriminant Analysis (LDA) with the single-view score to obtain the final likelihood for malignant mass. Experimental resultsshowed that the two-view dual system achieves significant better performancethan the dual and the single-view system for average masses and improves uponthe single-view system only for subtle masses, which are by default more difficultto classify.

All these studies dedicated their attention on establishing whether a regionin one view corresponds to a region in the other view and demonstrate improve-ment in the localised detection of breast cancer, mostly for prompting purposes.

378 M. Velikova et al.

Hence, in these approaches the likelihood for cancer in a case is often determinedby the region with the maximum likelihood. However, while the correct detectionand location of a suspicious region is important, in breast cancer screening thecrucial decision based on the mammographic exam is whether or not a patientshould be sent for further examination. Furthermore, most of these approachesare based on black-box models such as LDA and neural networks, and thusthey lack explanatory power and capability to interpret the classification re-sults. Therefore, in contrast to previous research, in the current study we aimat building a probabilistic CAD system, using Bayesian networks, in order to (i)discriminate well between normal and cancerous cases by considering all avail-able information (in terms of regions) in a case and (ii) provide an insight in theautomatic mammographic interpretation.

4 Bayesian Networks

4.1 Basic Definitions

Consider a finite set of random variables X , where each variable Xi in X takeson values from a finite domain dom(Xi). Let P be a joint probability distributionof X and let S, T, Q be subsets of X . We say that S and T are conditionallyindependent given Q, denoted by S ⊥⊥ P T | Q, if for all s ∈ dom(S), t ∈ dom(T ),q ∈ dom(Q), the following holds:

P (s | t, q) = P (s | q), whenever P (t, q) > 0.

In short, we have P (S | T, Q) = P (S | Q).Bayesian networks, or BNs for short, are used for modelling knowledge in

various domains, from bioinformatics (e.g., gene regulatory networks), to imageprocessing and decision support systems. A Bayesian network [25] is defined asa pair BN = (G, P ) where G is an acyclic directed graph (ADG) and P is ajoint probability distribution. The graph G = (V,A) is represented by a set ofnodes V corresponding one to one to the random variables in X and a set of arcsA ⊆ (V×V) corresponding to direct causal relationships between the variables.Independence information is modelled in an ADG by blockage of paths betweennodes in the graph by other nodes.

A path between a node u and a node v in an ADG is blocked by a node wif the node w does not have two incoming arcs, i.e., · −→ w ←− ·, which iscalled a v-structure. If there is a v-structure for w on the path from u to v thenthe path is unblocked by w or by one of the descendants of w, i.e., nodes thathave a directed path to w. Let U , V , and W be sets of nodes, then if any pathbetween a node in U and a node in V is blocked by a node in W (possiblythe empty set), then U and V are said to be d-separated given W , denoted byU ⊥⊥ GV |W . The reader should observe the association of the subscript G withthe relation ⊥⊥ . The idea is that the nodes in the sets U and V are unable to‘exchange’ information as any communication path between the nodes in thesesets is blocked. If the sets U and V are not d-separated given W , then U and

An Advanced Probabilistic Framework 379

V are said to be d-connected given W . This implies that the sets can exchangeinformation.

As the graphical part of a Bayesian network is meant to offer an explicitrepresentation of independence information in the associated joint probabilitydistribution P , we need to establish a relationship between ⊥⊥ G and ⊥⊥ P . This isdone as follows. We say that G is an I–map of P if any independence representedin G by means of d-separation is satisfied by P , i.e.,

U ⊥⊥ GV |W =⇒ XU ⊥⊥ P XV | XW ,

where U , Z and W are sets of nodes of the ADG G and XU , XV and XW arethe sets of random variables corresponding to the sets of nodes U , V and W ,respectively. This definition implies that the graph G is allowed to representmore dependence information than the probability distribution P , but nevermore independence information.

A Bayesian network is a I–map model whose nodes represent random vari-ables, and where missing arcs encode conditional independencies between thevariables. In short, it allows a compact representation of independence infor-mation about the joint probability distribution P in terms of local conditionalprobability distributions (CPDs), or, in the discrete case, in terms of conditionalprobability tables (CPTs), associated to each random variable. This table de-scribes the conditional distribution of the node given each possible combinationof values of its parents. The joint probability can be computed by simply multi-plying the CPTs.

In a BN model the local Markov property holds, i.e., each variable is condi-tionally independent of its non-descendants given its parents:

Xv ⊥⊥ XV\de(v) | Xpa(v) for all v ∈ V

where de(v) is the set of descendants of v. The joint probability distributiondefined by a Bayesian network is given by

P (XV) =∏

v∈V

P (xv | Xpa(v))

where XV denote the set of random variables in the given model, xv is a variablein such set and Xpa(v) represents the parent nodes of the variable xv. BN is aBayesian network model with respect to G if its joint probability density functioncan be written as a product of the individual density functions, conditional ontheir parents.

Bayesian networks are commonly used in the representation of causal rela-tionships. However, this does not have to be the case: an arc from node u tonode z does not require variable xz to be causally dependent on xu. In fact,considering its graphical representation, it is the case that in a BN,

u −→ z −→ w and u←− z ←− w

are equivalent, i.e., the same conditional independence constraints are enforcedby those graphs.

380 M. Velikova et al.

Inference in Bayesian network refers to answering probabilistic queries regard-ing the (random) variables present in the model. For instance, when a subsetof the variables in the model are observed – the so-called evidence variables –then inference is used in order to update the probabilistic information about theremaining variables. In other words, once the state of some variables becameknown, the probabilities of the other variables (might) change and distributionsneed to be coherently revised in order to maintain the integrity of the knowledgeencoded by the model. This process of computing the posterior distribution ofvariables given evidence is called probabilistic inference in which essentially threecomputation rules from probability theory are used: marginalisation, condition-ing, and Bayes’ theorem.

Next we present two specific Bayesian network architectures where the notionof independence is used to represent compactly the joint probability distribution.

4.2 Naive Bayes

A naive Bayesian (NB) classifier is a very popular special case of a Bayesiannetwork. These network models have a central node, called the class node Cl,and nodes representing feature variables Fi, i = 1, . . . , n, n ≥ 1; the arcs aregoing from the class node Cl to each of the feature nodes Fi [26]. A schematicrepresentation of a naive Bayes model is shown in Figure 2.

Cl

F1 F2 Fn· · ·

Fig. 2. A schematic representation of a naive Bayes model

The conditional independencies that explain the adjective ‘naive’ determinethat each feature Fi is conditionally independent of every other feature Fj , forj = i, given the class variable Cl. In other words,

P (Fi | Cl, Fj) = P (Fi | Cl).

The joint probability distribution underlying the model is:

P (Cl, F1, . . . , Fn) = P (Cl)n∏

i=1

P (Fi | Cl),

and the conditional distribution over the class variable Cl can be expressed as

P (Cl | F1, . . . , Fn) ∝ P (Cl)n∏

i=1

P (Fi | Cl).

An Advanced Probabilistic Framework 381

In many practical applications, parameter estimation for naive Bayes modelsuses the method of maximum likelihood estimates. Despite their naive design andapparently over-simplified assumptions, naive Bayes classifiers often demonstratea good performance in many complex real-world situations.

The naive Bayes classifier combines the above naive Bayesian network modelwith a decision rule [27]. A common rule is choosing most probable hypothesis–amaximisation criterion. In this case, the classifier is defined by the function:

NB(f1, . . . , fn) = argmaxc

P (Cl = c)n∏

i=1

P (Fi = fi | Cl = c). (1)

4.3 Causal Independence

It is known that the number of probabilities in a CPT for a certain variablegrows exponentially in the number of parents in the ADG. Therefore, it is ofteninfeasible to define the complete CPT for variables with many parents. Oneway to specify interactions among statistical variables in a compact fashion isoffered by the notion of causal independence [28], where multiple causes (parentnodes) lead to a common effect (child node). The general structure of a causal-independence model is shown in Figure 3 and below we give a brief descriptionof the notion following the definitions in [29].

C1 C2 . . . Cn

I1 I2 . . . In

Ef

Fig. 3. Causal-independence model

A causal-independence model expresses the idea that causes C1, . . . , Cn influ-ence a given common effect E through intermediate variables I1, . . . , In. We de-note the assignment of a value to a variable by lower-case letter, e.g., ik stands forIk = � (true) and ık otherwise. The interaction function f represents in whichway the intermediate effects Ik, and indirectly also the causes Ck, interact. Thisfunction f is defined in such way that when a relationship between the Ik’sand E = � is satisfied, then it holds that f(I1, . . . , In) = e; otherwise, it holdsthat f(I1, . . . , In) = e. Furthermore, it is assumed that if f(I1, . . . , In) = e thenP (e | I1, . . . , In) = 1; otherwise, if f(I1, . . . , In) = e, then P (e | I1, . . . , In) = 0.Using information from the topology of the network, the notion of causal inde-pendence can be formalised for the occurrence of effect E, i.e. E = �, in termsof probability theory by:

382 M. Velikova et al.

P (e | C1, . . . , Cn) =∑

f(I1,...,In)=e

n∏

k=1

P (Ik | Ck).

Finally, it is assumed that P (ik | ck) = 0 (absent causes do not contribute tothe effect); otherwise, P (Ik | Ck) > 0.

An important subclass of causal-independence models is obtained if the de-terministic function f is defined in terms of separate binary functions gk; it isthen called a decomposable causal-independence model [28]. Usually, all functionsgk(Ik, Ik+1) are identical for each k. Typical examples of decomposable causal-independence models are the noisy-OR [30] models, where the function f repre-sents a logical OR. A simple example for the application of the logical OR in thedomain of breast cancer is given in the Appendix, motivating the use of noisy-ORmodels in the general theoretical framework presented in the next section.

5 Bayesian Multi-View Detection

The inputs for the proposed multi-view detection scheme are the regions detectedby a single-view CAD system presented in [17] and for completeness it is brieflydescribed here.

5.1 Single-View CAD System

As its name implies, the single-view CAD system analyses independently eachmammographic view of each breast. Figure 4 depicts a schematic representationof the single-view system used in this study. It consists of the following four mainsteps:

1. Mammogram segmentation into background area, breast, and for MLO, thepectoral muscles.

2. Local maxima detection based on pixel-based locations. For each locationin the breast area a number of features are computed that are related to

Original mammogram

Mammogram segmentation

Local maxima detection

Region segmentation

0.91 0.62

Region classification

Fig. 4. Stages in the single-view CAD system

An Advanced Probabilistic Framework 383

tumour characteristics such as presence of spicules (star-like structure) andfocal mass. Based on these features, a neural network (NN) classifier is thenemployed to compute the region likelihood for cancer. The locations with alikelihood above certain threshold are selected as locations of interest.

3. Region segmentation with dynamic programming using the detected loca-tions as seed points. For each region a number of continuous features arecomputed based on breast and local area information, e.g., contrast, size,location.

4. Region classification as “normal” and “abnormal” based on the region fea-tures. A likelihood for cancer is computed based on supervised learning witha NN and converted into normality score (NSc): the average number of nor-mal regions in a view (image) with the same or higher cancer likelihood.Hence, the lower the normality score the higher the likelihood for cancer.

Based on the single-view CAD system the likelihood for cancer for a view,breast or case is computed by taking the likelihood of the most suspicious regionwithin the view, breast or case, respectively. Although this system demonstratesgood detection rate at a region level, its performance deteriorates at a case level.One of the main reasons is that the single-view processing fails to account forview interactions in computing the region likelihood for cancer. To overcomethis limitation, we consider the problem of multi-view breast cancer detection,presented in the next section, and subsequently we propose a Bayesian networkframework to model this problem.

5.2 Multi-View Problem Description

The objective of mammographic multi-view detection is to determine whetheror not the breast and respectively the patient exhibits characteristics of abnor-mality by establishing correspondences between two-dimensional image featuresin multiple breast projections. Figure 5 depicts a schematic representation ofmulti-view detection.

A1 B1

A2 B2

L11

L12

L21 L22

MLO CC

Fig. 5. Schematic representation of mammographic multi-view analysis with automat-ically detected regions

384 M. Velikova et al.

A lesion projected in MLO and CC views is displayed as a circle; thus, thewhole breast and the patient is cancerous. An automatic single-view systemdetects regions in both views and a number of real-valued features are extractedto describe every region. In the figure regions A1 and B1 are correct detectionof the lesion, i.e., these are true positive (TP) regions whereas A2 and B2 arefalse positive (FP) regions. Since we deal with projections of the same physicalobject we introduce links (Lij) between the detected regions in both views, Ai

and Bj . Every link has a class (label) Lij = �ij defined as follows

�ij =

{true if Ai OR Bj are TP,

false otherwise.(2)

This definition allows us to maintain information about the presence of cancereven if there is no cancer detection in one of the views. In contrast to the clinicalpractice, in the screening setting the detected lesions are usually small and dueto the breast compression they are sometimes difficult to observe in both views.A binary class C, with values of true (presence of cancer) and false, for region,view, breast and patient is assumed to be provided by pathology or a humanexpert.

In any case, multiple views corresponding to the same cancerous part containcorrelated characteristics whereas views corresponding to normal parts tend tobe less correlated. For example, in mammography an artefactual density mightappear in one view due to the superposition of normal tissue whereas it dis-appears in the other view. To account for the interaction between the breastprojections, in the next section we present a Bayesian network framework formulti-view mammographic analysis.

5.3 Model Description

The architecture of our model for two-view mammographic analysis is inspiredby the way radiologists analyse images, where several levels of interpretation aredistinguished. At the lowest image level, radiologists look for regions suspiciousfor cancer. If suspicious regions are observed on both views of the same breast,then the individual suspiciousness of these regions increases implying that alesion is likely to be present. As a result, the whole breast as well as the examand patient is considered suspicious for cancer; otherwise, the breast, exam andpatient is considered normal.

From a CAD point of view the first step–identifying suspicious regions–havealready been tackled by the single-view CAD systems developed by our group.Furthermore, in [31,32] we proposed a two-step Bayesian network framework formulti-view detection using the regions from the single-view system. Here, weextend this system to create an unified framework to model the following stagesin the mammographic analysis as described above. Figure 6 represents the newadvanced framework.

At first we compute the probability that a region in one view is classified astrue given its links to the regions in the other view. A straightforward way to

An Advanced Probabilistic Framework 385

Ai / Bj = (x1, x2, …, xn)

A1

L11

CA1

A2 B1 B2

CA1

IA1 IA2

MLO

IB1 IB2

Stage 1

REGNET L12 L21 L22

CB1 CB2 CA2

CA2 CB1 CB2

CC

MLO

I1 I2

CC NScM

Breast

NScC

Stage 2

VIEWNET

Stage 3

BREASTNET

LBr

I1 I2

RBr NScLBr

Case

NScRBr

MAX

OR

OR OR

Stage 4

CASENET

Fig. 6. Bayesian network framework for representing the dependencies between multi-ple views of an object

model a link Lij is to use the corresponding regions Ai and Bj as causes for thelink class, i.e., creating the so-called v-structure Ai −→ Lij ←− Bj , defined inSection 4.1. Since the link variable is discrete and the regions are representedby a vector of real-valued features (x1, x2, . . . , xn) extracted from an automaticdetection system, we apply logistic regression to compute P (Lij = true|Ai, Bj):

P (Lij = true|Ai, Bj) =exp

�ij

0 + β�ij

1 x1 + · · ·+ β�ij

k xk

)

1 + exp(β

�ij

0 + β�ij

1 x1 + · · ·+ β�ij

k xk

)

where β’s are the model parameters we optimise and k is the total numberof features of regions Ai and Bj . Logistic regression ensures that the outputsP (Lij = true|Ai, Bj) lie in the range [0, 1] and they sum up to one. Next we

386 M. Velikova et al.

compute the probabilities P (CAi = true|Lij = true) and P (CBj = true|Lij =true) where CAi and CBj are the classes of regions Ai and Bj , respectively.Given our class definition in (2), we can easily model these relations through acausal independence model using the logical OR. The Bayesian network RegNetmodels this scheme.

At the second stage of our Bayesian network framework we combine the com-puted region probabilities from RegNet by using a causal independence modelwith the logical OR to obtain the probability for cancer of the respective view.Thus we represent the knowledge that the detection of at least one cancerous re-gion is sufficient to classify the whole view as cancerous whereas the detection ofmore cancerous regions will increase the probability of the view being cancerous.We call this Bayesian network ViewNet.

At the third stage, we combine the view probabilities obtained from ViewNetinto a single probabilistic measure for the breast as a whole. As additional inputswe use the likelihoods for cancer (NSc) for each view computed by the single-viewCAD system, which are already indicators for the level of suspiciousness of theview. To explicitly account for the view dependences we model the probabilitiesobtained from ViewNet and the single-view measure by two v-structures, whoseoutputs are combined using the logical OR function. We refer to this Bayesiannetwork model as to BreastNet.

In the last, fourth, stage, we compute the probability for a case being cancer-ous using two combining schemes. Since in the screening programs breast canceroccurs mostly in one of the breasts then the first simple combining techniqueis to take the maximum of both breast probabilities obtained from BreastNet.As a second more advanced scheme we use a causal independence model andthe MAX function to combine the breast probabilities from BreastNet and thesingle-view likelihoods for cancer for each breast. We refer to the latter as toCaseNet and to the whole multi-view detection scheme as to MV-CAD-Causal.

6 Application to Breast Cancer Data

6.1 Data Description

The proposed model was evaluated using a data set containing 1063 screeningexams from which 383 are cancerous. All exams contained both MLO and CCviews. The total number of breasts were 2126 from which 385 had cancer. Allcancerous breasts had one visible lesion in at least one view, which was verifiedby pathology reports to be malignant. Lesion contours were marked by, or undersupervision of, a mammogram reader.

For each image (mammogram) we selected the first 5 most suspicious regionsdetected by the single-view CAD system. In total there were 10478 MLO re-gions and 10343 CC regions. Every region is described by 11 continuous featuresautomatically extracted by the system, which tend to be relatively correlatedacross the views. These features include the neural network’s output from thesingle-view CAD and lesion characteristics such as spiculation, focal mass, size,contrast, linear texture and location coordinates. Every region from MLO view

An Advanced Probabilistic Framework 387

was linked with every region in CC view, thus obtaining 51088 links in total.We added the link class variable with binary values true and false followingthe definition in equation (2). Finally to every region, view, breast and case,we assigned class values of true (presence of cancer) and false according to theground-truth information provided by pathology reports.

6.2 Evaluation

To train and evaluate the proposed multi-view CAD system, we used a stratifiedtwo-fold cross validation: the dataset is randomly split into two subsets withapproximately equal number of observations and proportion of cancerous cases.The data for a whole case belonged to only one of the folds. Each fold is usedas a training set and as a test set. At every level (region, view, breast and case)the same data folds were used. Although we use the results from the single-viewCAD system, we note that the random split for the multi-view CAD system isdone independently. This follows from the fact that the current framework usesonly a subset of the data (cases without CC views have been excluded).

The Bayesian networks at all stages of our MV-CAD-Causal model have beenbuilt, trained and tested by using the Bayesian Network Toolbox in Matlab([33]). The learning has been done using the EM algorithm, which is typicallyused to approximate a probability function given incomplete samples (in ournetworks the OR-nodes are not observed).

We compare the performance of our MV-CAD-Causal model with the per-formance of the single-view CAD system (SV-CAD). As additional benchmarkmethods for comparison at the breast and case level, we use the naive Bayesclassifier (MV-CAD-NB) and the logistic regression (MV-CAD-LR). Both classifiersuse the same inputs as MV-CAD-Causal at both levels. The model comparisonanalysis is done using Receiver Operating Characteristic (ROC) curve ([34]) andthe Area Under the Curve (AUC) as a performance measure. The significanceof the differences obtained in the AUC measures is tested using the ROCKITsoftware ([35]) for fully paired data: for each patient we have a pair of test resultscorresponding to MV-CAD-Causal and the benchmark systems.

6.3 Results

Classification accuracy. Based on the results from ViewNet, Figure 7 presentsthe classification outcome with the respective AUC measures per MLO and CCview, respectively.

The results clearly indicate an overall improvement in the discrimination be-tween suspicious and normal views for both MLO and CC projections. Such animprovement is expected as the classification of each view in our multi-view sys-tem takes into account region information not only from the view itself but alsofrom the regions in the other view. To check the significance of the differencebetween the AUC measures we test the hypothesis that the AUC measures forMV-CAD-Causal and SV-CAD are equal against the one-sided alternative hypoth-esis that the multi-view system yields higher AUCs for MLO and CC views. The

388 M. Velikova et al.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

False positive rate

Tru

e p

osi

tive

rat

e

MV-CADAUC=0.850

SV-CADAUC=0.807

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

False positive rate

Tru

e p

osi

tive

rat

e

MV-CADAUC=0.873

SV-CADAUC=0.858

a) MLO view b) CC view

Fig. 7. ROC analysis per a) MLO and b) CC view

p-values obtained are: 0.000 for MLO view and 0.035 for CC view, indicating asignificant improvement in the classification accuracy at a view level.

Furthermore, we observe that for both MV-CAD and SV-CAD the perfor-mance for CC view in terms of AUC is better than that for the MLO view. Thiscan be explained by the fact that the classification of CC views is generally easierthan that of MLO views due to the breast positioning. At the same time ourmulti-view system improves considerably upon the single-view CAD system inbetter distinguishing cancerous from normal MLO views whereas for CC viewsthis improvement is less.

While the view results are very promising, from a radiologists’ point of view itis more important to look at the breast and case level performance. InTables 1 and 2 the AUCs from the multi-view and single-view systems arepresented as well as the corresponding p-values and 95% confidence intervalsobtained from the statistical tests for the differences between the AUC measuresfor MV-CAD-Causal and the benchmark methods.

Table 1. AUC and std.errors obtained from the single- and multi-view systems at abreast level with the one-sided p-values and 95% confidence intervals for the differences

MethodBREAST

AUC± std.err p-value Confidence interval

MV-CAD-Causal 0.876±0.011 – –MV-CAD-LR 0.869±0.011 2.9% (0.000, 0.009)MV-CAD-NB 0.867±0.012 1.3% (0.004, 0.011)SV-CAD 0.849±0.012 0.1% (0.007, 0.032)

An Advanced Probabilistic Framework 389

Table 2. AUC and std.errors obtained from the single- and multi-view systems at acase level with the one-sided p-values and 95% confidence intervals for the differences

MethodCASE

AUC± std.err p-value Confidence interval

max

MV-CAD-Causal 0.833±0.013 – –MV-CAD-LR 0.835±0.013 41% (−0.007, 0.005)MV-CAD-NB 0.826±0.014 21% (−0.004, 0.010)SV-CAD 0.797±0.014 1.1% (0.003, 0.040)

train

MV-CAD-Causal 0.847±0.013 – –MV-CAD-LR 0.835±0.014 1.3% (0.002, 0.026)MV-CAD-NB 0.833±0.014 0.1% (0.005, 0.022)SV-CAD 0.797±0.014 0.1% (0.009, 0.043)

The results show that MV-CAD-Causal significantly outperforms SV-CAD atboth breast and case level. With respect to the two multi-view benchmarkmethods, our causal model is superior at a breast level and a case level us-ing training as a combining scheme (CaseNet). As expected the case resultsfor MV-CAD-Causal using the maximum (MAX) out of two breast probabilitieshas less satisfactory performance than using training in the causal independencemodel. This shows that combining all available information in a proper way isbeneficial for accurate case classification.

Although MV-CAD-NB and MV-CAD-LR are overall less accurate than the lasttwo stages of the multi-view causal model, it is clear that all three MV-CADmodels achieve superior performance with respect to SV-CAD. This result followsnaturally from the explicit representation of multi-view dependences at a regionand view level of the multi-view model and it clearly indicates the importanceof incorporating multi-view information in the development of CAD systems.

To get more insight into the areas of improvement at a case level we plottedROC curves for all systems. For all the plots we observed the same tendencyof an increased true positive rate at low false positive rates (< 0.5)–a resultultimately desired at the screening practice where the number of normal casesis considerably larger than the cancerous ones; Figure 8 presents the ROC plotfor the best performing models.

Evaluation of the multi-view model fitness. To have a closer look at thequality of classification for the three multi-view models, we compute the av-erage log-likelihood (ALL) of the probabilities for different units–link, region,MLO/CC view, breast and case–by:

ALL(Cl) =1N

N∑

i=1

− lnP (Cli|Ei), (3)

390 M. Velikova et al.

where N is the number of records for a unit, Cli and Ei is the class valueand the feature vector of the i-th observation, respectively. Thus, the value ofALL(Cl) indicates how close the posterior probability distribution is to reality:when P (Cli|Ei) = 1 then lnP (Cli|Ei) = 0 (no extra information); otherwise− lnP (Cli|Ei) > 0.

The log-likelihood results are given in Table 3. The lowest ALL(C) is achievedfor the links meaning that the estimated probabilities best fit the link probabilitydistribution. A possible explanation is that in our Bayesian network frameworkthe links are directly dependent on the original region features and thus they arebetter fitted. On the other hand, the rest of the units are based on combiningestimated probabilities from previous levels where noise could play a role. TheMV-CAD-Causal and MV-CAD-LR obtain comparable results for all units whereas

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8

False positive rate

Tru

e po

siti

ve r

ate

MV-CAD-CausalAUC=0.847

MV-CAD-LRAUC=0.835

MV-CAD-NBAUC=0.833

SV-CADAUC=0.797

Fig. 8. ROC analysis per case

Table 3. Average log-likelihood of the class based on the multi-view scheme for differ-ent units

UnitMethod

MV-CAD-Causal MV-CAD-LR MV-CAD-NB

Link 0.19Region 0.38MLO/CC 0.34/0.31Breast 0.300 0.235 0.553Case-max 0.482 0.483 0.790Case-train 0.476 0.477 0.543

An Advanced Probabilistic Framework 391

MV-CAD-NB leads to considerably larger average log-likelihoods, especially at acase level using the MAX combining scheme. The latter can be explained by thenature of the naive Bayes classifier, which assumes independence of the views andleads to less accurate breast probabilities as indicated in Tables 1 and 2. Withthe TRAIN combining scheme, the case probabilities computed by MV-CAD-NBare better fitted to the truths.

7 Discussion and Conclusions

In this work, we proposed a unified Bayesian network framework for automatedmulti-view analysis of mammograms, which is an extension of the work presentedin [31,32]. The new framework is based on a multi-stage scheme, which modelsthe way radiologists interpret mammograms. More specifically, using causal in-dependence models, we combined all available image information for the patientobtained from a single-view CAD system at four different levels of interpretation:region, view, breast and case. In comparison to our previous work, where thebreast and case classification were done using logistic regression, the new schemeallowed a better representation of the multi-view dependencies and handling un-certainty in the estimation of the breast/case probabilities for suspiciousness.

Furthermore, based on experimental results with actual screening data, wedemonstrated that the proposed Bayesian network framework improved the de-tection rate of breast cancer at lower false positive rates in comparison to asingle-view CAD system. This improvement is achieved at view, breast and caselevel and it is due to a number of factors. First, we built upon a single-view CADsystem that already demonstrates relatively good detection performance. By ap-plying a probabilistic causal model we linked the original features extracted bythe single-view CAD system for all the regions in MLO and CC views and wecombined all the links for one breast to obtain a single measure for suspicious-ness of a view, breast and case. Another factor for the improved classification isthat our approach incorporated domain knowledge. Following radiologists’ prac-tice, we applied a straightforward scheme to account for multi-view dependenciessuch that (i) correlations between the regions in MLO and CC views are consid-ered per breast as whole and (ii) the classification of breast/case as “suspicious”is employed through the logical OR. Thus the proposed methodology can beapplied to any domain (e.g., fault detection in manufacturing processes) wheresimilar definitions and objectives hold.

We also note important advantages of our probabilistic system in compari-son to previous approaches for automated mammographic analysis. On the onehand, most Bayesian network-based systems for breast cancer detection requirethat radiologists provide categorical description of the findings a priori to clas-sification. In contrast, our causal approach is completely automated and it isentirely based on the image processing results from the single-view CAD sys-tem. This implies that in screening setting, where time is a crucial factor, theproposed multi-view approach has more realistic application and it can be ofgreat help to the human expert. On the other hand, fully automated approaches

392 M. Velikova et al.

based on LDA and neural networks fail to account for uncertainty and an ex-plicit representation of view dependences, as done in our probabilistic approach.In addition, the former approaches aimed mostly in improving the detection ofthe breast cancer at a regional level, whereas we focused on modelling the wholeimage and breast context information for better distinction between cancerousand normal patients–the ultimate goal of a breast cancer screening program.

Although we demonstrated that the proposed framework has the potentialto assist screening radiologists in the evaluation of breast cancer cases, a num-ber of issues can be addressed for extending the proposed multi-view approach.Currently, we consider only cases where both MLO and CC views are present.However, in practice, for half of the patients taking subsequent mammographicexams, only MLO views are available. The question then is to modify the systemsuch that cases with missing CC views can also be analysed. Another interestingdirection for extension is with respect to the region linking. Instead of linkingall regions from MLO view to all regions from CC view of the same breast, ascurrently done, it would be natural to consider only the potential true links andavoid a large number of false positive links. To do so, an important piece of do-main knowledge used by radiologists in mammogram analysis is that a cancerouslesion projected on MLO and CC views is located on approximately the samedistance from the nipple. Thus, links to regions that clearly violate this rule canbe ignored and the system can become more precise.

In conclusion, we emphasise that in the coming years, the development andpractical application of computer-aided detection systems is likely to expandconsiderably. This trend is expected to be most observed in the screening pro-grams where the mammograms are to be digitised, the breast cancer incidencerates are very low, the workload is tremendous and the detection of breast canceris difficult due to the small and subtle changes observed. By providing a “sec-ond opinion” in the image analysis, CAD can help radiologists increase cancerdetection accuracy at an earlier stage. This can create both health and socialbenefits by saving women’s lives, increasing families’ well-being, reducing unnec-essary check-up costs and compensating the shortage of radiologists. To createreliable and widely applied CAD, however, it is crucial to integrate the knowl-edge and the working principles of radiologists in the CAD development. Themulti-view CAD system described in this chapter is a promising step forward inthis direction.

Acknowledgement

This research is funded by the Netherlands Organisation for Scientific Researchunder BRICKS/FOCUS grant number 642.066.605.

Appendix

Figure 9 depicts an example of a causal-independence model for breast cancerprediction. We have two binary cause variables mass (MASS) and microcalci-fications (MCAL), which are the two main mammographic indicators for the

An Advanced Probabilistic Framework 393

10false true

10truefalse

true

false

I1 BC

10true

01false

truefalse I2

I1

BC

I2

0.850.15true0 1 false true falseMASS

Noisy CPT I1

I1

0.60.4true0 1 false truefalseMCAL

I2

Noisy CPT I2

MASS MCAL

Fig. 9. Example of a noisy-OR model for breast cancer prediction

patient having breast cancer (BC), the so-called effect variable. It is also knownthat masses are more frequent occurring sign, which is reflected in the noisyprobability distributions P (I1|MASS) and P (I2|MCAL). Finally, the domainknowledge suggests that the combined occurrence of both signs increases theprobability for having breast cancer, which is modelled by the logical OR as aninteraction function f(I1, I2) for the effect BC.

Then the probability of having breast cancer given the states of MASS andMCAL is computed as follows:

P (BC = true|MASS, MCAL) =

=∑

f(I1,I2)=true

P (BC = true|I1, I2)P (I1|MASS)P (I2|MCAL)

= P (I1 = true|MASS)P (I2 = true|MCAL) +P (I1 = true|MASS)P (I2 = false|MCAL) +P (I1 = false|MASS)P (I2 = true|MCAL).

For example, given the evidence of MASS = true and MCAL = false thenthe probability for breast cancer is:

P (BC = true|MASS, MCAL) = 0.85 · 0 + 0.85 · 1 + 0.15 · 0 = 0.85.

Now suppose that MASS = true and MCAL = true. Then we obtain

P (BC = true|MASS, MCAL) = 0.85 · 0.6 + 0.85 · 0.4 + 0.15 · 0.6 = 0.94,

394 M. Velikova et al.

indicating the combined influence of both causes on the probability of havingbreast cancer.

References

1. Otten, J.D., Broeders, M.J., Fracheboud, J., Otto, S.J., de Koning, H.J., Verbeek,A.L.: Impressive time-related influence of the Dutch screening programme on breastcancer incidence and mortality, 1975-2006. International Journal of Cancer 123(8),1929–1934 (2008)

2. Engeland, S.V.: Detection of mass lesions in mammograms by using multiple views.PhD Thesis (2006)

3. BI-RADS: Breast Imaging Reporting and Data System. American College of Ra-diology, Reston (1993)

4. Morton, M.J., Whaley, D.H., Brandt, K.R., Amrami, K.K.: Screening mammo-grams: interpretation with computer-aided detection-prospective evaluation. Radi-ology 239, 375–383 (2006)

5. Sardo, M., Vaidya, S., Jain, L.C.: Advanced Computational Intelligence Paradigmsin Healthcare, vol. 3. Springer, Heidelberg (2008)

6. Buchanan, B.G., Shortliffe, E.H.: Rule-Based Expert Systems: The MYCIN Exper-iments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading(1984)

7. Liebowitz, J.: The Handbook of Applied Expert Systems. CRC Press, Boca Raton(1997)

8. Jain, A., Jain, A., Jain, S., Jain, L.: Artificial Intelligence Techniques in BreastCancer Diagnosis and Prognosis. World Scientific, Singapore (2000)

9. Burnside, E.S., Rubin, D.L., Fine, J.P., Shachter, R.D., Sisney, G.A., Leung, W.K.:Bayesian network to predict breast cancer risk of mammographic microcalcifica-tions and reduce number of benign biopsy results. Journal of Radiology (2006)

10. Kahn, C.E., Roberts, L.M., Shaffer, K.A., Haddawy, P.: Construction of a bayesiannetwork for mammographic diagnosis of breast cancer. Computers and Biology andMedicine 27(1), 19–30 (1997)

11. Karabatak, M., Ince, M.C.: An expert system for detection of breast cancer basedon association rules and neural network. Expert Systems with Applications: AnInternational Journal 36(2), 3465–3469 (2009)

12. Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Machine learning techniquesto diagnose breast cancer from fine-needle aspirates. Cancer Letters 77, 163–171(1994)

13. Ye, J., Zheng, S., Yang, C.: SVM-based microcalcification detection in digital mam-mograms. In: Proceedings of the 2008 International Conference on Computer Sci-ence and Software Engineering, vol. 6, pp. 89–92 (2008)

14. Roca-Pardinas, J., Cadarso-Suarez, C., Tahoces, P.G., Lado, M.J.: Assessing con-tinuous bivariate effects among different groups through nonparametric regressionmodels: An application to breast cancer detection. Computational Statistics &Data Analysis 52(4), 1958–1970 (2008)

15. Gupta, S., Chyn, P., Markey, M.: Breast cancer CADx based on bi-rads descriptorsfrom two mammographic views. Medical Physics 33(6), 1810–1817 (2006)

16. Good, W., Zheng, B., Chang, Y., Wang, X., Maitz, G., Gur, D.: Multi-image cademploying features derived from ipsilateral mammographic views. In: Proceedingsof SPIE, Medical Imaging, vol. 3661, pp. 474–485 (1999)

An Advanced Probabilistic Framework 395

17. Engeland, S.V., Timp, S., Karssemeijer, N.: Finding corresponding regions of in-terest in mediolateral oblique and craniocaudal mammographic views. MedicalPhysics 33(9), 3203–3212 (2006)

18. Engeland, S.V., Karssemeijer, N.: Combining two mammographic projections in acomputer aided mass detection method. Medical Physics 34(3), 898–905 (2007)

19. Paquerault, S., Petrick, N., Chan, H., Sahiner, B., Helvie, M.A.: Improvement ofcomputerized mass detection on mammograms: Fusion of two-view information.Medical Physics 29(2), 238–247 (2002)

20. Yuan, Y., Giger, M., Li, H., Luan, L., Sennett, C.: Identifying corresponding lesionsfrom CC and MLO views via correlative feature analysis. In: Krupinski, E.A. (ed.)IWDM 2008. LNCS, vol. 5116, pp. 323–328. Springer, Heidelberg (2008)

21. Gupta, S., Zhang, D., Sampat, M.P., Markey, M.K.: Combining texture featuresfrom the MLO and CC views for mammographic CADx. Progress in biomedicaloptics and imaging 7(3) (2006)

22. Zheng, B., Leader, J.K., Abrams, G.S., Lu, A.H., Wallace, L.P., Maitz, G.S., Gur,D.: Multiview-based computer-aided detection scheme for breast masses. MedicalPhysics 33(9), 3135–3143 (2006)

23. Zheng, B., Tan, J., Ganott, M.A., Chough, D.M., Gur, D.: Matching breast massesdepicted on different views a comparison of three methods. Academic Radiol-ogy 16(11), 1338–1347 (2009)

24. Wei, J., Chan, H., Sahiner, B., Zhou, C., Hadjiiski, L.M., Roubidoux, M.A., Helvie,M.A.: Computer-aided detection of breast masses on mammograms: dual systemapproach with two-view analysis. Medical Physics 36(10), 4451–4460 (2009)

25. Pearl, J.: Probabilistic Reasoning in Inteligent Systems: Networks of Plausible In-ference. Morgan Kaufmann, San Francisco (1988)

26. Jensen, F., Nielsen, T.: Bayesian Networks and Decision Graphs. Springer, Heidel-berg (2007)

27. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifierunder zero-one loss. Machine Learning 29, 103–130 (1997)

28. Heckerman, D., Breese, J.S.: Causal independence for probability assessment andinference using Bayesian networks. IEEE Transactions on Systems, Man and Cy-bernetics, Part A 26(6), 826–831 (1996)

29. Lucas, P.J.F.: Bayesian network modelling through qualitative pattern. ArtificialIntelligence 163, 233–263 (2005)

30. Diez, F.: Parameter adjustment in Bayes networks: The generalized noisy or-gate.In: Proceedings of the Ninth Conference on UAI. Morgan Kaufmann, San Francisco(1993)

31. Velikova, M., Lucas, P., de Carvalho Ferreira, N., Samulski, M., Karssemeijer, N.:A decision support system for breast cancer detection in screening programs. In:Proceedings of the 18th biennial European Conference on Artificial Intelligence(ECAI), vol. 178, pp. 658–662 (2008)

32. Velikova, M., Samulski, M., Lucas, P., Karssemeijer, N.: Improved mammographicCAD performance using multi-view information: A Bayesian network framework.Physics in Medicine and Biology 54(5), 1131–1147 (2009)

33. Murphy, K.: Bayesian Network Toolbox for Matlab, BNT (2007),http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

34. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a ReceiverOperating Characteristic (ROC) curve. Radiology 143, 29–36 (1982)

35. Metz, C., Wang, P., Kronman, H.: A new approach for testing the significance ofdifferences between ROC curves measured from correlated data. In: InformationProcessing in Medical Imaging, Nijhoff (1984)