practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... ·...

14
Practical use of receiver operating characteristic analysis to assess the performances of defect detection algorithms Yann Le Meur Jean-Michel Vignolle Trixell 38430 Moirans, France E-mail: [email protected] Jocelyn Chanussot GIPSA-Lab Department Image et Signaux 38402 Saint Martin d’Heres, France Abstract. Defect detection in images is a current task in quality control and is often integrated in partially or fully automated sys- tems. Assessing the performances of defect detection algorithms is thus of great interest. However, because this is application- and context-dependent, it remains a difficult task. We describe a meth- odology to measure the performances of such algorithms on large images in a semi-automated defect inspection situation. Consider- ing standard problems occurring in real cases, we compare typical performance evaluation methods. This analysis leads to the con- struction of a simple and practical receiver operating characteristic (ROC) based method. This method extends the pixel-level ROC analysis to an object-based approach by dilating the ground truth and the set of detected pixels before calculating the true-positive and false-positive rates. These dilations are computed thanks to the a priori knowledge of a human-defined ground truth and gives to true-positive and false-positive rates more consistent values in the semi-automated inspection context. Moreover, the dilation process is designed to be automatically suited to the object’s shape in order to be applied on all types of defects without any parameter to be tuned. © 2008 SPIE and IS&T. DOI: 10.1117/1.2952590 1 Introduction Quality-control tasks are some of the main application fields of digital image processing, particularly detection theory. The increasing number of new image processing techniques applied to industrial inspection is relevant proof of the interest taken by both the industrial and academic communities in this problem. The leading applications of these techniques are defect detection on textile, wood, or other industrial matters by automated inspection on digital images. 1,2 These images can be acquired by simple optical imaging, x-ray imaging, or nondestructive methods like ultrasound reflection on the surfaces to be inspected. This paper considers the retrieval of defects on digital x-ray detectors. Digital detectors are now used in x-ray radiography to acquire digital images. The advantages of this fully digital system are obvious: Lower exposure is required than with film systems; the im- ages have a better quality; the digital format enables easy storage and transmission; digital processing algorithms can be used in order to enhance diagnostic reliability, etc. Since the production process of such devices is lengthy and requires human intervention, an important issue is to check the quality of the detector’s output images, particu- larly to search for potential defects in these images. The detection and localization of such defects can be achieved by image processing algorithms. Defects from digital x-ray detectors produce spurious features in the output images, with various shapes and properties. Their detection thus remains a difficult problem, and several algorithms must consequently be considered, evaluated, and compared. Different methods to quantify the detection perfor- mances of an algorithm have been described in the litera- ture. In the frame of text detection and recognition, Wolf and Jolion 3 assess the performances by rectangle matching and performance graphs. Liu and Haralick 4 proposed a simple method based on neighborhood inspection to evalu- ate edge detection performances. Nascimento and Marques 5 classify types of detection errors in order to build a metric for the evaluation of object detection algorithms in surveil- lance applications. More general methods use common metrics merged in a basic way 6 or with fuzzy logic. 7 Even though they can be useful for specific applications, the re- liability of these methods vanishes when considering the task of detecting different objects with various shapes. We propose a practical view of how defect detection algorithms can be evaluated and give a response to the assessment of these algorithms based on well-known ROC analysis and object morphology. An overview of the in- spection task is followed by a brief description of ROC methodology, and an original method derived from this methodology is presented and discussed. Finally, we illus- trate how to use such a method to process an automated thresholding of detection images. This paper is an extended Paper 07160SSRR received Aug. 1, 2007; revised manuscript received Feb. 13, 2008; accepted for publication Feb. 13, 2008; published online Jul. 9, 2008. This paper is a revision of a paper presented at the SPIE conference on Quality Control by Artificial Vision, May 2007, Le Creusot, France. The paper presented there appears unrefereed in SPIE proceed- ings vol. 6356. 1017-9909/2008/173/031104/14/$25.00 © 2008 SPIE and IS&T. Journal of Electronic Imaging 17(3), 031104 (Jul–Sep 2008) Journal of Electronic Imaging Jul–Sep 2008/Vol. 17(3) 031104-1

Upload: others

Post on 24-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

Acttcoiips(aaaatsitt

1Qfittoc

dacoso

PFJcFi

1

Journal of Electronic Imaging 17(3), 031104 (Jul–Sep 2008)

J

Practical use of receiver operating characteristic analysisto assess the performances of defect detection algorithms

Yann Le MeurJean-Michel Vignolle

Trixell38430 Moirans, France

E-mail: [email protected]

Jocelyn ChanussotGIPSA-Lab

Department Image et Signaux38402 Saint Martin d’Heres, France

bstract. Defect detection in images is a current task in qualityontrol and is often integrated in partially or fully automated sys-ems. Assessing the performances of defect detection algorithms ishus of great interest. However, because this is application- andontext-dependent, it remains a difficult task. We describe a meth-dology to measure the performances of such algorithms on large

mages in a semi-automated defect inspection situation. Consider-ng standard problems occurring in real cases, we compare typicalerformance evaluation methods. This analysis leads to the con-truction of a simple and practical receiver operating characteristicROC) based method. This method extends the pixel-level ROCnalysis to an object-based approach by dilating the ground truthnd the set of detected pixels before calculating the true-positivend false-positive rates. These dilations are computed thanks to thepriori knowledge of a human-defined ground truth and gives to

rue-positive and false-positive rates more consistent values in theemi-automated inspection context. Moreover, the dilation processs designed to be automatically suited to the object’s shape in ordero be applied on all types of defects without any parameter to beuned. © 2008 SPIE and IS&T. �DOI: 10.1117/1.2952590�

Introductionuality-control tasks are some of the main applicationelds of digital image processing, particularly detection

heory. The increasing number of new image processingechniques applied to industrial inspection is relevant prooff the interest taken by both the industrial and academicommunities in this problem.

The leading applications of these techniques are defectetection on textile, wood, or other industrial matters byutomated inspection on digital images.1,2 These imagesan be acquired by simple optical imaging, x-ray imaging,r nondestructive methods like ultrasound reflection on theurfaces to be inspected. This paper considers the retrievalf defects on digital x-ray detectors. Digital detectors are

aper 07160SSRR received Aug. 1, 2007; revised manuscript receivedeb. 13, 2008; accepted for publication Feb. 13, 2008; published onlineul. 9, 2008. This paper is a revision of a paper presented at the SPIEonference on Quality Control by Artificial Vision, May 2007, Le Creusot,rance. The paper presented there appears �unrefereed� in SPIE proceed-

ngs vol. 6356.

017-9909/2008/17�3�/031104/14/$25.00 © 2008 SPIE and IS&T.

ournal of Electronic Imaging 031104-

now used in x-ray radiography to acquire digital images.The advantages of this fully digital system are obvious:Lower exposure is required than with film systems; the im-ages have a better quality; the digital format enables easystorage and transmission; digital processing algorithms canbe used in order to enhance diagnostic reliability, etc.

Since the production process of such devices is lengthyand requires human intervention, an important issue is tocheck the quality of the detector’s output images, particu-larly to search for potential defects in these images. Thedetection and localization of such defects can be achievedby image processing algorithms. Defects from digital x-raydetectors produce spurious features in the output images,with various shapes and properties. Their detection thusremains a difficult problem, and several algorithms mustconsequently be considered, evaluated, and compared.

Different methods to quantify the detection perfor-mances of an algorithm have been described in the litera-ture. In the frame of text detection and recognition, Wolfand Jolion3 assess the performances by rectangle matchingand performance graphs. Liu and Haralick4 proposed asimple method based on neighborhood inspection to evalu-ate edge detection performances. Nascimento and Marques5

classify types of detection errors in order to build a metricfor the evaluation of object detection algorithms in surveil-lance applications. More general methods use commonmetrics merged in a basic way6 or with fuzzy logic.7 Eventhough they can be useful for specific applications, the re-liability of these methods vanishes when considering thetask of detecting different objects with various shapes.

We propose a practical view of how defect detectionalgorithms can be evaluated and give a response to theassessment of these algorithms based on well-known ROCanalysis and object morphology. An overview of the in-spection task is followed by a brief description of ROCmethodology, and an original method derived from thismethodology is presented and discussed. Finally, we illus-trate how to use such a method to process an automatedthresholding of detection images. This paper is an extended

Jul–Sep 2008/Vol. 17(3)1

Page 2: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

vt�

2DlsaaCntdhTtricplmsd

tt

t

aeo

fdem

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

ersion of our presentation8 in 2007 at the Eighth Interna-ional Conference on Quality Control by Artificial VisionQCAV�.

The Inspection Taskigital x-ray detectors provide large grayscale images,

arger than 3,000�3,000 pixels. The inspection task con-ists of finding defects in these images. The considered im-ges are acquired with a nominal x-ray dose and withoutny subject between the x-ray generator and the detector:onsequently, images are only composed of the acquisitionoise, which we will consider as background, and of poten-ial defects we want to detect. In this case, the goal ofetection algorithms is to localize defect areas so that auman expert could then inspect them more specifically.he quality-control task is not fully automated: The detec-

ion algorithms bring assistance to make faster and moreeliable human decisions. This is a typical situation in anndustrial context: The detection algorithms are designed toatch the attention of a human operator and only focus onotential defect areas in order to make the inspection taskess tedious. The evaluation of the algorithm performancesust take this context into account and provide a measure

uitable for various defects. The main assumptions intro-uced by this specific application are as follows:

• The perfect location of all the defective pixels is notrequired: Actually, the human expert just needs somepixels at each defect location to identify it. On theother hand, the whole set of defects must be identifiedby the detection of at least one pixel lying inside eachtarget. A target or a defect within the ground-truth mapis defined as a connected set of defective pixels.

• The borders of each target are approximately defined:Since the design of a ground truth remains subjective,the areas near the borders of a defect can be seen aseither defective pixels or background pixels. More-over, some defects have naturally fuzzy borders; a pre-cise and certain ground truth can thus not be defined.

These observations imply particular definitions in ordero use ROC methodology, which we will describe in Sec-ion 4.

In our specific application, there are two kinds of defectso detect:

1. punctual defects: isolated pixels or little clusters ofpixels with abnormal statistics �strong luminance�;

2. extended defects: a spatially correlated set of pixelsthat are not statistically atypical when considered in-dividually. They come in the shape of lines, columns,spots, or gathered clusters. They contain several pix-els, which are not necessarily punctual defects.

Figure 1 shows examples of synthetic defects with thessociated defect map designed by a human expert. Thesexamples have been “handmade” to illustrate extreme casesf defects that could occur in any kind of imaging system.

This figure spotlights the various ways to build the de-ect maps, depending on the type of defect. For punctualefects, the defect map is defined precisely, whereas forxtended defects, like the fuzzy spot, the defect map isore subjective. When several clusters are gathered �as on

ournal of Electronic Imaging 031104-

the right side of Fig. 1�a��, the defect map includes theclusters and some nondefective pixels in one single object.It also underlines the scale adaptation required to identifyhigh-level structures of defects. As a matter of fact, theclusters of defective pixels on the right side of Fig. 1�a� arenot identified as several punctual defects, but rather as asingle defective area made of punctual defective pixels.These remarks should be considered when designing amethod to assess the performance of defect detection algo-rithms in such a semi-automated inspection task.

3 The ROC Analysis

3.1 Definitions and ROC CurvesInitially developed for the evaluation of radar detectionssystems in the 1950s, ROC analysis was first described interms of signal detection theory in the mid-1960s.9,10 Overthe years, it has become a standard technique to evaluatedetection performances.11,12 First used to measure diagnos-tic performances of medical imaging systems, especially inradiological imaging,13–15 the ROC methodology has sincebeen extended to various detection systems.

For a single-target �a defective area in our case� prob-lem, the ROC analysis consists of measuring the binaryresponse of the detection system �target present or not� toone stimulus, in our case an image, by calculating the true-positive rate tpr and false-positive rate fpr with

tpr =true positive

total positives, �1�

fpr =false positive

total negatives. �2�

Figure 2 presents the classical representation of a con-fusion matrix.

A couple �fpr ; tpr� corresponds to one point in the ROCplane. ROC curves are computed for varying parameters ofthe detection systems, and tpr and fpr are computed foreach value of the parameter. The ROC analysis is an appro-priate tool to deal with detection performances since ittakes the prevalence of each class into account and pro-vides two complementary and intuitive measures that aremeaningful for the system calibration.

Fig. 1 Different kinds of defects and corresponding defect mapdrawn by a human expert. One can notice the subjectivity of thistask: borders are delimited approximately, and clusters of defectivepixels can be gathered to form a single defective cluster.

Jul–Sep 2008/Vol. 17(3)2

Page 3: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

icmRdtbwacp

qptpa

ttirwmrrp

�si

Ffp

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

In a defect detection context, there can be many defectsn one image. The true positive and false positive can beomputed on this image following the Free-ROCethodology16 �FROC�. Free-ROC is an extension of theOC methodology to target localization, while ROC onlyeals with target detection. In our application, the advancedheory of Free-ROC is not needed, so we will only considerasic tools of the ROC methodology. Considering an imageith several targets—the defects—and a defect detection

lgorithm providing a pixel-by-pixel classification with twolasses �defect or no defect�, there are four cases for eachixel pi of the image:

1. pi is classified as defect and is a defect in the ground-truth image; it is a true positive also called hit, orrecall.

2. pi is classified as background �no defect� and is abackground pixel in the ground-truth image; it is atrue negative.

3. pi is classified as defect and is a background pixel inthe ground-truth image; it is a false positive alsocalled false alarm.

4. pi is classified as background and is a defect pixel inthe ground-truth image; it is a false negative.

The main advantage of ROC analysis is that the twouantities tpr and fpr are normalized to the number ofositive and negative samples, respectively. Then, unlikehe traditional measures like accuracy �the percentage ofixels correctly classified�, tpr and fpr cannot be biased bysmall prevalence of one class compared to the other.To spotlight this statement, let us consider a detection

ask on a 1,000�1,000-pixel image with one single defec-ive pixel. An algorithm that systematically detects nothings actually very accurate: 99.9999% of the pixels are cor-ectly classified. On the other hand, both its fpr and tprill be 0%, thus revealing really bad detection perfor-ances. As a conclusion, the sole accuracy of the algo-

ithms does not provide enough information to ensure aeliable estimation and is biased in this case by the smallrevalence of the defect class.

ROC analysis features in one curve the sensitivityequivalent to the true-positive rate tpr� of the detectionystem versus fpr, which are the two quantities of interestn a quality control context. It indicates how many false

ig. 2 The confusion matrix represents the true positive and thealse positive for a defect detection task. These two quantities arelotted on ROC curves.

ournal of Electronic Imaging 031104-

alarms are generated by the system for a given detectionsensitivity. Moreover, an ROC curve provides the dynamicbehavior of the system with respect to a change in the de-cision threshold. This information can be used to choosebetween two detection systems: The ROC curve of the bet-ter detection system is always higher than the other curve inthe ROC plane.

The ROC curves of four detection algorithms are dis-played in Fig. 3. The perfect detection algorithm is “algo1,”whose ROC curve is a step function �100% of tpr for anyfpr�. In the case where the two classes, defects and back-ground, are equally distributed, an algorithm correspondingto a random decision has the ROC curve “algo4” �the as-cending diagonal of the ROC plane�. Between random andideal decision, “algo2” performs better than “algo3.”

A practical measure of the global performance of analgorithm is given by the area under its ROC curve. Thisarea under curve �AUC� is commonly used to quantify withone single number the overall performance of a detectionalgorithm.13,17,18

4 Comparison of the MasksIn Section 3, the ROC analysis was presented as a usefultool to assess the performances of a detection algorithm.However, a major problem remains: How can tpr and fprbe estimated. In other words, which pixels should be con-sidered as true positives or false positives?

At each decision threshold, tpr and fpr are calculated bycomparing a binary detection mask �with ones for defectsand zeros for the background� with a ground-truth mask. Inthe following, the detected defect mask is called the testmask, Mi,j. It results from a pixel-wise decision producedby the detection algorithm. The manually designed ground-truth mask is called the target mask, Ti,j. In practice, thesimplest way to compare these two masks is to make apixel-level comparison, thus exactly fitting the definitionsof false positive and true positive given in Section 3.

Fig. 3 Examples of ROC curves. In on this example, 4 ROC curveshave been plotted. The first one, from algo1, shows that this detec-tion algorithm provides a perfect result �100% of true positives for alldecision thresholds�. The curve from algo4 has been computed froma detection algorithm that decides randomly between two possibleclasses �defect or no defect�. The 2 other curves are those from twoother detection algorithms. Algo2 performs a better detection sinceits curve always stays above algo3’s curve.

Jul–Sep 2008/Vol. 17(3)3

Page 4: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

nln

ododsc

nwppotgedpawNtthotpI4hfbiatiet

mi

Faspcsfst

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

As discussed in Section 2, the human expert does noteed the detector to provide a pixel-wise precision for theocalization of the defects. Based on this assumption, alter-ative methods have been proposed to calculate tpr and

fpr: Theiler et al.19 suggest transforming the test mask inrder to consider each object of the target mask as 100%etected as soon as at least one pixel is detected in thisbject. In a similar way, Harvey and Theiler20 proposedilating the test mask by a fixed factor, in order to includeome of the “near-hit” pixels of the test mask in the tprount.

The construction of such alternative methods take sig-ificance in our semi-automated inspection context, wheree have to be more specific about the definition of a trueositive and a false negative. First, let’s consider the trueositives: In a multitarget situation �i.e., there is more thanne target in the ground truth�, the tpr should be 100% ifhe detected pixels lead to the visual localization of all tar-ets, even if all the defective pixels are not detected. In thexample in Fig. 4�a�, about 66% of the defective pixels areetected �hatching areas�, but the detection allows the ex-ert to localize all the targets. The global tpr in Fig. 4�b� islso 66%, but for the human expert the detection is clearlyorse since the bottom target has been completely missed.ow, considering false alarms, we should make a distinc-

ion between isolated false alarms, false alarms close to aarget �“near hit”�, and clusters of false alarms. For theuman expert, the number of false alarms is the number ofbservation windows, called AOI �“area of interest”� thathe system will display to be checked. Each window dis-layed with no true defect will be considered a false alarm.n the example in Fig. 4, the two images of Figs. 4�c� and�d� both have 10 false-alarm-pixels. But for Fig. 4�c�, theuman expert will consider the detection to have only onealse alarm: The few detected pixels near the target cannote seen as a false alarm because the associated AOI willnclude a part of the target. The other detected pixels form

cluster that will be embedded in only one AOI. This de-ected AOI consequently raises the only false alarm of themage. In Fig. 4�d�, there are only isolated false alarms: Thexpert has to check 10 AOIs, which do not include a defect;he detection system then raises 10 false alarms.

Taking these observations into account, we propose aethod to compare binary masks that brings up less penal-

zation to detected pixels near the object borders. These

ig. 4 tpr and fpr: In each image, the targets to detect are in graynd the detected pixels are in white. The first two images have theame tpr and the last two have the same number of false-alarmixels. Nevertheless, in our inspection context, the configuration thatorresponds to �a� and �c� is considered a better detection resultince all the targets have been detected, and there is only one truealse alarm. Indeed, the detected pixels around the target in �c�hould be not counted as false alarms, as they are very close to thearget.

ournal of Electronic Imaging 031104-

pixels are considered false alarms by all the previous meth-ods. The main purpose of such a new method is to extendthe pixel-based ROC analysis to an object-based analysis.In the meantime, it should be adapted to the ROC frame-work. The proposed method does not require any extra pa-rameter to be tuned and can be applied to the cases wheremultitargets of different sizes and shapes are to be found ina single image.

In the following parts, three comparison methods formasks are described and discussed, namely, the simplepixel-level comparison, Theiler’s method, and our proposedmethod. The proposed method is focused on the problemsraised by the semi-automated inspection task. Harvey’scomparison method requires a strong a priori knowledge ofthe size of the targets. It is thus not further developed here.

4.1 Pixel-Level Mask Comparisons

The first and most intuitive method is to compute the binarycomparison between target and test masks, without any pre-processing. Considering the binary target mask Ti,j with Pdefective pixels �pixels with value 1� and N backgroundpixels �with value 0�, the pixel-level mask comparison isdescribed by Fig. 5. The “pixel count” box returns the num-ber of pixels with value 1 in the input image, and the “not”box stands for the binary complement operator. tpr and fprare thus computed as follows:

tpr =1

P�i,j

Mi,j · Ti,j , �3�

fpr =1

N�i,j

Mi,j · �1 − Ti,j� . �4�

The pixel-level mask comparison is computed on thethree synthetic defects in Figs. 6�a�–6�c�. The correspond-ing ground truth is displayed in gray on the second line ofthis Figure, with the test mask appearing in white. Missedpixels on these latter images are in gray and detected pixelsare always in white, whether they are false alarms or truepositives. The three defects chosen are

• punctual defects: six isolated defective pixels,• fuzzy spot defect: a bright spot with fuzzy borders,• cluster defects: a cluster of defective pixels forming an

area identified by the human expert as one single de-fective object.

For the purpose of clarify, the ground-truth pixels for thepunctual defects are pointed out in Fig. 6�d� by arrows.

Table 1 presents the computed tpr and fpr for the threedefects with this pixel-level mask comparison.

For the isolated pixels, the simple pixel-level mask com-parison provides satisfactory results. For these kinds of de-fects, the exact location is required and false alarms areraised even if the detection is close to the defect. In prac-tice, the system must be very precise for this kind of defectin order to catch the human expert’s attention on the exactpixel because of the very small size of the defects. In thiscase the pixel-level comparison performs well, giving a tpr

Jul–Sep 2008/Vol. 17(3)4

Page 5: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

onp

oattal

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

f one out of two and an fpr that correctly represents theumber of times the expert will focus on a nondefectiveixel.

The situation of the fuzzy spot defect is more ambigu-us. In the present example, some pixels inside the targetre actually detected, but not all of them are. In the mean-ime, the ground truth is set as a circular area that includeshe fuzzy spot. In this case, the algorithms’s detection islmost perfect: The number of good detections and theirocation at the center of the defect are sufficient informa-

Fig. 5 The pixel-level mask comparison computand test masks. “Not” stands for binary complemthe input image.

Fig. 6 Three kinds of defects and correspondinthe target mask, drawn by a human expert. Pixdetection algorithm.

ournal of Electronic Imaging 031104-

tion for the human expert to properly identify the defect.But due to the ground-truth definition, the fpr and tpr com-puted at a pixel level are far from the fairly good expectedvalues. Moreover, one can consider the detected pixel at thebottom left of the defect as a near hit, which then shouldnot be treated as a false alarm. As a matter of fact, theassessment of detection performances faces the high sub-jectivity linked to the design of the ground truth, especiallyin this case where the defect’s borders are fuzzy. This sub-jectivity is not integrated in the pixel-level comparison.

tpr and fpr by direct comparison between targetnd “count” returns the number of white pixels in

t and test masks. Pixels in gray are those fromwhite are the pixels detected as defective by a

es theent, a

g targeels in

Jul–Sep 2008/Vol. 17(3)5

Page 6: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

Acdmpbt

lcml

4Aptphdclp

t

T

D

P

S

C

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

Similar remarks hold for the example of cluster defect.gain, some pixels �left and right sides of the cluster� are

ounted as false alarms when they are too close to the bor-ers to be actually objectively considered as such. Second,any pixels are detected inside the cluster �mainly the

unctual defects inside this cluster�. Nevertheless, the tprarely reaches 2%, whereas a human expert would identifyhe whole defect area, thanks to these detected pixels.

These examples underline the unsuitability of the pixel-evel comparison to assess detection performances in aomplex context. The main reason is the following: Theethod does not take into account the assumptions under-

ined in Section 2.

.2 Theiler’s Mask Comparisons the simple pixel-level comparison leads to an inappro-riate assessment of detection performances, there is a needo derive a technique that somehow mimics the human ex-ert. In this way, Theiler proposed a metric to perform aigher-level interpretation of the test mask. This techniqueepends on a “filling-in” process: All of a target’s pixels areonsidered detected if the detection algorithm detects ateast one of them �see Fig. 7�. tpr and fpr are thus com-uted as follows:

pr =1

P�i,j

Fill�Mi,j� · Ti,j , �5�

able 1 tpr and fpr computed with the pixel-level mask comparison.

efect fpr �%� tpr �%�

unctual 0.12 50.0

pot 0.03 3.8

luster 0.12 1.5

Fig. 7 Theiler’s mask comparison diagram. Imethod. Then, if at least one pixel is detected intrue positives.

ournal of Electronic Imaging 031104-

fpr =1

N�i,j

Mi,j · �1 − Ti,j� , �6�

where Fill is the filling-in operator.Following this approach, tpr and fpr are computed on

the defects of Fig. 6. Corresponding results are displayed inTable 2.

For punctual defects, Theiler’s method provides satisfac-tory results, similar to those obtained with the pixel-levelmask comparison.

For the other two examples, the fpr according to Theil-er’s comparison is unchanged, but the tpr is now 100%. Asa matter of fact, in each case, at least one pixel is detectedinside the target. This strategy is well suited in some cases:For the fuzzy spot defect, the detected pixels are centeredon the defect and are distributed over an area that is not toodifferent from the true defect. Then, the expert will con-sider all these detected pixels as one detection, allowinghim to find the defect. The tpr of 100% is a correct measureof the detection performances.

On the other hand, the limits of this filling-in strategyarise when considering the cluster defect. In this case,mainly the bottom part of the defect is detected while thetop part has no detection. The expert can miss one part ofthe defect due to a lack of detected pixels on the whole areaof the ground truth. This fact is not expressed, since Theil-

“fill-in” operators to the previous pixel-levelet, all the pixels of this target will be counted as

Table 2 tpr and fpr for Theiler’s mask comparison.

Defect fpr �%� tpr �%�

Punctual 0.12 50

Spot 0.03 100

Cluster 0.12 100

t addsa targ

Jul–Sep 2008/Vol. 17(3)6

Page 7: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

ect

lv

liscaitbf

4Iamoi

amatpthwbcpccqdct

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

r’s method attributes a tpr of 100% to the detection onluster defect. This problem may occur especially with ex-ended defects and in a multitarget context.

The fpr computation follows the same rule as the pixel-evel comparison, with the drawbacks described in the pre-ious section.

To conclude, Theiler’s comparison extends the pixel-evel approach to an object-based approach by simply add-ng a filling-in process to the test and target mask compari-on. On a single-target context, and with small targetsompared to the image size, this comparison performs ancceptable assessment in terms of inspection. Nevertheless,t does not take the distribution of the detected pixels insidehe target into account, which is a criterion to look afterefore declaring the target as 100% detected. Moreover, thealse-alarm computation remains inappropriate.

.3 The Proposed Soft Mask Comparisonn this section, we present an original method that providespractical response to the performance assessment require-ents and overcomes the limitations of the standard meth-

ds previously described. The principle of our method isllustrated by Fig. 8.

The method requires two new operators: target dilationnd test dilation. The target dilation is applied on the targetask and aims at extending the areas of targets to take into

ccount the subjectivity of the “ground-truthing” task andhe fuzzy nature of target borders. The test dilation is ap-lied on the test mask, which contains pixels detected byhe detection algorithm. This process mimics the way auman expert would analyze the detection result—Sheould focus her attention not only on the detected pixels,ut also on the pixels around them. These dilation pro-esses require the computation of a distance map, as ex-lained in the following. This distance map needs to beomputed only one time for a given target mask. Then theomputation of several ROC points �varying test mask� re-uires only one distance map computation and one targetilation. The proposed method only adds a few operationsompared to the pixel-level methods: the dilations on theest mask.

Fig. 8 Original soft mask comparison. Target anin accordance with the size of each target. Thishould be made by a human expert in an inspe

ournal of Electronic Imaging 031104-

4.3.1 Distance map computationThe distance map is computed on the ground truth �targetmask�. It will provide a map with a dilation factor, assignedto each target. The computation of the distance map is donein two steps. First, a Euclidean distance transform21 on thetarget mask is computed, and then the directed Hausdorffdistance22 on each target is determined. These two steps areshown in Fig. 9.

1. Euclidean distance transform. The Euclidean distancebetween pixels �i , j� and �m ,n� is defined as

de��i, j�,�m,n�� = ��i − m�2 + �j − n�2.

The Euclidean distance transform on the target maskcalculates the Euclidean distance between the pixelsof the target and the background. The pixel values ofthe resulting image D�i , j� are computed as follows:

D�i, j� = minm,n

�de�T�i, j�,T�m,n���T�m,n� = 0 .

This value represents the minimum distance of eachpixel of the target mask to the background. Figure 9illustrates an example of a Euclidean transform on animage with two targets. The Euclidean distance of a

masks are dilated by a dilation factor computedss aims to mimic the fpr and tpr count, whichontext.

Fig. 9 Distance map computation. The Euclidean transform of thetest masks is computed first. It represents the Euclidean distance ofeach target pixel �in white� to the background �in black�. For eachtarget, the maximum of the Euclidean distance transform is taken;it’s the directed Haussdorf distance of the target, which will bestored in the distance map.

d tests procection c

Jul–Sep 2008/Vol. 17(3)7

Page 8: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

4Tmpcdaaiddjuf

4Tpmidttdmmdt

Fdmpct

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

target to the background depends on the shape of thetarget.

2. Directed Haussdorf distance. The final step is to ex-tend the values computed at a pixel level to the wholetarget. For each target, we take the maximum value inthe distance transform image of the pixels belongingto the target. Let L�i , j� be a labeled version of thetarget mask. L�i , j�=k if the pixel �i , j� belongs totarget k, k=1, . . . ,n targets, and L�i , j�=0 if �i , j� be-longs to the background. The value stored in the finaldistance map K�i , j� is

K�i, j� = maxm,n

�D�m,n��L�m,n� = L�i, j� .

.3.2 Target dilationarget dilation aims at dilating each target of the targetask T�i , j� according to a dilation factor computed by the

rocess described earlier �see Fig. 10�. For each target, aircular structuring element of radius Ki,j is used for theilation. This target dilation permits us to create an arearound each target where detected pixels won’t be counteds false alarms. The target-adapted aspect of these dilations one of the main contributions of the proposed method: Itoes not require any human intervention to set an arbitraryilation factor, as proposed in the last articles on this sub-ect �see, e.g., Ref. 20�. Moreover, this dilation allows us tose the method on multitarget situations since the dilationactors are adapted to each target of T�i , j�.

.3.3 Test dilationhe test dilation is done on the test mask M�i , j�. Eachixel of this mask is dilated by a circular structuring ele-ent whose radius is determined by the value K�i , j� stored

n the distance map, as illustrated by Fig. 11. This testilation expresses the fact that exhaustive detection of thearget is not required for our semi-automated inspectionask, as discussed in the first point of Section 2. Then oneetected pixel has more impact on the tpr in our proposedethod or in Theiler’s method than in the pixel-levelethod. But, contrary to Theiler’s method, the test dilation

oes not permit a tpr of 100% to be reached in all situa-ions. It is suited to the observation window, which we

ig. 10 Target dilation process. �a� The two targets �in white� areilated with respect to the dilation factors stored in the distanceap. The structuring elements used for these dilations are dis-layed in gray. �b� The result of this dilation. The target dilation pro-ess allows a dilation of the target mask suited to the shape of eacharget, thanks to the distance map previously computed.

ournal of Electronic Imaging 031104-

consider to have a size similar to the size of the object. Thisis why a dilation factor computed with a directed Haussdorfdistance is used for the test dilation. This is a reasonableargument since, for the inspection of one detected pixel, thehuman expert will adapt his observation scale to the contextsurrounding the pixel, i.e., to the target.

The tpr and fpr are thus computed as follows:

tpr =1

P�i,j

TestDil�Mi,j,Ki,j� · Ti,j , �7�

fpr =1

N�i,j

Mi,j · �1 − TargDil�Ti,j�� , �8�

where TestDil stands for the test dilation process andTargDil for the target dilation process.

The results of the proposed soft comparison method onthe three test defects are shown in Fig. 12. The first line ofthis figure represents the target dilation required for thecomputation of fpr: The initial target is in light gray, andthe dilated target is in dark gray, while the detected pixelsare in white. The target dilation is designed to expand thetarget’s borders, preserving the global shape of the target.Here, the dilation technique leads to approximately doublethe distance between the Hausdorff distance of each target.Two questions may then be raised:

1. Why does the dilation factor depend on the target’ssize?

2. Why should the dilation factor be set as described�i.e., one times the maximum distance to the bordersof each target�?

The following answers can be given:

1. A large defect is observed at a larger scale. Conse-quently, the ground truth is less precise than for avery small defect. Thus, a larger error for the near-hitpixels �not counted as false alarms� should beallowed.

2. We can make the reasonable assumption that a defectis observed with a window with a dimension approxi-mately two times larger than it. The target dilationprocessed mimics the observation windows by allow-

Fig. 11 Test dilation process. Each pixel in the test mask �whitepixels in �a�� is dilated by a circular structuring element �gray dashedlines� with respect to the dilation factor stored in the distance map.�b� The target mask �ground truth� is superposed by a continuousgray line. Pixels that did not hit a target are not dilated.

Jul–Sep 2008/Vol. 17(3)8

Page 9: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

psttllcpetc�orswct

4TTt1

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

ing detected pixels in an area of such size around thedefect. This dilation is an auto-adapted scheme thatdoes not require any parameter.

The result of this target dilation is that some detectedixels near the cluster or spot defects are not longer con-idered false alarms since they now lie within the dilatedarget mask. The second line of Fig. 12 shows the result ofhe test dilation on detected pixels �in white�, with the di-ated target mask superposed in gray. Each detected pixelying in an initial target �in light gray� is dilated before theomputation of tpr. This dilation mimics the human ex-ert’s behavior: The focus is not only on the detected pix-ls, but also on the surrounding pixels. Then it is consistento consider the area around these detected pixels for theomputation of tpr, as explained in the test dilation processSection 4.3.3�. A consequence of this test dilation is thatnly some central points have to be detected in order toeach 100% tpr for a target. Typically, the detection of thekeleton23,24 of the target is sufficient to reach such tpr,hich is in accordance with the visual inspection task. The

entral points of a target are sufficient to identify the fullarget.

.3.4 ROC point computation by the soft methodhe corresponding values of fpr and tpr are shown inable 3. For punctual defects, there is no change compared

o the previous methods. For spot defects, the tpr reaches00%, which is in great accordance with a human interpre-

Fig. 12 Dilated target mask superposed with tewith dilated test mask �second line�. The dilationcase of pixels detected inside a compact targettheir localization, the test mask dilation has no

ournal of Electronic Imaging 031104-

tation. Since the defect is small, the pixels flagged by thedetection algorithms will lead to the identification of thedefect: The human expert will focus the observation win-dow on these pixels and will see the whole defect. The fprhas slightly dropped due to the pixels at the bottom of thedefect, which are excluded from the fpr; indeed, they are“near hit.” For the cluster defect, the fpr has dropped forthe same reason, while the tpr now reaches 80%. As amatter of fact, the top part of the defect is not considereddetected by our method. This is a relevant interpretationsince the cluster defect is extended and is made of twodefective areas. Only the bottom one is actually detected bythe algorithm. In this situation, Theiler’s method gives a tprof 100%. It does not take into account the missed top partof the defect.

In the presented cases, the proposed soft mask compari-son gives tpr and fpr results that are consistent with the

�first line� and dilated target mask superposedtest mask leads to an increase in the tpr in theolated pixels, as we require a high precision inn the tpr.

Table 3 tpr and fpr for soft mask comparison.

Defect fpr �%� tpr �%�

Punctual 0.12 50

Spot 0.03 100

Cluster 0.07 80

st maskof the

. For iseffect o

Jul–Sep 2008/Vol. 17(3)9

Page 10: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

emtse

5

5

Issatgudad

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

xpert requests for detection performance assessment. Thisethod performs a morphological dilation of the target and

est masks and is auto-adapted to the multitarget problemince the dilation factors are computed only one time forach target without any parameter.

Application of Proposed Method

.1 Performance Assessment of DetectionAlgorithms by ROC Curves

n this section, we explain in detail two possible uses of ouroft ROC method in the field of detection performance as-essment. First, our method is used for the performancessessment of two algorithms on one kind of defect �Sec-ion 5.1.1�, and for comparing detection results of one al-orithm on two types of defect �Section 5.1.2�. The secondse of our soft method is focused on the calibration of aetection algorithm. We show how our method can be useds a framework to properly calibrate algorithms in accor-ance with the defect inspection application �Section 5.2�.

Fig. 13 Comparison of the detection performaoriginal image of the defect, the second columimage�, and the third column is the thresholdingold, with the corresponding AOI �gray squarecolumn shows the ground truth with the AOI to

Fig. 14 ROC curves computed with pixel-levealgorithm A �black curve� and algorithm B �dashdetection performance for algorithm A, whereamance for algorithm B. Theiler’s mask comparsince they are said to be perfect with this meth

ournal of Electronic Imaging 031104-1

5.1.1 Assessment of the performances of twoalgorithms on one type of defect

ROC analysis is a simple tool to compare the performancesof different detection algorithms by comparing their respec-tive AUC. In this example, we want to compare two detec-tion algorithms, namely “algorithm A” and “algorithm B,”with ROC analysis. Algorithm outputs are displayed in thesecond column of Fig. 13. Pixels detected as defective takehigh gray-level values �bright pixels�. The correspondingROC curves are plotted in Fig. 14, with the correspondingAUC reported in Table 4. A thresholding of these outputs isshown in the third column of Fig. 13.

The ROC curves and AUC computed with the pixel-level method clearly show that algorithm A provides betterdetection performance than algorithm B. On the contrary,our soft method leads to the opposite conclusion. Theiler’smethod gives no information; due to its hit-or-miss strategy,it is extremely sensitive to the tpr. In our application, thismethod is not appropriate. To fix the ambiguity raised by

two algorithms. The first column displays theoutput of the detection algorithms �grayscale

previous image at a particular decision thresh-s� that the human expert will check. The lastcked. Algorithm A raises one false alarm.

iler’s and soft mask comparison methods forcurve�. The pixel-level curves suggest a better

oft method suggests a better detection perfor-nnot discriminate between the two algorithms

nce ofn is theof the

windowbe che

l, Theed grays our sison caod.

Jul–Sep 2008/Vol. 17(3)0

Page 11: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

ttarc

�aadagcan

Tm

D

A

A

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

he pixel-level and soft ROC analysis, we show in Fig. 13hat algorithm B, as a matter of fact, performs better thanlgorithm A in this situation. The third column features theesult of a threshold applied on the figures from the secondolumn. The threshold was set as follows:

• for algorithm A: the decision threshold chosen is theone that maximizes the tpr and raises only one falsealarm;

• for algorithm B: the decision threshold chosen is theone that maximizes the tpr without raising any falsealarms.

Comparing the test masks obtained with the ground truthfourth column of Fig. 13�, one can notice that algorithm Bllows a better detection of the defect. In the meantime,lgorithm A generates a false alarm, which is close to theefect but sufficiently disconnected to be considered a falselarm. In fact, for our semi-automated inspection task, al-orithm B gives better results than algorithm A. As a con-lusion, the proposed soft ROC analysis ensures an evalu-tion of detection algorithms that better meets the usereeds.

able 4 AUC of algorithms A and B computed with the three ROCeasures: pixel-level, Theiler’s, and our soft method.

efect

AUCpixel-level

ROC

AUCTheiler’s

ROC

AUCsoft

ROC

lgorithm A 0.81 1 0.945000

lgorithm B 0.72 1 0.999947

Fig. 15 Comparison of detection performanceplays the original image of the defects, the se�grayscale image�, and the third column is thdecision threshold, with the corresponding AOIThe last column shows the ground truth with ththose AOI are false alarms.

ournal of Electronic Imaging 031104-1

5.1.2 Assessment of one algorithm on two types ofdefect

As explained in Section 3, ROC analysis can be performedon data with unbalanced class repartition, as the tpr and fprare not sensitive to particular class prevalence. Then, ROCanalysis can be used to compare detection results of onealgorithm when facing different kinds of defect with vari-ous shapes and sizes. Figure 15 shows the results of a de-tection algorithm on the defects called “Lines” and “Spot,”respectively. The corresponding defect image, the detectionresult �grayscale image�, test mask �previous image thresh-olding�, and comparison with the target mask are displayed.

To know which defects the algorithm performs well,ROC curves are plotted for the pixel-level method, Theil-er’s method, and the proposed soft method �see Fig. 16�.The corresponding AUC are reported in Table 5. From theresults obtained by the pixel-level ROC analysis, the con-clusion would be that the Lines defect is detected betterthan the Spot defect �the corresponding AUC is larger�. Noconclusion can be drawn from the results of Theiler’smethod �both AUC are too close to 1�.

In many practical cases, we have observed that Theiler’smethod is not discriminant enough. Very often, detectionperformances are considered perfect, thus preventing anyuseful comparison. Finally, if we look at the results ob-tained by our soft method, a conflicting conclusion can bedrawn: Soft ROC analysis gives a better score to detectionon Spot than on Lines.

As explained in Fig. 15, Spot is actually better detectedthan Lines with this algorithm. In this figure, two testmasks have been extracted �third column�. Considering theSpot defect, some pixels have been detected at the center ofthe defect, without any false alarms. For the human expertwho will check the detected pixels with a visualization win-dow �gray square boxes in Fig. 15�, the Spot defect will befully detected. On the other hand, considering the Lines

Lines and Spot defects. The first column dis-olumn is the output of the detection algorithmholding of the previous image at a particularquare windows� The human expert will check.to be checked. For the Lines defect, some of

on thecond ce thres�gray se AOI

Jul–Sep 2008/Vol. 17(3)1

Page 12: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

dtTaewcmas

5Ottaudgnd

masm

tdttd

Tt

D

L

S

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

efect, Fig. 15 shows that the defect is detected only par-ially. In the meantime, the detection raised false alarms.his pair of test masks demonstrates that, for our semi-utomated inspection task, the detection algorithm consid-red reaches a better performance on Spot than on Lines,hich is the conclusion given by soft ROC analysis. In this

ase, the pixel-level ROC analysis would cause an assess-ent mistake. As a conclusion, the proposed soft ROC

nalysis gives a better performance assessment than thetandard evaluation method.

.2 Automated Calibration of Detection Algorithmsur proposed soft mask comparison method to compute the

pr and fpr has been introduced in the previous section. Inhis section, we will show how to use this method to maken automatic thresholding of images. Detection algorithmssually provide grayscale images where bright pixels areefective pixels and dark ones are background pixels. Toet a test mask from this grayscale detection image, weeed to set a decision threshold in order to binarize theetection image.

When the target mask is known, we can use the ROCethodology to automatically set the decision threshold atvalue leading to a given tpr or fpr. In this situation, the

oft mask comparison allows us to get test masks that areore consistent with the given tpr and fpr.Considering the defects introduced in Fig. 6, we need

he thresholded image of a grayscale image provided by aetection algorithm. We want the thresholding image at apr of 100% �full target image� with the minimum value ofhe fpr. This is a typical image observed to evaluate theetection performances of an algorithm. Figure 17 shows

Fig. 16 ROC curves computed with pixel-level,Lines �black curve� and Spot �dashed gray cdetection for the Lines defect, whereas our softTheiler’s mask comparison cannot distinguish bperfect with this method.

able 5 AUC measured on the Lines and Spot defects and with thehree ROC measures: pixel-level, Theiler’s, and our soft method.

efect AUC pixel-level ROC AUC Theiler’s ROC AUC soft ROC

ines 0.885 1 0.986

pot 0.763 1 1.000

ournal of Electronic Imaging 031104-1

the full target images obtained, for the cluster and Gaussdefects, with two different mask comparison methods intro-duced earlier: the pixel-level method �Section 4.1� and theproposed soft method �Section 4.3�.

For the cluster defect, Fig. 17�b� spotlights the high sen-sitivity of the pixel-level method. To get a tpr=100% withour detection algorithms �images in the left column of Fig.17�, nearly all the pixels of the image should be detected.Consequently, numerous false alarms are raised, and onecould believe that the algorithm performs pretty poorly onthis defect. It is not the case, and the soft mask comparison�Fig. 17�c�� allows us to avoid such a mistake in this case.The thresholded image shows perfect detection in our semi-automated inspection context �the detected pixels are suffi-cient to localize the whole defect� with only a few falsealarms. The same conclusion can be made with the thresh-olded images of Figs. 17�e� and 17�f�. In these two cases,the pixel-level comparison, due to its pixel sensitivity, leadsto overdetection �too many false alarms� while the pro-posed method, using the new definition of tpr, gives rel-evant binarized images.

6 Possible ExtensionsOur method is a first step toward a high-level mask com-parison that will be fully adapted to the postprocessing in-spection made by the human expert. Some immediate ex-tensions of this method may be developed. First, for the fprcomputation, we should take into account the size of theAOIs that will be presented to the expert for a visual in-spection. Our method makes a pixel-by-pixel count of falsealarms, which corresponds to a pixel-by-pixel inspection ofthese alarms, i.e., a size of 1 pixel for the AOI. We shouldrather consider the real size of the inspection system togather clusters of false alarms in one single false alarm.This could be done by dividing the image into square win-dows of the same size as the AOIs and by counting thenumber of windows where false alarms actually occur. Thenumber of false alarms would then measure the number oftimes an AOI without any real defect nevertheless has to bechecked following the same idea, we can also perform adilation of the false alarms by the size of AOIs in order tomerge clusters of false alarms into one single false alarm.

r’s, and soft mask comparison methods for theefects. The pixel-level curves suggest better

d suggests better detection for the Spot defect.n the two detections since they are considered

Theileurve� dmethoetwee

Jul–Sep 2008/Vol. 17(3)2

Page 13: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

Ii

�hta�raarpcfwMfit

mttr

7Tt

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

n this scheme, we have to manage the problem of normal-zation of the number of false alarms before computing thefpr.

Second, the tpr has been normalized by the number Psee Eq. �7�� of defective pixels in the image. This choiceas been made in order to give a quick interpretation of thepr, but it can be too restrictive in certain cases. Considern image made of two defects—one large and one punctualone single defective pixel�. A detection algorithm that cor-ectly detects the first defect but misses the second one willchieve a fairly high tpr, whereas one target out of two hasctually been missed. To avoid such situations, we shouldather do an object-based normalization: The tpr is com-uted for each target and the overall tpr for the image isomputed by averaging all these rates. Then targets of dif-erent size with the same share of well-detected pixelsould have the same impact on the overall tpr value.oreover, if the AOI size is known, we should rather de-

ne a constant dilation factor for the test dilation that fitshis size.

To conclude, several improvements to the proposedethod can be made by using a priori knowledge poten-

ially available for the different stages of the detection sys-em. However, the global performance assessment schemeemains unchanged since we still consider fuzzy areas for

fpr and extended detected areas to compute tpr.

Conclusionhe inspection of defects on large images is a very tedious

ask. Thus, in order to help the human expert, many auto-

Fig. 17 Thresholded images at tpr=100% withand our proposed soft mask comparison. Theresulting fpr. On the contrary, the soft method prneed more pixels to localize and identify the clu

ournal of Electronic Imaging 031104-1

mated processes and image processing algorithms havebeen developed to detect potentially defective areas. As-sessing the actual quality and performance of these detec-tion algorithms is then of the utmost importance and mustbe dealt with respect to the inspection context. ROC analy-sis is a proven methodology to compare such algorithms,but it has some limitations when facing complex situations�various sizes/shapes/types of defects�. To overcome theselimitations, we propose a method to compute true-positiveand false-positive rates in a way that is consistent withsemi-automated inspection application. This method usessimple object-based morphological dilations to extend thepixel-level definitions of ROC quantities to more object-related ones. Thus, fuzzy areas are automatically definedaround each object to exclude near hits from the false-alarm count. In the meantime, true positives are linked tothe visual inspection problem by defining a dilation schemeto mimic human expert inspection. This way of using theROC methodology on practical cases provides a more reli-able assessment of defect detection algorithms and allowsa better calibration of semi-automated quality-controlsystems.

References1. A. Kumar and G. K. Pang, “Defect detection in textured materials

using optimized filters,” IEEE Trans. Syst., Man, Cybern., Part B:Cybern. 32�5�, 553–570 �2002�.

2. D.-M. Tsai and T.-Y. Huang, “Automatic surface inspection for sta-tistical textures,” Image Vis. Comput. 21, 307–323 �2003�.

ct to two mask comparison method: pixel-levelvel method leads to overdetection, with a highthe expected images for tpr=100%. We do notd Gauss defects.

respepixel-leovidesster an

Jul–Sep 2008/Vol. 17(3)3

Page 14: Practical use of receiver operating characteristic analysis to …jocelyn.chanussot/publis/... · 2009. 12. 22. · Practical use of receiver operating characteristic analysis to

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

�Tca

Le Meur, Vignolle, and Chanussot: Practical use of receiver operating characteristic analysis…

J

3. C. Wolf and J.-M. Jolion, “Object count/area graphs for the evalua-tion of object detection and segmentation algorithms,” Int. J. Doc.Anal. Recog. 8�4�, 280–296 �2006�.

4. G. Liu and R. Haralick, “Assignment problem in edge detection per-formance evaluation,” in Proc. IEEE Conf. Comp. Visi. Patt. Recog.CVPR 2000, pp. 1026–1031 �2000�.

5. J. C. Nascimento and J. S. Marques, “Novel metrics for performanceevaluation of object detection algorithms,” in Proc. 1st ISR Workshopon Systems, Decision and Control Robotic Monitoring and Surveil-lance �2005�.

6. V. Y. Mariano, J. Min, J.-H. Park, R. Kasturi, D. Mihalcik, H. Li, D.Doermann, and T. Drayer, “Performance evaluation of object detec-tion algorithms,” in Proc. Int. Conf. Patt. Recog., 3, 30965 �2002�.

7. J. M. James Keller and P. Gader, “A fuzzy logic approach to detectorscoring,” in Proc. Fuzzy Info. Process. Soci. NAFIPS, 20, 339–344�1998�.

8. Y. Le Meur, J.-M. Vignolle, and J. Chanussot, “A practical use ofROC analysis to assess the performances of defects detection algo-rithms,” in Proc. SPIE 6356, 635616 �2007�.

9. D. Green and J. Swets, Signal Detection Theory and Psychophysics,Wiley, New York �1966�.

0. D. Dorfman and E. J. Alf, “Maximum likelihood estimation of pa-rameters of signal detection theory—a direct solution,” Psychomet-rica 33, 117–124 �1968�.

1. J. Egan, Signal Detection Theory and ROC Analysis, Academic Press,New York �1975�.

2. J. Hanley, “Receiver operating characteristic �ROC� methodology:The state of the art,” Crit. Rev. Diagn. Imaging 29�3�, 307–335�1989�.

3. J. Hanley and B. McNeil, “The meaning and use of the area under areceiver operating characteristic �ROC� curve,” Radiology 143�29�,29–36 �1982�.

4. C. E. Metz, “ROC methodology in radiologic imaging,” Invest. Ra-diol. 21, 720–733 �1986�.

5. T. Fawcett, “ROC graphs: Notes and practical considerations for re-searchers,” Technical report, HP Labs �2004�. http://home.comcast.net/~tom.fawcett/public_html/papers/ROC101.pdf.

6. J. M. Irvine, “Assessing target search performance: The free-responseoperator characteristic model,” Opt. Eng. 43, 2926–2934 �2004�.

7. P. A. Flach, “Tutorial on the many faces of ROC analysis in machinelearning,” in Proc. Int. Conf. Machine Learning �http://www.cs.bris.ac.uk/flach/ICML04tutorial/� �2004�.

8. C. Cortes and M. Mohri, “AUC optimization vs. error rate minimi-zation,” in Proc. Adv. Neural Info. Proces. Syst. (NIPS 2003) 16, MITPress, Cambridge �2003�.

9. J. Theiler, N. Harvey, and J. M. Irvine, “Approach to target detectionbased on relevant metric for scoring performance,” in Proc. 33rdAppl. Imagery Patt. Recog. Workshop (AIPR’04), 184–189 �2004�.

0. N. R. Harvey and J. Theiler, “Focus-of-attention strategies for findingdiscrete objects in multispectral imagery,” Proc. SPIE 5546, 179–189�2004�.

1. A. Jain, Fundamentals of Digital Image Processing, Prentice Hall,Englewood Cliffs, NJ �1995�.

2. W. Rucklidge, Efficient Visual Recognition Using the Hausdorff Dis-tance, Springer-Verlag, New York �1996�.

3. H. Blum, “A transformation for extracting new descriptors of shape,”in Models for the Perception of Speech and Visual Form, W. Wathen-Dunn �Eds.�, pp. 362–380, MIT Press, Cambridge �1967�.

4. R. C. Gonzales and R. E. Woods, Digital Image Processing, 2nd Ed.,Prentice Hall, Englewood Cliffs, NJ �2002�.

Yann Le Meur graduated in electrical engi-neering from the Grenoble Institute ofTechnology �INP Grenoble�, France, in2004 and received his MS degree in signaland image processing from INP Grenoblethe same year. In 2004, he led a six-monthMS thesis project at the Center Nationald’Etudes Spatiales �French space agency�,Toulouse, France, where he worked onmulti-temporal remote sensing images. Heis now a PhD candidate at GIPSA-Lab

Grenoble Image Speech Signals and Automatics Laboratory� andrixell, Moirans, France. His research interests include image pro-essing, objects detection, image statistical analysis, image qualityssessment, and data fusion, especially kernel-based methods.

ournal of Electronic Imaging 031104-1

Jean-Michel Vignolle graduated in gen-eral engineering from Ecole Centrale Paris,France, in 1987, with a speciality inbioengineering, and in the same year re-ceived his MS degree in spectrochemicalanalysis methods from Paris VI University.In 1988–1989, he worked at ThomsonCentral Research Labs on materials forneural network hardware implementation,then fiber optics sensors. In 1990, hejoined Thales Avionics, where he was re-

sponsible for various LCD display design for projection in directview. His activities included microelectronics design, electrical de-sign, mechanical design, and optical design. In 1998, he joined Trix-ell as technical project manager on various x-ray detector designprojects. Since 2002, he has been in charge of the Image Group, agroup of engineers dedicated to the development of image process-ing, image correction algorithms, image quality measurement tools,and methods.

Jocelyn Chanussot graduated in electri-cal engineering from the Grenoble Instituteof Technology �INP Grenoble�, France, in1995. He received his PhD degree fromSavoie University, Annecy, France, in1998. He was with the Automatics and In-dustrial Micro-Computer Science Labora-tory �LAMII�. In 1999, he worked at the Ge-ography Imagery Perception Laboratory�GIP� for the Délégation Générale del’Armement �DGA—French National De-

fense Department�. Since 1999, he has been with INP Grenoble asan assistant professor �1999–2005�, associate professor �2005–2007�, and professor �2007–� of signal and image processing. He isconducting his research at GIPSA-Lab �Grenoble Image SpeechSignals and Automatics Laboratory�. His research interests includestatistical modeling, classification, image processing, nonlinear fil-tering, remote sensing, and data fusion. Dr. Chanussot is currentlyserving as an associate editor for the IEEE Transactions on Geo-science and Remote Sensing and for Pattern Recognition. He is theco-chair of the GRS Data Fusion Technical Committee and a mem-ber of the Machine Learning for Signal Processing Technical Com-mittee of the IEEE Signal Processing Society. He has authored orco-authored over 65 publications in international journals and con-ferences. He is a senior member of the IEEE.

Jul–Sep 2008/Vol. 17(3)4