effect of an artificial neural network on radiologists' performance in the differential...

5
AJR:172, May 1999 1311 Effect of an Artificial Neural Network on Radiologists’ Performance in the Differential Diagnosis of Interstitial Lung Disease Using Chest Radiographs Kazuto Ashizawa 1,2 Heber MaCMahon1 Takayuki Ishida Katumi Nakamura1 Carl J.Vyborny1 Shigehiko Katsuragawa1 Kunio Ooi1 OBJECTIVE. We developed a new method to distinguish between various interstitial lung diseases that uses an artificial neural network. This network is based on features extracted from chest radiographs and clinical parameters. The aim of our study was to evaluate the ef- feet of the output from the artificial neural network on radiologists diagnostic accuracy. MATERIALSAND METHODS. The artificial neural network was designed to differen- tiate among II interstitial lung diseases using 10 clinical parameters and 16 radiologic find- ings. Thirty-three clinical cases (three cases for each lung disease) were selected. In the observer test, chest radiographs were viewed by eight radiologists (four attending physicians and four residents) with and without network output. which indicated the likelihood of each of the II possible diagnoses in each case. The radiologists’ performance in distinguishing among the II interstitial lung diseases was evaluated by receiver operating characteristic (ROC) analysis with a continuous rating scale. RESULTS. When chest radiographs were viewed in conjunction with network output. a sta- tistically significant improvement in diagnostic accuracy was achieved (p < .(X)Ol ). The aver- age area under the ROC curve was .826 without network output and .9 1 1 with network output. CONCLUSION. An artificial neural network can provide a useful ‘second opinion” to as- sist radiologists in the differential diagnosis of interstitial lung disease using chest radiographs. Received September 9, 1998; accepted after revision November 16,1998. C. J. Vyborny, H. MacMahon, and K. Doi are shareholders of R2 Technology, Inc., Los Altos, CA. Supported by United States Public Health Service grants CA24806 and CA62625. K. Ashizawa supported in part by a Japanese Board of Radiology grant and by the Konica Company, Tokyo,Japan. 1Kurt Rossmann Laboratories for Radiologic Image Research, Department of Radiology, The University of Chicago, 5841 S. Maryland Ave., Chicago, IL 60637. Address correspondence to K. Doi. 2Present address: Department of Radiology, Nagasaki University School of Medicine, Sakamoto 1-7.1, Nagasaki 852-8501, Japan. AJR1999;172:131 1-1315 0361-803X/99/1725-131 1 © American Roentgen Ray Society D ifferential diagnosis of interstitial lung disease is a major subject in chest radiology. Although CT has greater diagnostic accuracy in the assessment of interstitial lung disease. chest radiography remains the imaging technique of choice for initial detection and diagnosis. However. differ- ential diagnosis of interstitial lung disease us- ing chest radiographs has always been difficult for radiologists because of the overlapping spectrum of radiographic appearances and the complexity of clinical parameters. Thus. one must often merge many radiologic features and clinical parameters to make a correct diagnosis. Because of an ability to process large amounts of information simultaneously. artifi- cial neural networks may be useful in the dif- ferential diagnosis of interstitial lung disease. In fact. artificial neural networks have been shown to be a powerful tool for pattern recog- nition and data classification in medical imag- ing [1-121. In previous studies [1. 2J. we applied an artificial neural network to the dif- ferential diagnosis of interstitial lung disease and showed the network to perform well. How- ever, we have not compared the performance of the artificial neural network with radiologists performance without and with network output. In this study. we used receiver operating char- acteristic (ROC) analysis to test the effect of network output on radiologists’ performance in differentiating between certain interstitial lung diseases using chest radiographs. Materials and Methods Artificial Neural Network Scheme The artificial neural network scheme and its per- fonnance for the differential diagnosis of interstitial lung disease have been described in detail 121. A sin- gle three-layer. feed-forward artificial neural network with a back-propagation algorithm was used in this study. We designed the artificial neural network to distinguish among I 1 types ofinterstitial lung disease Oil the basis of a given set of 26 clinical parameters and radiologic findings. The artificial neural network consisted of 26 input units tir 10 clinical parameters and 16 radiologic findings. II output units corre- sp()nding tO the II types of interstitial lung disease. and I 8 hidden units. The 10 clinical parameters in- eluded the patient’s age, sex. duration of symptoms. seventy of symptoms. temperature. immune status. underlying malignancy. history of smoking. dust cx- p()sure. and drug treatment. The 16 radiologic find- Downloaded from www.ajronline.org by Queensland Univ Of Tech on 11/21/14 from IP address 131.181.251.12. Copyright ARRS. For personal use only; all rights reserved

Upload: k

Post on 27-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs

AJR:172, May 1999 1311

Effect of an Artificial NeuralNetwork on Radiologists’Performance in the DifferentialDiagnosis of Interstitial LungDisease Using Chest Radiographs

Kazuto Ashizawa 1,2

Heber MaCMahon1

Takayuki Ishida

Katumi Nakamura1Carl J.Vyborny1

Shigehiko Katsuragawa1Kunio Ooi1

OBJECTIVE. We developed a new method to distinguish between various interstitial lung

diseases that uses an artificial neural network. This network is based on features extracted

from chest radiographs and clinical parameters. The aim of our study was to evaluate the ef-

feet of the output from the artificial neural network on radiologists� diagnostic accuracy.

MATERIALSAND METHODS. The artificial neural network was designed to differen-

tiate among I I interstitial lung diseases using 10 clinical parameters and 16 radiologic find-

ings. Thirty-three clinical cases (three cases for each lung disease) were selected. In the

observer test, chest radiographs were viewed by eight radiologists (four attending physicians

and four residents) with and without network output. which indicated the likelihood of each

of the I I possible diagnoses in each case. The radiologists’ performance in distinguishing

among the I I interstitial lung diseases was evaluated by receiver operating characteristic

(ROC) analysis with a continuous rating scale.

RESULTS. When chest radiographs were viewed in conjunction with network output. a sta-

tistically significant improvement in diagnostic accuracy was achieved (p < .(X)Ol ). The aver-

age area under the ROC curve was .826 without network output and .9 1 1 with network output.

CONCLUSION. An artificial neural network can provide a useful ‘second opinion” to as-

sist radiologists in the differential diagnosis of interstitial lung disease using chest radiographs.

Received September 9, 1998;accepted after revisionNovember 16,1998.

C. J. Vyborny, H. MacMahon, and K. Doi are shareholdersof R2 Technology, Inc., Los Altos, CA.

Supported by United States Public Health Service grantsCA24806 and CA62625. K. Ashizawa supported in part by aJapanese Board of Radiology grant and by the KonicaCompany, Tokyo, Japan.

1Kurt Rossmann Laboratories for Radiologic Image

Research, Department of Radiology, The University ofChicago, 5841 S. Maryland Ave., Chicago, IL 60637.Address correspondence to K. Doi.

2Present address: Department of Radiology, NagasakiUniversity School of Medicine, Sakamoto 1-7.1, Nagasaki852-8501, Japan.

AJR1999;172:131 1-1315

0361-803X/99/1725-131 1

© American Roentgen Ray Society

D ifferential diagnosis of interstitial

lung disease is a major subject in

chest radiology. Although CT has

greater diagnostic accuracy in the assessment

of interstitial lung disease. chest radiography

remains the imaging technique of choice for

initial detection and diagnosis. However. differ-

ential diagnosis of interstitial lung disease us-

ing chest radiographs has always been difficult

for radiologists because of the overlapping

spectrum of radiographic appearances and the

complexity of clinical parameters. Thus. one

must often merge many radiologic features and

clinical parameters to make a correct diagnosis.

Because of an ability to process large

amounts of information simultaneously. artifi-

cial neural networks may be useful in the dif-

ferential diagnosis of interstitial lung disease.

In fact. artificial neural networks have been

shown to be a powerful tool for pattern recog-

nition and data classification in medical imag-

ing [1-121. In previous studies [1. 2J. we

applied an artificial neural network to the dif-

ferential diagnosis of interstitial lung disease

and showed the network to perform well. How-

ever, we have not compared the performance of

the artificial neural network with radiologists�

performance without and with network output.

In this study. we used receiver operating char-

acteristic (ROC) analysis to test the effect of

network output on radiologists’ performance in

differentiating between certain interstitial lung

diseases using chest radiographs.

Materials and Methods

Artificial Neural Network Scheme

The artificial neural network scheme and its per-

fonnance for the differential diagnosis of interstitial

lung disease have been described in detail 121. A sin-

gle three-layer. feed-forward artificial neural network

with a back-propagation algorithm was used in thisstudy. We designed the artificial neural network to

distinguish among I 1 types ofinterstitial lung diseaseOil the basis of a given set of 26 clinical parameters

and radiologic findings. The artificial neural network

consisted of 26 input units tir 10 clinical parameters

and 16 radiologic findings. I I output units corre-

sp()nding tO the I I types of interstitial lung disease.

and I 8 hidden units. The 10 clinical parameters in-

eluded the patient’s age, sex. duration of symptoms.

seventy of symptoms. temperature. immune status.

underlying malignancy. history of smoking. dust cx-p()sure. and drug treatment. The 16 radiologic find-

Dow

nloa

ded

from

ww

w.a

jron

line.

org

by Q

ueen

slan

d U

niv

Of

Tec

h on

11/

21/1

4 fr

om I

P ad

dres

s 13

1.18

1.25

1.12

. Cop

yrig

ht A

RR

S. F

or p

erso

nal u

se o

nly;

all

righ

ts r

eser

ved

Page 2: Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs

Ashizawa et al.

1312 AJR:172, May 1999

ings (Table 1 ) were classified into three categories.

namely. distribution of infiltrates (upper. middle. and

lower zones of the right and left lungs: proximal orperipheral predominance). characteristics of infil-trates (homogeneity: fineness or coarseness: nodular-

ity: degree of septal lines, honeycombing. and loss of

lung volume). and three additional thoracic abnor-

malities (lymphadenopathy. pleural effusions. heartsize). The interstitial lung diseases selected for differ-

ential diagnosis included sarcoidosis. miliary tuber-

�One Radiologist’s Ratlngsa

Radiologic FindingSubjective Rating

Case 1 Case 2

Infiltrate distribution

Right upper field 4 5

Right middle field 6 4

Right lower field 8 3

Left upper field 4 5

Leftmiddlefield 6 4

Leftlowerfield 8 3

Proximal/peripheral 8 5

Infiltrate characteristics

Homogeneous 7 8

Fine/coarse 4 8

Nodular 3 9

Septal lines 2 2

Honeycombing 8 0

Lossoflungvolume 2 0

Lymphadenopathy 0 7

Pleural effusion 0 0

Heart size 1 1

Note-The ratings ranged from 0 to 10, with the excep-tion of heart size, which ranged from 1 to 5. For proximal!peripheral, 10 = peripheral; for fine!coarse, 10 = coarse.

aFor 16 radiologic findings in a 64-year-old man with idio-pathic pulmonary fibrosis lease 11 and a 34-year-old womanwith sarcoidosis Icase 2).

culosis, lymphangitis carcinomatosa. interstitial lung

edema. silicosis. Pneumocvstis carinii pneumonia.scleroderma. eosinophilic granuloma. idiopathic pul-

monary fibrosis. viral pneumonia, and pulmonary

drug toxicity.

Our databases for the artificial neural networkincluded 150 actual clinical cases. I 10 published

cases. and I 10 hypothetical cases. Diagnoses of

actual clinical cases were based on a detailed clini-

cal correlation (ii = 55 l37�i-D or on pathologic (ii =

61 [40%]) or bacteriologic (a = 34 [23�/c1) proof ofthe pulmonary lesion. For clinical cases and pub-

lished cases. subjective ratings for the 16 radio-

logic findings were provided independently by

three experienced radiologists. Table I shows exam-

pIes of one radiologist’s ratings for two clinical cases

(Fig. I). Input data obtained from clinical parameters

and subjective ratings for radiologic findings werenormalized to a range from 0 to I . We used a modi-

fled round-robin (leave-one-out) method 181 to evalu-

ate the performance of the artificial neural network in

distinguishing the actual clinical cases. With this

method. although a round-robin method was applied

to all databases for training. only clinical cases were

used for testing. Output values ranging from 0 to I

indicated the likelihood of each of the 1 1 possible

diseases in each case.

The performance of the artificial neural networkwas evaluated using ROC analysis 1131. BinormalROC curves for the diagnosis of each disease wereestimated by use of the LABROC4 algorithm devel-

oped in our laboratories I 141. A: values representing

the area under each of the I I ROC curves were cal-culated. The average performance was estimated by

averaging the two binormal parameters of the 1 1 in-

dividual ROC curves [151. The average A: value as ameasure of overall performance was .947. The per-

formance of the artificial neural network was also

assessed by comparing output values indicating the

likelihood of each of the I I diseases in each case.

By the two largest outputs. both the sensitivity andthe specificity of the artificial neural network for in-dicating the correct diagnosis were 89%.

Case Selection

In the observer test. 33 actual clinical cases (three

cases per disease) were selected from a database of

1St) cases 121 by two experienced radiologists who

did not participate in the observer test. The 33 clini-

cal cases included 22 men and I I women who were

21-84 years old (mean. 5 I years). In each of the

cases, the disease had only one cause.

Although each case initially had three output val-

ues based on the three input values provided by three

radiologists. we averaged the output values for each

case and presented these averages to the observers

fir the observer test. ROC analysis of the average

OUtPUt values for the artificial neural network fIund

an A: value of .977 for the 33 cases. By the two larg-

est outputs. the sensitivity and specificity of the arti-ficial neural network for indicating the correct

diagnosis were 91% and 89%, respectively. This

level of performance of the artificial neural network.

obtained with a subset of our database. was similarto that obtained with all I 50 clinical cases.

Observer TestAn ROC observer test can be either of two

types: independent or sequential I 161. Ours was

sequential. Each chest radiograph and its clinical

parameters were shown to an observer, who rated

the likelihood of each of the I I diseases (without

network output). Subsequently. the network output

(Fig. 2 was presented to the same observer. who

rated the likelihood a second time (with network

output). The observer could either change the mi-tial ratings or leave them unchanged.

Eight radiologists (four experienced radiologists

lattending physiciansi and four radiology resi-

dents) who knew nothing about the cases partici-pated as observers. Betiwe the test. the observers

were told that only one of the I I possible diseases

was the correct diagnosis for each case; that the

reading condition was based on the sequential test:

that the role of the network output was to provide a

“second opinion’: and that. by the two largest out-

puts. the sensitivity and specificity of the artificial

Fig. 1.-Examples of chest radio-graphs used in this study.A, 64-year-old man with idiopathicpulmonaryfibrosis (case 1 in Table 1).B, 34-year-old woman with sarcoi-dosis (case 2 in Table 2).

Dow

nloa

ded

from

ww

w.a

jron

line.

org

by Q

ueen

slan

d U

niv

Of

Tec

h on

11/

21/1

4 fr

om I

P ad

dres

s 13

1.18

1.25

1.12

. Cop

yrig

ht A

RR

S. F

or p

erso

nal u

se o

nly;

all

righ

ts r

eser

ved

Page 3: Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs

-IL..IIP

0.8 1.0

0

U

55

a).�U)00.

I� 0.2

I C

0

� U

. 0)

U)

0.

..�s for ROC Curves

agnostic Accuracy11th and Without ANN�utput

0.2

False-positive fraction False-positive fraction

Fig. 4.-Average receiver operating characteristiccurves for attending physicians without and with arti-ficial neural network output. Observer performance

with network output was significantly improved. ANN =

artificial neural network.

Effect of an Artificial Neural Network on Diagnosis of Lung Disease

AJR:172, May 1999 1313

neural network for indicating the correct diagnosiswere 9lCk and 89C/c. respectively. Before the test.four training cases were shown to the observers to

familiarize them with the rating method and withuse of network output as a second opinion.

The observers’ confidence about the likelihood of

each of the I I possible diseases was represented

using an analog continuous-rating scale with a line-

checking method I 161. For the initial ratings. theobservers used a graphite pencil to mark their con-

fidence levels along a 7-cm-long line. Ratings

of “definitely absent’ and “definitely present”

were marked above the left and the right ends.

Sarco�dos�s

MiIia,y bc.

Lymphwgftc Ca.

Edema

SAcosa

ScI#{149}�odeona

Pcp

EG

5#{176}F

Voal pneumorua

Drug tox.cay

0.0 0.2 0.4 0.6

ANN output

Fig. 2.-Graph shows one example of artificial neuralnetwork output presented to eight observers. Output

values were obtained for 64-year.old man in Figure 1A.

Largest output value among 11 diseases correspondsto correct diagnosis. tbc = tuberculosis, ca. = carcino-matosa, PCP = Pneumocystis cariniipneumonia, EG =

eosinophilic granuloma. IPF = idiopathic pulmonary fi-brosis, ANN = artificial neural network.

Fig. 3.-Comparison of average receiver operatingcharacteristic curves for all observers without and with

artificial neural network output and receiver operatingcharacteristic curve for artificial neural network output

alone. Observer performance with network output wassignificantly higher than that without network outputbut was still lower than performance of network output

alone. ANN = artificial neural network.

respectively. of the line. For second ratings that were

different from the initial ratings. observers used a red

pencil to mark their confidence levels along the

same line. For data analysis. the confidence level

was scored by measuring the distance from the left

end of the line to the marked point and converting

the measurement to a scale from 0 to 100.

Data Analysis

The radiologists diagnostic performance with

and without network output was evaluated using

ROC analysis [ 131. We defined confidence ratingsdata with the correct diagnosis as “actual positives”and those with any other diseases as “actual nega-

tives.�’ For each observer and each reading condition

(with and without network output). we used a maxi-

mum likelihood estimation to fit a binormal ROC

curve to the confidence ratings data for all 1 1 possi-

ble diseases in the 33 cases [14]. This combining of

data for all diseases was done because of the small

number of cases of each disease. The A: value was

then calculated for each fitted curve. The statistical

significance of differences between ROC curves foreach reading condition was determined by applying

a two-tailed t test for paired data to the reader-spe-

cific A: values. We also analyzed the statistical sig-nificance of differences between ROC curves tir

attending physicians and those for radiology resi-

dents using a two-tailed t test. To represent the over-

all performance for each group of observers. average

ROC curves were generated for the ftur residents.the four attending physicians. and all radiologists by

averaging the two binormal parameters of their mdi-

vidual ROC curves I I 51.Another indication of performance was the

number of correctly diagnosed cases for which the

observer’s ranking changed because of network

output. Four rankings were used-I. 2. 3. and less

than 3-with I corresponding to a case that the ob-

server diagnosed correctly with the highest confi-dence rating. 2 corresponding to a case diagnosed

correctly with the second highest confidence rating.

and so on. An improvement in a ranking. such as a

change from 2 to I , indicated that network output

benefited diagnostic performance: the opposite mdi-

cated a detrimental efl#{232}ct.Using a two-tailed t test

for paired data, we analyzed the statistical signifi-

cance of the difference between the number of cases

benefited and the number not benefited. The same

test was used tO analyze differences between thenumber of attending physicians’ cases affected and

the number of residents’ cases affected.

Results

The overall performance by the three groups

of observers is illustrated by the average ROC

curves in Figures 3-5. Diagnostic performance

improved when chest mdiographs and clinical

parameters were shown in conjunction with

network output. The average A: value for all

radiologists increased to a statistically signifi-

cant degree. from .826 without network output

to .91 1 with network output (p < .0001). How-

ever, for all radiologists, the average A: value

with network output was still lower than for

the network alone (A: .977). For the four at-

tending physicians, the average A: values

without and with network output were .839

and .905. respectively. whereas for the four

residents. the average A: values without and

with network output were .812 and .916. re-

spectively. These differences were also statisti-

cally significant (/) = .0026 and p = .0074.

respectively). Table 2 shows the A: values

without and with network output for each ra-

Dow

nloa

ded

from

ww

w.a

jron

line.

org

by Q

ueen

slan

d U

niv

Of

Tec

h on

11/

21/1

4 fr

om I

P ad

dres

s 13

1.18

1.25

1.12

. Cop

yrig

ht A

RR

S. F

or p

erso

nal u

se o

nly;

all

righ

ts r

eser

ved

Page 4: Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs

�1umberofCases (of 33otal) for Which Ranking

as AffeCted by Artificial

�iraI Network Output

0.2 0.4

False-positive fraction

A B C D

Observers

a

Ashizawa et al.

1314 AJR:172, May 1999

diologist. The performance of all improved

when network output was used.

When we compared the average perfor-

mance of the four attending physicians with

that of the four residents, the difference in

A: values failed to reach statistical signifi-

cance for each of the two reading conditions

(without P .3601 and with [p = .5651 net-work output). Although the gain in perfor-

mance with use of network output was

larger fir residents than for attending physi-

cians. the difference was not statistically

significant ( p = .077).

Table 3 shows the number of cases affected

by network output for the individual radiolo-

gists. The net effect of network output was

clearly an improvement in performance. The

average number of cases affected beneficially

and detrimentally by network output for all ra-

diologists was 9.8 and I .4. respectively. and

this dift’erence was statistically significant (p =

.0002). When the network ranking was 1, all

cases affected by network output showed a

beneficial effect for all radiologists (Fig. 6).

The difference between the average number of

cases affected beneficially and detrimentally

by network output was also statistically signif-

icant ( p = .0034 and p = .0002. respectively)

for the attending-physician group (6.6 and I .0.

respectively) and the resident group (13.0 and

I .8. respectively).

In a comparison of the average number of

cases benefited by network output for the four at-

tending physicians and for the four residents, the

difference was statistically significant (p = .0001).

whereas the difference in the detrimental effect

was not statistically significant (p = .414) (Fig. 6).

Performance improved significantly more for the

residents than for the attending physicians.

z4

.0

00

Cl)a)Cl)

U

E

z

-�� �-� � �---�

L� -�-� � � �-�-�-�-

Fig. 5.-Average receiver operating characteristic

curves for radiology residents without and with artifi-cial neural network output. Observer performance withnetwork output was significantly improved. ANN = arti-ficial neural network.

Fig. 6.-Graph shows number of correctly diagnosed cases for which observer’s ranking (1 or 2) changed be-cause of network output. Network output clearly improved performance of all observers. ANN = artificial neuralnetwork, white bars = ranking of 1, gray bars = ranking of 2.

Discussion

The results of �ur observer test indicate thatnetwork output can significantly improve mdi-

ologists’ performance in the differential diag-

nosis of interstitial lung disease using chest

radiographs. Three radiologists (one attending

physician and two residents) participated in a

pilot observer test before this study. Another set

of 33 clinical cases. which do not overlap the

cases used in this study. was selected from our

database and used for the pilot test. The average

performance of the three radiologists was sig-

nificantly greater with network output (A =

.930) than without it (A: .867) (p < .05). The

pilot results support those of the main study.

The performance of the artificial neural net-

work (A: 977) was substantially better than

the radiologists’ performance (A: .826), and

the difference was statistically significant (p <

.000 1 ). This finding can be interpreted as fol-

lows. The differential diagnosis of interstitial

lung disease using chest radiographs is gener-

ally considered to require extraction of image

features and subsequent merging of extracted

features and clinical parameters. However, the

features used by one radiologist may differ

from those used by another and may vary from

case to case. The approach tends to be less

than systematic and influenced by anecdotal

experience. Therefore. the 16 image features

used by the network would likely not be iden-

tical to those used by the radiologists but,

probably. would be more comprehensive. Thus,

the network’s information would consistently

be more complete than the radiologists’ . In ad-

dition. the network would likely be better able

than the radiologists to combine features. To

verify this assumption rigorously, however,

one would need to compare the ability of the

network and the radiologists to merge the

same features. In practice. performance differ-

ences between the network and the radiolo-

gists may have resulted from differences in

both the extraction and the merging.

Dow

nloa

ded

from

ww

w.a

jron

line.

org

by Q

ueen

slan

d U

niv

Of

Tec

h on

11/

21/1

4 fr

om I

P ad

dres

s 13

1.18

1.25

1.12

. Cop

yrig

ht A

RR

S. F

or p

erso

nal u

se o

nly;

all

righ

ts r

eser

ved

Page 5: Effect of an artificial neural network on radiologists' performance in the differential diagnosis of interstitial lung disease using chest radiographs

Effect of an Artificial Neural Network on Diagnosis of Lung Disease

AJR:172, May 1999 1315

Studies on the detection of abnormalities such

as lung nodules on chest radiographs and micro-

calcifications on mammograms have shown that

the computer helped radiologists detect findings

even if its performance was lower than that of the

radiologists 116, 17]. This result was probably

possible because the computer output alerted ra-

diologists to the locations of some lesions that

might be missed by radiologists. When the

computer scheme indicates whether a lesion

exists by using an arrow in a detection task, it

is relatively easy for radiologists to decide ei-

ther to agree with or to disregard the com-

puter output. In other words, radiologists

would be able to correct their mistakes if they

recognize that, for example, they might have

overlooked an obvious lesion. Our study, on

the other hand, found that the performance of

radiologists with network output was lower

than the performance of the artificial neural

network alone. Because the performance of

the network was much higher than that of the

radiologists alone, one might expect that the

performance of radiologists with networkoutput would be at least equal to that of the

network. However, the radiologists could not

take full advantage of network output be-

cause they were not familiar with using it.

Network output indicates the likelihood of each

of the 1 1 possible diseases-information that

radiologists may find difficult to assimilate. In

addition, although the radiologists knew that the

network performed well, they might not have

had confidence in all of its output, some of

which may have disagreed with their own

knowledge and experience. To gain confi-dence in network output and familiarity with

using it, radiologists need to apply it prospec-

tively to differential diagnosis in actual clini-

cal situations.

The performance of one of the auending

physicians without and with network output

was lower than that of the other attending phy-

sicians, possibly because for almost all the

cases he did not use the clinical parameters

when interpreting chest radiographs. Therefore,

even for experienced observers, consideration

of these parameters would seem important. His

low performance probably led to the compara-

bility between the average A. values of the at-

tending-physician group and those of the

resident group. However, the improvement in

his performance with network output was simi-

lar to that of the other attending physicians, so

the comparison of gains for the two groups

might be meaningful. In fact, the performance

gain for residents, as measured by both A.

value and number of cases benefited, was larger

than that for attending physicians. This finding

indicates that network output can improve the

performance of radiologists, especially thoseless experienced. We believe further study is

needed to determine the true difference in per-

fomiance between the two groups of observers

under the two reading conditions.

Independent ROC observer tests, which

have been used widely, have two sessions. In

the first, each observer interprets half the cases

with network output and the other half with-

out. In the second, the first half is interpreted

without and the second half with network out-

put. The general assumption is that observers

do not remember the cases interpreted in the

first session. This assumption is correct for

most simple tests, such as the detection of lung

nodules, but questionable for our study becausethe task required was differential diagnosis,

which is more complicated than detection. Thus,

radiologists would likely remember details of

some cases, especially first-session cases with

network output.

Unlike independent tests, a sequential test,

which we used, measures in one session the ef-

feet of network output on diagnostic decisions.

Concern about variations in the observer’s

memory is thus eliminated. The sequential test

is not, however, an established method for

ROC observer studies and is inherently biased

because the observer is always first tested

without network output. However, one study

[16] indicated that these two types of observer

tests reach similar conclusions. Therefore, we

believe that the sequential test can be used to

evaluate the effect of computer output, espe-

cially for differential diagnosis.

In conclusion, our results indicate that artifi-

cial neural network output can significantly

improve the accuracy of radiologists in the dif-

ferential diagnosis of interstitial lung disease

using chest radiographs. We believe that net-

work output, when used as a second opinion,

can help radiologists with decision making.

Acknowledgments

We thank Hajime Nakata (University of Oc-

cupational and Environmental Health, School

of Medicine, Fukuoka, Japan) and Kuniaki Ha-

yashi (Nagasaki University School of Medicine,

Nagasaki, Japan) for supplying valuable clinical

cases. We also thank John J. Fennessy, Lau-

rence Monnier, Walter Cannon, Thomas Woo,

Scott Stacy, Bmce Lin, Dixson Gilbert, and

Shawn Kenney for participating as observers;

Charles E. Metz for useful suggestions and dis-

cussions about receiver operating characteristic

analysis; Hiroyuki Yoshida and Yulei Jiang for

helpful discussions; and E. Lanzl for improving

the manuscript.

References

I. Asada N, Doi K. MacMahon H. et al. Potential

usefulness of an artificial neural network for dif-

ferential diagnosis of interstitial lung disease: pi-

lot study. Radiology 1990:177:857-860

2. Ashizawa K. Ishida T. MacMahon H. Vyborny

ci. Katsuragawa S. Doi K. Artificial neural net-

works in chest radiography: application to the dif-

ferential diagnosis of interstitial lung disease.

Acad Radiol 1999:6:2-9

3. Ishida T, Katsuragawa S. Ashizawa K. MacMahon

H. Doi K. Artificial neural networks in chest radio-

graphs: detection and characterization of interstitial

lung disease. Prnc SPIE 1997:3034:931-937

4. Gross GW. Boone JM. Greco-Hunt V. Greenberg

B. Neural networks in radiologic diagnosis. II. In-

terpretation of neonatal chest radiographs. invest

Radio! 1990:25:1017-1023

5. La SC. Freedman MT. Lin iS. Mon 5K. Automaticlung nodule detection using profile matching andback-propagation neural network techniques.

J Digit imaging 1993:6:48-54

6. Gurney JW, Swensen Si. Solitary pulmonary

nodules: determining the likelihood of malig-

nancy with neural network analysis. Radiology

1995:196:823-829

7. Wu Y, Doi K. Giger ML, Nishikawa RM. Comput-

erized detection of clustered microcalcifications in

digital mammograms: application of artificial neu-

ml networks. Med Phvs 1992:19:555-560

8. Wu Y, Giger ML. Doi K, Vybomy Ci. Schmidt RA.Mets CE. Artificial neural networks in mammogra-phy: application to decision making in the diagno-

sis ofbreast cancer. Radiology 1993;187:81-87

9. Heitmann KR. Kauczor H. Mildenberger P. et al.

Automatic detection of ground glass opacities onlung HRCT using multiple neural networks. Eur

Radio! 1997:7:1463-1472

10. Henschke CI. Yankelevitz DF. Mateescu I. et al.

Neural networks for the analysis of small pulmo-

nary nodules. Chit imaging 1997:21:390-399

I I. Bocchi L. Coppini G. De Dominicis R. et al. Tis-

sue characterization from X-ray images. Med Eng

Pin’s 1997; 19:336-342

12. Lin iS. Hasegawa A. Freedman MT. et al. Differ-

entiation between nodules and end-on vessels us-

ing a convolution neural network architecture. J

Digit imaging 1995:8:132-14113. Metz CE. ROC methodology in radiologic imag-

ing. InvestRadiol 1986:21:720-733

14. Metz CE, Herman BA, Shen JH. Maximum-like-

lihood estimation of receiver operating (ROC)

curves from continuously-distributed data. Stat

Med 1998:17:1033-1053

15. Mets CE. Some practical issues of experimental

design and data analysis in radiological ROC

studies. invest Radio! 1989:24:234-245

16. Kobayashi T. Xu XW, MacMahon H, Metz CE,

Doi K. Effect of a computer-aided diagnosis

scheme on radiologists’ performance in detection

of lung nodules on radiographs. Radiology 1996:

199:843-848

17. Chan H-P. Doi K. Vyborny Ci. et al. Improvementin radiologists’ detection of clustered microcalcifi-

cations on mammograms: the potential of com-

puter-aided diagnosis. invest Radio! 1990:25:

1102-1110

Dow

nloa

ded

from

ww

w.a

jron

line.

org

by Q

ueen

slan

d U

niv

Of

Tec

h on

11/

21/1

4 fr

om I

P ad

dres

s 13

1.18

1.25

1.12

. Cop

yrig

ht A

RR

S. F

or p

erso

nal u

se o

nly;

all

righ

ts r

eser

ved