a novel computer-aided diagnosis scheme on small annotated...

15
Research Article A Novel Computer-Aided Diagnosis Scheme on Small Annotated Set: G2C-CAD Guangyuan Zheng , 1,2 Guanghui Han , 3 Nouman Q. Soomro , 4 Linjuan Ma, 1 Fuquan Zhang , 5,6 Yanfeng Zhao, 7 Xinming Zhao, 7 and Chunwu Zhou 7 1 Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China 2 School of Information and Technology, Shangqiu Normal University, No. 55 Pingyuan Road, Shangqiu, Henan Province, China 3 School of Biomedical Engineering, Sun Yat-sen University, Guangzhou 510006, China 4 Department of Soſtware Engineering, Mehran University of Engineering and Technology, SZAB Campus, Khairpur Mirs 76062, Pakistan 5 Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou 350121, China 6 Digital Performance and Simulation Technology Lab., School of Computer Science and Technology, Beijing Institute of Technology, 100081 Beijing, China 7 Department of Imaging Diagnosis, Cancer Institute and Hospital, Chinese Academy of Medical Sciences, Beijing 100021, China Correspondence should be addressed to Nouman Q. Soomro; [email protected] Received 29 November 2018; Accepted 5 March 2019; Published 15 April 2019 Academic Editor: Giandomenico Roviello Copyright © 2019 Guangyuan Zheng et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Purpose. Computer-aided diagnosis (CAD) can aid in improving diagnostic level; however, the main problem currently faced by CAD is that it cannot obtain sufficient labeled samples. To solve this problem, in this study, we adopt a generative adversarial network (GAN) approach and design a semisupervised learning algorithm, named G2C-CAD. Methods. From the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, we extracted four types of pulmonary nodule sign images closely related to lung cancer: noncentral calcification, lobulation, spiculation, and nonsolid/ground-glass opacity (GGO) texture, obtaining a total of 3,196 samples. In addition, we randomly selected 2,000 non-lesion image blocks as negative samples. We split the data 90% for training and 10% for testing. We designed a DCGAN generative adversarial framework and trained it on the small sample set. We also trained our designed CNN-based fuzzy Co-forest on the labeled small sample set and obtained a preliminary classifier. en, coupled with the simulated unlabeled samples generated by the trained DCGAN, we conducted iterative semisupervised learning, which continually improved the classification performance of the fuzzy Co-forest until the termination condition was reached. Finally, we tested the fuzzy Co-forest and compared its performance with that of a C4.5 random decision forest and the G2C-CAD system without the fuzzy scheme, using ROC and confusion matrix for evaluation. Results. Four different types of lung cancer- related signs were used in the classification experiment: noncentral calcification, lobulation, spiculation, and nonsolid/ground-glass opacity (GGO) texture, along with negative image samples. For these five classes, the G2C-CAD system obtained AUCs of 0.946, 0.912, 0.908, 0.887, and 0.939, respectively. e average accuracy of G2C-CAD exceeded that of the C4.5 random decision tree by 14%. G2C-CAD also obtained promising test results on the LISS signs dataset; its AUCs for GGO, lobulation, spiculation, pleural indentation, and negative image samples were 0.972, 0.964, 0.941, 0.967, and 0.953, respectively. Conclusion. e experimental results show that G2C-CAD is an appropriate method for addressing the problem of insufficient labeled samples in the medical image analysis field. Moreover, our system can be used to establish a training sample library for CAD classification diagnosis, which is important for future medical image analysis. Hindawi BioMed Research International Volume 2019, Article ID 6425963, 14 pages https://doi.org/10.1155/2019/6425963

Upload: others

Post on 25-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

Research ArticleA Novel Computer-Aided Diagnosis Scheme on Small AnnotatedSet G2C-CAD

Guangyuan Zheng 12 Guanghui Han 3 Nouman Q Soomro 4 LinjuanMa1

Fuquan Zhang 56 Yanfeng Zhao7 Xinming Zhao7 and Chunwu Zhou7

1Beijing Key Laboratory of Intelligent Information Technology School of Computer Science and TechnologyBeijing Institute of Technology Beijing 100081 China

2School of Information and Technology Shangqiu Normal University No 55 Pingyuan Road ShangqiuHenan Province China

3School of Biomedical Engineering Sun Yat-sen University Guangzhou 510006 China4Department of Software Engineering Mehran University of Engineering and Technology SZAB CampusKhairpur Mirs 76062 Pakistan

5Fujian Provincial Key Laboratory of Information Processing and Intelligent Control Minjiang UniversityFuzhou 350121 China

6Digital Performance and Simulation Technology Lab School of Computer Science and TechnologyBeijing Institute of Technology 100081 Beijing China

7Department of Imaging Diagnosis Cancer Institute and Hospital Chinese Academy of Medical SciencesBeijing 100021 China

Correspondence should be addressed to Nouman Q Soomro noumansoomrogmailcom

Received 29 November 2018 Accepted 5 March 2019 Published 15 April 2019

Academic Editor Giandomenico Roviello

Copyright copy 2019 Guangyuan Zheng et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Purpose Computer-aided diagnosis (CAD) can aid in improving diagnostic level however the main problem currently faced byCAD is that it cannot obtain sufficient labeled samples To solve this problem in this studywe adopt a generative adversarial network(GAN) approach and design a semisupervised learning algorithm namedG2C-CADMethods From the National Cancer Institute(NCI) Lung Image Database Consortium (LIDC) dataset we extracted four types of pulmonary nodule sign images closely relatedto lung cancer noncentral calcification lobulation spiculation and nonsolidground-glass opacity (GGO) texture obtaining a totalof 3196 samples In addition we randomly selected 2000 non-lesion image blocks as negative samples We split the data 90 fortraining and 10 for testing We designed a DCGAN generative adversarial framework and trained it on the small sample set Wealso trained our designed CNN-based fuzzy Co-forest on the labeled small sample set and obtained a preliminary classifier Thencoupled with the simulated unlabeled samples generated by the trained DCGAN we conducted iterative semisupervised learningwhich continually improved the classification performance of the fuzzy Co-forest until the termination condition was reachedFinally we tested the fuzzy Co-forest and compared its performance with that of a C45 random decision forest and the G2C-CADsystem without the fuzzy scheme using ROC and confusion matrix for evaluation Results Four different types of lung cancer-related signs were used in the classification experiment noncentral calcification lobulation spiculation and nonsolidground-glassopacity (GGO) texture along with negative image samples For these five classes the G2C-CAD system obtained AUCs of 09460912 0908 0887 and 0939 respectively The average accuracy of G2C-CAD exceeded that of the C45 random decision tree by14 G2C-CAD also obtained promising test results on the LISS signs dataset its AUCs for GGO lobulation spiculation pleuralindentation and negative image samples were 0972 0964 0941 0967 and 0953 respectively Conclusion The experimentalresults show that G2C-CAD is an appropriate method for addressing the problem of insufficient labeled samples in the medicalimage analysis field Moreover our system can be used to establish a training sample library for CAD classification diagnosiswhich is important for future medical image analysis

HindawiBioMed Research InternationalVolume 2019 Article ID 6425963 14 pageshttpsdoiorg10115520196425963

2 BioMed Research International

1 Introduction

Pulmonary carcinomas are the most lethal disease in theworld Approximately 15 million people die due to pul-monary cancer every yearmdashfar higher than the mortalityrate of other diseases [1] In the United States alone thelung cancer deaths in 2018 will reach 154050 [2] Howevermost pulmonary tumors cause no symptoms Therefore thedisease is usually diagnosed at advanced stages resulting in alow overall 5-year survival rate of approximately 14 [3] Incontrast the 5-year survival rate of patientswith stage IAnon-small cell lung cancer that has been pathologically confirmedand resected or precision treated [4ndash6] can reach 83 [7ndash9] Thus early lung cancer detection can sharply decreasethe lung cancer mortality rate [10 11] Pulmonary primarycancers manifest as nodules in the early stage Comparedto chest X rays computed tomography (CT) has shownhigher sensitivity in detecting small lung nodules [3 12]Currently CT screening is the most recommended methodfor finding nodules [13 14] The associated increase in spiralCT screening has led to a growing burden on radiologists[15] Despite the higher resolution available today it is stilldifficult for radiologists to distinguish malignant nodulesfrom benign ones in low-dose CT (LDCT) images The ratesof resected benign pulmonary nodules can reach 50 duringsurgery [16ndash19] These unnecessary surgeries cause physicaland mental pain and impose additional financial burdens onpatients

A ldquosignrdquo in a CT lung scans refers to a radiologic findingthat suggests a specific disease process Understanding themeaning of a sign implies an understanding of the findingson the CT scan [20] Signs are also called ldquoCT featuresrdquo ldquoCTmanifestationrdquo ldquoCT patternsrdquo or sometimes ldquoCT findingsrdquo[21] Lobulation signs [22 23] spiculation signs [15 24ndash28] and some texture signs [29 30] play crucial roles inradiologistsrsquo ability to differentiate benign from malignantnodules [31 32] Noncentral calcification such as punctatesign or eccentric sign calcification usually indicates that anodule is malignant [33ndash35]Therefore it is highly importantto study methods for identifying the signs of pulmonarynodules automatically to assist radiologists in diagnosingpulmonary malignant nodules

One of the main methods is using Computer-AidedDetectionDiagnosis (CAD)The earliest conception of CADappeared in the 1960s [36 37] Its early idea was attemptingto ldquofully automate the chest examrdquo Over the decades thisexpectation has subsided (which seems to have happened tothe early enthusiasm regarding the capabilities of artificialintelligence systems in general) Currently the general agree-ment is that the focus should be on making useful computer-generated information available to physicians for decisionsupport rather than trying to make a computer act like adiagnostician [38] Various works on the CAD based on CTsigns have been published Han G et al [22] designed asliding-window-based framework to detect lobulation signSuzuki K et al [39] developed a computer-aided diagnostic(CAD) scheme to distinguish benign frommalignant nodulesin LDCT scans using a massive training artificial neural

network (MTANN) and concluded that spiculation signis a highly differentiated feature for distinguishing benignfrom malignant nodules CAD systems based on texturesign features have been investigated in several studies [40ndash42] Since AlexNet won the ImageNet challenge by a largemargin in 2012 deep learning techniques have flourishedrapidly in the image detection field Compared to traditionalclassification algorithms deep learning can extract mostdistinctive features automatically and can implement end-to-end operations However the size of the dataset required totrain a high capacity deep learning framework is quite largewhile generating labeled training data in the medical imageanalysis field is very expensive [43]

To address the dilemma of having only a small annotationset available for lung nodule sign recognition in this paperwe propose a semisupervised generative adversarial network(GAN) and a convolutional neural network (CNN)-basedCo-forest CAD scheme We apply the designed scheme toclassify four types of nodule signs that are highly relatedto lung cancer We call this proposed scheme G2C-CAD inabbreviation

Overall workflow of G2C-CAD is illustrated in Figure 1In the stage A we train a GAN and a Co-forest on theavailable small sample setThen in stage B we use the trainedGAN to generate a synthetic nodule patch and transfer it tothe trained CNN discriminator From the discriminator wegain the CNN-extracted features of the synthesized nodulepatch In stage C the CNN features are provided to the fuzzyCo-forest pretrained on the original sample set to conductsemisupervised learning for the five types of ROI patchesFinally the process iterates between stages B and C until atermination condition is met

The rest of this paper is organized as follows In Section 2we review the existing lung nodule classification algorithmsSection 3 presents our G2C-CAD algorithm We introducethe experimental method in Section 4 and present an analysisin Section 5 Section 6 concludes this paper

2 Related Work

Discriminating between benign or malignant nodules hasattracted the interest of a large number of researchersThe early discrimination methods were based primarily ontraditional machine learning algorithms such as k-nearestneighbors (k-NN) linear discriminant analysis (LDA) Bayes[44] rule based schemes decision trees (DT) and the sup-port vector machine (SVM) Krewer et al [45] tested severalclassifiers including DT k-NN and SVM on extractedtexture and shape features to discriminate malignant frombenign nodules By analyzing the experimental results theyfound that partly solid and nonsolid nodules have a highermalignancy rate than do solid nodules Colin Jacobs et al[46] developed and evaluated a computer aided diagnos-tic system for classifying lung nodules into solid partlysolid and nonsolid nodules This CADx system performsstatistical classification on the nodulesrsquo intensity- texture-and segmentation-based features using the k-NN algorithmXiabi Liu et al [47] proposed a feature selection method

BioMed Research International 3

GAN

Fuzzy Co-forest

Nodulepatches

Generator

Discriminator

Loss

Noise

100

CNNlayers

Leaky ReLU

middotmiddot

Feature Vector

Voting

k1

k2

kn

K

retraining

Figure 1 GAN- and CNN-based Co-forest computer aided diagnosis (G2C-CAD) system workflow

based on the FIsher criterion and genetic optimization (FIG)to address Common CT Imaging Signs of Lung (CISL)disease recognition problems They applied the FIG featureselection algorithm to bag-of-visual-words features wavelettransform-based features the Local Binary Pattern CT ValueHistogram features and others Their results showed thatFIG achieved high computational efficiency and was highlyeffective Tao Sun et al [48] investigated an SVM-basedCADx system for lung cancer classification using a total of488 input features that included textural features patientcharacteristics and morphological features to train the clas-sifier Hidetaka Arimura et al [49] developed a computerizedscheme to automatically detect lung nodules in LDCT imagesfor lung cancer screening They extracted possible noduleimages using a ring average filter identified a set of nodulecandidates by applying a multiple-gray-level thresholdingtechnique and removed false positives by using two rule-based schemes on the localized image features related tomorphology and gray levels Tao Sun et al [50] proposeda CADx system to predict the characteristics of solitarypulmonary nodules in lung CT to diagnose early stage lungcancer In their CADx systeman SVMmodelwas constructedthat exploited curvelet transform texture features 3 patientdemographic features and 9 morphological features Fang-fang Han et al [51] constructed a CADx system based on 50categories of 3D textural features extracted from gray levels acurvature cooccurrence matrix and gradients as well as othernodule volume data derivatives

The above CAD systemsmainly utilized features obtainedby traditional feature extraction algorithms Traditional fea-ture acquisition is based on manual design and selection

which requires experts with specialized heuristic knowledgeThe features obtained in this way are low-level features nearthe pixel level and the work scope is relatively small In theimage analysis workflow the final performance of the systemalso depends on the quality of prior preprocessing or seg-mentation stagesTherefore in the traditional CAD solutionstuning the classification performance is both complicated andarduous [52]

The emergence of the convolutional neural network(CNN) [53] solved this dilemma In 2012 the emergenceof AlexNet sparked a revival in the image detection fieldthrough deep learning techniques based on CNN featuresand CNNs have subsequently been used extensively inpulmonary imaging analysis Compared to traditional algo-rithms CNNs can extract more distinctive features auto-matically Two studies [52 54] demonstrated that CNNsare a promising technique in lung nodule identificationDeep learning techniques have the inherent superiority ofbeing able to automatically extract features and adjust theperformance seamlessly

In traditional CAD algorithms training only needs toseek an optimal discriminant surface from the manuallydesigned feature space In contrast deep learning networkssimultaneously attempt to find both the most significantdiscriminate features among large numbers of high-levelfeatures and an optimal classification surface As a resulttraining a deep learning network requires a massive amountof labeled samples [55]mdasha requirement that cannot be met inthe medical image analysis field because professional anno-tation is too expensive [56] Data augmentation techniquesproduce only limited effects Although transfer learning can

4 BioMed Research International

100Reshape Deconv Deconv Deconv

4times4times128 8times8times64 16times16times3232times32times3

Generator

Conv Conv Conv Full Connection

32times32times3 16times16times32 8times8times64 4times4times1281

DiscriminatorFigure 2 The architecture of the generator and the discriminator in the DCGANmodel

alleviate the problem of the lack of training examples to acertain extent the significant feature sequences vary betweendifferent classification tasks Thus transfer learning is not themost suitable method to cope with medical image analysistasks

Generative adversarial networks (GANs) [57] havedemonstrated the promising ability to generate visually real-istic images A GAN trained with limited annotated samplescan generate large numbers of realistic images ChuquicusmaMJ et al [58] conducted visual Turing tests to evaluate thedegree of realism in nodule images generated by a DCGANand showed that the generated samples can be used to boostthe diagnostic power by mining high-level discriminativeimage features and that the resulting features can be used totrain both radiologists and deep networks

Semisupervised learning is a type of machine learningmethod that combines supervised and unsupervised learn-ing It can be applied when only a small number of labeleddata exist but a large number of unlabeled data are availableSemisupervised learning is important for reducing the cost ofacquiring labeled data and improving classifier performanceCommonly used methods include EM with generative mix-ture models self-training cotraining transductive supportvector machines and graph-based methods [59] Co-forestis a cotraining based semisupervised learning algorithm thatfirst learns an initial classifier from a small amount of labeleddata and then refines the classifier by further exploitinga larger number of unlabeled data to boost the classifierrsquosperformance When applying micro calcification detectionfor breast cancer diagnosis Ming Li et al [60] showed thatCo-forest can successfully enhance the performance of amodel trained on only a small amount of diagnosed samplesby utilizing the available undiagnosed samples

The above works inspired us to exploit a GAN to enlargeand enrich a training set of pulmonary nodules It alsomotivated us to implement the designed G2C-CAD

3 Materials and Methods

31 Experimentation Materials Insufficient labeled samplesrepresents a barrier to CAD progressThe emergence of GAN[57] begun to change this situation A GAN consists of twomain parts a generator G and a discriminator D The G isused to learn the distribution of real imagesThen it generatesrealistic images to attempt to fool the D The D attempts toperform true and false discriminations concerning receivedimages Throughout the process the G strives to make thegenerated image more realistic while the D tries to identifythe true and false images This process is equivalent to agame with two opponents Over time the G and D eventuallyachieve a dynamic equilibrium in which images generated bythe generator are highly similar to the real image distributionand the discriminator cannot determine whether a sampleis drawn from the true data or generated by the generatorDCGAN [61] is a GAN extension in which a CNN isintroduced to conduct unsupervised training The ability ofthe CNN to extract features is used to enhance the training ofthe generation network Building on this idea we constructeda 32 times 32 input scale DCGANThe discriminator architecturein DCGAN has 4 layers as shown in Figure 2

We tested a GAN trained from 9 samples to generateunlabeled-nodule sign patches and the result is gratifying asshown in Figure 3

32 CNN Feature-Based Fuzzy Co-Forest Method As dis-cussed in Section 31 whenwe gain a trainedGANwe also geta trained CNN discriminator simultaneously For each imagepatch passed though the discriminator we obtain a 128-dimensional CNN feature vector from the last convolutionallayer Features of 4 times 4 is hard for eyes to discern inFigure 4 we show an example of 32-dimensional 16 times 16CNN features of a sign patch extracted from layer 1 ofD

BioMed Research International 5

(a) Original sign patches (b) Patches generated by DCGAN

Figure 3 Samples of DCGAN-generated sign patches and real samples The image on the left shows examples of the seed samples used fortraining Most of the generated samples in (b) are realistic although some of the generated patches can be recognized as being fake (the 5thone in row 3) they do help to boost the CAD systemrsquos performance

(a) Input patch (b) Corresponding features

Figure 4 (a) An original input sign patch (b) the corresponding 32 CNN features abstracted from the layer 1 of the discriminator D

Cotraining random forest (Co-forest) is an upgradedalgorithm for the cotraining paradigmThe standard cotrain-ing algorithm has two strong assumptions (1) the samplesdistribution is consistent with that of the target functionsand (2) the different features extracted from the same datashould be conditionally independent Inmost cases howeverthese two strong assumptions are difficult to satisfy Co-forestuses an ensemble consisting of multiple classifiers to avoidthe constraints of standard cotraining The specific structureof random forests enables the Co-forest to take advantageof semisupervised learning and ensemble learning to betterlearn the distribution of the training data Existing Co-forestworks are based on traditional manually designed features[60 62] Here we try to extend the Co-forest approach todeep neural networks by utilizing the CNN features obtainedfrom the GANrsquos discriminator

For each generated realistic sign from DCGAN we canextract a 128-element 4 times 4 CNN feature vector Each 4 times4 CNN feature can be transformed to a 1 times 16 vector Ifwe assume that DCGAN runs N times then we will obtainN CNN feature vectors from the discriminator These CNNfeature vectors for the N image patches can be expressed by amatrix

Original F =[[[[[[[

11989110 119891112711989120 11989121271198911198730 119891119873127

]]]]]]]

(1)

119891 = [1198900 11989015] (2)

where in (1) f is a 16-element vector transformed from a 4 times4 feature patch Every row in Original F represents the 128-dimensional CNN features from one input image patch Thefeatures in each column are produced from the same filterThe n in 119891119899119898 represents the 119899th sign in N and m representsthemth feature of a signWe build a complete reference vector119903 = [1 1 1] |119903| = 16 The cosine similarity of any featuref to r can be calculated as a

119886 = 1198900 lowast 1 + 1198901 lowast 1 + sdot sdot sdot + 11989015 lowast 1radic11989002 + 11989012 + sdot sdot sdot + 119890152 times radic12 + 12 + sdot sdot sdot + 12 (3)

where 119890119894 is an element of f We use a distance matrix A ofthe relative cosine distances to r to replace the feature matrixOriginal F

A =[[[[[[[

11988610 119886112711988620 11988621271198861198730 119886119873127

]]]]]]] (4)

From matrices (1) and (4) we can see that for any twoelements 119886i and 119886j a smaller difference between the valuesof 119886i and 119886j indicates that their corresponding features 119891i and119891j are more similar The elements in each column of A area series of continuous-valued data To build a Co-forest the

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 2: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

2 BioMed Research International

1 Introduction

Pulmonary carcinomas are the most lethal disease in theworld Approximately 15 million people die due to pul-monary cancer every yearmdashfar higher than the mortalityrate of other diseases [1] In the United States alone thelung cancer deaths in 2018 will reach 154050 [2] Howevermost pulmonary tumors cause no symptoms Therefore thedisease is usually diagnosed at advanced stages resulting in alow overall 5-year survival rate of approximately 14 [3] Incontrast the 5-year survival rate of patientswith stage IAnon-small cell lung cancer that has been pathologically confirmedand resected or precision treated [4ndash6] can reach 83 [7ndash9] Thus early lung cancer detection can sharply decreasethe lung cancer mortality rate [10 11] Pulmonary primarycancers manifest as nodules in the early stage Comparedto chest X rays computed tomography (CT) has shownhigher sensitivity in detecting small lung nodules [3 12]Currently CT screening is the most recommended methodfor finding nodules [13 14] The associated increase in spiralCT screening has led to a growing burden on radiologists[15] Despite the higher resolution available today it is stilldifficult for radiologists to distinguish malignant nodulesfrom benign ones in low-dose CT (LDCT) images The ratesof resected benign pulmonary nodules can reach 50 duringsurgery [16ndash19] These unnecessary surgeries cause physicaland mental pain and impose additional financial burdens onpatients

A ldquosignrdquo in a CT lung scans refers to a radiologic findingthat suggests a specific disease process Understanding themeaning of a sign implies an understanding of the findingson the CT scan [20] Signs are also called ldquoCT featuresrdquo ldquoCTmanifestationrdquo ldquoCT patternsrdquo or sometimes ldquoCT findingsrdquo[21] Lobulation signs [22 23] spiculation signs [15 24ndash28] and some texture signs [29 30] play crucial roles inradiologistsrsquo ability to differentiate benign from malignantnodules [31 32] Noncentral calcification such as punctatesign or eccentric sign calcification usually indicates that anodule is malignant [33ndash35]Therefore it is highly importantto study methods for identifying the signs of pulmonarynodules automatically to assist radiologists in diagnosingpulmonary malignant nodules

One of the main methods is using Computer-AidedDetectionDiagnosis (CAD)The earliest conception of CADappeared in the 1960s [36 37] Its early idea was attemptingto ldquofully automate the chest examrdquo Over the decades thisexpectation has subsided (which seems to have happened tothe early enthusiasm regarding the capabilities of artificialintelligence systems in general) Currently the general agree-ment is that the focus should be on making useful computer-generated information available to physicians for decisionsupport rather than trying to make a computer act like adiagnostician [38] Various works on the CAD based on CTsigns have been published Han G et al [22] designed asliding-window-based framework to detect lobulation signSuzuki K et al [39] developed a computer-aided diagnostic(CAD) scheme to distinguish benign frommalignant nodulesin LDCT scans using a massive training artificial neural

network (MTANN) and concluded that spiculation signis a highly differentiated feature for distinguishing benignfrom malignant nodules CAD systems based on texturesign features have been investigated in several studies [40ndash42] Since AlexNet won the ImageNet challenge by a largemargin in 2012 deep learning techniques have flourishedrapidly in the image detection field Compared to traditionalclassification algorithms deep learning can extract mostdistinctive features automatically and can implement end-to-end operations However the size of the dataset required totrain a high capacity deep learning framework is quite largewhile generating labeled training data in the medical imageanalysis field is very expensive [43]

To address the dilemma of having only a small annotationset available for lung nodule sign recognition in this paperwe propose a semisupervised generative adversarial network(GAN) and a convolutional neural network (CNN)-basedCo-forest CAD scheme We apply the designed scheme toclassify four types of nodule signs that are highly relatedto lung cancer We call this proposed scheme G2C-CAD inabbreviation

Overall workflow of G2C-CAD is illustrated in Figure 1In the stage A we train a GAN and a Co-forest on theavailable small sample setThen in stage B we use the trainedGAN to generate a synthetic nodule patch and transfer it tothe trained CNN discriminator From the discriminator wegain the CNN-extracted features of the synthesized nodulepatch In stage C the CNN features are provided to the fuzzyCo-forest pretrained on the original sample set to conductsemisupervised learning for the five types of ROI patchesFinally the process iterates between stages B and C until atermination condition is met

The rest of this paper is organized as follows In Section 2we review the existing lung nodule classification algorithmsSection 3 presents our G2C-CAD algorithm We introducethe experimental method in Section 4 and present an analysisin Section 5 Section 6 concludes this paper

2 Related Work

Discriminating between benign or malignant nodules hasattracted the interest of a large number of researchersThe early discrimination methods were based primarily ontraditional machine learning algorithms such as k-nearestneighbors (k-NN) linear discriminant analysis (LDA) Bayes[44] rule based schemes decision trees (DT) and the sup-port vector machine (SVM) Krewer et al [45] tested severalclassifiers including DT k-NN and SVM on extractedtexture and shape features to discriminate malignant frombenign nodules By analyzing the experimental results theyfound that partly solid and nonsolid nodules have a highermalignancy rate than do solid nodules Colin Jacobs et al[46] developed and evaluated a computer aided diagnos-tic system for classifying lung nodules into solid partlysolid and nonsolid nodules This CADx system performsstatistical classification on the nodulesrsquo intensity- texture-and segmentation-based features using the k-NN algorithmXiabi Liu et al [47] proposed a feature selection method

BioMed Research International 3

GAN

Fuzzy Co-forest

Nodulepatches

Generator

Discriminator

Loss

Noise

100

CNNlayers

Leaky ReLU

middotmiddot

Feature Vector

Voting

k1

k2

kn

K

retraining

Figure 1 GAN- and CNN-based Co-forest computer aided diagnosis (G2C-CAD) system workflow

based on the FIsher criterion and genetic optimization (FIG)to address Common CT Imaging Signs of Lung (CISL)disease recognition problems They applied the FIG featureselection algorithm to bag-of-visual-words features wavelettransform-based features the Local Binary Pattern CT ValueHistogram features and others Their results showed thatFIG achieved high computational efficiency and was highlyeffective Tao Sun et al [48] investigated an SVM-basedCADx system for lung cancer classification using a total of488 input features that included textural features patientcharacteristics and morphological features to train the clas-sifier Hidetaka Arimura et al [49] developed a computerizedscheme to automatically detect lung nodules in LDCT imagesfor lung cancer screening They extracted possible noduleimages using a ring average filter identified a set of nodulecandidates by applying a multiple-gray-level thresholdingtechnique and removed false positives by using two rule-based schemes on the localized image features related tomorphology and gray levels Tao Sun et al [50] proposeda CADx system to predict the characteristics of solitarypulmonary nodules in lung CT to diagnose early stage lungcancer In their CADx systeman SVMmodelwas constructedthat exploited curvelet transform texture features 3 patientdemographic features and 9 morphological features Fang-fang Han et al [51] constructed a CADx system based on 50categories of 3D textural features extracted from gray levels acurvature cooccurrence matrix and gradients as well as othernodule volume data derivatives

The above CAD systemsmainly utilized features obtainedby traditional feature extraction algorithms Traditional fea-ture acquisition is based on manual design and selection

which requires experts with specialized heuristic knowledgeThe features obtained in this way are low-level features nearthe pixel level and the work scope is relatively small In theimage analysis workflow the final performance of the systemalso depends on the quality of prior preprocessing or seg-mentation stagesTherefore in the traditional CAD solutionstuning the classification performance is both complicated andarduous [52]

The emergence of the convolutional neural network(CNN) [53] solved this dilemma In 2012 the emergenceof AlexNet sparked a revival in the image detection fieldthrough deep learning techniques based on CNN featuresand CNNs have subsequently been used extensively inpulmonary imaging analysis Compared to traditional algo-rithms CNNs can extract more distinctive features auto-matically Two studies [52 54] demonstrated that CNNsare a promising technique in lung nodule identificationDeep learning techniques have the inherent superiority ofbeing able to automatically extract features and adjust theperformance seamlessly

In traditional CAD algorithms training only needs toseek an optimal discriminant surface from the manuallydesigned feature space In contrast deep learning networkssimultaneously attempt to find both the most significantdiscriminate features among large numbers of high-levelfeatures and an optimal classification surface As a resulttraining a deep learning network requires a massive amountof labeled samples [55]mdasha requirement that cannot be met inthe medical image analysis field because professional anno-tation is too expensive [56] Data augmentation techniquesproduce only limited effects Although transfer learning can

4 BioMed Research International

100Reshape Deconv Deconv Deconv

4times4times128 8times8times64 16times16times3232times32times3

Generator

Conv Conv Conv Full Connection

32times32times3 16times16times32 8times8times64 4times4times1281

DiscriminatorFigure 2 The architecture of the generator and the discriminator in the DCGANmodel

alleviate the problem of the lack of training examples to acertain extent the significant feature sequences vary betweendifferent classification tasks Thus transfer learning is not themost suitable method to cope with medical image analysistasks

Generative adversarial networks (GANs) [57] havedemonstrated the promising ability to generate visually real-istic images A GAN trained with limited annotated samplescan generate large numbers of realistic images ChuquicusmaMJ et al [58] conducted visual Turing tests to evaluate thedegree of realism in nodule images generated by a DCGANand showed that the generated samples can be used to boostthe diagnostic power by mining high-level discriminativeimage features and that the resulting features can be used totrain both radiologists and deep networks

Semisupervised learning is a type of machine learningmethod that combines supervised and unsupervised learn-ing It can be applied when only a small number of labeleddata exist but a large number of unlabeled data are availableSemisupervised learning is important for reducing the cost ofacquiring labeled data and improving classifier performanceCommonly used methods include EM with generative mix-ture models self-training cotraining transductive supportvector machines and graph-based methods [59] Co-forestis a cotraining based semisupervised learning algorithm thatfirst learns an initial classifier from a small amount of labeleddata and then refines the classifier by further exploitinga larger number of unlabeled data to boost the classifierrsquosperformance When applying micro calcification detectionfor breast cancer diagnosis Ming Li et al [60] showed thatCo-forest can successfully enhance the performance of amodel trained on only a small amount of diagnosed samplesby utilizing the available undiagnosed samples

The above works inspired us to exploit a GAN to enlargeand enrich a training set of pulmonary nodules It alsomotivated us to implement the designed G2C-CAD

3 Materials and Methods

31 Experimentation Materials Insufficient labeled samplesrepresents a barrier to CAD progressThe emergence of GAN[57] begun to change this situation A GAN consists of twomain parts a generator G and a discriminator D The G isused to learn the distribution of real imagesThen it generatesrealistic images to attempt to fool the D The D attempts toperform true and false discriminations concerning receivedimages Throughout the process the G strives to make thegenerated image more realistic while the D tries to identifythe true and false images This process is equivalent to agame with two opponents Over time the G and D eventuallyachieve a dynamic equilibrium in which images generated bythe generator are highly similar to the real image distributionand the discriminator cannot determine whether a sampleis drawn from the true data or generated by the generatorDCGAN [61] is a GAN extension in which a CNN isintroduced to conduct unsupervised training The ability ofthe CNN to extract features is used to enhance the training ofthe generation network Building on this idea we constructeda 32 times 32 input scale DCGANThe discriminator architecturein DCGAN has 4 layers as shown in Figure 2

We tested a GAN trained from 9 samples to generateunlabeled-nodule sign patches and the result is gratifying asshown in Figure 3

32 CNN Feature-Based Fuzzy Co-Forest Method As dis-cussed in Section 31 whenwe gain a trainedGANwe also geta trained CNN discriminator simultaneously For each imagepatch passed though the discriminator we obtain a 128-dimensional CNN feature vector from the last convolutionallayer Features of 4 times 4 is hard for eyes to discern inFigure 4 we show an example of 32-dimensional 16 times 16CNN features of a sign patch extracted from layer 1 ofD

BioMed Research International 5

(a) Original sign patches (b) Patches generated by DCGAN

Figure 3 Samples of DCGAN-generated sign patches and real samples The image on the left shows examples of the seed samples used fortraining Most of the generated samples in (b) are realistic although some of the generated patches can be recognized as being fake (the 5thone in row 3) they do help to boost the CAD systemrsquos performance

(a) Input patch (b) Corresponding features

Figure 4 (a) An original input sign patch (b) the corresponding 32 CNN features abstracted from the layer 1 of the discriminator D

Cotraining random forest (Co-forest) is an upgradedalgorithm for the cotraining paradigmThe standard cotrain-ing algorithm has two strong assumptions (1) the samplesdistribution is consistent with that of the target functionsand (2) the different features extracted from the same datashould be conditionally independent Inmost cases howeverthese two strong assumptions are difficult to satisfy Co-forestuses an ensemble consisting of multiple classifiers to avoidthe constraints of standard cotraining The specific structureof random forests enables the Co-forest to take advantageof semisupervised learning and ensemble learning to betterlearn the distribution of the training data Existing Co-forestworks are based on traditional manually designed features[60 62] Here we try to extend the Co-forest approach todeep neural networks by utilizing the CNN features obtainedfrom the GANrsquos discriminator

For each generated realistic sign from DCGAN we canextract a 128-element 4 times 4 CNN feature vector Each 4 times4 CNN feature can be transformed to a 1 times 16 vector Ifwe assume that DCGAN runs N times then we will obtainN CNN feature vectors from the discriminator These CNNfeature vectors for the N image patches can be expressed by amatrix

Original F =[[[[[[[

11989110 119891112711989120 11989121271198911198730 119891119873127

]]]]]]]

(1)

119891 = [1198900 11989015] (2)

where in (1) f is a 16-element vector transformed from a 4 times4 feature patch Every row in Original F represents the 128-dimensional CNN features from one input image patch Thefeatures in each column are produced from the same filterThe n in 119891119899119898 represents the 119899th sign in N and m representsthemth feature of a signWe build a complete reference vector119903 = [1 1 1] |119903| = 16 The cosine similarity of any featuref to r can be calculated as a

119886 = 1198900 lowast 1 + 1198901 lowast 1 + sdot sdot sdot + 11989015 lowast 1radic11989002 + 11989012 + sdot sdot sdot + 119890152 times radic12 + 12 + sdot sdot sdot + 12 (3)

where 119890119894 is an element of f We use a distance matrix A ofthe relative cosine distances to r to replace the feature matrixOriginal F

A =[[[[[[[

11988610 119886112711988620 11988621271198861198730 119886119873127

]]]]]]] (4)

From matrices (1) and (4) we can see that for any twoelements 119886i and 119886j a smaller difference between the valuesof 119886i and 119886j indicates that their corresponding features 119891i and119891j are more similar The elements in each column of A area series of continuous-valued data To build a Co-forest the

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 3: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 3

GAN

Fuzzy Co-forest

Nodulepatches

Generator

Discriminator

Loss

Noise

100

CNNlayers

Leaky ReLU

middotmiddot

Feature Vector

Voting

k1

k2

kn

K

retraining

Figure 1 GAN- and CNN-based Co-forest computer aided diagnosis (G2C-CAD) system workflow

based on the FIsher criterion and genetic optimization (FIG)to address Common CT Imaging Signs of Lung (CISL)disease recognition problems They applied the FIG featureselection algorithm to bag-of-visual-words features wavelettransform-based features the Local Binary Pattern CT ValueHistogram features and others Their results showed thatFIG achieved high computational efficiency and was highlyeffective Tao Sun et al [48] investigated an SVM-basedCADx system for lung cancer classification using a total of488 input features that included textural features patientcharacteristics and morphological features to train the clas-sifier Hidetaka Arimura et al [49] developed a computerizedscheme to automatically detect lung nodules in LDCT imagesfor lung cancer screening They extracted possible noduleimages using a ring average filter identified a set of nodulecandidates by applying a multiple-gray-level thresholdingtechnique and removed false positives by using two rule-based schemes on the localized image features related tomorphology and gray levels Tao Sun et al [50] proposeda CADx system to predict the characteristics of solitarypulmonary nodules in lung CT to diagnose early stage lungcancer In their CADx systeman SVMmodelwas constructedthat exploited curvelet transform texture features 3 patientdemographic features and 9 morphological features Fang-fang Han et al [51] constructed a CADx system based on 50categories of 3D textural features extracted from gray levels acurvature cooccurrence matrix and gradients as well as othernodule volume data derivatives

The above CAD systemsmainly utilized features obtainedby traditional feature extraction algorithms Traditional fea-ture acquisition is based on manual design and selection

which requires experts with specialized heuristic knowledgeThe features obtained in this way are low-level features nearthe pixel level and the work scope is relatively small In theimage analysis workflow the final performance of the systemalso depends on the quality of prior preprocessing or seg-mentation stagesTherefore in the traditional CAD solutionstuning the classification performance is both complicated andarduous [52]

The emergence of the convolutional neural network(CNN) [53] solved this dilemma In 2012 the emergenceof AlexNet sparked a revival in the image detection fieldthrough deep learning techniques based on CNN featuresand CNNs have subsequently been used extensively inpulmonary imaging analysis Compared to traditional algo-rithms CNNs can extract more distinctive features auto-matically Two studies [52 54] demonstrated that CNNsare a promising technique in lung nodule identificationDeep learning techniques have the inherent superiority ofbeing able to automatically extract features and adjust theperformance seamlessly

In traditional CAD algorithms training only needs toseek an optimal discriminant surface from the manuallydesigned feature space In contrast deep learning networkssimultaneously attempt to find both the most significantdiscriminate features among large numbers of high-levelfeatures and an optimal classification surface As a resulttraining a deep learning network requires a massive amountof labeled samples [55]mdasha requirement that cannot be met inthe medical image analysis field because professional anno-tation is too expensive [56] Data augmentation techniquesproduce only limited effects Although transfer learning can

4 BioMed Research International

100Reshape Deconv Deconv Deconv

4times4times128 8times8times64 16times16times3232times32times3

Generator

Conv Conv Conv Full Connection

32times32times3 16times16times32 8times8times64 4times4times1281

DiscriminatorFigure 2 The architecture of the generator and the discriminator in the DCGANmodel

alleviate the problem of the lack of training examples to acertain extent the significant feature sequences vary betweendifferent classification tasks Thus transfer learning is not themost suitable method to cope with medical image analysistasks

Generative adversarial networks (GANs) [57] havedemonstrated the promising ability to generate visually real-istic images A GAN trained with limited annotated samplescan generate large numbers of realistic images ChuquicusmaMJ et al [58] conducted visual Turing tests to evaluate thedegree of realism in nodule images generated by a DCGANand showed that the generated samples can be used to boostthe diagnostic power by mining high-level discriminativeimage features and that the resulting features can be used totrain both radiologists and deep networks

Semisupervised learning is a type of machine learningmethod that combines supervised and unsupervised learn-ing It can be applied when only a small number of labeleddata exist but a large number of unlabeled data are availableSemisupervised learning is important for reducing the cost ofacquiring labeled data and improving classifier performanceCommonly used methods include EM with generative mix-ture models self-training cotraining transductive supportvector machines and graph-based methods [59] Co-forestis a cotraining based semisupervised learning algorithm thatfirst learns an initial classifier from a small amount of labeleddata and then refines the classifier by further exploitinga larger number of unlabeled data to boost the classifierrsquosperformance When applying micro calcification detectionfor breast cancer diagnosis Ming Li et al [60] showed thatCo-forest can successfully enhance the performance of amodel trained on only a small amount of diagnosed samplesby utilizing the available undiagnosed samples

The above works inspired us to exploit a GAN to enlargeand enrich a training set of pulmonary nodules It alsomotivated us to implement the designed G2C-CAD

3 Materials and Methods

31 Experimentation Materials Insufficient labeled samplesrepresents a barrier to CAD progressThe emergence of GAN[57] begun to change this situation A GAN consists of twomain parts a generator G and a discriminator D The G isused to learn the distribution of real imagesThen it generatesrealistic images to attempt to fool the D The D attempts toperform true and false discriminations concerning receivedimages Throughout the process the G strives to make thegenerated image more realistic while the D tries to identifythe true and false images This process is equivalent to agame with two opponents Over time the G and D eventuallyachieve a dynamic equilibrium in which images generated bythe generator are highly similar to the real image distributionand the discriminator cannot determine whether a sampleis drawn from the true data or generated by the generatorDCGAN [61] is a GAN extension in which a CNN isintroduced to conduct unsupervised training The ability ofthe CNN to extract features is used to enhance the training ofthe generation network Building on this idea we constructeda 32 times 32 input scale DCGANThe discriminator architecturein DCGAN has 4 layers as shown in Figure 2

We tested a GAN trained from 9 samples to generateunlabeled-nodule sign patches and the result is gratifying asshown in Figure 3

32 CNN Feature-Based Fuzzy Co-Forest Method As dis-cussed in Section 31 whenwe gain a trainedGANwe also geta trained CNN discriminator simultaneously For each imagepatch passed though the discriminator we obtain a 128-dimensional CNN feature vector from the last convolutionallayer Features of 4 times 4 is hard for eyes to discern inFigure 4 we show an example of 32-dimensional 16 times 16CNN features of a sign patch extracted from layer 1 ofD

BioMed Research International 5

(a) Original sign patches (b) Patches generated by DCGAN

Figure 3 Samples of DCGAN-generated sign patches and real samples The image on the left shows examples of the seed samples used fortraining Most of the generated samples in (b) are realistic although some of the generated patches can be recognized as being fake (the 5thone in row 3) they do help to boost the CAD systemrsquos performance

(a) Input patch (b) Corresponding features

Figure 4 (a) An original input sign patch (b) the corresponding 32 CNN features abstracted from the layer 1 of the discriminator D

Cotraining random forest (Co-forest) is an upgradedalgorithm for the cotraining paradigmThe standard cotrain-ing algorithm has two strong assumptions (1) the samplesdistribution is consistent with that of the target functionsand (2) the different features extracted from the same datashould be conditionally independent Inmost cases howeverthese two strong assumptions are difficult to satisfy Co-forestuses an ensemble consisting of multiple classifiers to avoidthe constraints of standard cotraining The specific structureof random forests enables the Co-forest to take advantageof semisupervised learning and ensemble learning to betterlearn the distribution of the training data Existing Co-forestworks are based on traditional manually designed features[60 62] Here we try to extend the Co-forest approach todeep neural networks by utilizing the CNN features obtainedfrom the GANrsquos discriminator

For each generated realistic sign from DCGAN we canextract a 128-element 4 times 4 CNN feature vector Each 4 times4 CNN feature can be transformed to a 1 times 16 vector Ifwe assume that DCGAN runs N times then we will obtainN CNN feature vectors from the discriminator These CNNfeature vectors for the N image patches can be expressed by amatrix

Original F =[[[[[[[

11989110 119891112711989120 11989121271198911198730 119891119873127

]]]]]]]

(1)

119891 = [1198900 11989015] (2)

where in (1) f is a 16-element vector transformed from a 4 times4 feature patch Every row in Original F represents the 128-dimensional CNN features from one input image patch Thefeatures in each column are produced from the same filterThe n in 119891119899119898 represents the 119899th sign in N and m representsthemth feature of a signWe build a complete reference vector119903 = [1 1 1] |119903| = 16 The cosine similarity of any featuref to r can be calculated as a

119886 = 1198900 lowast 1 + 1198901 lowast 1 + sdot sdot sdot + 11989015 lowast 1radic11989002 + 11989012 + sdot sdot sdot + 119890152 times radic12 + 12 + sdot sdot sdot + 12 (3)

where 119890119894 is an element of f We use a distance matrix A ofthe relative cosine distances to r to replace the feature matrixOriginal F

A =[[[[[[[

11988610 119886112711988620 11988621271198861198730 119886119873127

]]]]]]] (4)

From matrices (1) and (4) we can see that for any twoelements 119886i and 119886j a smaller difference between the valuesof 119886i and 119886j indicates that their corresponding features 119891i and119891j are more similar The elements in each column of A area series of continuous-valued data To build a Co-forest the

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 4: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

4 BioMed Research International

100Reshape Deconv Deconv Deconv

4times4times128 8times8times64 16times16times3232times32times3

Generator

Conv Conv Conv Full Connection

32times32times3 16times16times32 8times8times64 4times4times1281

DiscriminatorFigure 2 The architecture of the generator and the discriminator in the DCGANmodel

alleviate the problem of the lack of training examples to acertain extent the significant feature sequences vary betweendifferent classification tasks Thus transfer learning is not themost suitable method to cope with medical image analysistasks

Generative adversarial networks (GANs) [57] havedemonstrated the promising ability to generate visually real-istic images A GAN trained with limited annotated samplescan generate large numbers of realistic images ChuquicusmaMJ et al [58] conducted visual Turing tests to evaluate thedegree of realism in nodule images generated by a DCGANand showed that the generated samples can be used to boostthe diagnostic power by mining high-level discriminativeimage features and that the resulting features can be used totrain both radiologists and deep networks

Semisupervised learning is a type of machine learningmethod that combines supervised and unsupervised learn-ing It can be applied when only a small number of labeleddata exist but a large number of unlabeled data are availableSemisupervised learning is important for reducing the cost ofacquiring labeled data and improving classifier performanceCommonly used methods include EM with generative mix-ture models self-training cotraining transductive supportvector machines and graph-based methods [59] Co-forestis a cotraining based semisupervised learning algorithm thatfirst learns an initial classifier from a small amount of labeleddata and then refines the classifier by further exploitinga larger number of unlabeled data to boost the classifierrsquosperformance When applying micro calcification detectionfor breast cancer diagnosis Ming Li et al [60] showed thatCo-forest can successfully enhance the performance of amodel trained on only a small amount of diagnosed samplesby utilizing the available undiagnosed samples

The above works inspired us to exploit a GAN to enlargeand enrich a training set of pulmonary nodules It alsomotivated us to implement the designed G2C-CAD

3 Materials and Methods

31 Experimentation Materials Insufficient labeled samplesrepresents a barrier to CAD progressThe emergence of GAN[57] begun to change this situation A GAN consists of twomain parts a generator G and a discriminator D The G isused to learn the distribution of real imagesThen it generatesrealistic images to attempt to fool the D The D attempts toperform true and false discriminations concerning receivedimages Throughout the process the G strives to make thegenerated image more realistic while the D tries to identifythe true and false images This process is equivalent to agame with two opponents Over time the G and D eventuallyachieve a dynamic equilibrium in which images generated bythe generator are highly similar to the real image distributionand the discriminator cannot determine whether a sampleis drawn from the true data or generated by the generatorDCGAN [61] is a GAN extension in which a CNN isintroduced to conduct unsupervised training The ability ofthe CNN to extract features is used to enhance the training ofthe generation network Building on this idea we constructeda 32 times 32 input scale DCGANThe discriminator architecturein DCGAN has 4 layers as shown in Figure 2

We tested a GAN trained from 9 samples to generateunlabeled-nodule sign patches and the result is gratifying asshown in Figure 3

32 CNN Feature-Based Fuzzy Co-Forest Method As dis-cussed in Section 31 whenwe gain a trainedGANwe also geta trained CNN discriminator simultaneously For each imagepatch passed though the discriminator we obtain a 128-dimensional CNN feature vector from the last convolutionallayer Features of 4 times 4 is hard for eyes to discern inFigure 4 we show an example of 32-dimensional 16 times 16CNN features of a sign patch extracted from layer 1 ofD

BioMed Research International 5

(a) Original sign patches (b) Patches generated by DCGAN

Figure 3 Samples of DCGAN-generated sign patches and real samples The image on the left shows examples of the seed samples used fortraining Most of the generated samples in (b) are realistic although some of the generated patches can be recognized as being fake (the 5thone in row 3) they do help to boost the CAD systemrsquos performance

(a) Input patch (b) Corresponding features

Figure 4 (a) An original input sign patch (b) the corresponding 32 CNN features abstracted from the layer 1 of the discriminator D

Cotraining random forest (Co-forest) is an upgradedalgorithm for the cotraining paradigmThe standard cotrain-ing algorithm has two strong assumptions (1) the samplesdistribution is consistent with that of the target functionsand (2) the different features extracted from the same datashould be conditionally independent Inmost cases howeverthese two strong assumptions are difficult to satisfy Co-forestuses an ensemble consisting of multiple classifiers to avoidthe constraints of standard cotraining The specific structureof random forests enables the Co-forest to take advantageof semisupervised learning and ensemble learning to betterlearn the distribution of the training data Existing Co-forestworks are based on traditional manually designed features[60 62] Here we try to extend the Co-forest approach todeep neural networks by utilizing the CNN features obtainedfrom the GANrsquos discriminator

For each generated realistic sign from DCGAN we canextract a 128-element 4 times 4 CNN feature vector Each 4 times4 CNN feature can be transformed to a 1 times 16 vector Ifwe assume that DCGAN runs N times then we will obtainN CNN feature vectors from the discriminator These CNNfeature vectors for the N image patches can be expressed by amatrix

Original F =[[[[[[[

11989110 119891112711989120 11989121271198911198730 119891119873127

]]]]]]]

(1)

119891 = [1198900 11989015] (2)

where in (1) f is a 16-element vector transformed from a 4 times4 feature patch Every row in Original F represents the 128-dimensional CNN features from one input image patch Thefeatures in each column are produced from the same filterThe n in 119891119899119898 represents the 119899th sign in N and m representsthemth feature of a signWe build a complete reference vector119903 = [1 1 1] |119903| = 16 The cosine similarity of any featuref to r can be calculated as a

119886 = 1198900 lowast 1 + 1198901 lowast 1 + sdot sdot sdot + 11989015 lowast 1radic11989002 + 11989012 + sdot sdot sdot + 119890152 times radic12 + 12 + sdot sdot sdot + 12 (3)

where 119890119894 is an element of f We use a distance matrix A ofthe relative cosine distances to r to replace the feature matrixOriginal F

A =[[[[[[[

11988610 119886112711988620 11988621271198861198730 119886119873127

]]]]]]] (4)

From matrices (1) and (4) we can see that for any twoelements 119886i and 119886j a smaller difference between the valuesof 119886i and 119886j indicates that their corresponding features 119891i and119891j are more similar The elements in each column of A area series of continuous-valued data To build a Co-forest the

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 5: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 5

(a) Original sign patches (b) Patches generated by DCGAN

Figure 3 Samples of DCGAN-generated sign patches and real samples The image on the left shows examples of the seed samples used fortraining Most of the generated samples in (b) are realistic although some of the generated patches can be recognized as being fake (the 5thone in row 3) they do help to boost the CAD systemrsquos performance

(a) Input patch (b) Corresponding features

Figure 4 (a) An original input sign patch (b) the corresponding 32 CNN features abstracted from the layer 1 of the discriminator D

Cotraining random forest (Co-forest) is an upgradedalgorithm for the cotraining paradigmThe standard cotrain-ing algorithm has two strong assumptions (1) the samplesdistribution is consistent with that of the target functionsand (2) the different features extracted from the same datashould be conditionally independent Inmost cases howeverthese two strong assumptions are difficult to satisfy Co-forestuses an ensemble consisting of multiple classifiers to avoidthe constraints of standard cotraining The specific structureof random forests enables the Co-forest to take advantageof semisupervised learning and ensemble learning to betterlearn the distribution of the training data Existing Co-forestworks are based on traditional manually designed features[60 62] Here we try to extend the Co-forest approach todeep neural networks by utilizing the CNN features obtainedfrom the GANrsquos discriminator

For each generated realistic sign from DCGAN we canextract a 128-element 4 times 4 CNN feature vector Each 4 times4 CNN feature can be transformed to a 1 times 16 vector Ifwe assume that DCGAN runs N times then we will obtainN CNN feature vectors from the discriminator These CNNfeature vectors for the N image patches can be expressed by amatrix

Original F =[[[[[[[

11989110 119891112711989120 11989121271198911198730 119891119873127

]]]]]]]

(1)

119891 = [1198900 11989015] (2)

where in (1) f is a 16-element vector transformed from a 4 times4 feature patch Every row in Original F represents the 128-dimensional CNN features from one input image patch Thefeatures in each column are produced from the same filterThe n in 119891119899119898 represents the 119899th sign in N and m representsthemth feature of a signWe build a complete reference vector119903 = [1 1 1] |119903| = 16 The cosine similarity of any featuref to r can be calculated as a

119886 = 1198900 lowast 1 + 1198901 lowast 1 + sdot sdot sdot + 11989015 lowast 1radic11989002 + 11989012 + sdot sdot sdot + 119890152 times radic12 + 12 + sdot sdot sdot + 12 (3)

where 119890119894 is an element of f We use a distance matrix A ofthe relative cosine distances to r to replace the feature matrixOriginal F

A =[[[[[[[

11988610 119886112711988620 11988621271198861198730 119886119873127

]]]]]]] (4)

From matrices (1) and (4) we can see that for any twoelements 119886i and 119886j a smaller difference between the valuesof 119886i and 119886j indicates that their corresponding features 119891i and119891j are more similar The elements in each column of A area series of continuous-valued data To build a Co-forest the

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 6: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

6 BioMed Research International

Child node t2

a1=0973a1=0977

t1=0975

c3

(a)

t2=0826

c1 c2

(b)

Figure 5 Traditional classification process

2

Child node t2

a1=0973

t1=0975

a=0977

c3

(a)

t2=0826

c1 c2

2

(b)

lt0975gt0975

lt0826gt0826

c1c2

c3t2

t1

(c)

Figure 6 Here (a) and (b) are fuzzy classification processes and (c) is the decision tree

first step is to construct decision trees by utilizing the labeledsamples [63] Assume that the total number of samples is NLand that the classes are c1 c2 119888k k To build a decision treewe randomly select S samples These S samples are sorted onA[119909] by the values of 119886i where x represents the xth column ofA We calculate the middle value between 119886ix and 119886i+1x as asplit point 119905ix The class information entropy of S is

Info (S) = minus 119896sum119895=1

119901119895 log2 (119901119895) (5)

where Pj represents the proportion of the category j samplesrelative to all samples and Sj is a subset of S constructed byall the elements of S belonging to class cj 119901j = |Sj||S|

When selecting feature 119905ix as the splitting node of thedecision tree the information entropy of S is

Info (A [x] 119905ix S) =10038161003816100381610038161003816119878119895lt119905119894119909 10038161003816100381610038161003816|119878| times Info (119878119895lt119905119894119909) +

10038161003816100381610038161003816119878119895gt119905119894119909 10038161003816100381610038161003816|119878|times Info (119878119895gt119905119894119909)

(6)

Then we compute the information gain

Gain (A [x] 119905ix S) = Info(S)-Info (A [x] 119905ix S) (7)

The split information entropy is

Split (A [x] 119905ix S) = minus(10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| times log2 (

10038161003816100381610038161003816119878119886lt119905119894119909 10038161003816100381610038161003816|119878| )

+10038161003816100381610038161003816119878119886gt119905119894 10038161003816100381610038161003816|119878| times log2(

10038161003816100381610038161003816119878119886gt119905119894119909 10038161003816100381610038161003816|119878| )) (8)

and the information gain ratio is

IGR = Gain (A [x] 119905119894119909 S)Split (A [x] 119905119894119909 S) (9)

In a column of attributes A[x] a suitable threshold 119905maxto divide A[x] into two intervals is A[x]1(119886i lt 119905max 119886i isinA[x]1) A[x]2(119886i gt 119905max 119886i isin A[x]2) This split producesthe maximal information gain ratio This 119905max is then selectedas the parent node to generate two children This processis conducted recursively to build a decision tree until atermination criterion is matched

In the reasoning process if a samplersquos attribute value fallsinto a small region around 119905ix after it is disturbed by noiseit can easily be misclassified As shown in Figure 5 supposewe have a sample whose real attribute values are a1=0977and a2=0827 however due to noise during collection a1 ischanged to 0973 In a traditional decision tree this samplewould be misclassified as 1198882

To avoid this fragility when constructing a decision treewe utilize a fuzzy scheme [63] In a traditional decision treewhen a samplersquos attribute a is greater than 119905ix the samplebelongs to either 119888m or 119888n We modify this crisp classifyingmethod by selecting a neighborhood threshold 120576 around thesplit point 119905ix as shown in Figures 6(a) and 6(b) so that theclassification function becomes

C (119904) = 119905119894 + 120576 minus a2120576 times c119898 119886 minus 119905119894 + 1205762120576 times c119899 (10)

where C(s) is a classification function that maps sample s toone of two weighted classes

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 7: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 7

Input the labeled set LBThe threshold 120599 of the confidence T is random treesrsquo number

ProcessInitialize a random forest containing T treesfor i in 1 T

ei0 = 05Wi0 = 0

endfort = 0Loop until all of the trees in Random Forest unchanged

t = t+1z = a new random vectorfor i in 1 T119890119894119905 = ClassificationErrorRate(119867119894L)1198711198611015840119894119905 = 120601

if(119890it lt 119890it-1)1198801015840119894119905 = 119878119886119898119901119897119890119889(119866 (119911) 119890119894119905minus1119882119894119905minus1119890119894119905 )for 119909u in1198801015840119894119905

if (119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u) gt 120579)1198711198611015840119894119905 = cup(119909u 119864i(119909u))119882119894119905 = 119882119894119905 +119872119886119909119862119900119899119891119894119889119890119899119888119890(119864i 119909u)endfor

endforfor i in 1 T

if(119890119894119905119882119894119905 lt 119890119894119905minus1119882119894119905minus1)119890i = LearnRandomTree(119871119861 cup 1198711198611015840119894119905)endfor

endLoopOutput E lowast (x) = argmaxsumfuzzy classes set119910 isin 119897119886119887119890119897 i 119890i(119909) = fuzzy classes set

Algorithm 1 G2C-CAD

As shown in Figure 6(c) along one branch the finalclassification 119888i is calculated as follows

119888i = 119898prod119895=1

119905119895 + 120576 minus a1198952120576 left branch of 119905119895119886119895 minus 119905119895 + 1205762120576 right branch of 119905119895

(11)

where 119888i is one class from 1198881 1198882 119888kWhen 120576 = 01 under the fuzzy classifying scheme

the classification result of disturbed example s is C(119904) =0253lowast1198881 0258lowast1198882 049lowast1198883 It is obvious that the maximalprobability of the final decision result is max(C(s)) = 1198883Based on this classification example we can see that the fuzzydecision scheme is more robust to noise

The classification result of the fuzzy Co-forest is amultiple-label probability distribution of the union of thedecision treesrsquo output we consider the class with the highestprobability as the final output

Let LB denote the labeled set G(z) denotes the process ofgenerating a new fake sign-image patch utilizing the trainedDCGAN There are N classifiers in the Co-forest ensemble119864lowast 119890i (i = 1 N) We denote one of the classifiers inensemble 119864i as the concomitant ensemble of 119890i and createa subensemble that includes all the classifiers except 119890i For anewly generated sample from G if the max fuzzy vote sum

of the classifiers in concomitant ensemble 119864i exceeds a presetthreshold 120579 the sample will be copied to a newly labeled set1198711198611015840119894 with the new assigned label Based on [64] the processiterates until 119890119894119905minus1119882119894119905minus1119890119894119905 is larger than 119882119894119905Then the set119871119861 cup 1198711198611015840119894119905 is used to refine 119890i 119890 denotes the classificationerror rate and 119882119894119905 = sum119898119894119905119895=0119908119894119905119895 where 119908119894119905119895 is the predictedconfidence of 119864119894 on 1198711198611015840119894119905 and119898119894119905 is the size of set 1198711198611015840119894119905

Based on the fuzzy decision tree scheme we construct thefuzzy Co-forest as shown in Algorithm 1

4 Experiments

41 Datasets We collected sample instances from both theLIDC-IDRI and LISS datasets LIDC-IDRI [65] consists ofpulmonary medical image files (such as CT scans and X-rays) with corresponding pathological annotations The datawere collected by the National Cancer Institute to studyearly cancer detection in high-risk populations LISS [21]consists a set of CISLs collected by the Cancer Institute andHospital at the Chinese Academy ofMedical Sciences and theBeijing Institute of Technology intended for computer-aideddetection and diagnosis research andmedical education LISScontains 271 CT scans and 677 abnormal regions includingnine categories of CISLs

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 8: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

8 BioMed Research International

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 54 101

Diameter length summary

0100200300400500600700

Figure 7 The diameter-size distribution of the nodules in the LIDC-IDRI dataset

Table 1 The scheme of sign selection

Used as signs AbandonedSubtlety - -Internal structure - -Calcification =4 ltgt4Sphericity - -Margin - -Lobulation gt=4 lt4Spiculation gt=4 lt4Texture lt=2 gt2

42 LIDC-IDRI Instances In the LIDC-IDRI CT imagingslices most of the annotated nodules have diameters less than32 pixels as shown in Figure 7 Therefore we choose 32 times 32as the input ROI size

A higher degree of lobulation speculation and nonsolidtexture signs indicates a greater probability of malignantnodules [66] A calcification sign usually indicates a benignnodule [66ndash68] except when it has a noncentral appearanceSigns of subtlety [69] internal structure [27] sphericity andmargin [33] have not been clearly proven to have a strongrelationship with malignancy To simplify the comparisonsin this experiment we selected nodules from LIDC withcalcification =4 (noncentral calcification) lobulation gt=4spiculation gt=4 texture lt=2 and malignancy gt=3 as exper-imental instances based on the selection rules shown inTable 1

LIDC-IDRI contains 21057 annotated nodules Weselected the 4 category nodules with signs such as noncen-tral calcification lobulation speculation and nonsolidGGOtexture that have a high prevalence of malignancy as theexperimental objects For the probability of malignancy inthese nodules we adopted the average value of 4 radiologistsrsquoscores The noncentral calcification lobulation spiculationand nonsolidGGO texture signs are illustrated in Table 2 Inaddition we randomly extracted 32 times 32-pixel image patchesfrom slices not annotated by any radiologist as negativesamples In total we used five types of image blocks in ourexperiment

Based on the center point of the merged regions anno-tated by 4 experts we extracted 32 times 32-pixel image blocksas the experimental input by following the selection criteriashown in Table 1 Among these samples we ensured that

a given category of ROI patches for a single patient doesnot appear in any two subsets simultaneously which helpedensure that the specificity of the trained individual networksis as high as possible

43 LISS Instances From LISS we selected signs of lobula-tion spiculation pleural indentation and GGO associatedwith malignant lung cancer [70ndash74] as the experimentalobjects The number of samples in each category is shown inTable 3

44 Evaluation Criteria To evaluate the performance ofthe algorithm presented in this paper we considered thefollowing criteria

(1) ROC The Receiver Operator Characteristic curve(ROC) is a method that comprehensively and simul-taneously reflects the sensitivity and specificity of theclassification result By comparing the classificationresults of different samples with the annotation labelsa series of sensitivity and specificity scores is cal-culated Then a curve is drawn using sensitivity asthe ordinate and 1 minus specificity as the abscissa Alarger area under the curve (AUC) indicates a higherdiagnostic accuracy On the ROC curve the pointclosest to the top left of the coordinate diagram is thecritical value that reflects the highest sensitivity andspecificity

(2) Confusion matrix A confusion matrix is also calledan error matrix and it is a visual representation of theclassification effect A confusion matrix can be usedto describe the relationship between the real categoryattribute of the sample data and the recognition resultIt is a method for evaluating classifier performanceand iswidely used in pattern recognition A confusionmatrix is also a performance evaluation method thatscholars often use when solving practical applicationproblems

45 Experimentation From the LIDC-IDRI database weacquired 590 noncentral calcification 565 lobulation 576spiculation 545 nonsolidGGO texture sign patches and2500 negative image patchesThe experiment was conductedaccording to the following steps

(1) Set the initial values of T 119890 and 120579 to 6 05 and 06respectively

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 9: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 9

Table 2 Sign samples of calcification lobulation spiculation and nonsolidGGO texture

Noncentral calcification Lobulation Spiculation NonsolidGGO texture

Subtlety 467 50 50 40 40 50 367 30Internal structure 10 10 10 10 10 10 10 10Calcification 40 40 60 60 60 60 60 60Sphericity 333 267 425 40 50 50 433 40Margin 367 467 45 40 20 40 267 10Lobulation 167 20 425 40 40 10 267 10Spiculation 133 133 225 40 40 50 167 20Texture 50 467 475 40 50 40 133 10

Table 3 Number of selected instances of LISS

CISL Category of Lesion RegionsGGO 45Lobulation 41Spiculation 29Pleural Indentation 45Negative 80

(2) Train the GAN until the discrepancy cost reaches abalance as illustrated in Figure 8

(3) Train a primary fuzzy Co-forest based on the originalsamples utilizing the features exported from thetrained DCGAN discriminator in Step (2)

(4) Input a random vector to DCGAN and transmit thegenerated features to the concomitant random fuzzydecision forest119864i in the primary Co-forest until119882119894119905 ge119890119894119905minus1119882119894119905minus1119890119894119905 for all classesIn this process if the maximal weight of the fuzzylabel from 119864i exceeds the threshold 120579 store bothfeatureswith thematched label otherwise discard thegenerated image and generate a new image

(5) Retrain the corresponding tree 119890i of 119864i(6) Test the performance of the system

As a comparison we trained a C45 random forest accordingto the same schemewhich has been used byG2C-CAD as thebaseline method

We also conducted experiments on samples obtainedfrom LISS

5 Results

We divided the dataset into two parts 90 for training and10 for validation In this way the nodule distribution in thevalidation subsets is consistent with the nodule distributionof the original dataset according to the radiologistsrsquo consensusof their evaluations at the nodule levelThe number of trees intheCo-forest classificationmodelwas 6 primary trainingwasconducted on the training data and finally the system was

further validated on the remaining 10 The sensitivity andspecificity of each class instance was calculated and comparedusing the ROC curves shown in Figure 9

In Figure 9 theAUCsof noncentral calcification negativeimage samples lobulation spiculation and nonsolidGGOtexture are 0946 0939 0912 0908 and 0887 respectivelyFrom the curves in Figure 9 we can see that the G2C-CADsystem achieves the highest overall classification accuracy onthe calcification sign Compared to noncentral calcificationand negative image samples the AUCs of lobulation spicula-tion and nonsolidGGO texture are relatively lower

To show the underlying classification error distribution aconfusion matrix of the 5 classification results is presented inFigure 10

The diagonal numbers in the confusion matrix representthe recognition accuracy rates of the corresponding categoryand the nondiagonal elements are misclassification rates iethe ratio of other category test samples that weremisclassifiedto this class From Figure 10 we can see that the noncentralcalcification sign has the highest classification accuracywhile the nonsolidGGO texture sign has the highest mis-classification rate Misclassified spiculation signs are mostlyrecognized as lobulation By comparing the test sampleswe find that many of these two types of samples have verysimilar textures From the confusion matrix we can alsosee that most misclassifications occur between lobulationspiculation and nonsolidGGO texture signs

6 Discussion

We performed a performance comparison on each categorybetween G2C-CAD and the C45 random forest modelusing the confusion matrix First we constructed confusionmatrices reflecting the accuracies of G2C-CAD and C45Then we calculated the difference matrix for those twoconfusion matrices as shown in Figure 11 The float numberson the diagonal represent the differences between G2C-CAD and C45 regarding their classification accuracy forthe corresponding category while the float numbers on thenondiagonal represent their differences in the classificationerror rate showing the numbers for each class that weremisclassified into the corresponding category

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 10: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

10 BioMed Research International

100000 200000 300000 4000000iteration

minus70

minus60

minus50

minus40

minus30

minus20

minus10

0

trai

n di

sc co

st

Figure 8 Training discrepancy cost of DCGAN

ROC Curves

AUCNon-central Calcification A = 0946Negative A = 0939Lobulation A = 0912Spiculation A = 0908Non-solidGGO Texture A = 0887

00

2

4

6

8

10

Sens

itivi

ty

2 4 6 8 10001 - Specificity

Figure 9 ROC curves for the 5 categories of sign ROI recognition results

Consequently positive numbers on the diagonal andlarger values indicate that G2C-CAD achieved a betterperformance than that of C45 For the elements that arenot on the diagonal the situation is the opposite As shownin Figure 11 no positive values occur in the nondiagonalelements which means that G2C-CAD possesses greaterdiscrimination ability between each category The elementvalues on the diagonal are all positive and the average valueof the diagonal numbers in the difference confusion matrixis 0144 demonstrating that our method has a better overallperformance

To verify the effectiveness of the fuzzy algorithm we alsoconducted multiclassification experiments using G2C-CADwithout the fuzzy algorithm and compared the performancesof the two algorithms using a difference confusion matrix asshown in Figure 12

FromFigure 12 we can see that the discrimination perfor-mance of the CAD system employing the fuzzy algorithm isobviously better than that of the nonfuzzy one Furthermoreas shown in Figure 12 the CAD performance of the fuzzyalgorithm better distinguishes the difficult lobulation signfrom the spiculation sign

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 11: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 11

Cal 0933 005 0011 0002 0004

Lob 0 0907 006 0026 0007

Spic 0 0077 0891 0032 0

Tex 001 0064 0046 0878 0002

NegCa

l

Lob

Spic

Tex

Neg

0 0034 0028 0012 0926

Figure 10 Confusion matrix of the predicted results

Cal 0119 minus0051 minus00039 minus00623 minus00018

Lob minus00183 01286 minus00994 minus00101 minus00008

Spic minus00067 minus01065 01317 minus0028 minus0024

Tex minus00422 minus00627 minus01119 02175 minus00007

Neg

Cal

Lob

Spic

Tex

Neg

minus00002 minus006 minus00135 minus00495 01232

Figure 11 Performance difference matrix of G2C-CAD and C45

The experimental result on LISS shows higher perfor-mance the areas under the ROC curve of GGO lobulationspiculation pleural indentation and negative image samplesare 0972 0964 0941 0967 and 0953 respectively Bycomparing the training dataset and test samples we foundthat the main reason may be that the samples in differentcategories are more separable visually

7 Conclusion

In this paper by coupling a GAN with a semisupervisedlearning approach we proposed a G2C-CAD method todetect signs that are highly correlated with malignant pul-monary nodules We first trained a DCGAN on a smallsample set Then we extracted the features from the CNNdiscriminator trained with the training samples and usedthem to train a primary fuzzy Co-forest classifier Then weuse the trained DCGAN to generate large amounts of realistic

fake samples Based on these fake samples we conductedsemisupervised learning with the fuzzy Co-forest and finallyobtained a classifierwith excellent performance By validatingon the LIDC dataset the area under the ROC curve for fivesign types noncentral calcification negative image sampleslobulation spiculation and nonsolidGGO texture reached0946 0939 0912 0908 and 0887 respectively On the LISSdataset the proposed system showed comprehensively higherclassification performances than those of a trained C45classifier The experimental results show that the proposedG2C-CAD is an appropriate method for solving the problemof insufficient samples in the medical image analysis fieldMoreover our system can also be used to establish a trainingsample library for CAD classification diagnosis which holdsgreat significance for future medical image analysis

In future work we plan to combine this method withmultiple-instance learning to performweak supervised learn-ing directly on CT slice images or to extend the algorithm

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 12: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

12 BioMed Research International

00083 minus0002 minus00023 minus00032 minus00008

minus00013 00159 minus00074 minus00061 minus00011

00011 minus00044 00122 minus00042 minus00025

minus00002 minus00031 minus00029 00074 minus00012

minus00008 minus00036 minus00013 minus00039 00096

Cal

Lob

Spic

Tex

NegCa

l

Lob

Spic

Tex

Neg

Figure 12 Performance difference matrix of fuzzy G2C-CAD and nonfuzzy G2C-CAD

making it suitable for use in the 3D medical image classifica-tion field

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Disclosure

The publicly accessible medical images database of LIDCIDRI is used

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was supported in part by National Natural ScienceFoundation of China (Grant nos 60973059 and 81171407) andProgram for New Century Excellent Talents in University ofChina (Grant no NCET-10-0044)

References

[1] ldquoCancersrdquo httpwwwwhointennews-roomfact-sheetsde-tailcancer

[2] ldquoCancer stat facts common cancer sitesrdquo httpsseercancergovstatfactshtmlcommonhtml

[3] S Diederich D Wormanns M Semik et al ldquoScreening forearly lung cancer with low-dose spiral CT prevalence in 817asymptomatic smokersrdquo Radiology vol 222 no 3 pp 773ndash7812002

[4] J F Williamson ldquoBrachytherapy technology and physics prac-tice since 1950 A half-century of progressrdquo Physics in Medicineand Biology vol 51 no 13 article no R18 pp R303ndashR325 2006

[5] J F Williamson S K Das M S Goodsitt and J O DeasyldquoIntroducing the medical physics dataset articlerdquo MedicalPhysics vol 44 no 2 pp 349-350 2017

[6] S Balik E Weiss N Jan et al ldquoEvaluation of 4-dimensionalcomputed tomography to 4-dimensional cone-beam computedtomography deformable image registration for lung canceradaptive radiation therapyrdquo International Journal of RadiationOncology ∙ Biology ∙ Physics vol 86 no 2 pp 372ndash379 2013

[7] C F Mountain ldquoRevisions in the international system forstaging lung cancerrdquo Chest vol 111 no 6 pp 1710ndash1717 1997

[8] M T van Rens J M van den Bosch A Brutel de la Riviereand H R Elbers ldquoPrognostic assessment of 2361 patients whounderwent pulmonary resection for non-small cell lung cancerstage I II and IIIArdquo CHEST vol 117 no 2 pp 374ndash379 2000

[9] E F Patz S Rossi D H Harpole J E Herndon and P CGoodman ldquoCorrelation of tumor size and survival in patientswith stage IA non-small cell lung cancerrdquo Chest vol 117 no 6pp 1568ndash1571 2000

[10] C I Henschke P Boffetta O Gorlova R Yip J O DeLanceyand M Foy ldquoAssessment of lung-cancer mortality reductionfrom CT screeningrdquo Lung Cancer vol 71 no 3 pp 328ndash3322011

[11] D Kumar A Wong and D A Clausi ldquoLung nodule classifi-cation using deep features in CT imagesrdquo in Proceedings of the12th Conference on Computer and Robot Vision CRV 2015 pp133ndash138 Canada June 2015

[12] M Kaneko K Eguchi and H Ohmatsu ldquoPeripheral lungcancer screening and detection with low-dose spiral CT versusradiographyrdquo Radiology vol 201 no 3 pp 798ndash802 1996

[13] S Swensen J Jett J Sloan et al ldquoScreening for lung cancerwith low-dose spiral computed tomographyrdquo American Journalof Respiratory and Critical Care Medicine vol 165 no 4 pp508ndash513 2002

[14] E Paci D Puliti A Lopes Pegna et al ldquoMortality survivaland incidence rates in the ITALUNG randomised lung cancerscreening trialrdquoThorax vol 72 no 9 pp 825ndash831 2017

[15] S Iwano TNakamura Y Kamioka andT Ishigaki ldquoComputer-aided diagnosis A shape classification of pulmonary nodulesimaged by high-resolution CTrdquo Computerized Medical Imagingand Graphics vol 29 no 7 pp 565ndash570 2005

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 13: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

BioMed Research International 13

[16] A Bernard and H Shennib ldquoResection of pulmonary nodulesusing video-assisted thoracic surgeryrdquo The Annals of ThoracicSurgery vol 61 no 1 pp 202ndash205 1996

[17] M J Mack S R Hazelrigg R J Landreneau and T E AcuffldquoThoracoscopy for the diagnosis of the indeterminate solitarypulmonary nodulerdquoThe Annals of Thoracic Surgery vol 56 no4 pp 825ndash832 1993

[18] S Siegelman E Zerhouni F Leo N Khouri and F StitikldquoCT of the solitary pulmonary nodulerdquo American Journal ofRoentgenology vol 135 no 1 pp 1ndash13 1980

[19] B A Keagy P J K Starek G F Murray J W Battaglini ME Lores and B R Wilcox ldquoMajor pulmonary resection forsuspected but unconfirmedmalignancyrdquoTheAnnals ofThoracicSurgery vol 38 no 4 pp 314ndash316 1984

[20] J Collins ldquoCT signs and patterns of lung diseaserdquo RadiologicClinics of North America vol 39 no 6 pp 1115ndash1135 2001

[21] G Han X Liu F Han et al ldquoThe LISSmdasha public databaseof common imaging signs of lung diseases for computer-aideddetection and diagnosis research and medical educationrdquo IEEETransactions on Biomedical Engineering vol 62 no 2 pp 648ndash656 2015

[22] G Han X LiuN Q Soomro et al ldquoEmpirical driven automaticdetection of lobulation imaging signs in lung CTrdquo BioMedResearch International vol 2017 2017

[23] C-Z Shi Q Zhao L-P Luo and J-X He ldquoSize of solitarypulmonary nodule was the risk factor of malignancyrdquo Journalof Thoracic Disease vol 6 no 6 pp 668ndash676 2014

[24] J J Erasmus J E Connolly H P McAdams and V L RogglildquoSolitary pulmonary nodules part I Morphologic evaluationfor differentiation of benign and malignant lesionsrdquo Radio-Graphics vol 20 no 1 pp 43ndash58 2000

[25] K Furuya S Murayama H Soeda et al ldquoNew classification ofsmall pulmonary nodules by margin characteristics on high-resolution CTrdquo Acta Radiologica vol 40 no 5 pp 496ndash5042016

[26] Z Yang S Sone S Takashima et al ldquoHigh-resolution CTanalysis of small peripheral lung adenocarcinomas revealed onscreening helical CTrdquo American Journal of Roentgenology vol176 no 6 pp 1399ndash1407 2001

[27] M F McNitt-Gray E M Hart N Wyckoff J W Sayre J GGoldin and D R Aberle ldquoA pattern classification approachto characterizing solitary pulmonary nodules imaged on highresolution CT Preliminary resultsrdquoMedical Physics vol 26 no6 pp 880ndash888 1999

[28] V Ambrosini S Nicolini P Caroli et al ldquoPETCT imaging indifferent types of lung cancer an overviewrdquoEuropean Journal ofRadiology vol 81 no 5 pp 988ndash1001 2012

[29] F Li S Sone H Abe H MacMahon and K Doi ldquoMalignantversus benign nodules at CT screening for lung cancer com-parison of thin-section CT findingsrdquo Radiology vol 233 no 3pp 793ndash798 2004

[30] B Ganeshan E Panayiotou K Burnand S Dizdarevic and KMiles ldquoTumour heterogeneity in non-small cell lung carcinomaassessed by CT texture analysis a potential marker of survivalrdquoEuropean Radiology vol 22 no 4 pp 796ndash802 2012

[31] F Li S Sone H Abe H MacMahon S G Armato and KDoi ldquoLung cancers missed at low-dose helical CT screening ina general population comparison of clinical histopathologicand imaging findingsrdquo Radiology vol 225 no 3 pp 673ndash6832002

[32] H Chen Y Xu Y Ma and B Ma ldquoNeural network ensemble-based computer-aided diagnosis for differentiation of lungnodules on CT images clinical evaluationrdquoAcademic Radiologyvol 17 no 5 pp 595ndash602 2010

[33] C V Zwirewich S Vedal R R Miller and N L Muller ldquoSoli-tary pulmonary nodule high-resolution CT and radiologic-pathologic correlationrdquo Radiology vol 179 no 2 pp 469ndash4761991

[34] Y J Jeong C A Yi and K S Lee ldquoSolitary pulmonary nodulesdetection characterization and guidance for further diagnosticworkup and treatmentrdquoAmerican Journal of Roentgenology vol188 no 1 pp 57ndash68 2007

[35] A N Khan H H Al-Jahdali C M Allen K L Irion S AlGhanem and S S Koteyar ldquoThe calcified lung nodule whatdoes it meanrdquo Annals of Thoracic Medicine vol 5 no 2 pp67ndash79 2010

[36] H C BeckerW J Nettleton P HMeyers JW Sweeney and CMNice ldquoDigital computer determination of amedical diagnos-tic index directly from chest X-ray imagesrdquo IEEE Transactionson Biomedical Engineering vol BME-11 no 3 pp 67ndash72 1964

[37] P H Meyers C M Nice H C Becker W J Nettleton J WSweeney and G R Meckstroth ldquoAutomated computer analysisof radiographic imagesrdquo Radiology vol 83 no 6 pp 1029ndash10341964

[38] B Van Ginneken B Ter Haar Romeny and M ViergeverldquoComputer-aided diagnosis in chest radiography a surveyrdquoIEEE Transactions on Medical Imaging vol 20 no 12 pp 1228ndash1241 2001

[39] K Suzuki F Li S Sone and K Doi ldquoComputer-aided diagnos-tic scheme for distinction between benign and malignant nod-ules in thoracic low-dose CT by use ofmassive training artificialneural networkrdquo IEEE Transactions on Medical Imaging vol 24no 9 pp 1138ndash1150 2005

[40] K Kanazawa Y Kawata N Niki et al ldquoComputer-aideddiagnosis for pulmonary nodules based on helical CT imagesrdquoComputerized Medical Imaging and Graphics vol 22 no 2 pp157ndash167 1998

[41] F Han H Wang G Zhang et al ldquoTexture feature analysis forcomputer-aided diagnosis on pulmonary nodulesrdquo Journal ofDigital Imaging vol 28 no 1 pp 99ndash115 2014

[42] H Hu Q Wang H Tang L Xiong and Q Lin ldquoMulti-slicecomputed tomography characteristics of solitary pulmonaryground-glass nodules Differences between malignant andbenignrdquoThoracic Cancer vol 7 no 1 pp 80ndash87 2016

[43] G-Y Zheng X-B Liu and G-H Han ldquoSurvey on medicalimage computer aided detection and diagnosis systemsrdquo RuanJian Xue BaoJournal of Software vol 29 no 5 pp 1471ndash15142018

[44] F H Edwards P S Schaefer S Callahan G M Graeber andR A Albus ldquoBayesian statistical theory in the preoperativediagnosis of pulmonary lesionsrdquoCHEST vol 92 no 5 pp 888ndash891 1987

[45] H Krewer B Geiger L O Hall et al ldquoEffect of texture featuresin computer aided diagnosis of pulmonary nodules in low-dosecomputed tomographyrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo13) pp3887ndash3891 IEEE Manchester UK October 2013

[46] C Jacobs E M van Rikxoort E T Scholten et al ldquoSolidpart-solid or non-solid classification of pulmonary nodulesin low-dose chest computed tomography by a computer-aideddiagnosis systemrdquo Investigative Radiology vol 50 no 3 pp 168ndash173 2015

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 14: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

14 BioMed Research International

[47] X Liu L Ma L Song Y Zhao X Zhao and C Zhou ldquoRec-ognizing common CT imaging signs of lung diseases througha new feature selection method based on fisher criterion andgenetic optimizationrdquo IEEE Journal of Biomedical and HealthInformatics vol 19 no 2 pp 635ndash647 2015

[48] T Sun J Wang X Li et al ldquoComparative evaluation of supportvectormachines for computer aided diagnosis of lung cancer inCT based on a multi-dimensional data setrdquo Computer Methodsand Programs in Biomedicine vol 111 no 2 pp 519ndash524 2013

[49] H Arimura S Katsuragawa K Suzuki et al ldquoComputerizedscheme for automated detection of lung nodules in low-dose computed tomography images for lung cancer screeningrdquoAcademic Radiology vol 11 no 6 pp 617ndash629 2004

[50] T Sun R Zhang J Wang X Li and X Guo ldquoComputer-aideddiagnosis for early-stage lung cancer based on longitudinal andbalanced datardquo PLoS ONE vol 8 no 5 Article ID e63559 2013

[51] F Han C L Novak S Aylward et al ldquoA new 3D texture featurebased computer-aided diagnosis approach to differentiate pul-monary nodulesrdquo Proceedings of the SPIE Article ID 86702Z2013

[52] K-L Hua C-H Hsu S C Hidayati W-H Cheng and Y-J Chen ldquoComputer-aided classification of lung nodules oncomputed tomography images via deep learning techniquerdquoOncoTargets andTherapy 2015

[53] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[54] W Shen M Zhou F Yang C Yang and J Tian ldquoMulti-scaleconvolutional neural networks for lung nodule classificationrdquoin Proceedings of the International Conference on InformationProcessing in Medical Imaging vol 24 pp 588ndash599 Springer2015

[55] N Tajbakhsh and K Suzuki ldquoComparing two classes of end-to-end machine-learning models in lung nodule detection andclassificationMTANNs vs CNNsrdquo Pattern Recognition vol 63pp 476ndash486 2017

[56] H Greenspan B Van Ginneken and R M Summers ldquoGuesteditorial deep learning inmedical imaging overview and futurepromise of an exciting new techniquerdquo IEEE Transactions onMedical Imaging vol 35 no 5 pp 1153ndash1159 2016

[57] I J Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the 28th Annual Conferenceon Neural Information Processing Systems 2014 NIPS 2014 pp2672ndash2680 Canada December 2014

[58] M J Chuquicusma S Hussein J Burt and U Bagci ldquoHowto fool radiologists with generative adversarial networks Avisual turing test for lung cancer diagnosisrdquo in Proceedings of the2018 IEEE 15th International Symposium Biomedical Imagingpp 240ndash244 IEEE 2018

[59] X Zhu ldquoSemi-supervised learning literature surveyrdquo ComputerScience University ofWisconsin-Madison vol 2 no 3 p 4 2006

[60] M Li andZ-H Zhou ldquoImprove computer-aided diagnosis withmachine learning techniques usingundiagnosed samplesrdquo IEEETransactions on Systems Man and Cybernetics Systems vol 37no 6 pp 1088ndash1098 2007

[61] A Radford L Metz and S Chintala ldquoUnsupervised represen-tation learning with deep convolutional generative adversarialnetworksrdquo 2015 httpsarxivorgabs07123011

[62] Y Liu Z Xing C Deng P Li and M Guo ldquoAutomaticallydetecting lung nodules based on shape descriptor and semi-supervised learningrdquo in Proceedings of the 2010 International

Conference on Computer Application and System ModelingICCASM 2010 pp V1647ndashV1650 China October 2010

[63] Y Peng and P Flach ldquoSoft discretization to enhance thecontinuous decision tree inductionrdquo Integrating Aspects of DataMining Decision Support and Meta-Learning vol 1 no 34 pp109ndash118 2001

[64] N Settouti M E Habib Daho M E Amine Lazouni and MA Chikh ldquoRandom forest in semi-supervised learning (Co-Forest)rdquo in Proceedings of the 2013 8th International Workshopon Systems Signal Processing and Their Applications WoSSPA2013 pp 326ndash329 Algeria May 2013

[65] S G Armato III G McLennan L Bidaut et al ldquoThe lungimage database consortium (lidc) and image database resourceinitiative (idri) a completed reference database of lung noduleson ct scansrdquoMedical Physics vol 38 no 2 pp 915ndash931 2011

[66] D E Ost and M K Gould ldquoDecision making in patientswith pulmonary nodulesrdquo American Journal of Respiratory andCritical Care Medicine vol 185 no 4 pp 363ndash372 2012

[67] M M Goodsitt H Chan T W Way S C Larson E GChristodoulou and J Kim ldquoAccuracy of the CT numbersof simulated lung nodules imaged with multi-detector CTscannersrdquoMedical Physics vol 33 no 8 pp 3006ndash3017 2006

[68] MMGoodsittHChanTWWayM J Schipper S C Larsonand E G Christodoulou ldquoQuantitative CT of lung nodulesdependence of calibration on patient body size anatomicregion and calibration nodule size for single- and dual-energytechniquesrdquoMedical Physics vol 36 no 7 pp 3107ndash3121 2009

[69] P Opulencia D S Channin D S Raicu and J D FurstldquoRadLex and lung nodule image featuresrdquo Journal of DigitalImaging vol 24 no 2 pp 256ndash270 2011

[70] J-C Wang S Sone L Feng et al ldquoRapidly growing smallperipheral lung cancers detected by screening CT Correlationbetween radiological appearance and pathological featuresrdquoBritish Journal of Radiology vol 73 no 873 pp 930ndash937 2000

[71] S Takashima Y Maruyama M Hasegawa A Saito M Hani-uda and M Kadoya ldquoHigh-resolution CT features prognosticsignificance in peripheral lung adenocarcinoma with bronchi-oloalveolar carcinoma componentsrdquo Respiration vol 70 no 1pp 36ndash42 2003

[72] R M Lindell T E Hartman S J Swensen et al ldquoFive-yearlung cancer screening experience CT appearance growth ratelocation and histologic features of 61 lung cancersrdquo Radiologyvol 242 no 2 pp 555ndash562 2007

[73] T J Kim D H Han K N Jin and K Won Lee ldquoLung cancerdetected at cardiac CT prevalence clinicoradiologic featuresand importance of fullndashfield-of-view imagesrdquo Radiology vol255 no 2 pp 369ndash376 2010

[74] MVazquez D Carter E Brambilla et al ldquoSolitary andmultipleresected adenocarcinomas after CT screening for lung cancerhistopathologic features and their prognostic implicationsrdquoLung Cancer vol 64 no 2 pp 148ndash154 2009

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 15: A Novel Computer-Aided Diagnosis Scheme on Small Annotated ...downloads.hindawi.com/journals/bmri/2019/6425963.pdf · BioMedResearchInternational 100 Reshape Deconv Deconv Deconv

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom