automated treatment planning in radiation therapy using generative adversarial...

Proceedings of Machine Learning Research 85:1–15, 2018 Machine Learning for Healthcare

Automated Treatment Planning in Radiation Therapy using

Generative Adversarial Networks

Rafid Mahmood [email protected]

Department of Mechanical and Industrial EngineeringUniversity of Toronto, Toronto, ON, Canada

Aaron Babier [email protected]

Department of Mechanical and Industrial EngineeringUniversity of Toronto, Toronto, ON, Canada

Andrea McNiven [email protected]

Radiation Medicine ProgramPrincess Margaret Cancer Centre, Toronto, ON, Canada

Adam Diamant [email protected]

Schulich School of BusinessYork University, Toronto, ON, Canada

Timothy C. Y. Chan [email protected]

Department of Mechanical and Industrial Engineering

University of Toronto, Toronto, ON, Canada

Abstract

Knowledge-based planning (KBP) is an automated approach to radiation therapy treatmentplanning that involves predicting desirable treatment plans before they are then correctedto deliverable ones. We propose a generative adversarial network (GAN) approach for pre-dicting desirable 3D dose distributions that eschews the previous paradigms of site-specificfeature engineering and predicting low-dimensional representations of the plan. Experi-ments on a dataset of oropharyngeal cancer patients show that our approach significantlyoutperforms previous methods on several clinical satisfaction criteria and similarity metrics.

1. Introduction

Radiation therapy (RT) is one of the primary methods for treating cancer and is recom-mended for over 50% of all cancer patients (Delaney et al., 2005). In RT, a linear accelerator(linac) outputs high-energy x-ray beams from multiple angles around a patient to deliver aprescribed dose of radiation to a tumor while minimizing dose to the healthy tissue. An RTtreatment plan is the result of a complex design process involving multiple medical profes-sionals and several software systems. This includes specialized optimization software thatdetermines the beam characteristics (e.g., aperture shapes for each beam angle, dose deliv-ered from each aperture) required to deliver the final dose distribution. The optimizationmodel takes as input a set of computed tomography (CT) images of the patient, variousdosimetric objectives and constraints, and other parameters that guide the optimizationprocess. The model outputs a treatment plan that is subsequently evaluated by an oncol-

c� 2018 R. Mahmood, A. Babier, A. McNiven, A. Diamant & T.C.Y. Chan.

Automated Radiation Therapy using Generative Adversarial Networks

ogist. The oncologist usually proposes modifications to the plan, which then requires thetreatment planner to re-solve the optimization model using updated parameters. The totalprocess is labor intensive, time-consuming, and costly, as the back-and-forth between theplanner and oncologist is often repeated multiple times until the plan is finally approved.

The significant manual e↵ort associated with the current treatment planning paradigm,along with the fact that RT plans are generally quite similar for patients with similar geome-tries, has motivated researchers to investigate how automation can be used in the planningprocess (Sharpe et al., 2014). A key enabler of automation is known as knowledge-basedplanning (KBP), which leverages historically delivered treatments to generate new plans forsimilar patients. Figure 1 depicts the two main components of a KBP-driven automatedplanning system: (i) a machine learning model that uses CT-derived patient geometricfeatures to predict a clinically acceptable three-dimensional dose distribution (Appenzolleret al., 2012; Yang et al., 2013; Shiraishi et al., 2015; Younge et al., 2018); and (ii) an op-timization model that converts the prediction into a “deliverable” plan (McIntosh andPurdie, 2017; Wu et al., 2017; Babier et al., 2018b). The second step is needed to ensurethe treatment plan produced by the machine learning model satisfies the physical deliveryconstraints imposed by the linac.

A major drawback of most existing KBP prediction methods is their reliance on low-dimensional hand-tailored features derived from patient geometry to predict new dose dis-tributions. In contrast, we propose a new paradigm for generating KBP predictions thatautomatically learns to predict a 3D dose distribution directly from a CT image. Morespecifically, we recast the dose prediction problem as an image colorization problem, whichwe solve using a generative adversarial network (GAN) (Goodfellow et al., 2014). GANs,which have produced impressive results in other image colorization applications (Isola et al.,2017; Zhu et al., 2017), involve a pair of neural networks: a generator that performs a taskand a discriminator that evaluates how well the task is performed. In our application, thegenerator serves as a treatment planner that designs a treatment, while the discriminatorplays the role of the oncologist who critiques the generated dose distribution by comparingit to the real treatment plan. Both neural networks train simultaneously on historical data,e↵ectively replicating and aggregating the combined knowledge gained during the iterativemanual process used to design clinically acceptable treatments.

In this paper, we develop a novel automated treatment planning pipeline for oropha-ryngeal cancer that uses a GAN to predict 3D dose distributions. In contrast to previousmachine learning methods, our approach does not require the pre-specification of an ex-tensive set of feature variables for prediction. Instead, our model learns what features areimportant to produce clinically acceptable treatment plans. We apply our KBP method-ology to a dataset consisting of 26,279 CT images from 217 patients with oropharyngealcancer that have undergone radiation therapy. Approximately 60% of these images areused to train the GAN, which is used to predict high quality dose distributions for theremaining out-of-sample patients. These predictions are used as input into an optimizationmodel to produce deliverable plans. We compare our approach to several other techniques,including three feature-based machine learning models and a standard convolutional neu-ral network (CNN). We demonstrate that our approach outperforms all other models inachieving several clinically relevant criteria and in matching the clinical (benchmark) plans.

2


New patientgeometry

KBPprediction

Predicteddose

distribution

OptimizationDeliverable

plan

Figure 1: Overview of KBP-driven automated treatment planning pipeline.

Technical Significance We demonstrate the first use of GANs for generating radiationtreatment plans in cancer. We recast KBP prediction as an image colorization problemfor which GANs are known to perform well. Moreover, we provide the first full pipelinecomparison between di↵erent KBP prediction methods by optimizing the predicted dosedistribution and comparing the final result to deliverable plans. We find that, in thissetting, our GAN approach outperforms all other methods, including the latest in machinelearning-based KBP approaches, in meeting clinical criteria.

Clinical Relevance Oropharyngeal cancer is one of the most di�cult cancers to plana treatment for, and as a result, generating deliverable treatment plans is particularlytime consuming (Das et al., 2009). Our GAN approach automates the planning approachproducing, on average, plans that are superior to clinical ones in several key metrics. Oursite-independent method suggests similar performance for simpler sites, such as prostateand stomach cancers, while showing that high-quality oropharynx treatment plans can beautomatically generated.

2. Related work

2.1. Knowledge-based planning

Many di↵erent approaches have been tested for the machine learning component of a KBP-driven automated planning pipeline (cf. Figure 1). Query-based methods identify previ-ously treated patients who are su�ciently similar to the new patient, and use the historicallyachieved dose metrics as predictions for the new patient (Wu et al., 2009, 2011). Anothercommon approach uses principal component analysis (PCA), in conjunction with linearregression, to predict dose metrics for new patients (Zhu et al., 2011; Yuan et al., 2012).However, these well-established techniques only predict two-dimensional dose metrics. Re-cent research has shown that 3D dose distribution predictions can also be generated usingrandom forest or neural network-based models (Shiraishi and Moore, 2016; McIntosh et al.,2017; Nguyen et al., 2017). Nevertheless, for many of these approaches to work e↵ectively,significant e↵ort must be spent in feature engineering, i.e., introducing features specificto the cancer site. Furthermore, some of these approaches compare the predicted dosedistributions, rather than deliverable plans post-optimization, to the clinical plans.

For the optimization phase of KBP, there are two main approaches for turning pre-dictions into treatments: dose mimicking (Petersson et al., 2016) and inverse optimiza-tion (Chan et al., 2014). The dose mimicking model minimizes the L2 loss between thepredicted dose distribution and one that satisfies all physical constraints. Alternatively,inverse optimization (IO) is a methodology that estimates parameters of an optimizationproblem from its observed solutions (Ahuja and Orlin, 2001). In the RT context, IO findsparameters, e.g., objective function weights, that allow a deliverable treatment plan tore-create the predicted dose distribution as closely as possible (Chan et al., 2014). A key

3


advantage of inverse optimization is that it better replicates the trade-o↵s implicit in clinicaltreatment plans (Chan and Lee, 2018).

2.2. Generative adversarial networks

GANs are a well-studied class of deep learning algorithms used in generative modeling,i.e., in the creation of new data (Goodfellow et al., 2014). Although initially used toartificially generate 2D images, and later 3D models (Wu et al., 2016), their success hasgarnered increasing interest for healthcare applications. GANs have been used for medicaldrug discovery (Kadurin et al., 2017), generating artificial patient records (Choi et al.,2017; Esteban et al., 2017), the detection of brain lesions (Alex et al., 2017), and imageaugmentation for improved liver lesion classification (Frid-Adar et al., 2018).

A GAN consists of two neural networks, a generator and a discriminator, working intandem. The generator G(·) takes an initial random input z ⇠ p

z

and attempts to generatean artificial data sample x = G(z) (i.e., the 3D dose distribution). The discriminator D(·) isa classifier that takes generated and real data samples, and tries to identify which is which,i.e., D(x) 2 [0, 1] where 1 suggests the generated sample is satisfactory. The interactionbetween the networks can be formalized mathematically as a minimax game. If x ⇠ pdatais the probability distribution over the real data samples, then the game is defined as

minG

maxD

n

V (G,D) = Ex⇠pdata [logD(x)] + E

z⇠pz [log(1�D(G(z)))]o

.

GANs have been proven e↵ective in style transfer problems, where the generator inputz is a data sample corresponding to one style (or characteristic) and the output x is amapping to a di↵erent style (Isola et al., 2017; Zhu et al., 2017). For example, style transfercan be used to transform grayscale images to colored photos (Sangkloy et al., 2017), infacial recognition for surveillance-based law enforcement (Wang et al., 2017), and in 3Dreconstruction of damaged artifacts (Hermoza and Sipiran, 2017). Here, the generator G(z)learns the mapping between styles that generates samples resembling the ground truth.Since key structures in the output may be entangled with noise from the generator, thedesired output is often achieved by modifying the original minimax game with a penaltyterm on large deviations between the real and generated samples:

minG

maxD

n

V (G,D) + �Ex⇠pdata,z⇠pz [kx�G(z)k1]

o

, (1)

where � is a regularizer that balances the trade-o↵ between learning style and the real data.

3. Methods

We used contoured CT images and clinically acceptable dose distributions from the treat-ment plans of past oropharyngeal cancer patients to train a style transfer GAN. We thenpassed out-of-sample predicted dose distributions through an IO pipeline (Babier et al.,2018b) to generate the final treatment plans. For baseline comparisons, we also imple-mented several methods from the literature using the complete pipeline. Figure 2 shows ahigh-level overview of this automated planning pipeline.

4


PreprocessingContouredCT image

Clinical dosedistribution

Baselinemodels

GAN

Predictions

IO pipeline

Baselineplans

GAN plans

Plans

Figure 2: An schematic of our KBP-based automated planning pipeline.

3.1. Data

We obtained treatment plans from 217 oropharyngeal cancer patients treated at a singleinstitution with 6 MV, step-and-shoot, intensity-modulated radiation therapy machine. Allplans were for a prescription of 70 Gy, 63 Gy, and 56 Gy in 35 fractions to the gross disease,intermediate risk, and elective target volumes, respectively.

For each patient, we identified a set of targets and healthy organs-at-risk (OARs). Tar-gets were denoted as planning target volumes (PTVs) along with the oncologist-prescribeddose (e.g., PTV70 corresponds the target with the highest dose prescription). OARs in-cluded the brainstem, spinal cord, right and left parotids, larynx, esophagus, and mandible.Every voxel (a 3D pixel of size 4 mm ⇥ 4 mm ⇥ 2 mm) of a CT image was classified bytheir clinically drawn contours. All voxels were assigned a structure-specific color, and incases where the voxel was classified as both target and OAR, we reverted to target. Allunclassified tissue was left as the original CT image grayscale.

3.2. GAN model

We first divided each 3D CT image into 2D slices of 128⇥ 128 pixels. The generator useda single CT image slice to predict the dose distribution along that same plane withoutconsidering the vertical relationship between di↵erent slices. This process was repeated forevery slice until a full 3D dose distribution was produced. Our training set consisted of all2D slices from the 3D CT images for 130 patients, totaling 15,657 images. The CT imagesfrom the remaining 87 patients were used for out-of-sample evaluation.

Our GAN learning model was built on the pix2pix style transfer architecture of Isolaet al. (2017). We used a U-net generator that passed a 2D contoured CT image slice throughconsecutive convolution layers, a bottleneck layer, and then through several deconvolutionlayers. The U-net also employed skip connections, i.e., the output of each convolution layerwas concatenated to the input of a corresponding deconvolution layer. This allowed thegenerator to easily pass “high dimensional” information (e.g., structural outlines) betweenthe inputted CT image slice and the outputted dose slice. The discriminator passed a 2Dslice of the dose distribution along several consecutive convolution layers, outputting a singlescalar value. In the training phase, the discriminator received one real and one generateddose distribution before backpropagation. We disconnected the discriminator after training,

5


at which point the generator only received a contoured CT slice. We refer the reader toAppendix A for additional details regarding the network architectures.

We used the loss function given by (1) with � = 90, and trained using Adam (Kingmaand Ba, 2014), with learning rate 0.0002 and �1 = 0.5 and �2 = 0.999 for 25 epochs. Weused the default Adam settings from Isola et al. (2017), as they were proven to be good fora variety of di↵erent style transfer problems. While we swept through various values for �and the number of epochs, we found these default settings to be su�cient, with minimalsubsequent improvement. We found it useful to stop training when the loss functions wereroughly equal; if the loss from the l1 penalty fell too low, the GAN began to simply memorizethe dataset. The code for all experiments, along with the parameter settings is providedat http://github.com/rafidrm/gancer.

3.3. Plan generation

Predicted dose distributions were inputted into an IO pipeline to generate optimized plans.The IO model determined the weights of a parametric “forward” optimization model givena predicted dose distribution. The objective of the forward model was to minimize thesum of 65 objective functions: seven per OAR and three per target. Terms for the OARsincluded the mean dose, max dose, and the percentile (0.25, 0.50, 0.75, 0.90, and 0.975)above the maximum predicted dose to the OAR. Similarly, terms for the target includedthe maximum dose, average dose below prescription, and average dose above prescription.The complexity of the KBP-generated treatment plan was constrained to match the clinicaltreatment (Craft et al., 2007) where complexity represents a (convex) surrogate measure forthe physical deliverability of a plan. We note that in reality, there are additional constraintsin the IO pipeline that we omit for tractability. Thus, our notion of a deliverable plan doesnot include all physical constraints. Physical parameters for the optimization model werederived from A Computational Environment for Radiotherapy Research (Deasy et al.,2003). To replicate the clinical plans, all KBP-generated plans were delivered from nineequidistant coplanar beams at angles 0�, 40�, . . . , 320�. We used Gurobi 7.5 to solvethe inverse and forward optimization problems associated with the IO pipeline. Additionaldetails of the IO model can be found in Babier et al. (2018a).

3.4. Baseline approaches

We compared our GAN approach to generating predicted dose distributions with severalstate-of-the-art techniques. We briefly describe the baseline approaches here.

• Bagging query (BQ): A look-up method identifies patients with similar geometrieswho have undergone radiation therapy and outputs their doses as predictions. Thisapproach predicts dose volume histograms (DVHs), i.e., 2D summaries of the 3D dosedelivered to specific targets and OARs (e.g., Wu et al. (2009); Babier et al. (2018a)).

• Generalized PCA (gPCA): A method combining PCA with linear regression usingpatient geometry features. Similar to BQ, this method also predicts DVHs (e.g., Yuanet al. (2012); Babier et al. (2018a)).

6

http://github.com/rafidrm/gancer


• Random forest (RF): Predicts dose to each voxel (3D dose prediction) using tencustomized features based on patient geometry (inspired by McIntosh et al. (2017)).Additional details can be found in Appendix B.

• U-net (CNN): Predicts dose to each voxel in 2D slices from a CT image using aU-net convolution neural network architecture (e.g., Nguyen et al. (2017)).

All baseline predictions were fed into the same IO pipeline as the GAN approach to ensurea fair comparison between deliverable plans.

4. Results

4.1. Sample generated dose distributions

We observed that the style transfer function mapping the CT image to the predicted dosedistribution appeared easy to learn. This is because the GAN generated dose distributionshad the hallmarks of a deliverable plan, like the sharp dose gradients that are generated byindividual beams. However, there were subtle deliverability characteristics that the GANcould not always identify. The optimization step enforced these physical deliverabilityconstraints to correct for these idiosyncracies. This result can be observed in Figure 3,where five sample slices of a clinical, predicted, and optimized plan are presented.

4.2. Clinical criteria satisfaction

We measured plan quality by evaluating how frequently they satisfied the standard clinicalcriteria for oropharyngeal cancer treatment plans; see Table 1. Clinicians commonly usecriteria satisfaction as a metric to evaluate plan quality and approve a treatment plan afterit satisfies a su�cient number of the criteria. Thus, each criterion (one per OAR and target)was measured on a pass-fail basis depending on whether the mean dose D

mean

, maximumdose D

max

, or the dose to 99% of the volume of that structure D99, was above or below agiven threshold. To facilitate the comparisons, we scaled the GAN and baseline treatmentplans so that their PTV D99 was equal to the PTV D99 of the corresponding clinical plan.

Table 2 presents the percentage of the GAN and baseline treatment plans that satisfiedthe clinical criteria. We note that clinically acceptable plans typically cannot satisfy allcriteria simultaneously because of the proximity of the targets to the OARs and the com-plexity of the head-and-neck site in general. We observed that the BQ and gPCA planstended to satisfy PTV criteria more frequently, which suggested that they may recommenddelivering a higher dose to the target relative to the clinical plan. However, they failed toachieve mean and maximum dose criteria to the OARs (note: there are more than triplethe number of OAR criteria as PTV criteria once all plans are normalized to D99 of thePTV70). On the other hand, the RF plans appeared to satisfy fewer clinical criteria associ-ated with the target as compared to the clinical plans. The CNN plans achieved the closestlevel of performance to the clinical plans. However, the GAN plans had the best overallperformance among all approaches. They o↵ered a balanced trade-o↵ between the OARsand targets, and even outperformed the clinical plans on clinical criteria satisfaction.

The previous results focused on pass-fail performance with respect to the clinical criteria.We also examined the magnitude of passing or failing via head-to-head comparisons of

7


CT

Clinical

GAN

Prediction

GAN

Plan

Figure 3: Sample of slices from a test patient. From top to bottom: contoured CT image(generator input), clinical plan (ground truth), GAN prediction, and GAN plan(post optimization).

the GAN/baseline plans to the clinical plans, and between the GAN and CNN plans (seeFigure 4). The x-axis in each figure is the di↵erence in Gray (Gy) between the KBP and theclinical plans (KBP minus clinical) for the criterion on the corresponding y-axis. We foundthat for each criterion, the majority of GAN plans outperformed their clinical counterpartsby several Gy (Figure 4(e)). This is a significant result given that the clinical plans wereheavily optimized and delivered to actual patients. The BQ, gPCA, and RF plans displayedsubstantial variability in performance when compared to the clinical plan. Consistent withTable 2, performance of the CNN plans were closest to the GAN plans although, as shownin Figure 4(f), the GAN plans maintained a small, yet consistent, advantage.

Finally, we compared the KBP plans against the clinical plans using the gamma passingrate (GPR) metric. GPR measures the similarity between two dose distributions on a voxel-by-voxel basis, computing for each voxel, a pass-fail test. We considered the standard choiceof GPR, i.e., a 3%/3 mm tolerance (Low et al., 1998), which roughly means a voxel in theevaluated dose distribution (KBP) “passes” if there is at least one voxel in the referencedose distribution (clinical) within 3 mm that receives a dose that is within ±3% of thereference dose. Table 3 summarizes the average GPR achieved over all KBP-generatedplans. A score of 1.0 means that every voxel has passed the criteria; in other words, the twodose distributions were considered identical (within the tolerance). Overall, we observed

8


Structure Criteria

Brainstem Dmax

54 GySpinal Cord D

max

48 GyRight Parotid D

mean

26 GyLeft Parotid D

mean

26 GyLarynx D

mean

45 GyEsophagus D

mean

45 GyMandible D

max

73.5 GyPTV56 D99 � 53.2 GyPTV63 D99 � 59.9 GyPTV70 D99 � 66.5 Gy

Table 1: Clinical criteria used to evaluate all plans. Dmean

refers to the mean dose, Dmax

the maximum dose, and D99 dose to 99% of the structure.

BQ gPCA RF CNN GAN Clinical

OAR criteria 61.6% 65.8% 71.5% 72.5% 72.8% 72.0%PTV criteria 83.5% 85.7% 68.0% 76.3% 81.3% 76.8%All criteria 67.6% 71.2% 70.7% 73.6% 75.2% 73.3%

Table 2: Frequency of clinical criteria satisfaction.

that the GAN plans generated dose distributions that most closely resembled the clinicaldose distributions, followed by the CNN, and then the gPCA plans. Notably, the GANdose distributions best resembled the clinical dose distribution around the target, which isof primary importance. The GAN plans performed less well on the OARs, but this resultwas expected given the results from Table 2, which indicated that the GAN plans achievedmore OAR clinical criteria than the clinical plan (i.e., the GAN was able to deliver a lowerdose to the OARs as compared to the clinical dose distribution).

BQ gPCA RF CNN GAN

All OARs 0.548 0.584 0.535 0.566 0.549All PTVs 0.533 0.728 0.503 0.741 0.761All Structures 0.536 0.669 0.518 0.670 0.675

Table 3: Average GPR for each population of KBP plans compared to clinical plans.

5. Discussion and Future Work

In this paper, we proposed the first GAN-based KBP method to generate radiation therapytreatment plans. We trained our complete pipeline on 130 patients, tested on 87 out-of-sample patients diagnosed with oropharyngeal cancer, and compared our technique with

9


Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70

Clinical Criteria Difference (Gy)

-10 0 10 20

BQ plan better

(a) BQ � clinical

Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70


-10 0 10 20

gPCA plan better

(b) gPCA � clinical

Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70


-10 0 10 20

RF plan better

(c) RF � clinical

Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70


-10 0 10 20

CNN plan better

(d) CNN � clinical

Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70


-10 0 10 20

GAN plan better

(e) GAN � clinical

Brainstem

Spinal Cord

Right Parotid

Left Parotid

Larynx

Esophagus

Mandible

PTV56

PTV63

PTV70


-10 0 10 20

GAN plan better

(f ) GAN � CNN

Figure 4: Head-to-head comparisons: (a)–(e) the plans from each KBP-generated modelversus their clinical counterparts where positive di↵erence implies the KBP-generated plans were better; (f) the plans from the GAN versus the CNN. Upperand lower boundaries of each box represent the 75th and 25th percentiles respec-tively, and the vertical line in the box depicts the median. Whiskers extend to1.5 times the interquartile range. The line across each plot provides a referencefor zero di↵erence.

several state-of-the-art planning methods including a query-based approach, a PCA-basedmethod, a random forest, and a CNN. All methods were evaluated on standard clinicalcriteria for plan evaluation (i.e., OARs sparing and target coverage), showing that theGAN plans outperformed all baseline KBP methods. We also demonstrated that the GANplans outperformed the clinical plans by satisfying additional criteria on OAR dose sparingand target dose coverage. Finally, we used the gamma passing rate, a standard metric inthe radiation therapy literature, to evaluate the similarity of the full 3D dose distributionbetween the KBP and clinical plans demonstrating that the GAN plans were the mostsimilar to clinical plans on average. Note that the performance of automated planningmethods should be measured based on their ability to re-create clinical quality plans withminimal manual e↵ort. Of course, if the auto-generated plans manage to improve upon theclinical plans, that would be even better.

Our approach eschews the classical paradigm of predicting low-dimensional represen-tations, or engineering features, by training a generic neural network to learn desirabledose distributions. Specifically, the GAN recasts KBP prediction as an image colorizationproblem. Moreover, the GAN is trained by mimicking the iterative process between the

10


treatment planner and oncologist; the generator network acts as the treatment planner bydesigning dose distributions while the discriminator acts as the oncologist by determiningwhether the plans are good or bad. The implication is that selecting the appropriate neuralnetwork architecture may be su�cient when creating an automated KBP pipeline that gen-erates deliverable plans. Further, our approach does not add site-specific feature variableswhich suggests that the good performance we observe may not be limited to patients withoropharyngeal cancer. Finally, since the GAN plans improve upon the clinical plans, it maybe useful to analyze the results to generate useful insights for practitioners.

We envision two interesting directions for future work. First, we plan to explore howGANs can develop treatment plans for di↵erent cancer sites. By adding site labels, weexpect that a GAN can learn from the augmented training set of di↵erent cancer sitesto better develop plans for specific sites. Second, we hope to automate the preprocessingstage by using uncontoured CT images. As neural networks show increasing promise forautomated image segmentation (i.e., tumor and healthy organ identification), we hope toleverage this work to improve our treatment plan prediction model.

Acknowledgments

This study was approved by the institutional research ethics board. Support for this researchwas provided by the Natural Sciences and Engineering Research Council of Canada.

References

R. K. Ahuja and J. B. Orlin. Inverse optimization. Operations Research, 49(5):771–783,2001.

V. Alex, M. S. KP, S. S. Chennamsetty, and G. Krishnamurthi. Generative adversarialnetworks for brain lesion detection. In Medical Imaging 2017: Image Processing, volume10133, page 101330G. International Society for Optics and Photonics, 2017.

L. M. Appenzoller, J. M. Michalski, W. L. Thorstad, S. Mutic, and K. L. Moore. Predictingdose-volume histograms for organs-at-risk in imrt planning. Medical physics, 39(12):7446–7461, 2012.

A. Babier, J. J. Boutilier, A. L. McNiven, and T. C. Y. Chan. Knowledge-based automatedplanning for oropharyngeal cancer. Med Phys, 45(7):2875–2883, Jul 2018a.

A. Babier, J. J. Boutilier, M. B Sharpe, A. L. McNiven, and T. C. Y. Chan. Inverseoptimization of objective function weights for treatment planning using clinical dose-volume histograms. Phys Med Biol, 63(10):105004, May 2018b.

T. C. Y. Chan and T. Lee. Trade-o↵ preservation in inverse multi-objective convex opti-mization. accepted to European Journal of Operations Research, 2018.

T. C. Y. Chan, T. Craig, T. Lee, and M. B. Sharpe. Generalized inverse multiobjectiveoptimization with application to cancer therapy. Oper. Res., 62(3):680–95, 2014.

11


E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun. Generating multi-labeldiscrete electronic health records using generative adversarial networks. arXiv preprintarXiv:1703.06490, 2017.

D. Craft, P. Suss, and T. Bortfeld. The tradeo↵ between treatment plan quality and requirednumber of monitor units in intensity-modulated radiotherapy. Int. J. Radiat. Oncol. Biol.Phys., 67(5):1596–605, 2007.

I. J. Das, V. Moskvin, and P. A. Johnstone. Analysis of treatment planning time amongsystems and planners for intensity-modulated radiation therapy. J Am Coll Radiol, 6(7):514–7, Jul 2009. doi: 10.1016/j.jacr.2008.12.013.

J. O. Deasy, A. I. Blanco, and V. H. Clark. CERR: a computational environment forradiotherapy research. Med. Phys., 30(5):979–85, 2003.

G. Delaney, S. Jacob, C. Featherstone, and M. Barton. The role of radiotherapy in cancertreatment. Cancer, 104(6):1129–1137, 2005.

C. Esteban, S. L. Hyland, and G. Ratsch. Real-valued (medical) time series generation withrecurrent conditional gans. arXiv preprint arXiv:1706.02633, 2017.

M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. Gan-based synthetic medical image augmentation for increased cnn performance in liver lesionclassification. arXiv preprint arXiv:1803.01229, 2018.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,and Y. Bengio. Generative adversarial nets. In Advances in neural information processingsystems, pages 2672–2680, 2014.

R. Hermoza and I. Sipiran. 3d reconstruction of incomplete archaeological objects using agenerative adversary network. arXiv preprint arXiv:1711.06363, 2017.

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditionaladversarial networks. arXiv preprint, 2017.

A. Kadurin, S. Nikolenko, K. Khrabrov, A. Aliper, and A. Zhavoronkov. druGAN: AnAdvanced Generative Adversarial Autoencoder Model for de Novo Generation of NewMolecules with Desired Molecular Properties in Silico. Molecular Pharmaceutics, 14(9):3098–3104, 2017.

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980, 2014.

D. A. Low, W. B. Harms, S. Mutic, and J. A. Purdy. A technique for the quantitativeevaluation of dose distributions. Medical physics, 25(5):656–661, 1998.

C. McIntosh and T. G. Purdie. Voxel-based dose prediction with multi-patient atlas selectionfor automated radiotherapy treatment planning. Phys Med Biol, 62(2):415–431, Jan 2017.doi: 10.1088/1361-6560/62/2/415.

12


C. McIntosh, M. Welch, A. McNiven, D. A. Ja↵ray, and T. G. Purdie. Fully automatedtreatment planning for head and neck radiotherapy using a voxel-based dose predictionand dose mimicking method. Phys. Med. Biol., 62(15):5926–5944, 2017.

D. Nguyen, T. Long, X. Jia, W. Lu, X. Gu, Z. Iqbal, and S. Jiang. Dose prediction with u-net: A feasibility study for predicting dose distributions from contours using deep learningon prostate imrt patients. arXiv preprint arXiv:1709.09233, 2017.

K. Petersson, P. Nilsson, P. Engstrom, T. Knoos, and C. Ceberg. Evaluation of dual-arcvmat radiotherapy treatment plans automatically generated via dose mimicking. ActaOncologica, 55(4):523–525, 2016.

P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays. Scribbler: Controlling deep image synthesiswith sketch and color. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), volume 2, 2017.

M. B. Sharpe, K. L. Moore, and C. G. Orton. Within the next ten years treatment planningwill become fully automated without the need for human intervention. Medical physics,41(12), 2014.

S. Shiraishi and K. L. Moore. Knowledge-based prediction of three-dimensional dose distri-butions for external beam radiotherapy. Med. Phys., 43(1):378, 2016.

S. Shiraishi, J. Tan, L. A. Olsen, and K. L. Moore. Knowledge-based prediction of planquality metrics in intracranial stereotactic radiosurgery. Med. Phys., 42(2):908, 2015.

N. Wang, W. Zha, J. Li, and X. Gao. Back projection: an e↵ective postprocessing methodfor gan-based face sketch synthesis. Pattern Recognition Letters, 2017.

B. Wu, F. Ricchetti, G. Sanguineti, M. Kazhdan, P. Simari, M. Chuang, R. Taylor,R. Jacques, and T. McNutt. Patient geometry-driven information retrieval for IMRTtreatment plan quality control. Med. Phys., 36(12):5497–505, 2009.

B. Wu, F. Ricchetti, G. Sanguineti, M. Kazhdan, P. Simari, R. Jacques, R. Taylor, andT. McNutt. Data-driven approach to generating achievable dose-volume histogram ob-jectives in intensity-modulated radiotherapy planning. Int. J. Radiat. Oncol. Biol. Phys.,79(4):1241–7, 2011.

B. Wu, M. Kusters, M. Kunze-Busch, T. Dijkema, T. McNutt, G. Sanguineti, K. Bzdusek,A. Dritschilo, and D. Pang. Cross-institutional knowledge-based planning (KBP) imple-mentation and its performance comparison to auto-planning engine (APE). Radiother.Oncol., 123(1):57–62, 2017.

J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latentspace of object shapes via 3d generative-adversarial modeling. In Advances in NeuralInformation Processing Systems, pages 82–90, 2016.

T. Yang, E. C. Ford, B. Wu, M. Pinkawa, B. van Triest, P. Campbell, D. Y. Song, andT. R. McNutt. An overlap-volume-histogram based method for rectal dose prediction

13


and automated treatment planning in the external beam prostate radiotherapy followinghydrogel injection. Med. Phys., 40(1):011709, 2013.

K. C. Younge, R. B. Marsh, D. Owen, H. Geng, Y. Xiao, D. E. Spratt, J. Foy, K. Suresh,Q. J. Wu, F. Yin, S. Ryu, and M. M. Matuszak. Improving quality and consistency innrg oncology radiation therapy oncology group 0631 for spine radiosurgery via knowledge-based planning. Int J Radiat Oncol Biol Phys, 100(4):1067–1074, Mar 2018. doi: 10.1016/j.ijrobp.2017.12.276.

L. Yuan, Y. Ge, W. R. Lee, F. F. Yin, J. P. Kirkpatrick, and Q. J. Wu. Quantitativeanalysis of the factors which a↵ect the interpatient organ-at-risk dose sparing variationin IMRT plans. Med. Phys., 39(11):6868–78, 2012.

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation usingcycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.

X. Zhu, Y. Ge, T. Li, D. Thongphiew, F. Yin, and Q. J. Wu. A planning quality evaluationtool for prostate adaptive IMRT based on machine learning. Med. Phys., 38(2):719–26,2011.

14

automated treatment planning in radiation therapy using generative adversarial...

Documents