home | cancer immunology research - pvactools: …...cancer vaccines. this is a cross-disciplinary...

CANCER IMMUNOLOGY RESEARCH | RESEARCH ARTICLE

pVACtools: A Computational Toolkit to Identify andVisualize Cancer Neoantigens A C

Jasreet Hundal1, Susanna Kiwala1, Joshua McMichael1, Christopher A. Miller1,2,3, Huiming Xia1,Alexander T.Wollam1, Connor J. Liu1, Sidi Zhao1, Yang-Yang Feng1, Aaron P. Graubert1, Amber Z.Wollam1,Jonas Neichin1, Megan Neveau1, Jason Walker1, William E. Gillanders3,4, Elaine R. Mardis5,Obi L. Griffith1,2,3,6, and Malachi Griffith1,2,3,6

ABSTRACT◥

Identification of neoantigens is a critical step in predictingresponse to checkpoint blockade therapy and design of personalizedcancer vaccines. This is a cross-disciplinary challenge, involvinggenomics, proteomics, immunology, and computational approach-es. We have built a computational framework called pVACtoolsthat, when paired with a well-established genomics pipeline,produces an end-to-end solution for neoantigen characteriza-tion. pVACtools supports identification of altered peptidesfrom different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions.Prediction of peptide:MHC binding is accomplished by support-ing an ensemble of MHC Class I and II binding algorithmswithin a framework designed to facilitate the incorporation ofadditional algorithms. Prioritization of predicted peptides occursby integrating diverse data, including mutant allele expression,peptide binding affinities, and determination whether a mutation

is clonal or subclonal. Interactive visualization via a Web inter-face allows clinical users to efficiently generate, review, andinterpret results, selecting candidate peptides for individualpatient vaccine designs. Additional modules support designchoices needed for competing vaccine delivery approaches. Onesuch module optimizes peptide ordering to minimize junctionalepitopes in DNA vector vaccines. Downstream analysis com-mands for synthetic long peptide vaccines are available toassess candidates for factors that influence peptide synthesis.All of the aforementioned steps are executed via a modularworkflow consisting of tools for neoantigen prediction fromsomatic alterations (pVACseq and pVACfuse), prioritization,and selection using a graphical Web-based interface (pVACviz),and design of DNA vector–based vaccines (pVACvector)and synthetic long peptide vaccines. pVACtools is available athttp://www.pvactools.org.

IntroductionThe increasing use of cancer immunotherapies has spurred interest

in identifying and characterizing predicted neoantigens encoded by atumor's genome. The facility and precision of computational toolsfor predicting neoantigens have become increasingly important (1),and several resources aiding in these efforts have been published (2–4).Typically, these tools start with a list of somatic variants [in VariantCall Format (VCF) or other formats such as Browser Extensible

Data (BED)] with annotated protein changes and predict thestrongest MHC-binding peptides (8–11-mer for Class I MHC and13–25-mer for Class II) using one or more predictionalgorithms (5–7). The predicted neoantigens are then filtered andranked based on defined metrics, including sequencing read cov-erage, variant allele fraction (VAF), gene expression, and differen-tial binding compared with the wild-type peptide (agretopicityindex score; ref. 8). However, of the prediction tools overviewedin Supplementary Table S1 [Vaxrank (3), MuPeXI (2), Cloud-Neo (4), FRED2 (9), Epi-Seq (8), and ProTECT (10)], most lackkey functionality, including predicting neoantigens from genefusions, aiding optimized vaccine design for DNA cassette vaccines,and explicitly incorporating nearby germline or somatic alterationsinto the candidate neoantigens (11). In addition, none of these toolsoffer an intuitive graphical user interface for visualizing and effi-ciently selecting the most promising candidates—a key feature forfacilitating involvement of clinicians and other researchers in theprocess of neoantigen prediction and evaluation.

To address these limitations, we created a comprehensive andextensible toolkit for computational identification, selection, prioriti-zation, and visualization of neoantigens—pVACtools—which faci-litates each of the major components of neoantigen identificationand prioritization. This computational framework can be used toidentify neoantigens from a variety of somatic alterations, includinggene fusions and insertion/deletion frameshift variants, both ofwhich potentially create strong immunogenic neoantigens (12–14).pVACtools can facilitate both MHC Class I and II predictions andprovides an interactive display of predicted neoantigens for review bythe end user.

1McDonnell Genome Institute, Washington University School of Medicine, St.Louis, Missouri. 2Division of Oncology, Department of Medicine, WashingtonUniversity School of Medicine, St. Louis, Missouri. 3Siteman Cancer Center,Washington University School of Medicine, St. Louis, Missouri. 4Department ofSurgery,WashingtonUniversity School ofMedicine, St. Louis, Missouri. 5Institutefor Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio.6Department of Genetics, Washington University School of Medicine, St. Louis,Missouri.

Note: Supplementary data for this article are available at Cancer ImmunologyResearch Online (http://cancerimmunolres.aacrjournals.org/).

J. Hundal and S. Kiwala contributed equally to and are co-first authors of thisarticle.

Corresponding Authors: Malachi Griffith, Washington University School ofMedicine, Campus Box 8501, 4444 Forest Park Avenue, St. Louis, MO 63108.Phone: 314-286-1274; E-mail: [email protected]; andObi L. Griffith. Phone: 314-747-9248; E-mail: [email protected]

Cancer Immunol Res 2020;8:409–20

doi: 10.1158/2326-6066.CIR-19-0401

�2020 American Association for Cancer Research.

AACRJournals.org | 409

on December 11, 2020. © 2020 American Association for Cancer Research. cancerimmunolres.aacrjournals.org Downloaded from

Published OnlineFirst January 6, 2020; DOI: 10.1158/2326-6066.CIR-19-0401

http://crossmark.crossref.org/dialog/?doi=10.1158/2326-6066.CIR-19-0401&domain=pdf&date_stamp=2020-2-11

http://crossmark.crossref.org/dialog/?doi=10.1158/2326-6066.CIR-19-0401&domain=pdf&date_stamp=2020-2-11

http://www.pvactools.org


http://cancerimmunolres.aacrjournals.org/

Materials and MethodsThe Cancer Genome Atlas data preprocessing

Aligned (build GRCh38) tumor and normal BAMs from BWA(ref. 15; version 0.7.12-r1039) as well as somatic variant calls fromVarScan2 (refs. 16, 17; in VCF format) were downloaded from theGenomic Data Commons (GDC, https://gdc.cancer.gov/). Becausethe GDC does not provide germline variant calls for The CancerGenome Atlas (TCGA) data, we used GATK's (18) HaplotypeCallerto perform germline variant calling using default parameters. Thesecalls were refined using VariantRecalibrator in accordance withGATK Best Practices (19). Somatic and germline missense variantcalls from each sample were then combined using GATK's Combi-neVariants, and the variants were subsequently phased usingGATK's ReadBackedPhasing algorithm.

Phased Somatic VCF files were annotated with RNA depth andexpression information using VAtools (http://vatools.org). Werestricted our analysis to only consider “PASS” variants in theseVCFs, as these are higher confidence than the raw set, and thevariants were annotated using the “--pick” option in VEP (Ensemblversion 88).

Existing in silico HLA typing information was obtained from TheCancer Immunome Atlas (TCIA) database (20).

Neoantigen predictionThe VEP-annotated VCF files were then analyzed with

pVACseq using all eight Class I prediction algorithms for neoanti-gen peptide lengths 8–11. The current MHC Class I algorithmssupported by pVACseq are NetMHCpan (21), NetMHC (7, 21),NetMHCcons (22), PickPocket (23), SMM (24), SMMPMBEC(25), MHCflurry (26), and MHCnuggets (27). The four MHCClass II algorithms that are supported are NetMHCIIpan (28),SMMalign (29), NNalign (30), and MHCnuggets (31). For thedemonstration analysis, we limited our prediction to only MHCClass I alleles due to availability of HLA typing information fromTCIA, though binding predictions for Class II alleles can also begenerated using pVACtools.

Ranking of neoantigensTo help prioritize neoantigens, a rank is assigned to all neoantigens

that pass initial filters. Each of the following four criteria is assigned arank-ordered value (where the best is assigned rank ¼ 1):

B ¼ Rank of binding affinityF ¼ Rank of fold change between mutant and wild-type allelesM¼ Rank of mutant allele expression, calculated as (rank of gene

expression� rank ofmutant allele RNA variant allele fraction)D ¼ Rank of DNA variant allele fractionA final ranking is based on a score obtained from combining these

values: B þ F þ (M � 2) þ (D/2).Whereas agretopicity is considered in ranking, we do not rec-

ommend filtering candidates solely based on mutant versus wild-type binding value (default fold change value is 0). A previous studyon the binding properties of known neoantigen candidates showedthat most changes reside in T-cell receptor contact regions and notin anchor residues (32). To help end users make an informeddecision while selecting candidates, we also provide the positionof the variant in the final report. We do not rely on hard rules about“anchor positions” (e.g., position 2 and C-terminal) because thesemay vary by MHC allele and the peptide length. As we gain moreempirical results and validation data, we hope that future versions

of pVACtools may include more options for prioritizing based onthese factors.

The rank is not meant to be a definitive metric of peptide suitabilityfor vaccines, but was designed to be a useful first step in the peptideselection process.

ImplementationpVACtools is written in Python3. The individual tools are imple-

mented as separate command line entry points that can be run usingthe “pvacseq,” “pvacfuse,” “pvacvector,” “pvacapi,” and “pvacviz”commands for each respective tool. pVACapi is required to runpVACviz so both the “pvacapi” and “pvacviz” commands need to beexecuted in separate terminals.

The code test suite is implemented using the Python unittestframework, and GitHub integration tests are run using travis-ci(https://travis-ci.org). Code changes are integrated using GitHubpull requests (https://github.com/griffithlab/pVACtools/pulls). Fea-ture additions, user requests, and bug reports are managed using theGitHub issue tracking (https://github.com/griffithlab/pVACtools/issues). User documentation is written using the reStructuredTextmarkup language and the Sphinx documentation framework(https://www.sphinx-doc.org/en/master). Documentation is hostedon Read the Docs (https://readthedocs.org) and can be viewed athttp://www.pvactools.org.

Multithreading and parallelizationAs many prediction algorithms are CPU intensive, pVACseq,

pVACfuse, and pVACvector support using multiple cores to improveruntime. Using this feature, calls to the Immune Epitope Database(IEDB) and other prediction algorithms are made in parallel over auser-defined number of processes.

The pymp-pypi package was used to add support for parallelprocessing. The number of processes is controlled by the --n-threadsparameter.

pVACseqFor pVACseq, the pyvcf package is first used for parsing the input

VCF file and extracting information about the supported missense, in-frame insertion, in-frame deletion, and frameshift variants into TSVformat.

This output is then used to determine the wild-type peptidesequence by extracting a region around the somatic variant accordingto the --peptide-sequence-length specified by the user. The variant'samino acid change is incorporated in this peptide sequence to deter-mine the mutant peptide sequence. For frameshift variants, the newdownstream protein sequence calculated by VEP is reported from thevariant position onward. The number of downstream amino acids toinclude is controlled by the --downstream-sequence-length para-meter. If a phased VCF with proximal variants is provided, proximalmissense variants that are in phase with the somatic variant of interestare incorporated into the mutant and wild-type peptide sequences, asappropriate. The mutant and wild-type sequences are stored in aFASTA file. The FASTA file is then submitted to the individualprediction algorithms for binding affinity predictions. For algorithmsincluded in the IEDB, we use either the IEDB API or a standaloneinstallation, if an installation path is provided by the user (--iedb-install-directory). Themhcflurry andmhcnuggets packages are used torun theMHCflurry (26) andMHCnuggets (27) prediction algorithms,respectively.

The predicted mutant antigens are then parsed into a TSV reportformat, and, for each mutant antigen, the closest wild-type antigen is

Hundal et al.

Cancer Immunol Res; 8(3) March 2020 CANCER IMMUNOLOGY RESEARCH410



https://gdc.cancer.gov/

http://vatools.org

https://travis-ci.org

https://github.com/griffithlab/pVACtools/pulls

https://github.com/griffithlab/pVACtools/issues



https://www.sphinx-doc.org/en/master

https://readthedocs.org




determined and reported. Predictions for each mutant antigen/neoan-tigen from multiple algorithms are aggregated into the “best” (lowest)and median binding scores. The resulting TSV is processed throughmultiple filtering steps (as described below in Results: Comparison offiltering criteria): (i) Binding filter: this filter selects the strongestbinding candidates based on the mutant binding score and the foldchange (wild-type score/mutant score). Depending on the --top-score-metric parameter setting, thisfilter is applied to either themedian scoreacross all chosen prediction algorithms (default) or the best (lowest)score among the chosen prediction algorithms. (ii) Coverage filter:this filter accepts VAF and coverage information from the tumorDNA, tumor RNA, and normal DNA if these values are available in theinput VCF. (iii) Transcript-support-level (TSL) filter: this filterevaluates each transcript's support level if this information wasprovided by VEP in the VCF. (iv) Top-score filter: the filter picks thetop mutant peptide for each variant using the binding affinity asthe determining factor. This filter is implemented to only select thebest candidate from amongmultiple candidates that could result froma single variant due to different peptide lengths, variant registers,transcript sequences, andHLA alleles. The result of these filtering stepsis reported in a filtered report TSV. The remaining neoantigens arethen annotated with cleavage site and stability predictions byNetChopandNetMHCStabPan, respectively, and a relative rank (as described inthe Materials and Methods: Ranking of neoantigens) is assigned. Therank-ordered final output is reported in a condensed file. The pandaspackage is used for data management while filtering and ranking theneoantigen candidates.

pVACvectorWhen running pVACvector with a pVACseq output file, the

original input VCF must also be provided (--input-vcf parameter).The VCF is used to extract a larger peptide sequence around the targetneoantigen (length determined by the --input-n-mer parameter).Alternatively, a list of target peptide sequences can be provided in aFASTA file. The set of peptide sequences is then combined in allpossible pairs, and an ordering of peptides for the vector is produced asfollows:

To determine the optimal order of peptide–spacer–peptide combi-nations, binding predictions are made for all peptide registers over-lapping the junction. A directed graph is then constructed, with nodesdefined as target peptides, and edges representing junctions. The scoreof each edge is defined as the lowest binding score of its junctionalpeptides (a conservativemetric). Edgeswith scores below the thresholdare removed, and if heuristics indicate that a valid graph may exist, asimulated annealing procedure is used to identify a path through thenodes that maximizes junction scores (preserving the weakest overallpredicted binding for junctional epitopes). If no valid ordering isfound, additional “spacer” amino acids are added to each junction,binding affinities are recalculated, and a new graph is constructed andtested, setting edge weights equal to that of the best performing(highest binding score) peptide–junction–spacer combination.

The spacers used for pVACvector are set by the user with the--spacers parameter. This parameter defaults to None, AAY, HHHH,GGS, GPGPG, HHAA, AAL, HH, HHC, HHH, HHHD, HHL, andHHHC, where None is the placeholder for testing junctions without aspacer sequence. Spacers are tested iteratively, starting with the firstspacer in the list and adding subsequent spacers if no valid path isfound. If the user wishes to specify a custom spacer such as the furincleavage site (e.g., R-X-(R/K)-R), they can do so (e.g., --spacers RKKR).

If no result is found after testing with the full set of spacers, the endsof “problematic” peptides, where all junctions contain at least onewell-

binding epitope, will be clipped by removing one amino acid at a timeand then repeating the above binding and graph-building process. Thisclipping may be repeated up to the number of times specified in the--max-clip-length parameter.

It may be necessary to explore the parameter space when runningpVACvector. As binding predictions for some sites vary substantiallyacross algorithms, themost conservative settingsmay result in no validpaths, often due to one "outlier" prediction. Carefully selecting whichpredictors to run may help ameliorate this issue as well. In general,setting a higher binding threshold (e.g., 1,000 nmol/L) and usingthe median binding value (--top-score-metric median) will lead togreater possibility of a design, whereas more conservative settings of500 nmol/L and lowest/best binding value (--top-score-metric lowest)will give more confidence that there are no junctional neoepitopes.

Our current recommendation is to run pVACvector several differ-ent ways and choose the path resulting from the most conservative setof parameters that produces a vector design containing all candidateneoantigens.

pVACapi and pVACvizpVACapi is implemented using the Python libraries Flask and

Bokeh. The pVACviz client is written in TypeScript using the AngularWeb application framework, theClarityUI component library, and thengrx library for managing application state.

Data availabilityData from 100 cases each of melanoma, hepatocellular carcinoma,

and lung squamous cell carcinoma were obtained from TCGA (33)and downloaded via the GDC. This data can be accessed under dbGaPstudy accession phs000178. Data for demonstration and analysisof fusion neoantigens were downloaded from the GitHub repofor Integrate (https://github.com/ChrisMaherLab/INTEGRATE-Vis/tree/master/example).

Software availabilityThe pVACtools codebase is hosted publicly on GitHub at https://

github.com/griffithlab/pVACtools and https://github.com/griffithlab/BGA-interface-projects (pVACviz). User documentation is availableat http://www.pvactools.org. This project is licensed under the Non-Profit Open Software License version 3.0 (NPOSL-3.0, https://opensource.org/licenses/NPOSL-3.0). pVACtools has been packaged anduploaded to PyPI under the “pvactools” package name and can beinstalled on Linux systems by running the “pip install pvactools”command. Installation requires a Python 3.5 environment whichcan be emulated by using Conda. Versioned Docker images are avail-able on DockerHub (https://hub.docker.com/r/griffithlab/pvactools/).Releases are also made available on GitHub (https://github.com/griffithlab/pVACtools/releases).

Example pipeline for creation of pVACtools input filespVACtools was designed to support a standard VCF variant file

format and thus should be compatible with many existing variantcalling pipelines. However, as a reference, we provide the followingdescription of our current somatic and expression analysis pipeline(manuscript in preparation) which has been implemented usingdocker, common workflow language (CWL; ref. 34), and Crom-well (35). The pipeline consists of workflows for alignment of exomeDNA and RNA sequencing (RNA-seq) data, somatic and germlinevariant detection, RNA-seq expression estimation, and HLA typing.

This pipeline begins with raw patient tumor exome and cDNAcapture (36) or RNA-seq data and leads to the production of annotatedVCFs for neoantigen identification and prioritizationwith pVACtools.

pVACtools: Toolkit for Tumor Neoantigen Characterization

AACRJournals.org Cancer Immunol Res; 8(3) March 2020 411



https://github.com/ChrisMaherLab/INTEGRATE-Vis/tree/master/example



https://github.com/griffithlab/pVACtools



https://github.com/griffithlab/BGA-interface-projects



http://www.pvactools.org.

https://opensource.org/licenses/NPOSL-3.0



https://hub.docker.com/r/griffithlab/pvactools/

https://github.com/griffithlab/pVACtools/releases




Our pipeline consists of several modular components: DNA align-ment, HLA typing, germline and somatic variant detection, variantannotation, and RNA-seq analysis. More specifically, we use BWA-MEM (15) for aligning the patient's tumor and normal exome data.For germline variant calling, the output BAM then undergoesmerging [merging of separate instrument data; Samtools (37) Merge],query name sorting (Picard SortSam), duplicate marking (PicardMarkDuplicates), position sorting (Picard SortSam), and base qualityrecalibration (GATKBaseRecalibrator). GATK'sHaplotypeCaller (18)is used for germline variant calling, and the output variants wereannotated using VEP (38) and filtered for coding sequence variants.

For somatic variant calling, the pipeline is consistent with aboveexcept that the variant calling step combines the output of four variantdetection algorithms: Mutect2 (39), Strelka (40), Varscan (16, 17), andPindel (41). The combined variants are normalized using GATK'sLeftAlignAndTrimVariants to left-align the indels and trim commonbases. Vt (42) is used to split multiallelic variants. Several filters such asgnomAD allele frequency (maximum population allele frequency),percentage of mapq0 reads, as well as pass-only variants are appliedprior to annotation of the VCF using VEP. We use a combinationof custom and standard plugins for VEP annotation (parameters:--format VCF --plugin Downstream --plugin Wildtype --symbol--term SO --transcript_version --tsl --coding_only --flag_pick --hgvs).Variant coverage is assessed using bam-readcount (https://github.com/genome/bam-readcount) for both the tumor and normal DNA

exome data, and this information is also annotated into the VCFoutput using VAtools (http://vatools.org).

Our pipeline also generates a phased-VCF file by combining boththe somatic and germline variants and running the sorted combinedvariants through GATK ReadBackedPhasing.

For RNA-seq data, the pipeline first trims adapter sequences usingFlexbar (43) and aligns the patient's tumor RNA-seq data usingHISAT2 (44). Two different methods, Stringtie (45) and Kallisto (46),are employed for evaluating both the transcript and gene expressionvalues. In addition, coverage support for variants in RNA-seq data isdetermined with bam-readcount. This information is added to theVCF using VAtools and serves as an input for neoantigen prioritiza-tion with pVACtools.

Optionally, our pipeline can also run HLA-typing in silico usingOptiType (47) when clinical HLA typing is not available.

We aim to support various versions of these pipelines (via the use ofstandardized file formats) and are also actively developing an end-to-end public CWL workflow starting from sequence reads to pVACseqoutput (https://github.com/genome/analysis-workflows/blob/master/definitions/pipelines/immuno.cwl).

ResultsThe pVACtools workflow (Fig. 1) is divided into modular compo-

nents that can be run independently. The main tools in the workflow

Figure 1.

Overview of pVACtools workflow. The pVACtools workflow is highly modularized and is divided into flexible components that can be run independently. The maintools under the workflow include pVACseq for identifying and prioritizing neoantigens from a variety of somatic alterations (red inset box); pVACfuse (green) fordetecting neoantigens resulting from gene fusions; pVACviz (blue) for process management, visualization, and selection of results; and pVACvector (orange) foroptimizing design of neoantigens and nucleotide spacers in aDNAvector. All of these tools interact via the pVACapi (purple), anOpenAPI HTTPREST interface to thepVACtools suite.

Hundal et al.




https://github.com/genome/bam-readcount



http://vatools.org

https://github.com/genome/analysis-workflows/blob/master/definitions/pipelines/immuno.cwl




are: (i) pVACseq: a significantly enhanced and reengineered version ofour previous tool (48) for identifying and prioritizing neoantigensfrom a variety of tumor-specific alterations, (ii) pVACfuse: a tool fordetecting neoantigens resulting from gene fusions, (iii) pVACviz: agraphical user interface Web client for process management, visual-ization, and selection of results from pVACseq, (iv) pVACvector: atool for optimizing design of neoantigens and nucleotide spacers ina DNA vector that prevents high-affinity junctional neoantigens, and(v) pVACapi: an OpenAPI HTTP REST interface to the pVACtoolssuite.

pVACseqpVACseq (48) has been reimplemented in Python3 and extended to

include many new features since our initial report of its release.pVACseq no longer requires a custom input format for variants, andnow uses a standard VCF file annotated with VEP (38). In our ownneoantigen identification pipeline, this VCF is obtained by mergingresults frommultiple somatic variant callers and RNA expression tools(as described inMaterials andMethods: Example pipeline for creationof pVACtools input files). Information that is not natively available inthe VCF output from somatic variant callers (such as coverage andVAFs for RNA and DNA, as well as gene and transcript expressionvalues) can be added to the VCF using VAtools (http://vatools.org), asuite of accessory routines that we created to accompany pVACtools.pVACseq queries these features directly from the VCF, enablingprioritization and filtering of neoantigen candidates based on sequencecoverage and expression information. In addition, pVACseq nowmakes use of phasing information taking into account variants prox-imal to somatic variants of interest. Because proximal variants canchange the peptide sequence and affect neoantigen-binding predic-tions, this is important for ensuring that the selected neoantigenscorrectly represent the individual's genome (11).Wehave also expand-ed the supported variant types for neoantigen predictions to includein-frame indels and frameshift variants. These capabilities expand thepotential number of targetable neoantigens several-fold in manytumors (refs. 12, 14; Supplementary Fig. S1).

To prioritize neoantigens, pVACseq now offers support for eightdifferent MHC Class I antigen prediction algorithms and four MHCClass II prediction algorithms. The tool does this in part byleveraging the IEDB (49) and their suite of six different MHCClass I prediction algorithms, as well as three MHC Class IIalgorithms (as described in Materials and Methods: Neoantigenprediction). pVACseq supports local installation of these tools forhigh-throughput users, access through a docker container (https://hub.docker.com/r/griffithlab/pvactools), and ready-to-go accessvia the IEDB RESTful Web interface. In addition, pVACseq nowcontains an extensible framework for supporting new neoantigenprediction algorithms that has been used to add support fortwo new non-IEDB algorithms—MHCflurry (26) and MHCnuggets(27). By creating a framework that integrates many tools, we allowfor (i) a broader ensemble approach than IEDB and (ii) a systemthat other users can leverage to develop improved ensemble rank-ing, or to integrate proprietary or not-yet-public prediction soft-ware. This framework enables non-informatics expert users topredict neoantigens from sequence variant data sets.

Once neoantigens have been predicted, binding affinity values fromthe selected prediction algorithms are aggregated into a medianbinding score and a report file is generated. Finally, a rank is calculatedthat can be used to prioritize the predicted neoantigens. The rank takesinto account gene expression, sequence read coverage, binding affinitypredictions, and agretopicity (as described in the Materials and

Methods: Ranking of neoantigens). The pVACseq rank allows usersto pick the exemplar peptide for each variant from a large number oflengths, registers, alternative transcripts, and HLA alleles. In additionto applying strict binding affinity cutoffs, the pipeline also offerssupport for MHC allele-specific cutoffs (50). The allele-specific bind-ing filter incorporates allele-specific thresholds from IEDB for the38 most common HLA-A and HLA-B alleles, representative of thenine major supertypes, instead of the 500 nmol/L default cutoff.Because these allele-specific thresholds only include cutoffs forHLA-A and HLA-B alleles, we recommend evaluating binding pre-dictions for HLA-C alleles separately with a less stringent cutoff (suchas 1,000 nmol/L) and merging results when picking final candidates.

We also offer cleavage position predictions via optional processingthrough NetChop (51) as well as stability predictions made byNetMHCstabPan (52).

pVACfusePrevious studies show that the novel protein sequences pro-

duced by known driver gene fusions frequently produce neo-antigen candidates (53). pVACfuse provides support for predictingneoantigens from such gene fusions. One possible input topVACfuse is a set of annotated fusion files from AGFusion (54),thus enabling support for a variety of fusion callers such as STAR-Fusion (55), FusionCatcher (56), JAFFA (57), and TopHat-Fusion (58). pVACfuse also supports annotated BEDPE formatfrom any fusion caller which can, for example, be generated usingINTEGRATE-Neo (53). These variants are then assessed forpresence of fusion neoantigens using predictions from any of thepVACseq-supported binding prediction algorithms.

pVACviz and pVACapiImplementing cancer vaccines in a clinical setting requires

multidisciplinary teams, which may not include informatics experts.To support this growing community of users, we developed pVAC-viz, which is a browser-based user interface that assists in launching,managing, reviewing, and visualizing the results of pVACtoolsprocesses. Instead of interacting with the tools via terminal/shellcommands, the pVACviz client provides a modern Web-based userexperience. Users complete a pVACseq (Fig. 2) process setup formthat provides helpful documentation and suggests valid values forinputs. The client also provides views showing ongoing processes,their logs, and interim data files to aid in managing and trouble-shooting. After a process has completed, users may examine theresults as a filtered data table or as a scatterplot visualization—allowing them to curate results and save them as a CSV file forfurther analysis. Extensive documentation of the visualizationinterface can be found in the online documentation (https://pvactools.readthedocs.io/en/latest/pvacviz.html).

To support informatics groups that want to incorporate or buildupon the pVACtools features, we developed pVACapi, whichprovides an HTTP REST interface to the pVACtools suite. Itprovides the API that pVACviz uses to interact with the pVACtoolssuite. Advanced users could develop their own user interfaces oruse the API to control multiple pVACtools installations remotelyover an HTTP network.

pVACvectorOnce a list of neoantigen candidates has been prioritized and

selected, the pVACvector utility can be used to aid in the construc-tion of DNA-based cancer vaccines. The input is either the outputfile from pVACseq or a FASTA file containing peptide sequences.





http://vatools.org

https://hub.docker.com/r/griffithlab/pvactools



https://pvactools.readthedocs.io/en/latest/pvacviz.html




pVACvector returns a neoantigen sequence ordering that mini-mizes the effects of junctional peptides (which may create novelantigens) between the sequences (Fig. 3). This is accomplished byusing the core pVACseq module to predict the binding scores foreach junctional peptide and by modifying junctions with spac-er (59, 60) amino acid sequences, or by trimming the ends of thepeptides in order to reduce reactivity. The final vaccine ordering isachieved through a simulated annealing procedure (61) that returns

a near-optimal solution, when one exists (details can be found in theMaterials and Methods: Implementation).

User communitypVACtools has been used to predict and prioritize neoantigens

for several immunotherapy studies (62–64) and cancer vaccineclinical trials (e.g., NCT02348320, NCT03121677, NCT03122106,NCT03092453, and NCT03532217). We also have a large external

Figure 2.

pVACviz GUI client. pVACtools offers a browser-based graphical user interface, pVACviz, which provides an intuitive means to launch pipeline processes, monitortheir execution, and analyze, export, or archive their results. To launch a process, users navigate to the Start Page (A) and complete a form containing all of therelevant inputs and settings for a pVACseq process. Each form field includes help text and provides typeahead completionwhere applicable. For instance, the allelesfield provides a typeahead dropdown menu that filters available alleles to display only those alleles that match the text that the user has entered. Once a process islaunched, a usermaymonitor its progress on theManage Page (B), which lists all running, stopped, and completed processes. TheDetails Page (C) shows a process’scurrent log, attributes, and any results files aswell as provides controls for stopping, restarting, exporting, and archiving the process. The results of pipeline processesmay be analyzed on the Visualize Page (D), which displays a customizable scatterplot of a file's rows. The x- and y-axes may be set to any column in the resultset, and filtersmay be applied to values in any column. In addition, pointsmaybe selected on the scatter plot or data grid (not visible in this figure) for further analysisor export as CSV files.

Figure 3.

An example pVACvector output showing the optimumarrangement of candidate neoantigens for aDNAvector–based vaccine design. The figure depicts a circularizedDNA insert carrying 10 encoded neoantigenic peptidesequences to be synthesized and encoded/cloned into aDNAplasmid. DNA sequences encoding each peptide areordered (with use of spacer sequenceswhere needed) toensure there are no strong-binding junctional epitopes.Each neoantigenic peptide candidate is shown in blue,green, red, orange, purple, and brown. To minimizejunctional epitope affinity, pVACvector adds spacersequences where needed. These are depicted in black,along with the binding affinity value of the junctionalepitope. Labels represent the gene name and amino acidchange for each candidate.

Hundal et al.





user community that has been actively evaluating and using thesepackages for their neoantigen analysis and has aided in subsequentrefinement of pVACtools through extensive feedback. The original“pvacseq” package has been downloaded over 48,000 times from PyPI,the “pvactools” package has been downloaded over 34,000 times, andthe “pvactools” docker image has been pulled over 37,000 times.

Analysis of TCGA data using pVACtoolsTo demonstrate the utility and performance of the pVACtools

package, we downloaded exome sequencing and RNA-seq data fromTCGA (33) from 100 cases each of melanoma, hepatocellular carci-noma, and lung squamous cell carcinoma and used patient-specificMHC Class I alleles (Supplementary Fig. S2) to determine neoantigencandidates for each patient. There were a total of 64,422 VEP-annotated variants reported across 300 samples, with an average of214 variants per sample. Of these, 61,486 were single nucleotidevariants (SNV), 479 were in-frame insertions and deletions, and2,465 were frameshift mutations (Supplementary Fig. S1). We usedthis annotated list of variants as input to the pVACseq component ofpVACtools to predict neoantigenic peptides. pVACseq reported14,599,993 unfiltered neoantigen candidates. The original version ofpVACseq (20) reported 10,284,467 neoantigens. This demonstratedthat, by extending support for additional variant types as well asprediction algorithms (due to support for additional alleles), weproduced 42% more raw candidate neoantigens.

After applying our default median binding affinity cutoff of 500nmol/L across all eight MHC Class I prediction algorithms, therewere 96,235 predicted strong-binding neoantigens, derived from34,552 somatic variants (32,788 missense SNVs, 1,603 frameshiftvariants, and 131 in-frame indels). This set of strong binders wasfurther reduced by filtering out mutant peptides with medianpredicted binding affinities (across all prediction algorithms) great-er than that of the corresponding wild-type peptide (i.e., mutant/wild-type binding affinity fold change >1), resulting in 70,628neoantigens from 28,588 variants (26,880 SNVs, 1,583 frameshift,and 125 in-frame indels).

This set was subsequently filtered by evaluation of exome sequenc-ing data coverage and our recommended defaults as follows: byapplying the default criteria of VAF cutoff of >25% in tumors and<2% in normal samples, with at least 10� tumor coverage and at least5� normal coverage, 10,730 neoantigens from 4,891 associated var-iants (4,826 SNVs, 56 frameshift, and 9 in-frame indels) were obtained,with an average of 36 neoantigens predicted per case. Because RNA-seq data also were available, the filtering criteria included RNA-basedcoverage filters (tumor RNA VAF >25% and tumor RNA coverage>10�) as well as a gene expression filter (FPKM >1). To condense theresults even further, only the top ranked neoantigen was selected pervariant (“top-score filter”) across all alleles, lengths, and registers(position of amino acid mutation within peptide sequence), resultingin 4,891 total neoantigens with an average of 16 neoantigens per case.This list was then processed with pVACvector to determine theoptimum arrangement of the predicted high-quality neoantigens fora DNA vector–based vaccine design (Fig. 3).

Correlations and bias among binding prediction toolsBecause we offer support for as many as eight different Class I

binding prediction tools, we assessed agreement in binding affinitypredictions (IC50) between these algorithms from a random subset of100,000 neoantigen peptides fromourTCGAanalysis, described above(Fig. 4). The highest correlation was observed between the twoalgorithms that are based on a stabilization matrix method (SMM)

—SMM and SMMPMBEC. The next best correlation was observedbetween NetMHC and MHCflurry, possibly due to both being allele-specific predictors employing neural network-based models. Overall,the correlation between prediction algorithms is low (mean correlationof 0.388 and range of 0.18–0.89 between all pairwise comparisons ofalgorithms).

We also evaluated if there were any biases among the algorithms topredict strong-binding epitopes (i.e., binding affinity�500 nmol/L) byplotting the number of predicted peptides based on their respectivestrong-predicting algorithms (Fig. 5; Supplementary Fig. S3). Wefound that MHCnuggets (v2.2) predicted the highest number ofstrong-binding candidates alone (Fig. 5A). Of the total number ofstrong-binding candidates predicted, 64.7% of these candidates werepredicted by a single algorithm (any one of the eight algorithms),35.2% were predicted as strong-binders by two to seven algorithms,and only 1.8% of the strong-binding candidates were predicted asstrong binders by the combination of all eight algorithms (Supple-mentary Fig. S3). In fact, as shown in Fig. 6, even if one (or more)algorithms predict a peptide to be a strong binder, often anotheralgorithmwill disagree by a largemargin, in some cases predicting thatsame peptide as a very weak binder. This remarkable lack of agreementunderscores the potential value of an ensemble approach that con-siders multiple algorithms.

Next we determined if the number of humanHLA alleles supportedby these eight algorithms differed. As shown (Supplementary Fig. S4),MHCnuggets supports the highest number of human HLA alleles.

Comparison of filtering criteriaBecause pVACtools offers a multitude of ways to filter a list of

predicted neoantigens, we evaluated and compared the effect of eachfilter for selecting high-quality neoantigen candidates. There were14,599,993 unfiltered neoantigen candidates predicted by pVACseqresulting from 64,422 VEP-annotated variants reported across300 samples.

We first compared the effect of running pVACtools with thecommonly used standard binding score cutoff of �500 nmol/L(parameters: -b 500) to the newly added allele-specific score filter(parameters: -a). These cutoffs were, by default, applied to the

Figure 4.

Correlation between prediction algorithms. Spearman correlation betweenprediction values from all eight Class I prediction algorithms (MHCflurry,MHCnuggets, NetMHC, NetMHCcons, NetMHCpan, PickPocket, SMM, andSMMPMBEC) generated from a random subsample of 100,000 peptides.






“median” binding score of all prediction algorithms. In total, 96,235neoantigens (average 320.8 per patient) were predicted using the 500nmol/L binding score cutoff compared with 94,068 neoantigens(average 313.6 per patient) using allele-specific filters. About 79% ofneoantigens were shared between the two sets.

We also further narrowed these sets to include only thosepredictions where the default “median” predicted binding affinities

(�500 nmol/L) were lower than each correspondingmedian wild-typepeptide affinity (parameters: -b 500 -c 1; i.e., a binding affinity mutant/wild-type ratio or differential agretopicity value indicating that themutant version of the peptide is a stronger binder; ref. 21). Using theagretopicity value filter, 70,628 neoantigens (average 235.43 perpatient) were predicted versus the previously reported 96,235 neoanti-gens without this filter.

Figure 5.

Intersection of peptide sequences pre-dicted by different algorithms(MHCflurry, MHCnuggets, NetMHC,NetMHCcons, NetMHCpan, PickPock-et, SMM, and SMMPMBEC). The y-axisdisplays the number of overlappingunique neoantigenic peptides pre-dicted for each combination of algo-rithm depicted on the x-axis. Eachcollection of connected circles showsthe set contained in an exclusive inter-section (i.e., the identity of each algo-rithm), whereas the light gray circlesrepresent the algorithm(s) that doesnot participate in this exclusive inter-section. A, Upset plot for the top20 algorithm combinations ranked bythe number of peptides predictedto be a good binder (mutant IC50 score<500 nmol/L). The combination of alleight algorithms (highlighted orange)ranks eighth highest. B, Upset plotfor algorithm combinations where atleast six algorithms agree on predict-ing a peptide to be a good binder(mutant IC50 score <500 nmol/L).The combination of all eight algo-rithms (highlighted orange) ranksthe highest.

Hundal et al.





We also evaluated the effect of the agretopicity value filter whenapplied to the set of peptides filtered on “lowest” binding score of�500 nmol/L (parameters: -b 500 -c 1 -m lowest). This filter limitspeptides to those where at least one of the algorithms predicts astrong binder, instead of calculating a median score and requiringthat to meet the 500 nmol/L threshold (Fig. 6). Using the lowestbinding score filter resulted in an 11-fold increase in the number ofcandidates predicted (827,423 candidates, average 2,758.08 perpatient). This approach may be useful for finding candidates intumors with low mutation burden (but may also presumably lead toa higher rate of false positives).

Using the median (default) binding affinity filtering criteria,we next applied coverage and expression-based filters. First, wefiltered using the recommended defaults, i.e., greater than 5�normal DNA coverage, less than 2% normal VAF, greater than10� tumor RNA and DNA coverage, and greater than 25% tumorRNA and DNA VAF, along with FPKM >1 for transcript levelexpression (parameters: --normal-cov 5 --tdna-cov 10 --trna-cov10 --normal-vaf 0.02 --tdna-vaf 0.25 --trna-vaf 0.25 --expn-val 1).A total of 10,730 neoantigens were shortlisted across all sampleswith an average of 35 neoantigens per case. We then comparedthis set with slightly more stringent criteria using tumor DNA andRNA VAF of 40% (parameters: --tdna-vaf 0.40 --trna-vaf 0.40).This shortened our list of predicted neoantigens to 4,073 candidateswith an average of 13 candidates per patient.

Demonstration of neoantigen analysis using pVACfuseTo demonstrate the potential of neoantigens resulting from gene

fusions, we analyzed TCGA prostate cancer RNA-seq data from 302patients. This dataset was previously used as a demonstration set forthe fusion neoantigen prediction supported by INTEGRATE-

Neo (22). We wanted to assess the difference (if any) in neoantigencandidates reported by INTEGRATE-Neo using the one MHC Class Iprediction algorithm it supports (NetMHC) versus an ensemble ofeight Class I prediction algorithms supported in pVACfuse.

Using 1,619 gene fusions across 302 samples as input, pVACfusereported 2,104 strong-binding neoantigens (binding affinity�500 nmol/L) resulting from 739 gene fusions. On average, therewere about seven neoantigens per sample resulting from an averageof two fusions per case. This is an 8-fold increase in the number ofstrong-binding neoantigens predicted by pVACfuse versus thosereported by INTEGRATE-Neo alone, which reported 261 neoanti-gens across 210 fusions.

Validation of pVACtools resultsTo assess the real-world validity of pVACtools results, we reana-

lyzed raw sequencing data from published studies that had performedimmunologic validation of candidate neoantigens (SupplementaryTables S2 and S3). Analysis of a patient with melanoma (pt3713;ref. 65) showed that pVACtools was able to recapitulate eight (80%) ofthe positively validated peptides in the filtered list of neoantigensusing our suggested filtering methodology (median binding predictedIC50 <500 nmol/L, >5� DNA and RNA coverage, >10% DNA andRNA VAF, >1 TPM gene expression). Two of the positively validatedpeptides were filtered out due to low RNA depth and VAF values buthad median binding affinity predictions of 13.707 and 4.799 nmol/L.We were also able to eliminate 68% of negatively validated peptidesusing these filters. In an additional analysis of a patient with chroniclymphocytic leukemia (pt5002; ref. 66), we were able to recapitulatetwo (100%) of the positively validated peptides in our filtered list ofneoantigens using our suggested filtering methodology (median bind-ing predicted IC50 <500 nmol/L, >5� DNA coverage, >10% DNA

Figure 6.

Overall distribution of binding affinityscores (nmol/L) for peptideswhere atleast oneof the algorithmspredicted astrong binder. Out of 14,599,993 pep-tides predicted, 126,648 peptideswere predicted to be strong bindersby at least one of the following algo-rithms: MHCflurry, MHCnuggets,NetMHC, NetMHCcons, NetMHCpan,PickPocket, SMM, or SMMPMBEC. Todefine the set of peptides that werestrong binders according to at leastone algorithm, HLA allele subtype–specific thresholdswere appliedwhenavailable; otherwise, the default cutoffbinding affinity of 500 nmol/L wasused. The peptides were further fil-tered using the default coverage-based filters (normal coverage >5�and VAF <2%, tumor DNA coverage>10� and VAF >25%, tumor RNA cov-erage >10� and VAF >25%, transcriptexpression >1 FPKM). Peptides withpredicted mutant IC50 scores lowerthan their respective cutoff scores arehighlighted in orange, whereas thosehigher are plotted in gray. Themedianmutant IC50 scores of each algorithm'sprediction (purple line) and the500 nmol/L binding affinity cutoff(red dotted line) are indicated forreference.






VAF). No RNA data were available for this sample. We were also ableto eliminate 57% of negatively validated peptides using these filters.

DiscussionAs reported from our demonstration analysis, a typical tumor has

too many possible neoantigen candidates to be practical for a vaccine.There is therefore a critical need for a tool that takes in the input from atumor sequencing analysis pipeline and reports a filtered and prior-itized list of neoantigens. pVACtools enables a streamlined, accurate,and user-friendly analysis of neoantigenic peptides from NGS cancerdatasets. This suite offers a complete and easily configurable end-to-end analysis, starting from somatic variants and gene fusions (pVAC-seq and pVACfuse respectively), through filtering, prioritization, andvisualization of candidates (pVACviz), and determining the bestarrangement of candidates for a DNA vector vaccine (pVACvector).By supporting additional classes of variants as well as gene fusions, wepredict an increased number of predicted neoantigens which may becritical for low mutational burden tumors. Finally, by extendingsupport for multiple binding prediction algorithms, we allow for aconsensus approach. The need for this integrated approach is madeabundantly clear by the disagreement between the candidate neoanti-gens identified by these algorithms, as observed in our demonstrationanalyses.

To support a wider range of variant events not easily representablein VCF format such as alternatively spliced transcripts that may createtumor antigens (67), upcoming versions of pVACtools will add a newtool, pVACbind, which can be used to execute our prediction pipelinefrom sequences contained in a FASTA input file. Another majorplanned feature is the addition of reference proteome similarityassessment of predicted peptides in order to exclude peptides thatare not true neoantigens because they occur in other parts of theproteome. Upcoming versions will also include the calculation ofvarious metrics for peptide manufacturability for use in synthetic longpeptide-based vaccines. In addition, we are planning on addingbinding affinity percentile ranks to our prediction outputs for thoseprediction algorithms that support it. This would also allow us tosupport additional IEDBprediction algorithms, such asCombinatorialLibrary (68) and Sturniolo (69). The IEDB software added support forvarious Class II epitope lengths starting with version 2.22. We plan onupdating pVACtools in an upcoming version to take advantage of thisnew functionality. Lastly, we are working on a CWL immunotherapypipeline that will execute all steps starting with alignment, somaticand germline variant calling, HLA-typing, and RNA-seq analysis, in-cluding fusion calling, followed by running pVACseq and pVACfuse.This will support users that would like a full end-to-end solution notsupported by pVACtools itself.

The results from pVACtools analyses are already being used indozens of cancer immunology studies, including studying the rela-tionship between tumor mutation burden and neoantigen load topredict response to immunotherapies (64, 70, 71), and the design of

cancer vaccines in ongoing clinical trials (e.g., NCT02348320,NCT03121677, NCT03122106, NCT03092453, and NCT03532217).Whether the predicted neoantigens will induce T-cell responsesremains an open question for the field (72–74). Pipelines like pVAC-tools are meant to assist these groups to obtain a prioritized set ofneoantigen candidates from among a myriad of possible epitopepredictions. Pursuant to that, they should perform additional func-tional validations according to the standards in this field (73).Whereasit may not be possible to evaluate a de novo response to all predictedcandidate neoantigens prior to vaccination, we highly recommendadding a similar validation step as part of the clinical workflow todetermine those candidates that have a preexisting immune response.Validation data from the studies currently using pVACtools will beincorporated into our toolkit as they become available, and we hopethat over time this will improve the prioritization process. We antic-ipate that pVACtools will make such analyses more robust, repro-ducible, and facile as these efforts continue.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors’ ContributionsConception and design: J. Hundal, S. Kiwala, J. McMichael, W.E. Gillanders,E.R. Mardis, O.L. Griffith, M. GriffithDevelopment of methodology: J. Hundal, S. Kiwala, J. McMichael, C.A. Miller,Y.-Y. Feng, A.P. Graubert, A.Z. Wollam, J. Neichin, O.L. Griffith, M. GriffithAcquisition of data (provided animals, acquired and managed patients, providedfacilities, etc.): J. Hundal, C.J. LiuAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): J. Hundal, H. Xia, C.J. Liu, S. Zhao, M. GriffithWriting, review, and/or revision of the manuscript: J. Hundal, S. Kiwala,C.A. Miller, H. Xia, C.J. Liu, W.E. Gillanders, E.R. Mardis, O.L. Griffith, M. GriffithAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): J. Hundal, S. Kiwala, J. McMichael, A.T. Wollam,Y.-Y. Feng, A.Z. Wollam, M. Neveau, J. WalkerStudy supervision: J. Walker, E.R. Mardis, O.L. Griffith, M. Griffith

AcknowledgmentsWe thank the patients and their families for donation of their samples and

participation in clinical trials. We also thank our growing user community for testingthe software and providing useful input, critical bug reports, as well as suggestions forimprovement and new features. We are grateful to Drs. Robert Schreiber, GavinDunn, and Beatriz Carreno for their expertise and guidance on foundational work oncancer immunology using neoantigens and suggestions on improving the software.O.L. Griffith was supported by the NCI of the NIH under award numbersU01CA209936, U01CA231844, and U24CA237719. M. Griffith was supported bythe National Human Genome Research Institute (NHGRI) of the NIH under awardnumber R00HG007940 and the V Foundation for Cancer Research under awardnumber V2018-007.

The costs of publication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked advertisement in accordancewith 18 U.S.C. Section 1734 solely to indicate this fact.

Received May 29, 2019; revised October 6, 2019; accepted December 30, 2019;published first January 6, 2020.

References1. Liu XS, Shirley Liu X, Mardis ER. Applications of immunogenomics to cancer.

Cell 2017;168:600–12.2. Bjerregaard A-M, Nielsen M, Hadrup SR, Szallasi Z, Eklund AC. MuPeXI:

prediction of neo-epitopes from tumor sequencing data. Cancer ImmunolImmunother 2017;66:1123–30.

3. Rubinsteyn A, Hodes I, Kodysh J, Hammerbacher J. Vaxrank: a computationaltool for designing personalized cancer vaccines. BioRxiv 142919 [Preprint].2017. Available from: http://dx.doi.org/10.1101/142919.

4. Bais P, Namburi S, Gatti DM, Zhang X, Chuang JH. CloudNeo: a cloud pipelinefor identifying patient-specific tumor neoantigens. Bioinformatics 2017;33:3110–2.

5. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neuralnetworks: application to theMHC class I system. Bioinformatics 2016;32:511–7.

6. Jurtz V, Paul S, AndreattaM,Marcatili P, Peters B, NielsenM. NetMHCpan-4.0:improved peptide–MHC class I interaction predictions integrating eluted ligandand peptide binding affinity data. J Immunol 2017;199:3360–8.

Hundal et al.




http://dx.doi.org/10.1101/142919


7. NielsenM, Lundegaard C,Worning P, Lauemøller SL, Lamberth K, Buus S, et al.Reliable prediction of T-cell epitopes using neural networks with novel sequencerepresentations. Protein Sci 2003;12:1007–17.

8. Duan F, Duitama J, Al Seesi S, Ayres CM, Corcelli SA, Pawashe AP, et al.Genomic and bioinformatic profiling of mutational neoepitopes revealsnew rules to predict anticancer immunogenicity. J Exp Med 2014;211:2231–48.

9. Schubert B, Walzer M, Brachvogel H-P, Szolek A, Mohr C, Kohlbacher O.FRED 2: an immunoinformatics framework for Python. Bioinformatics 2016;32:2044–6.

10. Rao AA, Madejska AA, Pfeil J, Paten B, Salama SR, Haussler D. ProTECT –

prediction of T-cell epitopes for cancer therapy. BioRxiv 696526 [Preprint].2019. Available from: https://doi.org/10.1101/696526.

11. Hundal J, Kiwala S, Feng Y-Y, Liu CJ, Govindan R, Chapman WC, et al.Accounting for proximal variants improves neoantigen prediction. Nat Genet2019;51:175–9.

12. Turajlic S, Litchfield K, Xu H, Rosenthal R, McGranahan N, Reading JL, et al.Insertion-and-deletion-derived tumor-specific neoantigens and the immuno-genic phenotype: a pan-cancer analysis. Lancet Oncol 2017;18:1009–21.

13. Zamora AE, Crawford JC, Allen EK, Guo X-ZJ, Bakke J, Carter RA, et al.Pediatric patients with acute lymphoblastic leukemia generate abundantand functional neoantigen-specific CD8þ T cell responses. Sci Transl Med2019;11. pii: eaat8549.

14. Yang W, Lee K-W, Srivastava RM, Kuo F, Krishna C, Chowell D, et al.Immunogenic neoantigens derived from gene fusions stimulate T cell responses.Nat Med 2019;25:767–75.

15. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheelertransform. Bioinformatics 2009;25:1754–60.

16. Koboldt DC, Larson DE, Wilson RK. Using VarScan 2 for germline variantcalling and somatic mutation detection. Curr Protoc Bioinformatics 2013;44:15.4.1–17.

17. Koboldt DC, ZhangQ, LarsonDE, ShenD,McLellanMD, Lin L, et al. VarScan 2:Somatic mutation and copy number alteration discovery in cancer by exomesequencing. Genome Res 2012;22:568–76.

18. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. Aframework for variation discovery and genotyping using next-generation DNAsequencing data. Nat Genet 2011;43:491–8.

19. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: theGenome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics2013;43:11.10.1–33.

20. Charoentong P, Finotello F, Angelova M, Mayer C, Efremova M, Rieder D, et al.Pan-cancer immunogenomic analyses reveal genotype-immunophenotype rela-tionships and predictors of response to checkpoint blockade. Cell Rep 2017;18:248–62.

21. Hoof I, Peters B, Sidney J, Pedersen LE, Sette A, Lund O, et al. NetMHCpan, amethod for MHC class I binding prediction beyond humans. Immunogenetics2009;61:1–13.

22. Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensusmethod for the major histocompatibility complex class I predictions. Immu-nogenetics 2011;64:177–86.

23. Zhang H, Lund O, Nielsen M. The PickPocket method for predicting bindingspecificities for receptors based on receptor pocket similarities: application toMHC-peptide binding. Bioinformatics 2009;25:1293–9.

24. Peters B, Sette A. Generating quantitative models describing the sequencespecificity of biological processes with the stabilized matrix method.BMC Bioinformatics 2005;6:132.

25. Kim Y, Sidney J, Pinilla C, Sette A, Peters B. Derivation of an amino acidsimilarity matrix for peptide: MHC binding and its application as a Bayesianprior. BMC Bioinformatics 2009;10:394.

26. O’Donnell TJ, Rubinsteyn A, Bonsack M, Riemer AB, Laserson U,Hammerbacher J. MHCflurry: open-source class I MHC binding affinityprediction. Cell Syst 2018;7:129–32.e4.

27. Bhattacharya R, Sivakumar A, Tokheim C, Guthrie VB, Anagnostou V,Velculescu VE, et al. Evaluation of machine learning methods to predictpeptide binding to MHC class I proteins. BioRxiv 154757 [Preprint]. 2017.Available from: https://doi.org/10.1101/154757.

28. Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S,et al. Quantitative predictions of peptide binding to any HLA-DRmolecule of known sequence: NetMHCIIpan. PLoS Comput Biol 2008;4:e1000107.

29. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinityusing SMM-align, a novel stabilization matrix alignment method.BMC Bioinformatics 2007;8:238.

30. Nielsen M, Lund O. NN-align. An artificial neural network-based alignmentalgorithm for MHC class II peptide binding prediction. BMC Bioinformatics2009;10:1–10.

31. Shao XM, Bhattacharya R, Huang J, Sivakumar IKA, Tokheim C, Zheng L, et al.High-throughput prediction of MHC class I and class II neoantigens withMHCnuggets. Cancer Immunol Res 2020;8:396–408.

32. Fritsch EF, Rajasagi M, Ott PA, Brusic V, Hacohen N, Wu CJ. HLA-bindingproperties of tumor neoepitopes in humans. Cancer Immunol Res 2014;2:522–9.

33. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA,Mills GB,Shaw KRM, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Canceranalysis project. Nat Genet 2013;45:1113–20.

34. Amstutz P, Crusoe MR, Tijani�c N, Chapman B, Chilton J, Heuer M, et al.Common Workflow Language, v1.0. 2016. Available from: http://dx.doi.org/10.6084/m9.figshare.3115156.v2.

35. Voss K, Gentry J, Van der Auwera G. Full-stack genomics pipelining withGATK4 þ WDL þ Cromwell [version 1; not peer reviewed]. 2017. Availablefrom: http://dx.doi.org/10.7490/f1000research.1114631.1.

36. Cabanski CR, Magrini V, Griffith M, Griffith OL, McGrath S, Zhang J, et al.cDNA hybrid capture improves transcriptome analysis on low-input andarchived samples. J Mol Diagn 2014;16:440–51.

37. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequencealignment/map format and SAMtools. Bioinformatics 2009;25:2078–9.

38. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. TheEnsembl Variant Effect Predictor. Genome Biol 2016;17:122.

39. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al.Sensitive detection of somatic point mutations in impure and heterogeneouscancer samples. Nat Biotechnol 2013;31:213–9.

40. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka:accurate somatic small-variant calling from sequenced tumor-normal samplepairs. Bioinformatics 2012;28:1811–7.

41. YeK, SchulzMH, LongQ,Apweiler R,NingZ. Pindel: a pattern growth approachto detect break points of large deletions and medium sized insertions frompaired-end short reads. Bioinformatics 2009;25:2865–71.

42. Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants.Bioinformatics 2015;31:2202–4.

43. Roehr JT, Dieterich C, Reinert K. Flexbar 3.0 - SIMD and multicore paralleliza-tion. Bioinformatics 2017;33:2941–2.

44. KimD, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with lowmemoryrequirements. Nat Methods 2015;12:357–60.

45. Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL.StringTie enables improved reconstruction of a transcriptome from RNA-seqreads. Nat Biotechnol 2015;33:290–5.

46. Bray NL, Pimentel H,Melsted P, Pachter L. Near-optimal probabilistic RNA-seqquantification. Nat Biotechnol 2016;34:525–7.

47. Szolek A, Schubert B, Mohr C, SturmM, FeldhahnM, Kohlbacher O. OptiType:precision HLA typing from next-generation sequencing data. Bioinformatics2014;30:3310–6.

48. Hundal J, Carreno BM, Petti AA, Linette GP, Griffith OL, Mardis ER, et al.pVAC-Seq: a genome-guided in silico approach to identifying tumor neoanti-gens. Genome Med 2016;8:11.

49. Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, et al. TheImmune Epitope Database 2.0. Nucleic Acids Res 2009;38:D854–62.

50. Paul S,Weiskopf D,AngeloMA, Sidney J, Peters B, Sette A.HLA class I alleles areassociated with peptide-binding repertoires of different size, affinity, and immu-nogenicity. J Immunol 2013;191:5831–9.

51. Kesmir C, Nussbaum AK, Schild H, Detours V, Brunak S. Prediction ofproteasome cleavage motifs by neural networks. Protein Eng 2002;15:287–96.

52. Rasmussen M, Fenoy E, Harndahl M. Pan-specific prediction of peptide–MHCclass I complex stability, a correlate of T cell immunogenicity. J Immunol 2016;197:1517–24.

53. Zhang J, Mardis ER, Maher CA. INTEGRATE-neo: a pipeline for personalizedgene fusion neoantigen discovery. Bioinformatics 2017;33:555–7.

54. Murphy C, Elemento O. AGFusion: annotate and visualize gene fusions.BioRxiv 080903 [Preprint]. 2016. Available from: https://doi.org/10.1101/080903.

55. Haas BJ, DobinA, StranskyN, Li B, YangX, Tickle T, et al. STAR-Fusion: fast andaccurate fusion transcript detection from RNA-Seq. BioRxiv 120295 [Preprint].2017. Available from: https://doi.org/10.1101/120295.





https://doi.org/10.1101/696526

https://doi.org/10.1101/154757

http://dx.doi.org/10.6084/m9.figshare.3115156.v2



http://dx.doi.org/10.7490/f1000research.1114631.1

https://doi.org/10.1101/080903

https://doi.org/10.1101/080903

https://doi.org/10.1101/080903

https://doi.org/10.1101/120295


56. Nicorici D, S atalan M, Edgren H, Kangaspeska S, Murum€agi A, Kallioniemi O,et al. FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. BioRxiv 011650 [Preprint]. 2014. Available from: https://doi.org/10.1101/011650.

57. Davidson NM, Majewski IJ, Oshlack A. JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med 2015;7:43.

58. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusiontranscripts. Genome Biol 2011;12:R72.

59. Schubert B, Kohlbacher O. Designing string-of-beads vaccines with optimalspacers. Genome Med 2016;8:9.

60. Negahdaripour M, Nezafat N, Eslami M, GhoshoonMB, Shoolian E, NajafipourS, et al. Structural vaccinology considerations for in silico designing of a multi-epitope vaccine. Infect Genet Evol 2018;58:96–109.

61. Kirkpatrick S, Gelatt CD Jr, Vecchi MP. Optimization by simulated annealing.Science 1983;220:671–80.

62. Miller A, AsmannY,Cattaneo L, Braggio E, Keats J, AuclairD, et al. High somaticmutation and neoantigen burden are correlated with decreased progression-freesurvival in multiple myeloma. Blood Cancer J 2017;7:e612.

63. Balachandran VP, èuksza M, Zhao JN, Makarov V, Moral JA, Remark R, et al.Identification of uniqueneoantigen qualities in long-term survivors of pancreaticcancer. Nature 2017;551:512.

64. Formenti SC, Rudqvist N-P, Golden E, Cooper B, Wennerberg E, Lhuillier C,et al. Radiotherapy induces responses of lung cancer to CTLA-4 blockade.Nat Med 2018;24:1845.

65. Prickett TD, Crystal JS, Cohen CJ, Pasetto A, Parkhurst MR, Gartner JJ,et al. Durable complete response from metastatic melanoma after transfer

of autologous T cells recognizing 10 mutated tumor antigens. CancerImmunol Res 2016;4:669–78.

66. Rajasagi M, Shukla SA, Fritsch EF, Keskin DB, DeLuca D, Carmona E, et al.Systematic identification of personal tumor-specific neoantigens in chroniclymphocytic leukemia. Blood 2014;124:453–62.

67. Kahles A, Lehmann K-V, Toussaint NC,H€userM, Stark SG, Sachsenberg T, et al.Comprehensive analysis of alternative splicing across tumors from 8,705patients. Cancer Cell 2018;34:211–24.e6.

68. Sidney J, Peters B, FrahmN, Brander C, Sette A. HLA class I supertypes: a revisedand updated classification. BMC Immunol 2008;9:1.

69. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, et al.Generation of tissue-specific and promiscuous HLA ligand databases usingDNA microarrays and virtual HLA class II matrices. Nat Biotechnol 1999;17:555–61.

70. Johanns TM, Miller CA, Dorward IG, Tsien C, Chang E, Perry A, et al.Immunogenomics of hypermutated glioblastoma: a patient with germline POLEdeficiency treated with checkpoint blockade immunotherapy. Cancer Discov2016;6:1230–6.

71. LaussM,DoniaM,Harbst K, Andersen R,Mitra S, Rosengren F, et al.Mutationaland putative neoantigen load predict clinical benefit of adoptive T cell therapy inmelanoma. Nat Commun 2017;8:1738.

72. Linette GP, Carreno BM. Neoantigen vaccines pass the immunogenicity test.Trends Mol Med 2017;869–71.

73. Vitiello A, Zanetti M. Neoantigen prediction and the need for validation.Nat Biotechnol 2017;35:815–7.

74. The problem with neoantigen prediction. Nat Biotechnol 2017;35:97.


Hundal et al.



https://doi.org/10.1101/011650

https://doi.org/10.1101/011650

https://doi.org/10.1101/011650


2020;8:409-420. Published OnlineFirst January 6, 2020.Cancer Immunol Res Jasreet Hundal, Susanna Kiwala, Joshua McMichael, et al. Cancer NeoantigenspVACtools: A Computational Toolkit to Identify and Visualize

Updated version

10.1158/2326-6066.CIR-19-0401doi:

Access the most recent version of this article at:

Material

Supplementary

http://cancerimmunolres.aacrjournals.org/content/suppl/2020/01/04/2326-6066.CIR-19-0401.DC1

Access the most recent supplemental material at:

Cited articles

http://cancerimmunolres.aacrjournals.org/content/8/3/409.full#ref-list-1

This article cites 64 articles, 11 of which you can access for free at:

Citing articles

http://cancerimmunolres.aacrjournals.org/content/8/3/409.full#related-urls

This article has been cited by 6 HighWire-hosted articles. Access the articles at:

E-mail alerts related to this article or journal.Sign up to receive free email-alerts

Subscriptions

Reprints and

[email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department

Permissions

Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerimmunolres.aacrjournals.org/content/8/3/409To request permission to re-use all or part of this article, use this link



http://cancerimmunolres.aacrjournals.org/lookup/doi/10.1158/2326-6066.CIR-19-0401

http://cancerimmunolres.aacrjournals.org/content/suppl/2020/01/04/2326-6066.CIR-19-0401.DC1

http://cancerimmunolres.aacrjournals.org/content/8/3/409.full#ref-list-1

http://cancerimmunolres.aacrjournals.org/content/8/3/409.full#related-urls

http://cancerimmunolres.aacrjournals.org/cgi/alerts

mailto:[email protected]

http://cancerimmunolres.aacrjournals.org/content/8/3/409


home | cancer immunology research - pvactools: …...cancer vaccines. this is a cross-disciplinary...

Documents