synthetic circulating cell-free dna as quality control

11
Synthetic Circulating Cell-free DNA as Quality Control Materials for Somatic Mutation Detection in Liquid Biopsy for Cancer Rui Zhang, 1,3† Rongxue Peng, 1,2,3† Ziyang Li, 1,2,3† Peng Gao, 1,2,3 Shiyu Jia, 1,3,4 Xin Yang, 1,2,3 Jiansheng Ding, 1,2,3 Yanxi Han, 1,3 Jiehong Xie, 1,3 and Jinming Li 1,2,3* BACKGROUND: Detection of somatic genomic alterations in tumor-derived cell-free DNA (cfDNA) in the plasma is challenging owing to the low concentrations of cfDNA, variable detection methods, and complex workflows. More- over, no proper quality control materials are available currently. METHODS: We developed a set of synthetic cfDNA qual- ity control materials (SCQCMs) containing spike-in cfDNA on the basis of micrococcal nuclease digestion carrying somatic mutations as simulated cfDNA and matched genomic DNA as genetic background to emu- late paired tumor-normal samples in real clinical tests. Site-directed mutagenesis DNA that contained 1500 – 2000 bases with single-nucleotide variants or indels and genomic DNA from CRISPR/Cas9 edited cells with EML4-ALK rearrangements was fragmented, quantified, and added into micrococcal nuclease-digested DNA de- rived from HEK293T cells. To prove their suitability, the SCQCMs were compared with patient-derived plasma samples and validated in a collaborative study that encompassed 11 laboratories. RESULTS: The results of SCQCM analysis by next- generation sequencing showed strong agreement with those of patient-derived plasma samples, including the size profile of cfDNA and the quality control metrics of the sequencing data. More than 95% of laboratories cor- rectly detected the SCQCMs with EGFR T790M, L858R, KRAS G12D, and a deletion in exon 19, as well as with EML4-ALK variant 2. CONCLUSIONS: The SCQCMs were successfully applied in a broad range of settings, methodologies, and informatics techniques. We conclude that SCQCMs can be used as optimal quality controls in test performance assessments for circulating tumor DNA somatic mutation detection. © 2017 American Association for Clinical Chemistry Recent advances have led to the development of “liq- uid biopsies” that allow for genotyping circulating tu- mor DNA (ctDNA) 5 concurrently found in tumors by the detection of somatic genomic alterations in plasma (1–3 ). Current approaches for the detection of tu- mor aberrations in ctDNA can be divided into 2 cate- gories: methods targeting specific changes, including the beads, emulsions, amplification, and magnetics (BEAMing) process built on emulsion PCR (4), digi- tal PCR (dPCR) (5), and amplification refractory mu- tation system (ARMS) PCR (6), and methods detect- ing all possible aberrations in DNA, such as targeted and whole exome/genome sequencing (7, 8 ). How- ever, the detection of specific ctDNA mutations is still challenging owing to the extremely low concentration and highly fragmented size of ctDNA and numerous variables influencing the interlaboratory concordance of variant detection in ctDNA, such as different method sensitivities, different efficiencies of target enrichment strat- egies from cell-free DNA (cfDNA), and stochastic noise of next-generation sequencing (NGS) (2, 9 –11 ). Therefore, well-characterized quality control materials of known vari- ants at known concentrations will be valuable in assay de- velopment, test validation, internal quality control, and pro- ficiency tests (12 ). Currently, there are no proper standardized ma- terials that can be used as controls for ctDNA somatic 1 National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontol- ogy, Beijing, People’s Republic of China; 2 Graduate School, Peking Union Medical Col- lege, Chinese Academy of Medical Sciences, Beijing, People’s Republic of China; 3 Bei- jing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People’s Republic of China; 4 Peking University Fifth School of Clinical Medicine, Beijing, People’s Republic of China. * Address correspondence to this author at: National Center for Clinical Laboratories, Bei- jing Hospital, No.1 Dahua Road, Dongdan, Beijing 100730, People’s Republic of China. Fax +86-10-65212064; e-mail [email protected]. Rui Zhang, Rongxue Peng, and Ziyang Li contributed equally to the work. Received February 3, 2017; accepted May 1, 2017. Previously published online at DOI: 10.1373/clinchem.2017.272559 © 2017 American Association for Clinical Chemistry 5 Nonstandard abbreviations: ctDNA, circulating tumor DNA; SCQCM, synthetic cfDNA quality control material; dPCR, digital PCR; ARMS, amplification refractory mutation sys- tem; cfDNA, cell-free DNA; NGS, next-generation sequencing; MNase, micrococcal nu- clease; M-cfDNA, cfDNA based on micrococcal nuclease digestion; SNV, single- nucleotide variant; indel, insertion and deletion; CRISPR/Cas9, clustered regularly interspaced short palindromic repeat and CRISPR associated protein 9 system; AF, allele frequency. Clinical Chemistry 63:9 1465–1475 (2017) Cancer Diagnostics 1465

Upload: others

Post on 08-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Synthetic Circulating Cell-free DNA as Quality ControlMaterials for Somatic Mutation Detection in Liquid

Biopsy for CancerRui Zhang,1,3† Rongxue Peng,1,2,3† Ziyang Li,1,2,3† Peng Gao,1,2,3 Shiyu Jia,1,3,4 Xin Yang,1,2,3

Jiansheng Ding,1,2,3 Yanxi Han,1,3 Jiehong Xie,1,3 and Jinming Li1,2,3*

BACKGROUND: Detection of somatic genomic alterationsin tumor-derived cell-free DNA (cfDNA) in the plasmais challenging owing to the low concentrations of cfDNA,variable detection methods, and complex workflows. More-over, no proper quality control materials are availablecurrently.

METHODS: We developed a set of synthetic cfDNA qual-ity control materials (SCQCMs) containing spike-incfDNA on the basis of micrococcal nuclease digestioncarrying somatic mutations as simulated cfDNA andmatched genomic DNA as genetic background to emu-late paired tumor-normal samples in real clinical tests.Site-directed mutagenesis DNA that contained 1500–2000 bases with single-nucleotide variants or indels andgenomic DNA from CRISPR/Cas9 edited cells withEML4-ALK rearrangements was fragmented, quantified,and added into micrococcal nuclease-digested DNA de-rived from HEK293T cells. To prove their suitability,the SCQCMs were compared with patient-derivedplasma samples and validated in a collaborative study thatencompassed 11 laboratories.

RESULTS: The results of SCQCM analysis by next-generation sequencing showed strong agreement withthose of patient-derived plasma samples, including thesize profile of cfDNA and the quality control metrics ofthe sequencing data. More than 95% of laboratories cor-rectly detected the SCQCMs with EGFR T790M,L858R, KRAS G12D, and a deletion in exon 19, as wellas with EML4-ALK variant 2.

CONCLUSIONS: The SCQCMs were successfully applied ina broad range of settings, methodologies, and informatics

techniques. We conclude that SCQCMs can be used asoptimal quality controls in test performance assessments forcirculating tumor DNA somatic mutation detection.© 2017 American Association for Clinical Chemistry

Recent advances have led to the development of “liq-uid biopsies” that allow for genotyping circulating tu-mor DNA (ctDNA)5 concurrently found in tumors bythe detection of somatic genomic alterations in plasma(1–3 ). Current approaches for the detection of tu-mor aberrations in ctDNA can be divided into 2 cate-gories: methods targeting specific changes, includingthe beads, emulsions, amplification, and magnetics(BEAMing) process built on emulsion PCR (4 ), digi-tal PCR (dPCR) (5 ), and amplification refractory mu-tation system (ARMS) PCR (6 ), and methods detect-ing all possible aberrations in DNA, such as targetedand whole exome/genome sequencing (7, 8 ). How-ever, the detection of specific ctDNA mutations is stillchallenging owing to the extremely low concentrationand highly fragmented size of ctDNA and numerousvariables influencing the interlaboratory concordanceof variant detection in ctDNA, such as different methodsensitivities, different efficiencies of target enrichment strat-egies from cell-free DNA (cfDNA), and stochastic noise ofnext-generation sequencing (NGS) (2, 9–11). Therefore,well-characterized quality control materials of known vari-ants at known concentrations will be valuable in assay de-velopment, test validation, internal quality control, and pro-ficiency tests (12).

Currently, there are no proper standardized ma-terials that can be used as controls for ctDNA somatic

1 National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontol-ogy, Beijing, People’s Republic of China; 2 Graduate School, Peking Union Medical Col-lege, Chinese Academy of Medical Sciences, Beijing, People’s Republic of China; 3 Bei-jing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing,People’s Republic of China; 4 Peking University Fifth School of Clinical Medicine, Beijing,People’s Republic of China.

* Address correspondence to this author at: National Center for Clinical Laboratories, Bei-jing Hospital, No.1 Dahua Road, Dongdan, Beijing 100730, People’s Republic of China.Fax +86-10-65212064; e-mail [email protected].

† Rui Zhang, Rongxue Peng, and Ziyang Li contributed equally to the work.

Received February 3, 2017; accepted May 1, 2017.Previously published online at DOI: 10.1373/clinchem.2017.272559© 2017 American Association for Clinical Chemistry5 Nonstandard abbreviations: ctDNA, circulating tumor DNA; SCQCM, synthetic cfDNA

quality control material; dPCR, digital PCR; ARMS, amplification refractory mutation sys-tem; cfDNA, cell-free DNA; NGS, next-generation sequencing; MNase, micrococcal nu-clease; M-cfDNA, cfDNA based on micrococcal nuclease digestion; SNV, single-nucleotide variant; indel, insertion and deletion; CRISPR/Cas9, clustered regularlyinterspaced short palindromic repeat and CRISPR associated protein 9 system; AF, allelefrequency.

Clinical Chemistry 63:91465–1475 (2017)

Cancer Diagnostics

1465

variation analysis. Ideally, such quality control mate-rials should be obtained from plasma of patients withcancer, but this is not feasible because of the limitednumber of clinical samples. Characterized materialscould be developed by synthetic or spike-in samples,but there are 2 major concerning issues. First, the syn-thetic or spike-in materials should be suitable for all meth-ods, especially for NGS. In contrast to methods targetingspecific changes, NGS needs to distinguish germline poly-morphisms from somatic ones at the sites containing vari-ants and identify somatic mutations that appear in ctDNA.Because germline mutations are mostly uncatalogued bypublic databases, they would be falsely identified as somaticmutations (13, 14). Currently, some NGS laboratories de-tect somatic mutations by parallel analysis of cfDNA in theplasma and genomic DNA in mononuclear cells from thesame individual (15). Therefore, optimal materials shouldrepresent paired tumor-normal materials containing totalcfDNA with somatic mutations and matched genomicDNA as genetic background. Furthermore, the features ofsynthetic materials should be similar to the biological prop-erties of cfDNA in the plasma of patients with cancer. Inprevious studies, biomimetic materials for ctDNA assayswere prepared by shearing mixtures of abnormal and normalgenomic DNA (16). However, cfDNA in patients withcancer is different from sheared fragments. It has been dem-onstrated that cfDNA presents some features of nucleo-somal DNA and positioning, whereas randomly shearedfragments do not possess such properties (17). Therefore,the fragments created by ultrasonication might not be thepreferred material because of different fragmentationpatterns.

In the current study, we set out to develop a set ofsynthetic cfDNA quality control materials (SCQCMs)that comprises spike-in cfDNA based on micrococcal nu-clease (MNase) digestion and matched genomic DNA.The spike-in cfDNA needed to emulate cfDNA bearingsomatic mutations in the plasma of patients with cancer,including single-nucleotide variants (SNVs), insertions anddeletions (indels), and fusion genes. To evaluate their reli-ability, we compared the SCQCMs with patient-derivedplasma as an effective substitute for clinical samples. We alsoevaluated the generalizability of the SCQCMs in a collab-orative study that encompassed 11 laboratories and a broadrange of methodologies and informatics techniques.

Materials and Methods

GENERATION OF CFDNA BASED ON MNASE DIGESTION

Cell-free DNA based on MNase digestion (M-cfDNA)without genomic variation was prepared from theHEK293T cell line. HEK293T cells were purchasedfrom China Infrastructure of Cell Line Resources andcultured in DMEM supplemented with 10% fetal bo-vine serum at 37 °C and 5% CO2. For cell lysis, 1 �

107 HEK293T cells were pelleted and rinsed twice inPBS at 4 °C. Cells were resuspended in 5 mL ofice-cold NP-40 and incubated on ice for 5 min.For MNase digestion, the nuclei were rinsed twicein MNase digestion buffer [1mmol/L Tris-Hcl, 5mmol/L CaCl2, 5 mmol/L 2-mercaptoethanol withprotease inhibitors (Roche), and BSA (New EnglandBiolabs)] and finally resuspended in 100 �L of MNasedigestion buffer. Further, 1700 U of MNase (NewEngland Biolabs) was added to the nuclei suspensionsand incubated at 37 °C for 15 min. Digestion washalted by adding 20 �L of the stop buffer (250mmol/L EDTA, 250 mmol/L EGTA). The digestednuclei were diluted 200-fold with water. Then, 30 �Lof 20 g/L of proteinase K solution and 100 �L of 20%SDS solution were added into 2 mL of diluted nucleipreparation; the mixture was then incubated at 60 °Cfor 20 min. DNA was extracted with MagMAX™Cell-Free DNA Isolation Kit (Thermo Fisher Scien-tific), quantified with BR dsDNA Qubit Assay (Invit-rogen), and verified with Agilent High SensitivityDNA Kit (Agilent Technologies).

DESIGN AND PREPARATION OF SPIKE-IN M-cfDNA WITH SNVS

AND INDELS

We generated spike-in M-cfDNA with SNVs and indelsby mixing DNA fragments obtained by sheared site-directed mutagenesis with M-cfDNA from HEK293Tcells. First, M-cfDNA was prepared from HEK293Tcells by MNase digestion and verified on an Agilent2100 Bioanalyzer. Second, PCR-based site-directedmutagenesis was performed by using genomic DNAextracted from HEK293T cells as the template (Fig.1A). Five mutations were selected to represent SNVsand indels (Table 1). The positions of primers andmutations are shown in Supplemental Table 1 in theData Supplement that accompanies the online versionof this article at http://www.clinchem.org/content/vol63/issue9. The sizes of the designed sequences were1500 –2000 bp. The expected mutations were locatedin the central parts of these fragments. Each fragmentwas inserted into the pMD8-T vector to construct theplasmids. The sequences of the purified plasmids werevalidated with Sanger sequencing; the fragments werethen obtained by cleavage with restriction enzymes.DNA samples were purified using a Gel PurificationKit (QIAGEN). The fragments were sheared to 100 –200 bp and verified by an Agilent 2100 Bioanalyzer.Lastly, the sheared mutated fragments were seriallydiluted into a balanced pool by mixing withM-cfDNA, and the QuantStudio 3D dPCR System(Thermo Fisher Scientific) was used for allele fraction(AF) detection. Full protocols are described in the on-line Data Supplement.

1466 Clinical Chemistry 63:9 (2017)

Fig. 1. Schematic illustration of the preparation and validation of SCQCMs.The workflow of the preparation of SCQCMs. Left panel: HEK293T cells were edited to introduce EML4-ALK rearrangements (variant 2and variant 3) by the CRISPR/Cas9 system. Target sequences were selected on the basis of the following criteria: proximity to intronicbreakpoints reported in patients containing EML4-ALK rearrangements (variant 2 and variant 3) and presence of the protospaceradjacent motif sequences on the 3’-end and a G appended on the 5’-end. After gene editing, the individual positive clones wereblended with HEK293T cells and then digested by MNase to produce approximately 147-bp DNA fragments. Right panel: The genomicDNA was extracted from the HEK293T cells. PCR-based site-directed mutagenesis was performed to introduce KRAS G12D and 5 kindsof EGFR mutations, including T790M, L858R, G12D, deletion in exon 19, and insertion in exon 20. After sonication, the fragments weresheared to 100 –200 bp. Middle panel: MNase enzyme was also used to digest HEK293T cells in order to produce wild-type DNA pieces.The DNA fragments were mixed with DNA segments containing EML4-ALK rearrangements and DNA segments with SNVs and indels,respectively, to generate the spike-in M-cfDNA (A). RT-PCR detection of the fusion transcript from HEK293T cells in which Cas9 wasexpressed with both EML4 sgRNA and ALK sgRNA, EML4 sgRNA alone, ALK sgRNA alone, no sgRNA (px330 vector), and water. Theexpected fusion transcript was detected in the cells transfected with both EML4 sgRNA and ALK sgRNA and verified by sequencing (B).The sheared DNA fragments with EGFR T790M and exon 19 deletion were subjected to serial dilutions in M-cfDNA to determinelinearity (C).

Synthetic cfDNA Quality Control Materials

Clinical Chemistry 63:9 (2017) 1467

Tabl

e1.

The

inte

nded

resu

lts,v

alid

atio

nre

sults

,and

resu

ltsfro

m24

data

sets

ofth

eSC

QCM

s.

Sam

ple

No

.G

ene

Tran

scri

ptI

DV

aria

ntIn

tend

edA

F

Val

idat

edus

ing

NG

SN

o.C

orr

ect/

Tota

lch

alle

nges

Err

ors

Var

iant

AF

AEG

FRN

M_0

0522

8.3

c.23

69C

>T

(p.T

hr79

0Met

)2.

5%N

M_0

0522

8.3

(EG

FR):

c.23

69C

>T(

p.T

hr79

0Met

)(p

.Thr

790M

et)

1.64

%95

.9%

(23/

24)

2FN

s(T

790M

,L8

58R

);1F

P(V

600E

)

EGFR

NM

_005

228.

3c.

2573

T>G

(p.L

eu85

8Arg

)2.

5%N

M_0

0522

8.3

(EG

FR):

c.25

73T>

G(p

.Leu

858A

rg)

1.85

%

BEG

FRN

M_0

0522

8.3

c.23

69C

>T

(p.T

hr79

0Met

)2.

5%N

M_0

0522

8.3

(EG

FR):

c.23

69C

>T

(p.T

hr79

0Met

)3.

04%

100%

(24/

24)

0

CEG

FRN

M_0

0522

8.3

c.22

35_2

249d

el15

(p.G

lu74

6_A

la75

0del

)2.

5%N

M_0

0522

8.3

(EG

FR):

c.22

35_2

249d

el15

(p.G

lu74

6_A

la75

0del

)1.

32%

95.9

%(2

3/24

)1

FP(T

790M

)

DEG

FRN

M_0

0522

8.3

c.23

10_2

311i

nsG

GT

(p.A

sp77

0_A

sn77

1ins

Gly

)1.

0%N

M_0

0522

8.3

(EG

FR):

c.23

10_2

311i

nsG

GT(

p.A

sp77

0_A

sn77

1ins

Gly

)1.

60%

87.5

%(2

1/24

)3

FNs

EEG

FRN

M_0

0522

8.3

c.23

69C

>T

(p.T

hr79

0Met

)5%

NM

_005

228.

3(E

GFR

):c.

2369

C>

T(p

.Thr

790M

et)

6.64

%95

.9%

(23/

24)

1FP

(KR

AS

G12

C)

FK

RA

SN

M_0

3336

0.3

c.35

G>

A(p

.Gly

12A

sp)

2.5%

NM

_033

360.

3(K

RA

S):c

.35G

>A

(p.G

ly12

Asp

)2.

64%

95.9

%(2

3/24

)1

FP(K

RA

SG

12S)

GEG

FRN

M_0

0522

8.3

c.22

35_2

249d

el15

(p.G

lu74

6_A

la75

0del

)2.

5%N

M_0

0522

8.3

(EG

FR):

c.22

35_2

249d

el15

(p.G

lu74

6_A

la75

0del

)1.

07%

100%

(24/

24)

0

HEM

L4-A

LKN

Ra

EML4

exo

n20-

ALK

exo

n20

fusi

on

1.0%

EML4

exo

n20-

ALK

exo

n20

fusi

on

0.72

%95

.9%

(23/

24)

1FN

IEM

L4-A

LKN

REM

L4ex

on6

-ALK

exo

n20

fusi

on

1.0%

EML4

exo

n6-A

LKex

on2

0fu

sio

n0.

74%

79.2

%(1

9/24

)1

FN;4

FP(L

858R

;19

del

;20i

ns)

JN

RN

RW

ild-t

ype

NR

Wild

-typ

eN

R10

0%(2

4/24

)0

NR:n

otre

late

d.

1468 Clinical Chemistry 63:9 (2017)

DESIGN AND PREPARATION OF SPIKE-IN M-CFDNA WITH

FUSION GENES

Before preparing spike-in M-cfDNA with fusion genes,the edited HEK293T cells with EML4-ALK6 rearrange-ments (variant 2 and variant 3) were prepared by clus-tered regularly interspaced short palindromic repeat andthe CRISPR-associated protein 9 (CRISPR/Cas9) sys-tem (the detailed protocol is provided in the online DataSupplement). The edited HEK293T cells with EML4-ALK rearrangements (variant 2 and variant 3) were ana-lyzed by NGS to calculate AFs of the edited cells. Then,we mixed the edited cells with HEK293T cells at anexpected ratio. The mixtures of cells were then digestedto yield M-cfDNA.

TARGETED CAPTURE NGS

The spike-in M-cfDNA samples and plasma cfDNAwere subjected to targeted capture massively parallel se-quencing. Full protocols are described in the online DataSupplement.

ANALYSIS OF NGS DATA

Sequencing data were processed using a custom bioinfor-matics pipeline, including quality control, reads collapse,reads alignment, and variants calling. Details are de-scribed in the online Data Supplement.

EVALUATION OF THE SUITABILITY OF SCQCMS

Ten vials of SCQCMs were shipped to each laboratory.Detailed instructions for storage conditions and assayprocedures were provided. The negative control (NC)sample was specially described as normal genomic DNAextracted from normal tissues or blood cells. Laboratorieswere required to submit their results, including the vari-ants and corresponding AFs, within 4 weeks. In addition,other important information required included themethods, reagents, and instruments used for mutationanalysis; detectable ranges and minimum detection lim-its; enrichment strategy and bioinformatics tools for thealignment to the reference human genome, variant call-ing, and variant annotation; databases against which thereads were searched; and quality metrics.

Accession codes: Targeted capture massively parallelsequencing have been deposited in the Sequence ReadArchive (SRA) and are available via the BioProjectPRJNA361443.

Results

VALIDATION OF SPIKE-IN M-CFDNA SAMPLES

The site-directed mutagenesis DNA and fusion tran-scripts were confirmed by Sanger sequencing andmatched with the predicted sequences by BLAST (Fig.1B). To ensure the quality of our spike-in M-cfDNA, aspectrophotometer was used to determine the purity(A260/A280 ratio) of the spike-in M-cfDNA. The resultsrevealed that all our samples had high purity [mean (SD)A260/A280 ratio: 1.75 (0.1)]. The results of capillary elec-trophoresis also confirmed that no contamination oflarge DNA fragments existed. All the spike-in M-cfDNAsamples prepared were detected by NGS to confirm themutations and corresponding AFs designed. Table 1summarizes the composition of the sample panel: 10 ofthe samples included tumor-derived mutations, whereas1 sample included only genomic DNA from HEK293Tcells and acted as a control to filter out irrelevantmutations.

We further evaluated the performance of ourspike-in M-cfDNA preparation in terms of linear detec-tion. Relationships between AFs obtained by theoreticalcalculations and by dPCR were also formulated. ShearedDNA fragments with EGFR T790M and a deletion inexon 19 were serially diluted into balanced pools andthen blended with M-cfDNA at precise ratios of 20%,10%, 5%, 2.5%, 1.25%, and 0.625%. In EGFR T790M,the AFs of serially diluted artificial spike-in M-cfDNAsamples were linearly related to those detected by dPCR(slope � 0.46, R2 � 0.998). In the case of the exon 19deletion fragment, a similar alignment signature was ob-served (slope � 0.73, R2 � 0.988; Fig. 1C). By calculat-ing the theoretical formulas, we could estimate the AF inour spike-in M-cfDNA in a roughly predictable pattern.

COMPARISON OF THE PERFORMANCE OF SCQCMS WITH

PLASMA SAMPLES

To demonstrate the utility of SCQCMs as quality con-trol materials, we compared the results of the spike-inM-cfDNA analysis by NGS with those of plasma sam-ples, as variations might arise from intrinsic sequence-specific biases during library preparation, sequencing,and the data analysis step owing to sample origin differ-ences. Ten prepared spike-in M-cfDNA samples, plasmasamples from 5 patients with tumors, and samples from 5healthy individuals were subjected to library preparationand target sequencing using a custom target panel on aNextSeq500 machine (Illumina).

Recent studies have shown that cfDNA fragmentsare mainly generated by apoptosis-associated enzymaticdigestion and that the overall size profile of plasmacfDNA is a unique origin signature. The nuclear archi-tecture, gene structure, and expression level correlate wellwith cfDNA nucleosome occupancy and are directly

6 Human genes: EML4, Echinoderm microtubule associated protein like 4; ALK, ALK recep-tor tyrosine kinase; EGFR, epidermal growth factor receptor; KRAS, KRAS proto-oncogene, GTPase.

Synthetic cfDNA Quality Control Materials

Clinical Chemistry 63:9 (2017) 1469

footprinted by cfDNA (18 ), which makes cfDNA natu-rally different from genomic DNA or mechanicalshearing-derived DNA fragments. The size distributionsof the spike-in M-cfDNA, plasma cfDNA of patientswith breast cancer, and plasma of healthy individuals areshown in Fig. 2A. The most prominent peak was ob-

served at 147 bp in the size distribution of the spike-inM-cfDNA, bearing correspondence to precisely trimmedcore particles (19, 20 ). The spike-in M-cfDNA also ex-hibited fragments shorter than 147 bp, with minor peaksaround 130 bp, corresponding to internally digestednucleosomes, as well as fragments longer than 147 bp,

Fig. 2. Sequencing quality of sequenced reads derived from the spike-in M-cfDNA samples and plasma cfDNA samples.The fragment lengths derived from M-cfDNA (red) were generally shorter than the fragment lengths present in the patients and healthycontrols (green and blue lines, respectively) (A). FastQC package reports for the spike-in M-cfDNA and for patient with breast cancer plasmacfDNA sequence reads, showing Phred quality scores as a function of nucleotide position within sequenced reads and on a per-read basis (B).The introduced genetic variations (including SNV, indel, gene fusion) were identified in the spike-in M-cfDNA samples with the expectedmutant AF (C). The AF of each mutation in the spike-in M-cfDNA samples and tumor plasma is shown.

1470 Clinical Chemistry 63:9 (2017)

with a weak hump at approximately 166 bp, related toparticles with protruding linker DNA. The size distribu-tion profile of our spike-in M-cfDNA obtained in rela-tion with MNase was slightly shorter than the lengthof most common fragments (approximately 167 bp) ofcfDNA from patients with tumor and from the plasma ofhealthy individuals (Fig. 2A). Although previous studieshave shown that circulating DNA of patients with canceris more fragmented than that of healthy individuals (21–23), the distribution profiles of cfDNA for both the pa-tients with early-stage breast cancer and the healthy indi-viduals chosen in our study were comparable. The reasonfor this discrepancy might be the fluctuating release oftumor DNA into plasma depending on individual-specific factors and stages of tumors (24, 25 ). This situ-ation was also observed in the plasma of patients withhepatocellular carcinoma (HCC) (26 ).

Quality control metrics of the sequencing data couldprovide us with a quick insight into sample quality (27 ).In our study, quality control statistics of the raw sequenc-ing data showed that the spike-in M-cfDNA-derivedreads were similar to reads derived from plasma cfDNAof a patient with tumor in the sequencing quality pattern(Fig. 2B). The overall alignment rate of reads generatedfrom the spike-in M-cfDNA samples was almost identi-cal to that of reads from the plasma samples (both�98%), and the average target capture efficiency of thespike-in M-cfDNA samples was similar to that of theplasma samples (51.5% vs 49.0%, respectively; see Table4 in the online Data Supplement). In addition, the dis-tributions of sequencing depth values in the spike-inM-cfDNA samples and cfDNA from plasma sampleswere also similar (see Table 4 in the online DataSupplement).

We also explored variants’ identification by NGSusing our SCQCMs. Our SCQCMs contained differenttypes of clinically relevant genetic variations, rangingfrom SNVs to indels and fusion genes. We successfulidentified all the introduced variants by using NGS(Fig. 2C). This demonstrates that introduced spike-inM-cfDNA variants could be precisely detected, verifyingtheir suitability as a quality evaluation standard for a va-riety of methods of ctDNA mutation detection.

CONSISTENCY OF THE SCQCMS FOR SOMATIC MUTATION

DETECTION ACROSS LABORATORIES AND METHODS

We next invited 11 laboratories to participate in a collab-orative study to evaluate the suitability of our SCQCMs.In total, 24 data sets were received because 9 laboratoriesreturned data generated by 2 different methods, and 2laboratories returned data generated from 3 methods.A wide range of detection and informatics methodswere used by the laboratories, and no 2 laboratoriesused identical workflows (see Table 5 in the onlineData Supplement).

All the laboratories specified the required quantity ofcfDNA extracted from the plasma based on their meth-ods (see Table 5 in the online Data Supplement). FiveNGS laboratories determined the cfDNA size distribu-tion to examine whether genomic DNA or long-fragment DNA were present (see Table 5 in the onlineData Supplement). It has been demonstrated that inhigh-quality plasma, long DNA fragments represent only20% of cfDNA (28, 29 ). Thus, determination of thefragment size distribution is essential for revealing thecontamination of larger normal genomic DNA frag-ments from cellular DNA. All the laboratories declaredthat the quantity and size profiles of the SCQCMs mettheir method-specific requirements for testing ctDNAfrom clinical samples.

We next evaluated the consistency of SCQCMsfrom each of the 24 data sets. The results from the labo-ratories were compared with expected reference results.For the samples with mutations out of the specific detect-able range of laboratories, negative results were consid-ered to be correct. More than 95% of laboratories cor-rectly detected the SCQCMs with KRAS G12D SNVs,EGFR T790M, L858R, and a deletion in exon 19, as wellas with EML4-ALK variant 2 (Table 1 and Fig. 3). Themajority of laboratories correctly detected SCQCMswith the insertion in exon 20 and EML4-ALK variant 3(87.5% or 21/24 and 79.2% or 19/24, respectively). Wealso investigated the performance of the laboratories ac-cording to the methodologies applied (Fig. 4A). Morethan 80% of the laboratories using NGS, ARMS, anddPCR detected all of the SCQCMs correctly except forthe 71.5% (5/7) concordance of EML4-ALK variant 3reported by laboratories using ARMS. Because distinctenrichment methods, sequencing platforms, and bioin-formatics pipelines of NGS were used in the 11 partici-pating laboratories, the obtained results demonstratedthe overall suitability of SCQCMs as quality control ma-terials for those frequently used NGS, ARMS and dPCRassays.

To rule out the possibility that the mistakes were dueto the SCQCMs, we sought to explore the reasons for thediscrepant results (Table 1). In total, 7 false-negative and8 false-positive results were reported. Laboratory 04missed 6 mutations, including EML4-ALK variants 2 and3, T790M, L858R, and the insertion in exon 20. Wespeculate that the low AFs of the insertion in exon 20 andEML4-ALK rearrangements in the SCQCMs may havecontributed to the failure to detect these mutations ow-ing to the 1% limit of detection at that laboratory. How-ever, this reason cannot account for the omission ofT790M and L858R because all other laboratories de-tected sample A correctly. This suggests that laboratory04 might not have properly validated the process to es-tablish the expected performance characteristics beforeusing the method. Further, 50% (4/8) of false-positive

Synthetic cfDNA Quality Control Materials

Clinical Chemistry 63:9 (2017) 1471

results were generated by laboratory 04 using the ARMSmethod, whereas other false-positive results appeared tobe randomly distributed and varied between the labora-tories. Therefore, the errors were unlikely to have beencaused by the preparation of the SCQCMs.

We also observed that our SCQCMs were suitablefor monitoring the ability of NGS assays to differentiatesomatic mutations from germline polymorphisms. Thedetectable genes of the NGS laboratories in this studywere variable (see Table 5 in the online Data Supple-ment). No germline SNVs or indels were reported ineither of the data sets (Fig. 3). This indicated that pairedtumor-normal SCQCMs, which were based on M-cfDNAand genomic DNA, were identical at each matched site ex-cept for the designed mutations.

CONSISTENCY OF SCQCMS FOR SOMATIC MUTATION AFS

ACROSS LABORATORIES AND METHODS

We observed that the mean values of mutant allele per-centage reported by the laboratories were close to thoseexpected when the samples were prepared (Fig. 4B). Fur-thermore, the AFs of SNVs in the NGS group were sim-ilar to those in the dPCR group (Fig. 4C). For the dele-tion in exon 19, the mean AF value in the dPCR groupwas higher than that in the NGS group. Similar variabil-

ity of the AFs of the deletion in exon 19 was also found intesting ctDNA from patients with lung adenocarcinomadetected by dPCR and NGS (30 ). We also noticed thatall the coefficients of variation generated by NGS werehigher than those of the dPCR approach for eachSCQCM (Fig. 4D). We speculated that the simpler pro-cedure of dPCR might have contributed to its high inter-laboratory reproducibility.

Discussion

In this study, we generated synthetic cfDNA as qualitycontrol materials with somatic mutations by 2 spike-inapproaches based on the use of M-cfDNA: CRISPR/Cas9-edited HEK293T cells and sheared site-directedmutagenesis of DNA fragments engineered to containpreviously reported cancer-related mutations, whichhave a genetic background identical to that of HEK293Tcells. We report that the use of SCQCMs has severaldistinct advantages.

First, the SCQCMs based on M-cfDNA have bio-logical properties similar to those of plasma cfDNA spe-cies, such as fragments associated with nucleosome coreparticles and trimmed nucleosomes (18, 25 ). BecauseMNase can digest the DNA between nucleosomes, it can

Fig. 3. Testing results of the 11 laboratories.The distributions of the results are indicated by the columns of samples between the darkest vertical lines. A blue box indicates that theexpected variants were correctly reported with or without false-positive results; a gray box indicates that expected variants were not reportedbecause the expected variant fell outside the specific detectable range; a green box indicates a false-negative result; a red box indicates afalse-positive result. The allele frequencies reported are shown inside the boxes.

1472 Clinical Chemistry 63:9 (2017)

produce a characteristic size profile with a fairly broadpeak at 147 bp. The analysis of the spike-in M-cfDNA byNGS also showed a consistent size profile analogous tocfDNA in patients with cancer, as demonstrated in aprevious study (31 ). Thus, SCQCMs should be pre-ferred quality control materials for ctDNA testing overthe DNA fragments created by mechanical shearing be-cause the latter have different fragmentation patterns.

Second, our SCQCMs are widely applicable forvarious ctDNA testing methods. NGS tests demon-strated that the spike-in M-cfDNA samples performedanalogously to clinical samples from patients with tu-mor and samples from healthy people, verifying theirsuitability as highly multiplex control materials for NGSclinical assays. Most importantly, the spike-in M-cfDNAcan generate paired tumor-normal SCQCMs with manydifferent mutations using the same genetic background.Therefore, SCQCMs can be applied in all NGS laboratoriesto identify somatic mutations. NGS-based methods (7, 8)also vary by their multiple enrichment strategies, the differ-ent characteristics of commercial sequencing platformsbased on different chemistry approaches, dissimilar bioin-formatics pipelines, and distinct databases. By applying thespike-in M-cfDNA for the evaluation of different capturingand enrichment strategies and over different platforms byuse of diverse ctDNA analysis pipelines, we illustrated thebroad utility of the spike-in M-cfDNA.

Third, the spike-in M-cfDNA can be mixed at pre-cise ratios to generate a standard to assess quantitativefeatures such as AFs. When total cfDNA is degraded,DNA is released from both normal and tumor cells, andso the variant allelic fraction in ctDNA can be as low as0.01% (9, 32 ), which requires highly sensitive method-ologies. Hence, it is important for the validation of ana-lytical sensitivity in clinical ctDNA assays. Our futurestudies will apply SCQCMs in the evaluation of differentmethods, and the variants with AFs down to 0.01% willbe included to evaluate the method sensitivity.

Finally, SCQCMs can be applied for monitoringfalse-positive and false-negative results in the qualitycontrol of ctDNA testing workflow in laboratories.Mu et al. (33 ) showed that 1.3% of the variants de-tected by NGS were revealed to be false-positives bySanger sequencing. Other studies also reported discor-dant NGS results from different laboratories (34, 35 ).The errors observed in our study demonstrated that itis essential to have suitable quality control materials toensure that different assays generate comparable results.

Despite the proven suitability of our SCQCMs asquality evaluation standards, there are 2 limitations toour approach. First, our samples did not involve acfDNA extraction process. Isolation of cfDNA fromsmall amounts of plasma remains one of the greatest

Fig. 4. Summary of the results from the spike-in M-cfDNA carrying different somatic mutations.The correct rates of samples A–J reported by the laboratories using NGS, ARMS, and dPCR (A). The maximum, minimum, and meanvalues of AFs of each variant reported by all the laboratories and the expected AFs of each variant (B). The AFs of each variant reportedby the laboratories using NGS and dPCR (C). The mean values and CVs of the AFs of each variant reported by the laboratories using NGSand dPCR (D). NR: not related.

Synthetic cfDNA Quality Control Materials

Clinical Chemistry 63:9 (2017) 1473

technical challenges that we still face with regard toctDNA genotyping (36, 37 ). However, it is nearly im-possible for plasma to be used as a quality controlmaterial for ctDNA somatic mutation testing. On theone hand, a mutation in ctDNA is difficult to charac-terize because of plasma DNA complexity and lowfrequency (0.01%) of variant AFs in ctDNA (38 ). Onthe other hand, it is impractical to prepare mimic qual-ity control materials by mixing fragmented DNA withthe plasma because the genetic backgrounds of plasmaDNA and fragmented DNA are different. In contrast,our SCQCMs based on M-cfDNA can be well charac-terized, thereby eliminating the need to obtain limitedplasma specimens. Second, recent studies have shownthat some tumor-derived cfDNA is highly fragmentedand mainly consists of fragments �147 bp and even�100 bp (21–23, 26 ), revealing that mechanismsother than apoptosis might exist to explain the originof cfDNA. In this situation, the quality control mate-rials might not be optimal. However, as apoptosis isthe main driver for the release of DNA in the circula-tion, SCQCMs should be desirable as quality controlmaterials for ctDNA testing in most cases.

Author Contributions: All authors confirmed they have contributed tothe intellectual content of this paper and have met the following 3requirements: (a) significant contributions to the conception and design,acquisition of data, or analysis and interpretation of data; (b) draftingor revising the article for intellectual content; and (c) final approval ofthe published article.

Authors’ Disclosures or Potential Conflicts of Interest: Upon man-uscript submission, all authors completed the author disclosure form. Dis-closures and/or potential conflicts of interest:

Employment or Leadership: None declared.Consultant or Advisory Role: None declared.Stock Ownership: None declared.Honoraria: None declared.Research Funding: R. Zhang, Beijing Natural Science Foundation(Grant 7174345) to Beijing Hospital and National Natural ScienceFoundation of China (Grant 81601848) to Beijing Hospital; J. Li, theSpecial Fund for Health Scientific Research in the Public Interest fromNational Population and Family Planning Commission of the People’sRepublic of China (No. 201402018) to Beijing Hospital.Expert Testimony: None declared.Patents: None declared.

Role of Sponsor: The funding organizations played no role in thedesign of study, review and interpretation of data, or preparation orapproval of manuscript.

References

1. Alix-Panabieres C, Pantel K. Clinical applications of cir-culating tumor cells and circulating tumor DNA as liquidbiopsy. Cancer Discov 2016;6:479 –91.

2. Heitzer E, Ulz P, Geigl JB. Circulating tumor DNA as aliquid biopsy for cancer. Clin Chem 2015;61:112–23.

3. Garcia-Murillas I, Schiavon G, Weigelt B, Ng C, HrebienS, Cutts RJ, et al. Mutation tracking in circulating tumorDNA predicts relapse in early breast cancer. Sci TranslMed 2015;7:302ra133.

4. Diehl F, Schmidt K, Choti MA, Romans K, Goodman S, LiM, et al. Circulating mutant DNA to assess tumor dy-namics. Nat Med 2008;14:985–90.

5. Brychta N, Krahn T, von Ahsen O. Detection of KRAS muta-tions in circulating tumor DNA by digital PCR in early stagesof pancreatic cancer. Clin Chem 2016;62:1482–91.

6. Aung KL, Donald E, Ellison G, Bujac S, Fletcher L, Can-tarini M, et al. Analytical validation of BRAF mutationtesting from circulating free DNA using the amplifica-tion refractory mutation testing system. J Mol Diagn2014;16:343–9.

7. Newman AM, Bratman SV, To J, Wynne JF, Eclov NC,Modlin LA, et al. An ultrasensitive method for quantitat-ing circulating tumor DNA with broad patient coverage.Nat Med 2014;20:548 –54.

8. Newman AM, Lovejoy AF, Klass DM, Kurtz DM, ChabonJJ, Scherer F, et al. Integrated digital error suppressionfor improved detection of circulating tumor DNA. NatBiotechnol 2016;34:547–55.

9. Volik S, Alcaide M, Morin RD, Collins C. Cell-free DNA(cfDNA): clinical significance and utility in cancershaped by emerging technologies. Mol Cancer Res2016;14:898 –908.

10. Robasky K, Lewis NE, Church GM. The role of replicatesfor error mitigation in next-generation sequencing. NatRev Genet 2014;15:56 – 62.

11. Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Se-

quencing depth and coverage: key considerations ingenomic analyses. Nat Rev Genet 2014;15:121–32.

12. Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP,Hambuch T, et al. Assuring the quality of next-generation sequencing in clinical laboratory practice.Nat Biotechnol 2012;30:1033– 6.

13. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L,Smigielski EM, Sirotkin K. dbSNP: the NCBI database ofgenetic variation. Nucleic Acids Res 2001;29:308 –11.

14. 1000 Genomes Project Consortium, Abecasis GR, Au-ton A, Brooks LD, DePristo MA, Durbin RM, et al. Anintegrated map of genetic variation from 1,092 humangenomes. Nature 2012;491:56 – 65.

15. Liu Y, Loewer M, Aluru S, Schmidt B. SNVSniffer: an inte-grated caller for germline and somatic single-nucleotideand indel mutations. BMC Syst Biol 2016;10:47.

16. Harkins SB, Tomson FL, Anekella B, Garlick R. Method-ological considerations in the preparation of biomi-metic reference materials for ctDNA assays. Cancer Res2016;76:3961.

17. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, QuakeSR. Noninvasive diagnosis of fetal aneuploidy by shot-gun sequencing DNA from maternal blood. Proc NatlAcad Sci U S A 2008;105:16266 –71.

18. Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA comprises an in vivo nucleosome footprint thatinforms its tissues-of-origin. Cell 2016;164:57– 68.

19. Cole HA, Cui F, Ocampo J, Burke TL, Nikitina T, Nagara-javel V, et al. Novel nucleosomal particles containingcore histones and linker DNA but no histone h1. NucleicAcids Res 2016;44:573– 81.

20. Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM,Henikoff S. Epigenome characterization at singlebase-pair resolution. Proc Natl Acad Sci U S A 2011;108:18318 –23.

21. Mouliere F, Robert B, Arnau Peyrotte E, Del Rio M,Ychou M, Molina F, et al. High fragmentation character-

izes tumour-derived circulating DNA. PloS One 2011;6:e23418.

22. Thierry AR, El Messaoudi S, Gahan PB, Anker P,Stroun M. Origins, structures, and functions of circu-lating DNA in oncology. Cancer Metastasis Rev2016;35:347–76.

23. Thierry AR, Mouliere F, Gongora C, Ollier J, Robert B,Ychou M, et al. Origin and quantification of circulatingDNA in mice with human colorectal cancer xenografts.Nucleic Acids Res 2010;38:6159 –75.

24. Garcia-Olmo DC, Picazo MG, Toboso I, Asensio AI, Garcia-Olmo D. Quantitation of cell-free DNA and RNA in plasmaduring tumor progression in rats. Mol Cancer 2013;12:8.

25. Jiang P, Lo YM. The long and short of circulating cell-free DNA and the ins and outs of molecular diagnostics.Trends Genet 2016;32:360 –71.

26. Jiang P, Chan CW, Chan KC, Cheng SH, Wong J, WongVW, et al. Lengthening and shortening of plasma DNAin hepatocellular carcinoma patients. Proc Natl Acad SciU S A 2015;112:E1317–25.

27. Guo Y, Ye F, Sheng Q, Clark T, Samuels DC. Three-stagequality control strategies for DNA re-sequencing data.Brief Bioinform 2014;15:879 – 89.

28. Chan KC, Yeung SW, Lui WB, Rainer TH, Lo YM. Effectsof preanalytical factors on the molecular size of cell-freeDNA in blood. Clin Chem 2005;51:781– 4.

29. Medina Diaz I, Nocon A, Mehnert DH, Fredebohm J,Diehl F, Holtrup F. Performance of Streck cfDNA bloodcollection tubes for liquid biopsy testing. PloS One2016;11:e0166354.

30. Yang X, Zhuo M, Ye X, Bai H, Wang Z, Sun Y, et al. Quan-tification of mutant alleles in circulating tumor DNA canpredict survival in lung cancer. Oncotarget 2016;7:20810 –24.

31. Kwiatkowski DJ, Underhill HR, Kitzman JO, Hellwig S,Welker NC, Daza R, et al. Fragment length of circulatingtumor DNA. PLoS Genet 2016;12:e1006162.

1474 Clinical Chemistry 63:9 (2017)

32. Diaz LA, Jr., Bardelli A. Liquid biopsies: genotypingcirculating tumor DNA. J Clin Oncol 2014;32:579 –86.

33. Mu W, Lu H-M, Chen J, Li S, Elliott AM. Sanger confir-mation is required to achieve optimal sensitivity andspecificity in next-generation sequencing panel test-ing. J Mol Diagn 2016;18:923–32.

34. Kuderer NM, Burton KA, Blau S, Rose AL, Parker S, Ly-man GH, Blau CA. Comparison of 2 commercially avail-able next-generation sequencing platforms in oncol-

ogy. JAMA Oncol 2017;3:996 – 8.35. Zhang R, Ding J, Han Y, Yi L, Xie J, Yang X, et al. The

reliable assurance of detecting somatic mutations incancer-related genes by next-generation sequencing:the results of external quality assessment in China. On-cotarget 2016;7:58500 –15.

36. Devonshire AS, Whale AS, Gutteridge A, Jones G, Co-wen S, Foy CA, Huggett JF. Towards standardisation ofcell-free DNA measurement in plasma: controls for ex-traction efficiency, fragment size bias and quantifica-

tion. Anal Bioanal Chem 2014;406:6499 –512.37. Malentacchi F, Pizzamiglio S, Verderio P, Pazzagli M,

Orlando C, Ciniselli CM, et al. Influence of storage con-ditions and extraction methods on the quantity andquality of circulating cell-free DNA (ccfDNA): theSPIDIA-DNAplas external quality assessment experi-ence. Clin Chem Lab Med 2015;53:1935– 42.

38. Crowley E, Di Nicolantonio F, Loupakis F, Bardelli A. Liq-uid biopsy: monitoring cancer-genetics in the blood.Nat Rev Clin Oncol 2013;10:472– 84.

Synthetic cfDNA Quality Control Materials

Clinical Chemistry 63:9 (2017) 1475