tla technology & targeted complete ngs sequencing · tla based targeted sequencing presents...

5
..... .............................................................................................................. ............. BLLC1 CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING Features & applications of the human genomic DNA TLA protocol INTRODUCTION Cergentis’ TLA Technology: • enables targeted complete gene sequencing. • requires one primer pair complementary to a short locus specific sequence. • detects all Single Nucleotide Variants and Structural Variants. • enables haplotyping. • is easy to execute with standard laboratory equipment. The original TLA protocol (Nature Biotechnology 2014 1 ) requires cells as input material. This application note describes the features and applications of the TLA protocol for isolated human genomic DNA. LABORATORY AND INPUT DNA REQUIREMENTS The gDNA TLA protocol only requires standard laboratory equipment. The protocol is straightforward and can be performed in 2 days. TLA analyses on DNA isolated with standard protocols enable the amplification and sequencing of >70kb per primer pair. High Molecular Weight DNA results in coverage across a larger sequence per TLA amplification. Multiplex TLA amplifications can be performed to sequence multiple or larger loci. The current protocol requires 5 μg of DNA. Smaller amounts of DNA (>10 ng) can be amplified with Whole Genome Amplification prior to a TLA analysis. TLA primers can be designed and ordered quickly from any oligonucleotide manufacturer. TLA thus enables both routine screening as well as the flexible targeted sequencing of individual loci in individual samples. TLA TECHNOLOGY 1 http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html Figure 1. Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence. Sequencing coverage PCR primers Locus Specific Sequence 20 - 50 kb 20 - 50 kb Locus Figure 2. A summary of the TLA Technology. First, genomic DNA is crosslinked. Crosslinking preferentially occurs between sequences in extreme physical proximity. Crosslinking therefore results in the crosslinking of sequences from the same locus (depicted in red). This results in TLA Template; long stretches of DNA consisting of religated DNA fragments originating from the same locus. The crosslinked DNA is fragmented, religated with a ligase enzyme and then decrosslinked. This template is fragmented and circularised. Stochastic variation in the folding, crosslinking & religation of DNA fragments in individual copies of a locus results in a repertoire of DNA circles that are composed of unique combinations of DNA fragments from that locus. Circular fragments originating from the locus of interest are amplified with inverse primers complementary to a short locus-specific sequence. As a result, the complete locus is amplified and can be sequenced using Next Generation Sequencing technologies. In this manner the TLA Technology enables targeted hypothesis-neutral sequencing. It detects all sequence and structural variants in loci of interest, also in heterogeneous samples such as tumours. The TLA Technology permits multiplexing. Multiple loci can be amplified in multiplex and/or multiple individual amplifications.

Upload: others

Post on 26-May-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING · TLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCINGFeatures & applications of the human genomic DNA TLA protocol

INTRODUCTIONCergentis’ TLA Technology:

• enables targeted complete gene sequencing.• requires one primer pair complementary to a short locus specific sequence. • detects all Single Nucleotide Variants and Structural Variants. • enables haplotyping.• is easy to execute with standard laboratory equipment.

The original TLA protocol (Nature Biotechnology 20141) requires cells as input material. This application note describes the features and applications of the TLA protocol for isolated human genomic DNA.

LABORATORY AND INPUT DNA REQUIREMENTSThe gDNA TLA protocol only requires standard laboratory equipment. The protocol is straightforward and can be performed in 2 days.

TLA analyses on DNA isolated with standard protocols enable the amplification and sequencing of >70kb per primer pair. High Molecular Weight DNA results in coverage across a larger sequence per TLA amplification. Multiplex TLA amplifications can be performed to sequence multiple or larger loci.

The current protocol requires 5 µg of DNA. Smaller amounts of DNA (>10 ng) can be amplified with Whole Genome Amplification prior to a TLA analysis. TLA primers can be designed and ordered quickly from any oligonucleotide manufacturer. TLA thus enables both routine screening as well as the flexible targeted sequencing of individual loci in individual samples.

TLA TECHNOLOGY

1 http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html

Figure 1. Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence.

Sequencing coverage

PCR primers

Locus Specific Sequence

20 - 50 kb 20 - 50 kb

Locus

Figure 2. A summary of the TLA Technology.

First, genomic DNA is crosslinked. Crosslinking preferentially occurs between

sequences in extreme physical proximity.

Crosslinking therefore results in the

crosslinking of sequences from the same locus

(depicted in red).

This results in TLA Template; long stretches of

DNA consisting of religated DNA fragments

originating from the same locus.

The crosslinked DNA is fragmented, religated

with a ligase enzyme and then decrosslinked.

This template is fragmented and circularised.

Stochastic variation in the folding, crosslinking

& religation of DNA fragments in individual

copies of a locus results in a repertoire of DNA

circles that are composed of unique combinations

of DNA fragments from that locus.

Circular fragments originating from the locus

of interest are amplified with inverse primers

complementary to a short locus-specific

sequence.

As a result, the complete locus is amplified

and can be sequenced using Next Generation

Sequencing technologies.

In this manner the TLA Technology enables

targeted hypothesis-neutral sequencing.

It detects all sequence and structural variants

in loci of interest, also in heterogeneous

samples such as tumours.

The TLA Technology permits multiplexing.

Multiple loci can be amplified in multiplex

and/or multiple individual amplifications.

Page 2: TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING · TLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

COMPLETE BRCA1 AND SERPINA GENE SEQUENCINGFigure 3 shows the results of TLA based targeted sequencing of the BRCA1 and SERPINA1 genes on DNA isolated from the NA12878 Cell-Line2.

4 TLA amplifications were performed across the BRCA1 gene and 2 TLA amplifications across the SERPINA1 gene. TLA amplicons were library prepped with Illumina® NexteraXTTM and sequenced on the Illumina MiniSeq® with 150bp paired-end reads. Generated sequence information was mapped with BWA SW and Single Nucleotide Variants (SNVs) were called with samtools mpileup. Identified variants were compared to the public NA12878 genome sequence.

2 http://www.nist.gov/mml/bbd/ppgenomeinabottle2.cfm Data release version Pedigreev0.2

BRCA1

94.844 94.846 94.848 94.850 94.852 94.854 94.856

Position chr14 [Kb]

41.20 41.22 41.24 41.26 41.28

0

20

40

60

80

100

Position chr17 [Kb]

Alle

le fr

eque

ncy

[%]

SERPINA1

NGS

Cov

erag

e de

pth

0

50

100

150

200 10 Kb 2 Kb

BRCA1 SERPINA1

Exons Introns Exons IntronsRegion size [bp] 7,362 73,827 3,687 10,259Bases covered [%] 100 99.533 100 100Bases>10X [%] 100 99.149 100 100Bases>20X [%] 100 98.843 100 100Bases>30X [%] 99.946 98.639 100 100Bases>50X [%] 98.343 98.148 100 100Bases>100X [%] 84.651 93.813 97.261 97.895Min coverage 22x 0x 75x 68xMedian coveragee 744.5x 435x 1017x 317x

BRCA1 SERPINA1

Total reads 3,000,000 500,000Mapped reads 2,976,887 496,580Mapped reads [%] 99,230 99,316Total mapped bases 722,614,711 129,059,094Bases on target 398,756,872 70,990,884Bases on target [%] 55.182 55.006

Gene size 81,189 13,946SERPINA1BRCA1

Tables 1 and 2: Statistics of generated sequencing data across BRCA1 and SERPINA1.

Figure 3. A) coverage profiles generated across BRCA1 & SERPINA1. White arrows indicate the positions of the TLA primer pairs. B) allelic ratios of all known SNVs in BRCA1 and SERPINA1(blue = heterozygous,red = homozygous).

One AT rich sequence of 347 bp in BRCA1 was not sequenced because it is not successfully prepped and/or sequenced in the cited combination of Illumina Library Prep and Sequencing. Otherwise, complete sequence information is obtained across the entire human BRCA1 and SERPINA1 genes and all previously identified SNVs (110 in BRCA1 and 27 in SERPINA1) were identified with the correct zygosity (Figure 3). Coverage statistics are shown in Table 1 and 2.

Page 3: TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING · TLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

COMPLETE SEQUENCING OF SMALL AND LARGE STRUCTURAL VARIANTSTLA is highly suited to completely sequence both small and large structural changes in genes of interest. Figure 4 shows the results of a TLA analysis across the T cell receptor alpha-delta (TCRAD) locus in a sample harbouring a chr8 – chr14 translocation. Figure 5 shows the results of targeted sequencing of BRCA1 and NRXN1 in patient samples3,4.

3 Kind permission to share generated data was given by Dr. Andreas Rump of the Institut für Klinische Genetik, Dresden, Germany. 4 Kind permission to share generated data was given by Prof. Dr. Hilde Peeters of Centre for Human Genetics, Leuven, Belgium.

CATGTAAGTGATGAGAGGAGAT GAACCTTGGGGGGCA GGATAGCAACTATCAGTTAATCTGGN

GS C

over

age

dept

h

0

50

100

150

200

DAD1PVT1

chr8 chr14

129.10 129.12 129.14 129.16 22.98 23.00 23.02 23.04

10 Kb10 Kb

Genomic position [Mb]

0 20 40 60 80 100 120 140 160 180 200 220 240

chr1chr2chr3chr4chr5chr6chr7chr8chr9chr10chr11chr12chr13chr14chr15chr16chr17chr18chr19chr20chr21chr22chrX

Figure 4. A) Whole genome coverage plot and B) locus specific coverage plots generated with a TLA primer pair (white arrow) in a sample with a chr8-chr14 translocation. Peaks in coverage across the TRCAD gene and fusion partner are encircled in red. The identified breakpoint sequence is specified.

NGS

Cov

erag

e de

pth

NRXN1

−0.8−0.6−0.4−0.2

0.00.2

Log

R r

atio

10 Kb

0

100

200

Genomic position chr17 [Mb]

41.16 41.18 41.20 41.22 41.24 41.26 41.28

ACCCCCGCCTCCCAGGTTCAGGCGATTCTCC

BRCA1Genomic position chr2 [Mb]

NGS

Cov

erag

e de

pth

10 Kb

50.93 50.94 50.95 50.96 50.97 50.98

0

200

400

GAGATTTTTAAATCAGAGT ... AGATTTAA AAACAGAGATT

BRCA1RND2VAT1NRXN1 intron IFI35

Figure 5. A) IGV Coverage profile across deletions in the NRXN1 and BRCA1 gene generated with one TLA primer pair per sample (white arrows). The position of the deletion and sequence of the identified deletion breakpoints are shown. AThe breakpoint of the NRXN1 deletion contained a sequence insertion, which is shown in black. B) Log R ratios across the same deletion in the NRXN1 gene generated using a SNV microarray.

Page 4: TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING · TLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

position on chr17

200

600

1000

position on chr17

position on chr17

NGS

Cov

erag

eN

GS Coverage

400

200

0

0

200

400 NGS Coverage

41200000 41220000 41240000 41260000

41200000 41220000 41240000 41260000

ILLUMINA PAIRED-END READS

41200000 41220000 41240000 41260000

exons + introns 97.4 95.3exons 100 99.6

>0 x >=10 xcoverage assignedto allele 1

exons + introns 98.1 96.4exons 99.9 99.4

>0 x >=10 xcoverage assignedto allele 2

PHASING USING SHORT READ PAIRED-END SEQUENCINGIn combination with paired-end Illumina sequencing, the unique composition of the TLA reads enables phasing across large distances and the assembly of sequencing reads in their allele of origin.

Figure 7 shows the results of sequencing and phasing of the BRCA1 gene in the NA12878 cell-line. The sample was sequenced using paired-end Illumina sequencing (4 million reads, 2 X 150 bp). Of the 110 known SNVs, 109 are heterozygous. All 109 heterozygous SNVs were phased using TLA data.

After the heterozyous SNVs have been phased each NGS read containing one of these SNVs can be assigned to have derived from one of alleles. Division of the reads based on their allele of origin resulted in complete coverage on both alleles (apart from the AT stretch that is missed in each experiment). This uniquely shows that both alleles of the gene were captured and sequenced.

Figure 7. A) Principle of TLA based phasing using paired-end Illumina Sequencing data – SNV’s found in paired-end data can be assigned to the same allele. B) Resulting phasing of heterozygous SNV’s across the entire BRCA1 gene. C) Sequencing coverage and identified SNV’s across BRCA1 locus in complete NGS data and in reads assigned to one of the individual alleles. D) Coverage statistics across each allele.

A)

B)

C)

D)

= A = C = G = T

41.21 41.22 41.23 41.24 41.25 41.26

Genomic position chr17 [Mb]

Alle

le fr

eque

ncy

[%]

5

25

50

75

95

10 Kb

95%

−5%

75%

−25%

50%

−50%

25%

−75%

95%

−5%

Figure 6. SAllelec frequencies of SNVs across BRCA1 in five serial dilutions of two homozygous cell-lines. The blue arrow indicates the position of the single TLA primer pair.

TARGETED DEEPSEQUENCING

TLA based deep sequencing enables the detection of rare variants. Figure 6 shows the allelic frequency of SNVs sequenced in homozygous cell-lines mixed in 95/5, 75/25, 50/50, 25/75, and 5/95 ratios in which BRCA1 was sequenced with one single TLA primer pair.

Page 5: TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCING · TLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

NGS Coverage

400

200

0

0

200

400 NGS Coverage

200

600

position on chr17

position on chr17

NGS

Cov

erag

e

41200000 41220000 41240000 41260000

position on chr17

SMRT READS

41200000 41220000 41240000 41260000

41200000 41220000 41240000 41260000

CONCLUSIONTLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables the (deep) sequencing of entire exonic and intronic regions and, as such, the detection of all Single Nucleotide Variants and Structural Variants. In combination with both short and long read Next Generation Sequencing technologies TLA also enables the phasing of regions of interest.

The TLA protocol is easy to execute and is suited for both routine analyses as well as the highly flexible targeted sequencing of individual regions of interest in individual samples.

Links:www.cergentis.com

http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html

http://www.pacb.com/wp-content/uploads/AppNote-Targeted-Sequencing-Chromosomal-Haplotype-Assembly-Cergentis-TLA-Technology-SMRT-Sequencing.pdf

Illumina, Nextera and MiniSeq are are trademarks or registered trademarks of Illumina, Inc Pacific Biosciences, PacBio, and SMRT are trademarks of Pacific Biosciences. All other

trademarks are the sole property of their respective owners. For Research Use Only. Not for use in diagnostic procedures. Information in this document is subject to change without

notice. Cergentis assumes no responsibility for any errors or omissions in this document.

TLA, SMRT® SEQUENCING AND PHASINGPhasing is particularly e�ective in combination with long read sequencing technologies. Pacific Biosciences SMRT based sequencing enables the sequencing of entire TLA amplicons and therefore the phasing of all sequences that occur within one TLA amplicon (Figure 8).

In this experiment a single TLA amplification of the BRCA1 gene was performed in the NA12878 cell-line and sequenced using SMRT sequencing (79,323 CCS reads). 107/109 heterozygous SNVs in BRCA1 were captured and phased. Based on the phasing information the SMRT reads could be assigned to their allele of origin. This showed that 99% of both alleles were sequenced.

Figure 8. A) Principle of TLA based phasing using Pacific Biosciences SMRT Sequencing B) Sequencing coverage and identified SNV’s across BRCA1 locus in all data and in reads assigned to both individual alleles.

= A = C = G = T

A)

B)