tla-based complete transgene & integration site sequencing … · - ori2 specifies the...

4
..... .............................................................................................................. ............. BLLC1 CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING IN CHO CELL DEVELOPMENT AND SELECTION INTRODUCTION Cergentis’ TLA Technology (Nature Biotech 2014 1 ) uniquely enables the efficient targeted complete Next Generation Sequencing (NGS) of transgenes and their integration sites. TLA analyses are therefore very useful in CHO cell development and -selection processes. 1 http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2959.html 2 www.chogenome.org TLA analyses can determine: • The genomic position(s) of integration site(s). • The number of integration sites. • Single Nucleotide Variants in the transgene sequence. • Structural changes in the transgene sequence. • Sequence changes resulting from targeted (trans)gene editing. This information can answer the following questions in different stages of clone selection: Daughter Clones • Are all daughter clones genetically identical to the original mother clone? • Are daughter clones genetically stable or do sequence and/or structural changes occur? TLA TECHNOLOGY The TLA Technology enables the targeted amplification and NGS of any locus of interest using just one primer pair complementary to a short sequence unique to the locus (Figure 1). EXPERIMENTAL SET-UP In typical transgene sequencing analyses, primer sets complementary to short transgene-specific sequences are used. Such TLA analyses provide sequence information across the entire transgene sequence and across the loci in the CHO genome where the transgene has integrated (Figure 2). TLA products are library prepped and sequenced using NGS. NGS reads are mapped using BWA-SW, which is a Smith-Waterman alignment tool. This allows partial mapping which is optimally suited for identifying breakpoint spanning reads. Unless another CHO genome sequence is available, the CHO-K1-v1 2 reference sequence is used for mapping. Figure 1: Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence. Sequencing coverage PCR primers Locus Specific Sequence 50 - 100 kb 50 - 100 kb Locus Mother Clones • Which mother clones have clean (single) integration sites? • Which mother clones should be discarded due to the presence of genetic alterations in the protein coding sequence? Pool Recombinant Cells • Which pools contain high quality integrations and should be prioritized in the pipeline? • How does a transfection method perform? Naïve CHO Cell Pool Recombinant Cells Mother clones Daughter clones CHO GENOME CHO GENOME Transgene Sequenced locus Figure 2: TLA-based transgene sequencing. Using one TLA primer pair complementary to a sequence unique to the transgene sequence, complete sequence information is generated across the transgene and its integration site(s) in the CHO genome. TLA PRIMER PAIR

Upload: others

Post on 23-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING … · - Ori2 specifies the orientation of the read (see above). CONCLUSION TLA-based CHO transgene sequencing provides

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING IN CHO CELL DEVELOPMENT AND SELECTION

INTRODUCTIONCergentis’ TLA Technology (Nature Biotech 20141) uniquely enables the e�cient targeted complete Next Generation Sequencing (NGS) of transgenes and their integration sites. TLA analyses are therefore very useful in CHO cell development and -selection processes.

1 http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2959.html2 www.chogenome.org

TLA analyses can determine:• The genomic position(s) of integration site(s).• The number of integration sites.• Single Nucleotide Variants in the transgene sequence.• Structural changes in the transgene sequence.• Sequence changes resulting from targeted (trans)gene editing.

This information can answer the following questions in di�erent stages of clone selection:

Daughter Clones• Are all daughter clones genetically identical to the original mother clone?• Are daughter clones genetically stable or do sequence and/or structural changes occur?

TLA TECHNOLOGYThe TLA Technology enables the targeted amplification and NGS of any locus of interest using just one primer pair complementary to a short sequence unique to the locus (Figure 1).

EXPERIMENTAL SET-UPIn typical transgene sequencing analyses, primer sets complementary to short transgene-specific sequences are used. Such TLA analyses provide sequence information across the entire transgene sequence and across the loci in the CHO genome where the transgene has integrated (Figure 2).

TLA products are library prepped and sequenced using NGS. NGS reads are mapped using BWA-SW, which is a Smith-Waterman alignment tool. This allows partial mapping which is optimally suited for identifying breakpoint spanning reads. Unless another CHO genome sequence is available, the CHO-K1-v12 reference sequence is used for mapping.

Figure 1: Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence.

Sequencing coverage

PCR primers

Locus Specific Sequence

50 - 100 kb 50 - 100 kb

Locus

Mother Clones• Which mother clones have clean (single) integration sites?• Which mother clones should be discarded due to the presence of genetic alterations in the protein coding sequence?

Pool Recombinant Cells• Which pools contain high quality integrations and should be prioritized in the pipeline?• How does a transfection method perform?

Naïve CHO Cell

Pool Recombinant Cells

Mother clones

Daughter clones

CHO GENOME CHO GENOMETransgene

Sequenced locus

Figure 2: TLA-based transgene sequencing. Using one TLA primer pair complementary to a sequence unique to the transgene sequence, complete sequence information is generated across the transgene and its integration site(s) in the CHO genome.

TLA PRIMER PAIR

Page 2: TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING … · - Ori2 specifies the orientation of the read (see above). CONCLUSION TLA-based CHO transgene sequencing provides

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

INTEGRATION SITESIntegration sites are detected by analysing the coverage profile across contigs in the CHO genome sequence and by identifying breakpoint sequences between the CHO genome and the transgene sequence.

Partial integrations can also be detected by performing multiple TLA amplifications with di�erent primer pairs specific for di�erent positions in the transgene sequence.

Identified breakpoint sequences are specified in tables (Figure 3).

In these tables: - Seq1 specifies the transgene sequence name. - Pos1 specifies the position in the transgene sequence at which the breakpoint occurs.- Ori1 specifies the orientation of the read: + indicates that the fusion read continues with increasing positions (downstream) across the transgene. - indicates the fusion read continues with decreasing positions (upstream) across the transgene.- Seq2 specifies the CHO K1 contig in which the integration breakpoint occurs.- Pos2 specifies the position in the contig at which the breakpoint occurs.- Ori2 specifies the orientation of the read (see above).

3 More detailed information about this approach can be found on www.cergentis.com and in our Nature Biotechnology publication: http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html.

Frequently, structural changes occur as a result of an integration of the transgene. Such structural changes include deletions, as shown in Figure 3, as well as more complex rearrangements involving sequences originating from di�erent contigs. Paired-end sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and therefore whether two breakpoints that occur on di�erent contigs constitute the same integration site3.

TLA analyses with primers specific for identified integration sites can be performed on wild-type samples to further characterise any rearrangements resulting from transgene integrations.

CHO GENOME CHO GENOMETransgene

80 kb 100 kb 120 kb 140 kb 160 kb 180 kb 200 kb 220 kb 240 kb

seq1 pos1 ori1 seq2 pos2 ori2Transgene 200 + gi|351517213|ref|NW_003614336.1| 167543 -Transgene 9000 - gi|351517213|ref|NW_003614336.1| 171831 +

CHO GENOME SEQUENCING COVERAGE

Figure 3: An example of generated sequencing coverage accross a contig where a transgene integration has occurred. The table specifies the breakpoint sequences resulting from the integration. As is apparent from the coverage profile as well as from the positions in the CHO contig where the breakpoints occur, a deletion of CHO sequence has happened as a result of the transgene integration.

Page 3: TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING … · - Ori2 specifies the orientation of the read (see above). CONCLUSION TLA-based CHO transgene sequencing provides

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TRANSGENE SEQUENCE: SINGLE NUCLEOTIDE VARIANTS AND INDELSIdentified Single Nucleotide Variants (SNV’s) and InDels (insertions or deletions) are specified in tables (Figure 4).

In these tables: - Seq1 specifies the transgene sequence analysed.- Pos specifies the position at which the mutation is detected.- Ref specifies the reference sequence.- Alt specifies the identified mutation.- Hom specifies which two positions in the transgene have sequence homology to each other. The mutation likely occurs in one of the two positions rather than in both.- %SNV specifies the percentage of reads that contains the mutation.- Cov specifies the sequencing coverage on the position of the mutation.- Cov and %SNV are specified for data generated in two individual TLA amplifications of the same sample using di�erent transgene specific primer pairs.

SNVs and InDels are reported when they are found to occur with at least a 1% frequency in two independent TLA amplifications. The %SNV value provides a good estimation of the percentage of copies of the transgene in the cell line that contains the specified mutation4.

4 Three factors determine the sensitivity of NGS analyses:• Sequencing coverage; the detection of rare sequence variants requires su�cient sequencing coverage across the region in which these rare alleles are to be detected.• Sequencing errors; <1% of reads will contain sequencing errors. No absolute figure can be given as sequencing errors are context-dependent (see for instance http://genomebiology.com/2013/14/5/R51).• Mapping errors; the analysis of NGS data is based on the high-throughput mapping and processing of large number of sequencing reads. Errors in mapping can result in false positives. Potential false positives can be further analyzed with a more detailed inspection of generated sequences and the analysis of control samples.

Selection of mutations, identified with two transgene-specific primer sets, increases the reliability of their detection.

Transgene -Transgene fusion

Transgene -Transgene fusion

CHO GENOME CHO GENOME

Sequenced locus

Figure 5: TLA-based sequencing of a transgene concatamer. Since each locus- specific sequence will contribute to sequencing coverage, the entire concatamer will be sequenced. The positions of transgene-transgene fusions are highlighted with the red arrows. An example of a table specifying the position and orientation of breakpoints between di�erentcopies of a transgene is shown in Figure 6.

TLA PRIMER PAIRTLA PRIMER PAIRTLA PRIMER PAIR

seq1 pos ref alt hom cov %SNV cov %SNVTransgene 141 A C 610 30 464 28Transgene 489 A G 740 1 570 1Transgene 816 T G 5698 462 1 360 2Transgene 1013 T C 780 100 202 100Transgene 1304 A C 970 100 380 100Transgene 1305 G C 960 100 380 100Transgene 2956 T C 1220 1 490 2Transgene 3561 C A 897 100 200 100Transgene 4638 G A 972 100 482 99Transgene 5698 A C 986 1 195 2Transgene 8836 T G

816812 1 650 1

Transgene 9487 T C 1278 1 850 1Transgene 11037 A G 870 20 850 21

primer-set 1 primer-set 2

Figure 4: An example of a coverage profile across a transgene. In this case, a deletion has occurred in the transgene sequence between positons 8400 and 9100. Most sequences within the transgene have been sequenced with > 1000x coverage (i.e. with at least 1000 NGS reads). The table provides an example of SNV’s identified in a transgene sequence.

TRANSGENE-TRANSGENE FUSIONSOften transgenes concatamerise and multiple copies will integrate in one integration site. Frequently, such concatamerisation will include partial copies of the transgene that have fused in di�erent orientations.Depending on where these fusions occur, they can result in the expression of undesired aberrant proteins. Changes in (the number of) transgene fusions indicate genomic instability of the concatamer and integration site.

If concatamerisation has occurred, each copy will contribute to sequencing coverage. TLA thus provides comprehensive sequence information across the entire concatamer (Figure 5).

position in T-DNA 0 2000 4000 6000 8000 10000 12000

1000

800

600

400

200

0

NGS

cove

rage

Del

etio

n

Page 4: TLA-BASED COMPLETE TRANSGENE & INTEGRATION SITE SEQUENCING … · - Ori2 specifies the orientation of the read (see above). CONCLUSION TLA-based CHO transgene sequencing provides

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BLLC1

CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 [email protected] WWW.CERGENTIS.COM

TRANSGENE COPY NUMBER ANALYSESAn exact copy number cannot be determined using TLA. However, a good estimation can be made based on the number of integration sites, number of fusion reads and the ratio of the coverage on the transgene and its integration site in the CHO genome.

SEQUENCING OF TARGETED GENE EDITING EVENTSAs is sketched in Figure 7, TLA analyses can also be used to perform targeted sequencing of loci in which genetic alterations have been introduced using a targeted gene editing method (e.g. targeted knock-outs or -integrations using CRISPR/Cas9).

TLA can thus be used to assess whether genetic alterations have been generated successfully. This approach can also be used to further characterise individual transgene integration sites and assess which variants (i.e. which mutations and transgene fusions) occur in which integration site.

Transgene fusions are specified in tables (Figure 6).

In these tables:- Pos1 specifies the position in the transgene sequence at which the first breakpoint occurs.- Ori1 specifies the orientation of the read: + indicates that the fusion read continues with increasing positions across the transgene. - indicates the fusion read continues with decreasing positions across the transgene.- Pos2 specifies the position in the transgene at which the second breapoint occurs.- Ori2 specifies the orientation of the read (see above).

CONCLUSIONTLA-based CHO transgene sequencing provides comprehensive information about the integrated transgene sequences and their integration site(s). TLA is ideally suited to select CHO cells for the production of the desired protein, to assess the genetic stability of used CHO production strains, and to analyze CHO cells in the optimization of CHO cell generation protocols.

2000 2500 5000 5500Fusion 1:

Fusion 2:

Fusion 3:

Fusion 4:

Fusions pos1 ori1 pos2 ori2Fusion 1 2500 - 5000 +Fusion 2 2500 - 5500 -Fusion 3 2000 + 5000 +Fusion 4 2000 + 5500 -

0 10.000

Figure 6: A graphical depiction of di�erent fusions that can occur between two transgene sequences and how these di�erent fusion events are shown in the table.

Targeted site

CHO GENOME CHO GENOME

Sequenced locus

TLA PRIMER PAIR

Figure 7: TLA-based analyses of targeted genetic modifications. A TLA analysis with a primer pair in proximity to the targeted site, provides sequence information across this locus.