rna sequencing for full length transcript discovery

55
RNA-Sequencing for Full-length Transcript Discovery Lab Meeting 2/10/14 Anne Deslattes Mays Mentor: Anton Wellstein, MD, PhD Special Recognition: Marcel Schmidt, PhD 06/22/2022 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 1

Upload: anne-deslattes-mays

Post on 10-May-2015

760 views

Category:

Technology


2 download

DESCRIPTION

Use of second and third generation sequencing technology platforms to create a dataset for the discovery of full length transcripts

TRANSCRIPT

Page 1: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 1

RNA-Sequencing for Full-length Transcript DiscoveryLab Meeting

2/10/14

Anne Deslattes MaysMentor: Anton Wellstein, MD, PhD

Special Recognition: Marcel Schmidt, PhD

Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

Page 2: RNA Sequencing for Full Length Transcript Discovery

2

Discovery of homing gene fragments using bone marrow-derived monocytes

Questions:1. which proteins drive organ homing of hematopoietic

cells ?2. are there distinct homing proteins for diseased organs

(cancer, wound healing, ischemia, infection) ?

Approaches: 1. use human bone marrow (BM) cDNA library

that displays large proteins from bone marrow & precursor cells on the phage surface

2. in vivo selection of homing proteins from target organs or vessels in animal models (normal or diseased)

3. this approach selects for gene fragments coding for homing proteins

full length transcriptsfrom source material

Page 3: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

3

Experimental Objective

We aim to identify the full-length transcripts using 2nd and 3rd generation sequencing methods for genes whose fragments were discovered through the

phage display experiments nearly a decade ago.

Page 4: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

4

MedStar Georgetown University Hospital Cell Processing UnitObjective: Obtain healthy donor bone marrow bags

Page 5: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

5

Objective: RNA Isolation from Total Bone MarrowStep 1: Total Bone Marrow Isolation

Page 6: RNA Sequencing for Full Length Transcript Discovery
Page 7: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 7

Four Sequencing Experiments

Second Generation Sequencing

Page 8: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 8

2nd Generation Sequencing with Illumina

HiSeq 2000

Page 9: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 9

Four Sequencing Experiments

Second Generation Sequencing1. Total.bm.random – total bone marrow sequenced mate paired

non-strand specific randomly primed ~ 180 million reads

Page 10: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

10

Page 11: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 11

Experiment 1 Results

Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read informationWellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structureBlat/Blasted the short reads against this region and still• Results were ambiguous information regarding isoforms and gene structure hits

which included phage• Structure of transcript was not clear• Strand information regarding reads aligned not clearNext Steps• Design another experiment, same cell population, this time targeted (including

original phage primers used often in experiments in both lineage negative and total bone marrow experiments) and strand specific

• Create a custom long transcript library primed to include full length phage transcripts

Page 12: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

12

Page 13: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

13

Random RNA-Sequencing vs Strand-specific Targeted RNA-sequencing

Page 14: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

14

Targeted RNA-Sequencing Workflow

5

Page 15: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

15

Initial G12 Gene Model from the Total Bone Marrow

Page 16: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

16

Design targeted primers and create custom long reaction cDNA library

Page 17: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

17

Results and pre-sequencing fragmentation

Page 18: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 18

Experiment 2 Results

Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read informationWellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structureBlat/Blasted the short reads against this region and still• Results were ambiguous information regarding isoforms and gene structure hits which

included phage• Strand information known but yet• Structure of transcript was not clear• Was it the depth? Was it the cell population? Was it mistargeted regions?Next Steps• Design another experiment, now looking at only the lineage negative cell population

where it is known the phage are enriched• Return to randomly primed reads• Sequence at a depth similar to the original total bone marrow experiment (100 million

reads)

Page 19: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 19

Four Sequencing Experiments

Second Generation Sequencing1. Total.bm.random – total bone marrow non-strand specific

randomly primed ~ 180 million reads2. Total.bm.ss.targeted – total bone marrow strand specific targeted

primed to a depth ~ 20 million reads3. Lin.neg.ss.random – lineage-negative strand specific randomly

primed ~ 111 million reads

Page 20: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

20

Page 21: RNA Sequencing for Full Length Transcript Discovery

Negative Selection:Human Progenitor Cell Enrichment Kit with Platelet Depletion

to Isolate the Lineage Negative sub population from total bone marrow

Page 22: RNA Sequencing for Full Length Transcript Discovery

Loading and Negative Controlsclass gene total.bm.ss lin.neg.ss

loading ACTB 2933 12,643loading B2M 1500 8473loading GAPDH 622 44,413negative CD11B 231 1193negative CD11C 132 689negative CD14 21 49negative CD16a 418 1312negative CD19 8 36negative CD2 7 16negative CD24 142 177negative CD3EAP 28 243negative CD56 197 2039negative CD61 24 480negative CD66B 207 208negative glycophorin.A 49 80negative mir155 2 20

Phage and Positive Controlsclass gene total.bm.ss lin.neg.ss

phage _b9 203 2298phage a1 0 0phage A12 0 0phage A5 186 553phage a8 76 789phage b3 439 4731phage b6 68 331phage B9 171 2354phage C1 9 139phage C12 42 10,657phage C2 147 1757phage c3 163 453phage C7 170 1419phage d5 236 744phage E12.1 34 459phage E7 106 300phage E9 236 2723phage F6 120 2556phage G12 292 925phage H3 64 1060phage h4 179 658phage h6 0 0phage h7 126 1302positive BST1 32 1616positive CD133 0 0positive CD34 9 398positive THY1 2 4

3 loading controls13 negative controls27 Positive controls and phage

Page 23: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

23

Peak read count: 45,701

Peak read count: 52,626

Peak read count: 12,570

Peak read count: 200

ACTB

Page 24: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

24

Negative Control: CD14 (should be highest in Total Bone Marrow)

Peak read count: 109

Peak read count: 6318

Peak read count: 48

Peak read count: 21

Page 25: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

25

Negative Control: CD34 (should be highest in Lineage Negative)

Peak read count: 169

Peak read count: 43

Peak read count: 386

Peak read count: 10

Page 26: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 26

What’s Wrong With Illumina ReadsUniformity of Read Coverage*

• An aligned read can be represented as an integer point in R2 as follows: The ‘t-coordinate’ corresponding to the read is its left-end point while the ‘l-coordinate’ is the length of the fragment. In Evans et al. (2010), it is shown that for any choice of fragment length distribution, the col- lection of points f(t, l)} from a sequencing experiment forms a two-dimensional Poisson process. This principle guides our further analysis of these points f(t, l)}, as we test for uniformity in both the t and l coordinates. The output of ReadSpy is a list of test statistics and P-values for each transcript. A statistically significant (low) P-value means we reject the fact that the dataset is uniform on that transcript. Thus, a higher P-value corresponds to a set of reads sampled uniformly, which is desired. In the next two sections, we describe the statistical test applied a each transcript. The test is formulated in terms of the genomic segment [a, b].

*Hower, Valerie, Richard Starfield, Adam Roberts, and Lior Pachter. "Quantifying uniformity of mapped reads." Bioinformatics 28, no. 20 (2012): 2680-2682.

Page 27: RNA Sequencing for Full Length Transcript Discovery

Lior Pachter’s ReadSpy ResultsTotal BM Targeted Strand Specific (20 million reads)

target_id length dfpair_counts_0 test_stat_0 p_value_0

chr19 49129131 19 226 3948.34 0.00E+00chr4 191038775 19 227 1760.40 0.00E+00chr11 135006716 19 304 2811.79 0.00E+00chr2 243199471 19 361 6859.00 0.00E+00chr16 90354953 38 402 7638.00 0.00E+00chr9 141354337 38 436 2754.92 0.00E+00chr12 133851995 57 797 15143.00 0.00E+00chr15 102531492 76 841 15979.00 0.00E+00chr1 249250866 247 2739 20184.43 0.00E+00chr7 159138908 285 3325 54980.68 0.00E+00

Lineage Negative Strand Specific Random (110 million reads)

target_id length dfpair_counts_0 test_stat_0 p_value_0

chrY 59373664 19 224 4256.00 0.00E+00chr21 48130091 19 284 2951.63 0.00E+00chr19 49129131 57 663 10583.74 0.00E+00chr8 146364218 57 751 5478.61 0.00E+00chr10 135534897 76 902 8655.73 0.00E+00chr3 198022577 76 957 12936.24 0.00E+00chr16 90354953 133 1439 27341.00 0.00E+00chr11 135006716 190 2067 23431.41 0.00E+00chr2 243199471 190 2260 42940.00 0.00E+00chr4 191038775 285 3236 40639.91 0.00E+00chr9 141354337 304 3423 23574.66 0.00E+00chr15 102531492 380 5735 108965.00 0.00E+00chr1 249250866 912 10322 97596.23 0.00E+00chr7 159138908 2394 29726 504209.24 0.00E+00chr12 133851995 5605 84272 1601168.00 0.00E+00

Our reads all have low p-values indicating the non-uniform

nature of their read coverage

Page 28: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 28

Experiment 3 Results

Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read informationWellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structureBlat/Blasted the short reads against this region and still• Results were ambiguous information regarding isoforms and gene structure hits which included

phage• Strand information known but yet• Enrichment in population is evident• Unambiguous Structure of phage transcripts still not clear• Finding known genes can be done, even de novo assembly of novel transcripts is done on a

regular basis• But with these phage, a fragment is known -- how do we find the full length structure of this

phage?• What if we had the phage transcripts in the targeted full length library, but it was lost in the

fragmentation? Is there a way to do sequencing without fragmentation?Next Steps• Use new 3rd generation technology to do full length transcript sequencing without fragmentation

Page 29: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

29

Source: Iso-seq webinar by Liz Tseng, Pacific Biosystemshttps://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-PacBio-

transcriptome-data

Page 30: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 30

Four Sequencing Experiments

Second Generation Sequencing1. Total.bm.random – total bone marrow sequenced non-strand

specific randomly primed ~ 180 million reads2. Total.bm.ss.targeted – total bone marrow sequenced strand

specific targeted primed to a depth ~ 20 million reads3. Lin.neg.ss.random – lin- sequenced strand specific randomly

primed ~ 111 million readsThird Generation Sequencing4. Lin.neg Pac Bio Long reads – 6 million CCS Filtered SubReads ~ 277,000 readsOfInserts

Page 31: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

31Source: http://www.pacificbiosciences.com/products/smrt-technology/

Page 32: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 32

Source: https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-PacBio-transcriptome-data#wiki-roiexplained

Page 33: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

33

Source: https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-PacBio-transcriptome-data#wiki-roiexplained

Page 34: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

34

Source: Bobby Sebra – smrt portal analysis results

Page 35: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

35

Peak read count: 45,701

Peak read count: 52,626

Peak read count: 12,570

Peak read count: 10

ACTB

Page 36: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

36

Negative Control: CD14 (should be highest in Total Bone Marrow)

Peak read count: 109

Peak read count: 6318

Peak read count: 48

Peak read count: 21

Page 37: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

37

Negative Control: CD34 (should be highest in Lineage Negative)

Peak read count: 169

Peak read count: 43

Peak read count: 386

Peak read count: 10

Page 38: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

38

Phage: B9 – only the phage (953 bp)

Peak read count: 10

Peak read count: 10

Peak read count: 10

Peak read count: 10

Page 39: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

39

Peak read count: 10

Peak read count: 16

Peak read count: 10

Peak read count: 10

Phage: B9 10x larger region (~9kb) centered on phage evidence

Page 40: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

40

Source: self-install smrt portal – reads of insert

Page 41: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

41

87%

11%

2%

Transcript Size Distribution1 to 2k 2 to 3k over 3k

Page 42: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

42

Summary of reads.

------ 5' primer seen summary ---- Per subread: 258835/277161 (93.4%)Per ZMW: 258835/277161 (93.4%)Per ZMW first-pass: 258835/277161 (93.4%)------ 3' primer seen summary ---- Per subread: 1361/277161 (0.5%)Per ZMW: 1361/277161 (0.5%)Per ZMW first-pass: 1361/277161 (0.5%)------ 5'&3' primer seen summary ---- Per subread: 1341/277161 (0.5%)Per ZMW: 1341/277161 (0.5%)Per ZMW first-pass: 1341/277161 (0.5%)------ 5'&3'&polyA primer seen summary ---- Per subread: 18/277161 (0.0%)Per ZMW: 18/277161 (0.0%)Per ZMW first-pass: 18/277161 (0.0%)------ Primer Match breakdown ---- F0/R0: 258855 (100.0%) Source: output of summarize_results.py (Liz Tseng)

Page 43: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

43

But this is not good – it turns out that the primers were incorrectly chosen and the best way to find the primers used is to

do as follows:>cat reads_of_insert.fasta | grep -A1 "AAAAAAAAAAAAAAAAA" | moreGGCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--AACATTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTAACTCTGCGTTGATACCACTGCTT--TGTTTTATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--TTACAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--GAGCCCTTACCGAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--GTGGTGATTGTTTACTAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--GACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--TTTCCCGCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--CTTACTTACGTAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT--GCCCCATCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT

>cat reads_of_insert.fasta | grep -A1 "TTTTTTTTTTTT" | moreAAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTGGCTTGAT--AAGCAGTTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTGATTTCCAT--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTTGGGATCTTT--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTT--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACCCATCAGCG--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTGGTATTTGTTTGTTTCTG--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTT--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGACATAAACAC--AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTAAGCATATTT

Now my primers are:

>F0AAGCAGTGGTATCAACGCAGAGTAC>R0GTAACTCTGCGTTGATACCACTGCTT

Page 44: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

44

------ 5' primer seen summary ---- Per subread: 256672/277161 (92.6%)Per ZMW: 256672/277161 (92.6%)Per ZMW first-pass: 256672/277161 (92.6%)------ 3' primer seen summary ---- Per subread: 208877/277161 (75.4%)Per ZMW: 208877/277161 (75.4%)Per ZMW first-pass: 208877/277161 (75.4%)------ 5'&3' primer seen summary ---- Per subread: 207111/277161 (74.7%)Per ZMW: 207111/277161 (74.7%)Per ZMW first-pass: 207111/277161 (74.7%)------ 5'&3'&polyA primer seen summary ---- Per subread: 100863/277161 (36.4%)Per ZMW: 100863/277161 (36.4%)Per ZMW first-pass: 100863/277161 (36.4%)------ Primer Match breakdown ---- F0/R0: 258438 (100.0%)

Source: output of summarize_results.py (Liz Tseng)

Page 45: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

45

Negative Control: CD14 (should be highest in Total Bone Marrow)

Peak read count: 109

Peak read count: 6318

Peak read count: 48

Peak read count: 21

Page 46: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

46

Negative Control: CD34 (should be highest in Lineage Negative)

Peak read count: 169

Peak read count: 43

Peak read count: 386

Peak read count: 10

Page 47: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

47

Phage: B9 – only the phage (953 bp)

Peak read count: 10

Peak read count: 10

Peak read count: 10

Peak read count: 10

Page 48: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

48

Peak read count: 10

Peak read count: 16

Peak read count: 10

Peak read count: 10

Phage: B9 10x larger region (~9kb) centered on phage evidence

Page 49: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

49

Phage 14-10: 100% identity and alignment to 19 full length read of inserts

Page 50: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

50

Phage 14-10: 100% aligned to CTSD, 2 possibly 3 splice variants in lineage negative cell population – structure fully resolved

Page 51: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

51

Conclusions:• Full Length Transcript discovery is achieved with Pacific Biosystems RS

sequencer, using size selection in library preparation prior to sequencing and Reads Of Insert algorithm

• Even before the release of the ReadsOfInsert approach, the subreads that are available as a result of the sequencing still had the ability to tell you the structure of the complete transcript.

• With an error rate of 15%, seemingly daunting, the random nature of the error and the length of the read provided the complete structure in a way that no short read second generation sequence could.

• When one is searching for the complete structure, perfection in the parts is of no consequence

• NO ASSEMBLY is REQUIRED

Page 52: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

52

Next Steps:

1. Compete the reads of insert approach with 75% accuracy and minimum 1 pass

2. Identify additional full length structure (if possible with the sample reads)3. Write up the results4. (next paper) If no additional phage found, sequence an enriched

population with confirmed phage evidence at full length with more another pacific bio sequencing

5. Use illumina reads to correct for errors and recover more reads6. Use greater pac bio sequencing depth

Page 53: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory 53

AcknowledgementsDr. Anton Wellstein

Dr. Anna Riegel

Dr. Elena TassiDr. Marcel SchmidtThe entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Khaled, Eric, Nitya, the entire Wellstein/Riegel laboratory

My Committee Dr. Yuri GusevDr. Anatoly DritschiloDr. Michael JohnsonDr. Christopher LoffredoDr. Habtom RessomDr. Terry Ryan (external committee member)

Robert Sebra, Mt. Sinai PacBio SequencingLiz, Tseng, Pacific BiosystemsEric Schadt, Mt. Sinai PacBio SequencingBrian Haas, Author Trinity Suite`

Page 54: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

54

CD11: New Evidence of an Exon From all Samples, confirmed by PacBio

Peak read count: 16

Peak read count: 1925

Peak read count: 639

Peak read count: 121

Page 55: RNA Sequencing for Full Length Transcript Discovery

04/11/2023 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007

55

PASA assembly (Trinity Pipeline) Denovo + Genome Guided

Evidence of a new exon – not found in annotation for CD11