modencode august 20-21, 2007 drosophila transcriptome: aim 2.2
TRANSCRIPT
![Page 1: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/1.jpg)
modENCODEAugust 20-21, 2007
Drosophila Transcriptome: Aim 2.2
![Page 2: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/2.jpg)
Aim 2.2 Experimental Validationof Transcript Models
1. Experimental verification of selected splice sites in transcript models (short RT-PCR)
2. Mapping transcript ends using RACE
3. Screening cDNA libraries for transcripts
4. Recovering cDNA clones using long RT-PCR
5. High-throughput sequencing of small RNAs
6. Submitting sequence data to databases
7. Reviewing the transcriptome annotation
![Page 3: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/3.jpg)
Experiments at LBNL
Transcript EndsTSSs: 20,000 targeted 5’ RACE experiments poly-A: 1,000 targeted 3’ RACE experiments
Full-Length Transcript Structures6,000 cDNA screens and full-insert sequencing3,000 long RT-PCRs and full-insert sequencing
Small RNA Sequencing15 runs on on 454 Life Sciences deviceSize fractionate < 500 nt (larger range than Eric Lai)
![Page 4: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/4.jpg)
Mapping TSSs
• 5’ RLM-RACE is a simple, scalable method
• RLM primer replaces the 5’ CAP structure
• Gene specific primers are nested & near 5’ end
• Sequence 8 clones• Direct sequencing is also
proposed but is difficult• We are prioritizing
transcripts and tissues using our 5’ EST data
![Page 5: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/5.jpg)
TSSs: Slippery vs Discrete
head RACE productslarval RACE products
cDNAs
![Page 6: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/6.jpg)
Cap-Trapped 5’ ESTs Define Discrete…
![Page 7: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/7.jpg)
…and Slippery Transcripotion Start Sites
![Page 8: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/8.jpg)
How Many TSSs Does bowl Have?
![Page 9: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/9.jpg)
5’ RACE Plans
• Identify TSSs that are well mapped by 5’ EST data• Test RLM-RACE production protocol on 96 well
mapped TSSs to measure experimental success rate• Prioritize 5’ RACE experiments:
1. Transcripts with < 8 RE ESTs, using mixed embryo RNA2. Transcripts with ESTs from other embryo-derived libraries3. Transcripts with < 8 RH/TA ESTs4. Transcripts with larval/pupal ESTs5. Transcript without ESTs. Use appropriate RNA samples.
• Develop statistical description of “slipperiness”• Biological validation with microarrays & P elements
![Page 10: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/10.jpg)
Computationally predicted conserved exons validated by cDNA screening and sequencing
I. Gene modifications II. Identification of New Genes
![Page 11: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/11.jpg)
cDNA and Long RT-PCR Plans
• Identify all transcripts that are well defined by cDNA sequence- complete & spliced ORF, poly-A tail, (not necessarily a defined TSS)
• Identify targets for cDNA screening (DGC goals in parentheses)(Transcripts with a community cDNA but no BDGP cDNA)(Transcripts with truncated ORFs)(Alternative transcripts that encode alternative coding sequences)1. Conserved ORFs that failed on the first SLIP attempt: choose best RNA2. Transfrags & RACEfrags that are not captured in sequenced transcripts
• Identify targets for long RT-PCR- targets that fail in SLIP screening on the best RNA sample- RT-PCR is probably more sensitive than SLIP but seems limited to ~2 kb
• cDNA and RT-PCR design depends on Aim 1 & Aim 2.1 and should be an iterative process.
• Biological validation using integrated description of all data
![Page 12: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/12.jpg)
An Unannotated Transfrag
![Page 13: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/13.jpg)
A Relatively Rare Transript
CG31036: chordotonal neurons,lateral and head sensory neurons
![Page 14: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/14.jpg)
High Throughput Sequencing Plan
• Pyrosequence RNA samples on 454 Life Sciences device- consider alternative platforms, e.g. Solexa
• Select 15 target tissues for analysis• Define a transcript size range to target
- avoid redundancy with Eric Lai: < 50 bases vs 50-500 bases- consider avoiding tRNAs
• Align transcript sequences and integrate with models• Biological validation:
Compare to microarray dataConservation in other species, including structure for ncRNAsFunctional genomics in Aim 3
![Page 15: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/15.jpg)
Some Questions for Discussion
• How many genes & transcripts in Drosophila?
• How many genes with multiple transcripts? CDSs?
• Are these expressed in different cell types?
• Can we segregate them in different RNA samples to avoid mixed RACE, cDNA and RT-PCR products?
• How do we prioritize screening
• What will we miss?
• How do we know when we’re done?
![Page 16: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ed85503460f94be6419/html5/thumbnails/16.jpg)
Future Directions
• Do different promoter motifs correlate with “slipperiness”, tissue, stage?
• Confidence scores associated with exons, transcripts and gene models:How do we measure confidence?How confident can we be?How much data do we need per gene?