the drosophila gene collection mark stapleton berkeley drosophila genome project lawrence berkeley...
TRANSCRIPT
TheDrosophila
Gene Collection
Mark StapletonBerkeley Drosophila Genome Project Lawrence Berkeley National Lab
Mature protein-coding transcript features
Start codon Stop codon
Transcription Start Poly (A) Signal
Generate High Quality cDNA libraries- head, 0-22hr embryo, larval/pupal, S2 cell line, testes, ovary
Random sample end sequence~ 80k 5’ ESTs (Science: ‘00)
~180k 5’ ESTs (Gen Res: ’02)
Clustering, Full-length sequence and analyze - Inter Se and utilizing genome sequence (Gen Res: ’02, Gen Bio: ’02)
EST / cDNA Project
Annotation Experiments
cDNA library methodology
Start codon Stop codon
Transcription Start Poly (A) Signal
cDNA library technologies
1) Ling’s libraries (Rubin lab)
* From embryo, head, larval/pupal, S2, ovary, and testes.* “Vanilla” libraries using oligo dT primed Stratagene kit.
2) Carninci libraries (RIKEN)
* From embryo and head tissues.* Cap-trapped, oligo dT primed, trehalose-stabilized RT.
3) BDGP libraries
* From whole adult.
Advantages/disadvantages of each method
Ling’s libraries not enriched for full-length, but sampled from many tissuesand exist as plasmid libraries.
RIKEN libraries were Cap-trapped, but contain many SNPs due to the conditions used for 1st and 2nd strand synthesis.Only as phagemid libraries.
RLM method has only one library made so far, holds great promise….But it has the potential of RNA ligating to incompletely de-PO4 transcripts.
Assessment of new Adult library compared to cap-trapped Riken Head library
Rate of diminishing returns for thenormalized Riken embryonic cDNA library
1%
SLIP - Self Ligation of Inverse PCR products
Summary• Attempts 3,829 • Recovered 2,047 • Success rate 53%
Advantages over RT-PCR• Captures 5’ and 3’ UTRs• Captures splice variants• Extends predictions
Hoskins et al., (2005) NAR 33(21):e185 Wan et al., (2006) Nat Proto 1:624
cDNAs Sequencing Corrects Gene Models
Extends gene model at both 5’ and 3’ ends
Merges three separate gene models
LD (0-22hr embryo) 35,257GM (Ovaries) 13,570HL (Adult head) 3,293GH (Adult head) 21,059LP (Mixed larval/pupal) 14,976SD (S2 cells) 20,154AT,UT (testes library) 23,294RE (0-22hr embryo) 61,181 RH (Adult head) 55,816
TA (Adult) 871
Total 249,471
Libraries and 5’ ESTs Full-length sequenced
12,581 from random approach representing 9,423 genes.
3,064 from directed SLIP approach representing 1,813 genes.
Represents ~ 75% of the 14,549 predicted genes.
~ Half of the remaining 25% are in process, which leaves ~1,500 genes.
Towards completion of the DGC
RACE to define the ends of ORF-short transcripts followed by RT-PCR.
Generate cDNA libraries from complex tissues: total disc and total adult.
Perform SLIP-directed screens on new libraries.
Purpose of the DGC