the drosophila gene collection mark stapleton berkeley drosophila genome project lawrence berkeley...

14
The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Upload: liliana-wilkinson

Post on 04-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

TheDrosophila

Gene Collection

Mark StapletonBerkeley Drosophila Genome Project Lawrence Berkeley National Lab

Page 2: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Mature protein-coding transcript features

Start codon Stop codon

Transcription Start Poly (A) Signal

Page 3: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Generate High Quality cDNA libraries- head, 0-22hr embryo, larval/pupal, S2 cell line, testes, ovary

Random sample end sequence~ 80k 5’ ESTs (Science: ‘00)

~180k 5’ ESTs (Gen Res: ’02)

Clustering, Full-length sequence and analyze - Inter Se and utilizing genome sequence (Gen Res: ’02, Gen Bio: ’02)

EST / cDNA Project

Annotation Experiments

Page 4: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

cDNA library methodology

Start codon Stop codon

Transcription Start Poly (A) Signal

Page 5: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

cDNA library technologies

1) Ling’s libraries (Rubin lab)

* From embryo, head, larval/pupal, S2, ovary, and testes.* “Vanilla” libraries using oligo dT primed Stratagene kit.

2) Carninci libraries (RIKEN)

* From embryo and head tissues.* Cap-trapped, oligo dT primed, trehalose-stabilized RT.

3) BDGP libraries

* From whole adult.

Page 6: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Advantages/disadvantages of each method

Ling’s libraries not enriched for full-length, but sampled from many tissuesand exist as plasmid libraries.

RIKEN libraries were Cap-trapped, but contain many SNPs due to the conditions used for 1st and 2nd strand synthesis.Only as phagemid libraries.

RLM method has only one library made so far, holds great promise….But it has the potential of RNA ligating to incompletely de-PO4 transcripts.

Page 7: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Assessment of new Adult library compared to cap-trapped Riken Head library

Page 8: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Rate of diminishing returns for thenormalized Riken embryonic cDNA library

1%

Page 9: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

SLIP - Self Ligation of Inverse PCR products

Summary• Attempts 3,829 • Recovered 2,047 • Success rate 53%

Advantages over RT-PCR• Captures 5’ and 3’ UTRs• Captures splice variants• Extends predictions

Hoskins et al., (2005) NAR 33(21):e185 Wan et al., (2006) Nat Proto 1:624

Page 10: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

cDNAs Sequencing Corrects Gene Models

Extends gene model at both 5’ and 3’ ends

Merges three separate gene models

Page 11: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

LD (0-22hr embryo) 35,257GM (Ovaries) 13,570HL (Adult head) 3,293GH (Adult head) 21,059LP (Mixed larval/pupal) 14,976SD (S2 cells) 20,154AT,UT (testes library) 23,294RE (0-22hr embryo) 61,181 RH (Adult head) 55,816

TA (Adult) 871

Total 249,471

Libraries and 5’ ESTs Full-length sequenced

12,581 from random approach representing 9,423 genes.

3,064 from directed SLIP approach representing 1,813 genes.

Represents ~ 75% of the 14,549 predicted genes.

~ Half of the remaining 25% are in process, which leaves ~1,500 genes.

Page 12: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Towards completion of the DGC

RACE to define the ends of ORF-short transcripts followed by RT-PCR.

Generate cDNA libraries from complex tissues: total disc and total adult.

Perform SLIP-directed screens on new libraries.

Page 13: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Purpose of the DGC

Page 14: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab