1. the genome of d. pseudoobscura was completed by the baylor hgsc in 2005, and while their...

12
1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements (about 900 were estimated), it also suggested that the comparison supports ~2,000 new genes, plus refinement of thousands others. Thus it is clear that there are many more genes in the fly genome, perhaps ~15,000. 2. Their analyses of Ka/Ks ratios indicated relatively few instances of genes with signatures of positive selection, but even this congeneric comparison might be too distant to find these. 3. An interesting analysis was a summary of the kinds of change seen in different parts of a “typical/averaged” gene, taken across all ~10,000 confident 1:1 orthologs identified. Focus on the top of the green band to understand it - this is effectively the percent nucleotide identity. C. 50bp before transcript start site. D. Entire 5’ UTR aligned at TS. E. Entire 5’UTR aligned at ATG. F. 5’ end of first exon aligned at ATG. G. 3’ end of coding exons aligned at intron donor site. H. Intron aligned at donor site. I. Intron aligned at acceptor site. J. 5’ end of internal coding exons aligned at IB404 - 11 - D. melanogaster 3 - 22 Fe

Upload: patience-nelson

Post on 11-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements (about 900 were estimated), it also suggested that the comparison supports ~2,000 new genes, plus refinement of thousands others. Thus it is clear that there are many more genes in the fly genome, perhaps ~15,000.2. Their analyses of Ka/Ks ratios indicated relatively few instances of genes with signatures of positive selection, but even this congeneric comparison might be too distant to find these.3. An interesting analysis was a summary of the kinds of change seen in different parts of a “typical/averaged” gene, taken across all ~10,000 confident 1:1 orthologs identified. Focus on the top of the green band to understand it - this is effectively the percent nucleotide identity.

C. 50bp before transcript start site.D. Entire 5’ UTR aligned at TS.E. Entire 5’UTR aligned at ATG.F. 5’ end of first exon aligned at ATG.G. 3’ end of coding exons aligned at intron donor site.H. Intron aligned at donor site.I. Intron aligned at acceptor site.J. 5’ end of internal coding exons aligned at intron acceptor site.K. 3’ end of coding region aligned at STOP codon.L. 3’ UTR aligned at STOP.M-O 3’ UTR and flanking DNA.P. Genome-wide average.

IB404 - 11 - D. melanogaster 3 - 22 Feb

Page 2: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

12 Drosophila in 2007. To refine these analyses along the lines of the ~10 yeast species comparisons, an additional 10 Drosophila genomes were sequenced. They range from very close sibling species, like D. simulans (also cosmopolitan) and D. sechellia (restricted to one fruit on the Seychelles islands in Indian Ocean), all the way out to three species in the sister subgenus, confusingly called Drosophila, including D. grimshawi, a representative of the ~800 Hawaiian species. It is worthwhile to learn most of these species names, as they are intensively studied.

Page 3: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

Cosmopolitan human commensalOut-of-Africa generalist

Specialist in Seychelles islands

Cosmopolitan human commensalOut-of-Africa generalist

Afro-tropical savannah generalist

West African tropical specialist

Pan-tropical species - Japanese

Western USA, sibling species favorites of Theodosius Dobzhansky for microadaptation

SW desert specialist on cactus

South/Central American generalist

Larger dark flies, from Asia, but now cosmopolitan with humans

Representative of the ~800 picture-winged hawaiian flies

Page 4: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

These genomes are about the same size (~200 Mbp) and contain similar numbers of genes, around 15,000-20,000. The analyses were on patterns of changes. The simplest is that there are around 7,000 1:1 single-copy orthologs and around 5,000 conserved homologs. The remaining genes are more patchily distributed across species or too rapidly evolving to show convincing similarities (like my chemoreceptors, of which each has a couple hundred). Note that the category of “lineage-specific” genes in D. melanogaster is tiny, because the initial annotation was conservative, while the “patchy homologues” is absent because all comparisons are with D. mel.

Page 5: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

Recall that the comparison of D. melanogaster with D. pseudoobscura indicated that there had been many hundreds of rearrangements between their chromosomes. Here they show this progression with increasing phylogenetic distance. These are “synteny” plots comparing two chromosome arms (the left and right arms of chromosome 2 in D. mel., called Muller elements B and C to make them equivalent across all species, which have different numbering conventions). Amazingly these are almost all intra-chromosomal rearrangements (mostly inversions), with precious few inter-chromosomal rearrangements (transpositions and translocations - presumably because they are more disruptive). Even inversions crossing a centromere are rare (blue lines).

Page 6: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

They attempted to look for positively selected genes using the phylogeny and multiple species comparisons to improve power of analysis. However there are two major caveats. First, these species evolve too rapidly to include all of them, so only the six melanogaster group species with D. ananassae as root could be used. Second, few genes actually showed Ka/Ks ratios above 1, with, of course, the vast majority being far less than 1. This is because even if there are some positively selected amino acid changes in a protein, they will be swamped by all the other conservative positions.This figure sorts the 8,510 1:1 orthologs in these six species by GO category, showing more rapid evolution in “Defense response” at the top, as well as “Unknown” and “Other biological process”. Presumably the latter include many environmentally relevant genes that we don’t yet know much about.

Page 7: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

By undertaking detailed studies of the patterns of conservation and divergence across subsets or all of these species they could identify evolutionary signatures allowing recognition of refined features of these genomes, using again D. melanogaster as reference:1. ~150 new protein genes recognized by patterns of synonymous codon positions changing while non-synonymous codon positions generally stay the same, prevalence of conservative amino acid changes (e.g. I-L-M), plus conservation of reading frame (indels are multiples of three).2. ~500 new exons of existing genes, especially alternatively-spliced and small exons.3. ~300 spurious gene predictions in D. melanogaster with the reverse kind of evidence, that is, frameshifting indels in other species and changes in all three codon positions equally frequent.4. Many unusual gene structures, the most remarkable being ~150 instances of conserved codons after a stop codon, which they show result from read-through of some stop codons.5. ~300 new candidate non-coding RNA genes, by virtue of conservation of folding of the RNA.6. 35 new microRNAs to add to the 75 known, by conserved hairpin structures.7. Many pre-transcriptional regulatory motifs (enhancers and silencers) with high confidence.8. Post-transcriptional regulatory motifs, e.g. miRNA-binding regions in 3’ UTRs of transcripts.

Page 8: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

Here are actual examples of A. protein coding exon conservation or lack thereof, B. conservation of base-pairing in folded portions of a non-coding RNA, C. similarly for microRNA hairpin.

Page 9: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

Scaling of comparative genomics power. Finally, given such huge sets of analyses with 12 species of varying phylogenetic distance, they were able to ask which types of comparisons work best for which features of genomes. The three plots below attempt to ask how pairwise and multiple species comparisons of different phylogenetic distance or total tree branch length, respectively, yield known ncRNAs, miRNAs, and regulatory motifs. Clearly pairwise comparisons in blue are less powerful than multiple species comparisons, more so for ncRNAs and less so for miRNAs (apparently because the latter are so highly conserved they are easier to find). Notice that very close comparisons are essentially useless for motif discovery, presumably because there are not enough background changes for conserved motifs to stand out against. They also looked at exon discovery and find that for long exons even close pairwise comparisons are fine, but for short exons distant and multiple comparisons are needed. 9 more sequenced today.

Page 10: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

Flies versus vertebrates. These two schematics show the levels of molecular divergence between these 12 fly species, compared with vertebrates back to fish. The measure is essentially synonymous changes, which are presumably evolving close to neutral rates. The top line is pairwise comparisons, showing that comparisons across just the melanogaster species subgroup are half the distance of all placental mammals, to ananassae is equivalent to placental versus marsupial or even monotreme, while across the two subgenera are similar to all tetrapods. Going out to the next fly genus would be equivalent to all vertebrates, which is roughly 50 Myr versus 500 Myr, thus flies evolved molecularly ~10X faster, as indicated in that schematic at the end of the C. elegans 2 lecture.

The second line is multi-species comparisons, showing that including all 12 flies is like including all 20 species of mammals out to the monotremes. Taking this further in the next two lectures, comparing orders of endopterygote or metamorphosing insects is equivalent to going through all chordates and indeed all deuterostomes. Thus insects evolve roughly 10X faster molecularly.

Page 11: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

modENCODE 2010. An exhaustive effort was made to confirm and extend these inferences by experimental work, called the modENCODE project for Model Organism Encyclopedia of DNA Elements, following a similar study of 1% of the human genome. It was done for both D. melanogaster and C. elegans. An exhaustive study of chromatin structures, plus epigenetics and histone modifications, DNA replication, and RNA transcription. It succeeded in tripling the number of nucleotides in the genome for which a role is known, largely by adding regulatory regions of various kinds. Just as a few examples, they added another ~2000 protein-coding and non-coding genes, modified or added ~53,000 exons in existing gene models, recognized ~2000 non-coding transcripts that might represent additional genes, and extensively documented all the short- and micro-RNAs being produced from this genome. Because the functions of these non-coding transcripts are seldom known, they are commonly called the “dark matter” of genomes.

Page 12: 1. The genome of D. pseudoobscura was completed by the Baylor HGSC in 2005, and while their manuscript focused on analyses of chromosome rearrangements

modENCODE totals.Bars show contribution of sub-regions of the genome; the line above showing cumulative coverage of the genome. While coding exons (left) occupy 23% of the unique non-repetitive regions of the genome (blue), and 34% of the region of the genome conserved across these 12 Drosophila species and mosquitoes (red), when you add in all the other regions they document, including 5’ and 3’ UTRs, non-coding RNAs, TF and other protein-bound regions, chromatin domains, and introns, 90% of genome is accounted for!