hertweck bbl2012

Post on 10-May-2015

477 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Annual presentation to NESCent scientists in Dec 2012 about my postdoctoral research projects.

TRANSCRIPT

Genome-wide effects of transposable element evolution

Kate L HertweckNational Evolutionary Synthesis Center (NESCent)

● Teaching half time for Duke Bio 202 (genetics and evolution)

● Responsible for one lab section, lab development, and lecturing

● Interesting integration of Duke course with Coursera next semester

But first...a teaching interlude

Overview

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

What is in a genome?

● The first step in analyzing genomes is usually to mask or filter repetitive sequences, which often comprise a large portion of the nuclear genome

● Repetitive sequences include satellites, telomeres, and other “junk” DNA elements

● “Selfish” DNA (or mobile genetic elements) is a category of repetitive sequences representing transposable elements (parasitic self-replicating derived from viruses)

● Growing evidence (including ENCODE) supports that “junk” DNA contains essential function and provides material for evolutionary innovation

TEs Asparagales Drosophila

Class I: RetrotransposonsLTRLINESINEERVSVA

Class II: DNA transposonsTIRCryptonHelitronMaverick

www.virtualsciencefair.org

TEs directly affect organisms as they move throughout a genome

Kate Hertweck, Genomic effects of repetitive DNA

● TEs interact with genes

● TE insertion within a gene disrupts function

● Exaptation of TEs into genes: Alu elements contributed to evolution of three color vision (Dulai, 1999)

● Gene expression and regulatory changes

● TEs affect molecular evolution

● Indels

● increased recombination (chromosomal restructuring)

● Links between TEs and adaptation/speciation

Kate Hertweck, NESCent, Genomic effects of junk DNATEs Asparagales Drosophila

TEs indirectly affect organisms through changes in genome size

Changes in overall genome size

Physical-mechanical effects of nuclear size and mass

Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, ecology, growth form)

TEs Asparagales Drosophila

Research questions and goals

● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?

● Patterns in genome size change

● Proliferation of TEs within lineages

Evolutionnews.org

TEs Asparagales Drosophila

Research questions and goals

● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?

● Patterns in genome size change

● Proliferation of TEs within lineages

Evolutionnews.org

● Do genomic patterns correlate with changes in life history?

● Improving methods for comparative genomics across broad taxonomic levels

● Application of phylogenetic comparative methods to genomic data

TEs Asparagales Drosophila

Overview

Collaborators:J. Chris Pires and lab (U of Missouri)Patrick EdgerDustin Mayfield

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

Genomic evolution in Asparagales

● Many edible species (onion, asparagus, agave) and ornamentals (orchid, amaryllis, yucca)

● Lots of variation in life history traits: physiology, growth habit, habitat

● Interesting patterns of genomic evolution● Wide variation genome size● Bimodal karyotypes

● Despite possessing some of the largest angiosperm genomes, we know little about the TEs in Asparagales

● Possibility to test hypotheses of correlations between genomic changes and life history traits

ag.arizona.edu Naturehills.com

TEs Asparagales Drosophila

TEs Asparagales Drosophila

Our data

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS (Genome Survey Sequences): total genomic DNA!

● Data originally collected for systematics

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● Well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● Discovery of novel repeats

TEs Asparagales Drosophila

Our data

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS (Genome Survey Sequences): total genomic DNA!

● Data originally collected for systematics

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● Well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● Discovery of novel repeats

● Is there a way to characterize repeats when the genome

is a big black box?

TEs Asparagales Drosophila

Bioinformatics approach

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

TEs Asparagales Drosophila

Bioinformatics approach

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

● Annotation method:

Motif searching

● Reference library

TEs Asparagales Drosophila

Bioinformatics approach

Sidenote: improving the ontology for transposable elements (classification and annotation)Sequence Ontology (SO)Comparative Data Analysis Ontology (CDAO)

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

● Annotation method:

Motif searching

● Reference library

TEs Asparagales Drosophila

Pipeline

TEs Asparagales Drosophila

Raw fastq files

De novo genome assembly (MSR-CA)

Filter out scaffolds that BLAST to reference organellar genomes

Map raw reads back to scaffolds to estimate relative proportion of TE

Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )

Discard unknown scaffolds and “unimportant” repeats, categorize others by type

Scripts available on GitHub:AsparagalesTEscripts

Pipeline

TEs Asparagales Drosophila

Raw fastq files

De novo genome assembly (MSR-CA)

Filter out scaffolds that BLAST to reference organellar genomes

Map raw reads back to scaffolds to estimate relative proportion of TE

Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )

Discard unknown scaffolds and “unimportant” repeats, categorize others by type

Scripts available on GitHub:AsparagalesTEscripts

Quality control: Poaceae

● Largest scaffolds with deepest coverage are from the chloroplast and mitochondrial genomes, but are easily identified for exclusion

● All relevant classes of repeats are present in scaffolds from a single genome

● Even long repeats can be reconstructed into a single scaffold

● Characterization of repeats is not dependent on sequence coverage

● Estimates of quantity repeats are not very accurate-- but there is little consensus of TE quantification in published literature!

● Decision: use a dataset constructed from similar data and analyzed in the same pipeline so any error is systematic and shared among all taxa

● How well do these methods work for non-model systems?

TEs Asparagales Drosophila

Example: LTR from Hosta

● Reads map across scaffold: assembly is reliable● Some divergence in reads: measure of diversity?

TEs Asparagales Drosophila

REs in Core Asparagales

TEs Asparagales Drosophila

Genome size varies among core Asparagales

TEs Asparagales Drosophila

0

5

10

15

20

25

Genome size (Gb)#reads (billions)

Number of scaffolds varies among taxa

TEs Asparagales Drosophila

0

500

1000

1500

2000

2500

3000

Total scaffoldsNuclear scaffolds

Proportion of TEs varies among taxa

TEs Asparagales Drosophila

0

10

20

30

40

50

60

other (RC, satellite, low complexity, simple repeats)% Copia LTRs% Gypsy LTRs% LINEs% DNA TEs

Very large genomes in Core Asparagales

TEs Asparagales Drosophila

Small genomes contain variation

TEs Asparagales Drosophila

Developing genomic traits for comparative biology

TEs Asparagales Drosophila

● Genomic traits can be treated just like any other phenotype

• Number of gene copies of a single family

• Genome size, intron size, GC content, number of chromosomes, polyploidy, karyotype (sex chromosomes)

• Sometimes genomic traits evolve in such a way that models need to be altered to accommodate their variation

● We finally have enough information to be able to apply these methods across robust phylogenies of organisms!

● What about transposable elements?

So what?● You can peek into the black box of large plant genomes with even very

limited genomic sequence data

● There is a great deal of variation in TE compliments among closely related plant species

● These methods can easily be applied to extant datasets to summarize TEs

TEs Asparagales Drosophila

So what?● Data available for most plants are low coverage, with little known about

the TEs present and their direct effects on the genome and organism

● Plant genomes tolerate more plasticity than animal genomes

• Polyploidy, chromosomal restructuring more common in plants

• Repetitive compliment comprises a higher proportion of plant genomes

• Differences in gene silencing

● Pretty plants are great, but what if we want a more applied approach?

TEs Asparagales Drosophila

Overview

Collaborators:Joseph Graves (UNCG, NC A&T)Michael Rose (UC Irvine)Mira Han (NESCent)

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

Genomics of aging

● Aging as “detuning” of adaptation

● Age-related genes and expression patterns

● Does the movement of TEs throughout a genome correspond to how long an organism lives?

● Previously discussed life history traits only involve TE proliferation in gametic tissue

● Questions about aging involve changes in organisms throughout lifespan, especially if results can be transferred to human research

TEs Asparagales Drosophila

Experimental data● Replicate populations of fruit flies selected for both short and long life

spans (Burke et al 2010)

● Next-gen sequencing of pooled populations● SNP analysis indicates allele frequency changes at many loci, but

little evidence for selective sweeps● Extensive gene expression change

TEs Asparagales Drosophila

Experimental approach

FBMITELINELTRTIR

● Does the frequency of a TE differ between control and treatment populations?

● Are there patterns consistent with type of TE● T-lex: perl script for identifying presence and absence of annotated

transposable elements

● 2947 transposable elements from publicly available genome sequence

TEs Asparagales Drosophila

Scripts available on GitHub:flyTEscripts

Preliminary results

● Controls and populations selected for shorter lifespan

● All population pairs are statistically the same (Kruskal-Wallis, p=0.9414)

TEs Asparagales Drosophila

1 2 3 4 50

100

200

300

400

500

600

700

NA0100final

population

num

be

r o

f TE

s

Preliminary results

TEs Asparagales Drosophila

● Controls and populations selected for shorter lifespan

● 153 TEs vary in one or more population

● 70 TEs vary in all five populations

● some TE frequencies move to fixation

Finishing the job...

● What are patterns from other population pairs (selection for longer lifespan)?

● Formal statistical testing for variation

● Where are TEs of interest located in the genome? What genes are located nearby?

● T-lex de novo: searching for unannotated insertions

– Are there unique TE insertions related to longer life spans?

TEs Asparagales Drosophila

Conclusions

● What are general patterns of TE evolution?

● Different TEs contribute to genome size obesity.● We still need better methods to compare genomes.

● Are there common patterns between TEs and life history trait evolution?

● Yes, very specific insertions, at least in Drosophila.● How can comparative methods be appropriated for genomic

characeristics?● Does TE proliferation contribute to diversification or shifts in rates of

molecular evolution?

● We are getting closer to possessing enough data to answer these questions.

TEs Asparagales Drosophila

Conclusions

● There are many interesting questions to be investigated using other folks' genomic trash!

● A little sequencing data can tell you a lot about a genome.

● Many markers for systematic purposes ● You can characterize major groups of repeats even in the absence

of a robust reference library for the species.● Informatics tools and resources abound!

TEs Asparagales Drosophila

Acknowledgements

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, TE ontology

NESCent (National Evolutionary Synthesis Center)Allen RoderigoKaren Cranston (and bioinformatics group!)

www.nescent.org

k8hert.blogspot.com

Find me:Twitter @k8hertGoogle+ k8hertweck@gmail.com

top related