hertweck bbl2012

42
Genome-wide effects of transposable element evolution Kate L Hertweck National Evolutionary Synthesis Center (NESCent)

Upload: kate-hertweck

Post on 10-May-2015

477 views

Category:

Documents


0 download

DESCRIPTION

Annual presentation to NESCent scientists in Dec 2012 about my postdoctoral research projects.

TRANSCRIPT

Page 1: Hertweck bbl2012

Genome-wide effects of transposable element evolution

Kate L HertweckNational Evolutionary Synthesis Center (NESCent)

Page 2: Hertweck bbl2012

● Teaching half time for Duke Bio 202 (genetics and evolution)

● Responsible for one lab section, lab development, and lecturing

● Interesting integration of Duke course with Coursera next semester

But first...a teaching interlude

Page 3: Hertweck bbl2012

Overview

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

Page 4: Hertweck bbl2012

What is in a genome?

● The first step in analyzing genomes is usually to mask or filter repetitive sequences, which often comprise a large portion of the nuclear genome

● Repetitive sequences include satellites, telomeres, and other “junk” DNA elements

● “Selfish” DNA (or mobile genetic elements) is a category of repetitive sequences representing transposable elements (parasitic self-replicating derived from viruses)

● Growing evidence (including ENCODE) supports that “junk” DNA contains essential function and provides material for evolutionary innovation

TEs Asparagales Drosophila

Class I: RetrotransposonsLTRLINESINEERVSVA

Class II: DNA transposonsTIRCryptonHelitronMaverick

www.virtualsciencefair.org

Page 5: Hertweck bbl2012

TEs directly affect organisms as they move throughout a genome

Kate Hertweck, Genomic effects of repetitive DNA

● TEs interact with genes

● TE insertion within a gene disrupts function

● Exaptation of TEs into genes: Alu elements contributed to evolution of three color vision (Dulai, 1999)

● Gene expression and regulatory changes

● TEs affect molecular evolution

● Indels

● increased recombination (chromosomal restructuring)

● Links between TEs and adaptation/speciation

Kate Hertweck, NESCent, Genomic effects of junk DNATEs Asparagales Drosophila

Page 6: Hertweck bbl2012

TEs indirectly affect organisms through changes in genome size

Changes in overall genome size

Physical-mechanical effects of nuclear size and mass

Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, ecology, growth form)

TEs Asparagales Drosophila

Page 7: Hertweck bbl2012

Research questions and goals

● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?

● Patterns in genome size change

● Proliferation of TEs within lineages

Evolutionnews.org

TEs Asparagales Drosophila

Page 8: Hertweck bbl2012

Research questions and goals

● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms?

● Patterns in genome size change

● Proliferation of TEs within lineages

Evolutionnews.org

● Do genomic patterns correlate with changes in life history?

● Improving methods for comparative genomics across broad taxonomic levels

● Application of phylogenetic comparative methods to genomic data

TEs Asparagales Drosophila

Page 9: Hertweck bbl2012

Overview

Collaborators:J. Chris Pires and lab (U of Missouri)Patrick EdgerDustin Mayfield

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

Page 10: Hertweck bbl2012

Genomic evolution in Asparagales

● Many edible species (onion, asparagus, agave) and ornamentals (orchid, amaryllis, yucca)

● Lots of variation in life history traits: physiology, growth habit, habitat

● Interesting patterns of genomic evolution● Wide variation genome size● Bimodal karyotypes

● Despite possessing some of the largest angiosperm genomes, we know little about the TEs in Asparagales

● Possibility to test hypotheses of correlations between genomic changes and life history traits

ag.arizona.edu Naturehills.com

TEs Asparagales Drosophila

Page 11: Hertweck bbl2012

TEs Asparagales Drosophila

Page 15: Hertweck bbl2012

Our data

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS (Genome Survey Sequences): total genomic DNA!

● Data originally collected for systematics

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● Well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● Discovery of novel repeats

TEs Asparagales Drosophila

Page 16: Hertweck bbl2012

Our data

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS (Genome Survey Sequences): total genomic DNA!

● Data originally collected for systematics

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012)

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● Well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● Discovery of novel repeats

● Is there a way to characterize repeats when the genome

is a big black box?

TEs Asparagales Drosophila

Page 17: Hertweck bbl2012

Bioinformatics approach

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

TEs Asparagales Drosophila

Page 18: Hertweck bbl2012

Bioinformatics approach

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

● Annotation method:

Motif searching

● Reference library

TEs Asparagales Drosophila

Page 19: Hertweck bbl2012

Bioinformatics approach

Sidenote: improving the ontology for transposable elements (classification and annotation)Sequence Ontology (SO)Comparative Data Analysis Ontology (CDAO)

● Sequence assembly:

● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences

● De novo sequence assembly: standard genome assembly methods, screen resulting contigs

● Annotation method:

Motif searching

● Reference library

TEs Asparagales Drosophila

Page 20: Hertweck bbl2012

Pipeline

TEs Asparagales Drosophila

Raw fastq files

De novo genome assembly (MSR-CA)

Filter out scaffolds that BLAST to reference organellar genomes

Map raw reads back to scaffolds to estimate relative proportion of TE

Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )

Discard unknown scaffolds and “unimportant” repeats, categorize others by type

Scripts available on GitHub:AsparagalesTEscripts

Page 21: Hertweck bbl2012

Pipeline

TEs Asparagales Drosophila

Raw fastq files

De novo genome assembly (MSR-CA)

Filter out scaffolds that BLAST to reference organellar genomes

Map raw reads back to scaffolds to estimate relative proportion of TE

Run RepeatMasker to identify similarity to known repeats(3110 repeats, 98.7% are from grasses )

Discard unknown scaffolds and “unimportant” repeats, categorize others by type

Scripts available on GitHub:AsparagalesTEscripts

Page 22: Hertweck bbl2012

Quality control: Poaceae

● Largest scaffolds with deepest coverage are from the chloroplast and mitochondrial genomes, but are easily identified for exclusion

● All relevant classes of repeats are present in scaffolds from a single genome

● Even long repeats can be reconstructed into a single scaffold

● Characterization of repeats is not dependent on sequence coverage

● Estimates of quantity repeats are not very accurate-- but there is little consensus of TE quantification in published literature!

● Decision: use a dataset constructed from similar data and analyzed in the same pipeline so any error is systematic and shared among all taxa

● How well do these methods work for non-model systems?

TEs Asparagales Drosophila

Page 23: Hertweck bbl2012

Example: LTR from Hosta

● Reads map across scaffold: assembly is reliable● Some divergence in reads: measure of diversity?

TEs Asparagales Drosophila

Page 24: Hertweck bbl2012

REs in Core Asparagales

TEs Asparagales Drosophila

Page 25: Hertweck bbl2012

Genome size varies among core Asparagales

TEs Asparagales Drosophila

0

5

10

15

20

25

Genome size (Gb)#reads (billions)

Page 26: Hertweck bbl2012

Number of scaffolds varies among taxa

TEs Asparagales Drosophila

0

500

1000

1500

2000

2500

3000

Total scaffoldsNuclear scaffolds

Page 27: Hertweck bbl2012

Proportion of TEs varies among taxa

TEs Asparagales Drosophila

0

10

20

30

40

50

60

other (RC, satellite, low complexity, simple repeats)% Copia LTRs% Gypsy LTRs% LINEs% DNA TEs

Page 28: Hertweck bbl2012

Very large genomes in Core Asparagales

TEs Asparagales Drosophila

Page 29: Hertweck bbl2012

Small genomes contain variation

TEs Asparagales Drosophila

Page 30: Hertweck bbl2012

Developing genomic traits for comparative biology

TEs Asparagales Drosophila

● Genomic traits can be treated just like any other phenotype

• Number of gene copies of a single family

• Genome size, intron size, GC content, number of chromosomes, polyploidy, karyotype (sex chromosomes)

• Sometimes genomic traits evolve in such a way that models need to be altered to accommodate their variation

● We finally have enough information to be able to apply these methods across robust phylogenies of organisms!

● What about transposable elements?

Page 31: Hertweck bbl2012

So what?● You can peek into the black box of large plant genomes with even very

limited genomic sequence data

● There is a great deal of variation in TE compliments among closely related plant species

● These methods can easily be applied to extant datasets to summarize TEs

TEs Asparagales Drosophila

Page 32: Hertweck bbl2012

So what?● Data available for most plants are low coverage, with little known about

the TEs present and their direct effects on the genome and organism

● Plant genomes tolerate more plasticity than animal genomes

• Polyploidy, chromosomal restructuring more common in plants

• Repetitive compliment comprises a higher proportion of plant genomes

• Differences in gene silencing

● Pretty plants are great, but what if we want a more applied approach?

TEs Asparagales Drosophila

Page 33: Hertweck bbl2012

Overview

Collaborators:Joseph Graves (UNCG, NC A&T)Michael Rose (UC Irvine)Mira Han (NESCent)

1. Transposable elements as a model system

2. Genomic contributions to life history evolution in Asparagales

3. TEs and aging in Drosophila

Page 34: Hertweck bbl2012

Genomics of aging

● Aging as “detuning” of adaptation

● Age-related genes and expression patterns

● Does the movement of TEs throughout a genome correspond to how long an organism lives?

● Previously discussed life history traits only involve TE proliferation in gametic tissue

● Questions about aging involve changes in organisms throughout lifespan, especially if results can be transferred to human research

TEs Asparagales Drosophila

Page 35: Hertweck bbl2012

Experimental data● Replicate populations of fruit flies selected for both short and long life

spans (Burke et al 2010)

● Next-gen sequencing of pooled populations● SNP analysis indicates allele frequency changes at many loci, but

little evidence for selective sweeps● Extensive gene expression change

TEs Asparagales Drosophila

Page 36: Hertweck bbl2012

Experimental approach

FBMITELINELTRTIR

● Does the frequency of a TE differ between control and treatment populations?

● Are there patterns consistent with type of TE● T-lex: perl script for identifying presence and absence of annotated

transposable elements

● 2947 transposable elements from publicly available genome sequence

TEs Asparagales Drosophila

Scripts available on GitHub:flyTEscripts

Page 37: Hertweck bbl2012

Preliminary results

● Controls and populations selected for shorter lifespan

● All population pairs are statistically the same (Kruskal-Wallis, p=0.9414)

TEs Asparagales Drosophila

1 2 3 4 50

100

200

300

400

500

600

700

NA0100final

population

num

be

r o

f TE

s

Page 38: Hertweck bbl2012

Preliminary results

TEs Asparagales Drosophila

● Controls and populations selected for shorter lifespan

● 153 TEs vary in one or more population

● 70 TEs vary in all five populations

● some TE frequencies move to fixation

Page 39: Hertweck bbl2012

Finishing the job...

● What are patterns from other population pairs (selection for longer lifespan)?

● Formal statistical testing for variation

● Where are TEs of interest located in the genome? What genes are located nearby?

● T-lex de novo: searching for unannotated insertions

– Are there unique TE insertions related to longer life spans?

TEs Asparagales Drosophila

Page 40: Hertweck bbl2012

Conclusions

● What are general patterns of TE evolution?

● Different TEs contribute to genome size obesity.● We still need better methods to compare genomes.

● Are there common patterns between TEs and life history trait evolution?

● Yes, very specific insertions, at least in Drosophila.● How can comparative methods be appropriated for genomic

characeristics?● Does TE proliferation contribute to diversification or shifts in rates of

molecular evolution?

● We are getting closer to possessing enough data to answer these questions.

TEs Asparagales Drosophila

Page 41: Hertweck bbl2012

Conclusions

● There are many interesting questions to be investigated using other folks' genomic trash!

● A little sequencing data can tell you a lot about a genome.

● Many markers for systematic purposes ● You can characterize major groups of repeats even in the absence

of a robust reference library for the species.● Informatics tools and resources abound!

TEs Asparagales Drosophila

Page 42: Hertweck bbl2012

Acknowledgements

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, TE ontology

NESCent (National Evolutionary Synthesis Center)Allen RoderigoKaren Cranston (and bioinformatics group!)

www.nescent.org

k8hert.blogspot.com

Find me:Twitter @k8hertGoogle+ [email protected]