computational analysis of transposable element evolution...

27
Casey M. Bergman Faculty of Life Sciences University of Manchester [email protected] http://bioinf.manchester.ac.uk/bergman Computational analysis of transposable element evolution in Drosophila genomes.

Upload: others

Post on 14-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Casey M. Bergman

Faculty of Life SciencesUniversity of Manchester

[email protected]://bioinf.manchester.ac.uk/bergman

Computational analysis of transposable element evolution in Drosophila genomes.

Page 2: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Overview of Talk

• Demographic history of TEs in D. melanogaster

• TE population genomics using 454 sequencing

• Discovery and detection of TEs in genomes

• Abundance and distribution of TEs in Drosophila genomes

• Noncoding DNA (ncDNA) & transposable elements (TEs)

Page 3: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Higher organisms have ahigher proportion of ncDNA

Bacteria15 %

Yeast30 %

Page 4: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Higher organisms have ahigher proportion of ncDNA

Bacteria15 %

Yeast30 %

Human98 %

Fly75 %

Page 5: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

TEs are significant & variable part of ncDNA

Fly

chr2R:

Gap

SINELINELTRDNA

5440000 5450000 5460000 5470000 5480000 5490000 5500000 5510000 5520000 5530000 5540000 5550000 5560000 5570000Gap Locations

FlyBase Protein-Coding Genes

Repeating Elements by RepeatMasker

Mef2Mef2Mef2Mef2Mef2

Mef2

CG15863CG12130

CG1418CG12133

AdamCG12134

CG12134

eve TER94TER94

Pka-R2Pka-R2Pka-R2

CG12128CG12128

CG1407CG1407

CG12129

CG18445

CG2249CG2249

CG12918CG2264CG2264CG2264CG2264

Def

CG2264

CG2292

Human

chrX:

SINELINELTRDNA

152500000 152550000

UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics

Repeating Elements by RepeatMasker

ATP2B3ATP2B3

ATP2B3FAM58AFAM58A

DUSP9DUSP9

PNCKPNCK

Page 6: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

(A)n

DNA transposons (cut+paste)

RNA retrotransposons (copy+paste)

3 major types of transposable element (TE)

Terminal Inverted Repeat (TIR)

LINE-like (non-LTR)

Long Terminal Repeat (LTR)(A)n

Page 7: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Why is the discovery and detection of TE sequences in genomes important?

• Genome alignment

• Genome evolution

• Population genomics

• Genome organization

• Genome assembly

Page 8: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Discovery of new TE families

• Homology to TE proteins (e.g. transposase) => HMMer, tBLASTx

gagPBS

3’TSD 5’LTR 3’LTR 3’TSD

PPTpol env

Dmin ! (b3 – b5) ! Dmax

Lmin ! (e5 – b5), (e3 – b3) ! Lmax

b5 e5 b3 e5

• structural motifs (e.g. LTRs)=> LTRstruc, LTRharvest

• comparative genomics=> compTE

• dispersed repeats (all-by-all, k-mers)=> RECON, PILER, RepeatScout, ReAS

Reviewed in: Bergman & Quesneville (2007) Brief Bioinf. 8:382-92

Page 9: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

hmmall-by-allRECON

BLASTER

RepeatMasker

TBLASTX

RMBLR

Release 3

Release 4

Quesneville, Bergman et al. (2005) PLoS Comp. Biol. 1:e22

Detection of individual TE copies

Page 10: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Genomic TE distribution in D. melanogaster

~3% ~20%

genome-wideaverage ~5.5%

10

20

30

40

50

5 10 15 20

X# TEs per 50kb

~ centromere~ high-low rec.

Bergman, Quesneville et al. (2006) Genome Biology 7:R112

Page 11: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Duplication generates regions of high TE densityHDR 1

HDR 1

D. melanogaster

D. melanogaster

D.

yaku

ba

D. m

ela

no

gaste

r

10

20

30

40

50

5 10 15 20

X# TEs per 50kb

~ centromere~ high-low rec.

Bergman, Quesneville et al. (2006) Genome Biology 7:R112

Page 12: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Chimeric gene-TE transcripts create nested genes

genes

TEEST

Ptp61F

312ru

CG32320CG9168

FucTD CG9173

FBti0020016

LP01280

1340000 1360000 1380000 1400000

Chrom. Arm 3L

Lipatov, Lenkov, Petrov & Bergman (2005) BMC Biology 3:24

Page 13: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

D.mel D.sim D.sec D.yak D.ere D.ana D.pse D.per D.wil D.vir D.moj D.gri

Proport

ion T

E/R

epea

t in

Sca

ffold

s >

200 K

b

BLASTER-tx+Repbase-NoDros

BLASTER-tx+BDGP

BLASTER-tx+PILER

RepeatMasker+ReAS

RepeatRunner+PILER

CompTE

TE abundance in 12 Drosophila genomes

Clark, Eisen, Smith, Bergman, et al (2007) Nature 450:203-18

5.3%

2.7% 3.7%

12.0

%

24.9

%

6.9%

15.6

%

2.8%2.7%

8.5%

13.9

%

8.9%

Page 14: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

D.mel D.sim D.sec D.yak D.ere D.ana D.pse D.per D.wil D.vir D.moj D.gri

Proport

ion T

E C

lass

in S

caffold

s >

200 K

b

LTR

LINE

TIR

OTHER

Abundance of major TE types is conserved across genus Drosophila

non-LTR

LTR

TIR

Clark, Eisen, Smith, Bergman, et al (2007) Nature 450:203-18

Page 15: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Is the genomic distribution of TEs in D. melanogaster affected by historical activity?

Page 16: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

non-LTR retrotransposons exhibit a ‘pseudogene-like’ mode of sequence evolution

Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302

Alignment of paralogous TEs

Page 17: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

non-LTR retrotransposons exhibit a ‘pseudogene-like’ mode of sequence evolution

Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302

Page 18: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

D. mel - D. sim speciation

Bergman & Bensasson (2007) PNAS 104:11340-5

a_

inva

de

r2_

6

b_

mic

rop

ia_

4

c_

Ta

bo

r_3

d_

17

.6_

11

e_

Sta

lke

r_4

f_ro

ve

r_3

g_

fle

a_

16

h_

co

pia

_2

8

i_m

dg

3_

10

j_ro

o_

86

k_

Tra

nsp

ac_

4

l_o

pu

s_

16

m_

blo

od

_2

2

n_

41

2_

24

o_

Bu

rdo

ck_

13

p_

div

er_

9

q_

Tira

nt_

20

r_jo

cke

y2

_7

s_

He

len

a_

7

t_C

r1a

_3

6

u_

ba

gg

ins_

6

v_

G4

_1

0

w_

Do

c3

_7

x_

G5

_8

y_

BS

_1

5

z_

Ju

an

_9

zz_

Do

c_

53

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Div

erg

en

ce

(su

b/s

ite

)

0

1.80

3.60

5.41

7.21

9.01

10.81

Ag

e (

Mya

)

Retrotransposon demographics in D. melanogaster

Page 19: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Summary of retrotransposon demographicsinferred from intra-genomic comparisons

• LTR elements systematically younger than non-LTR elements

• Low frequency of LTR insertions may not be due to selection

• non-LTR families inserted in waves since speciation

• most LTR families inserted since colonization of non-African habitats

• LTR insertions not at transposition-selection equilibrium

Bergman & Bensasson (2007) PNAS 104:11340-5

Page 20: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

From evolutionary statics to dynamics: population genomics of TEs using 454 sequencing

Page 21: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Population genomics of TEs using 454 sequencing

Hybrid TE-unique reads“Unique Flank Tags”

Strain X

454 Reads

TEs

Page 22: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Population genomics of TEs using 454 sequencing

Hybrid TE-unique reads“Unique Flank Tags”

Strain X

454 Reads

TEs

KNOWN ✓ReferenceNEW !

Page 23: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Population genomics of TEs using 454 sequencing

TEs in reference sequence

Page 24: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Population genomics of TEs using 454 sequencing

TEs in reference sequence

Known INE-1 insertion present in NC and AF strains

Novel jockey insertion present in >1 strain

Page 25: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Preliminary findings using 454 sequencing

• 10 strains of D. melanogaster: 6 USA, 4 Malawi

• ~1/3 of INE-1 found in nature, so estimate ~3900 annotated TEs present in >=1 wild strain

• ~24% of annotated TEs found in >=1 wild strain (1300/5400)

• ~72% found in nature in low recomb. regions (950/1300)

• DNA transposons (~30%) found in nature more often than LTR/non-LTR retrotransposons (~10%)

• consistent with all TEs fixed in low recomb. regions plus few hundred segregating in high recomb. regions

Page 26: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Summary

• Mature methods exist for analysis of TEs in genomeshttp://www.bioinf.manchester.ac.uk/bergman/te-tools.html

• Structural classes of TEs have different genome dynamics

• Recent LTR insertion has implications for transposition-selection balance paradigm

• Population genomics using next generation sequencing will help resolve forces controlling TE evolution

Page 27: Computational analysis of transposable element evolution ...bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/BergmanICG2008.pdfLong Terminal Repeat (LTR) (A)n. Why is the discovery

Hadi QuesnevilleRuqiang Li

Douda Bensasson

2 Post-doctoral positions (available ~Sep 2008)

Andy Clark