computational analysis of transposable element evolution...
TRANSCRIPT
Casey M. Bergman
Faculty of Life SciencesUniversity of Manchester
[email protected]://bioinf.manchester.ac.uk/bergman
Computational analysis of transposable element evolution in Drosophila genomes.
Overview of Talk
• Demographic history of TEs in D. melanogaster
• TE population genomics using 454 sequencing
• Discovery and detection of TEs in genomes
• Abundance and distribution of TEs in Drosophila genomes
• Noncoding DNA (ncDNA) & transposable elements (TEs)
Higher organisms have ahigher proportion of ncDNA
Bacteria15 %
Yeast30 %
Higher organisms have ahigher proportion of ncDNA
Bacteria15 %
Yeast30 %
Human98 %
Fly75 %
TEs are significant & variable part of ncDNA
Fly
chr2R:
Gap
SINELINELTRDNA
5440000 5450000 5460000 5470000 5480000 5490000 5500000 5510000 5520000 5530000 5540000 5550000 5560000 5570000Gap Locations
FlyBase Protein-Coding Genes
Repeating Elements by RepeatMasker
Mef2Mef2Mef2Mef2Mef2
Mef2
CG15863CG12130
CG1418CG12133
AdamCG12134
CG12134
eve TER94TER94
Pka-R2Pka-R2Pka-R2
CG12128CG12128
CG1407CG1407
CG12129
CG18445
CG2249CG2249
CG12918CG2264CG2264CG2264CG2264
Def
CG2264
CG2292
Human
chrX:
SINELINELTRDNA
152500000 152550000
UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics
Repeating Elements by RepeatMasker
ATP2B3ATP2B3
ATP2B3FAM58AFAM58A
DUSP9DUSP9
PNCKPNCK
(A)n
DNA transposons (cut+paste)
RNA retrotransposons (copy+paste)
3 major types of transposable element (TE)
Terminal Inverted Repeat (TIR)
LINE-like (non-LTR)
Long Terminal Repeat (LTR)(A)n
Why is the discovery and detection of TE sequences in genomes important?
• Genome alignment
• Genome evolution
• Population genomics
• Genome organization
• Genome assembly
Discovery of new TE families
• Homology to TE proteins (e.g. transposase) => HMMer, tBLASTx
gagPBS
3’TSD 5’LTR 3’LTR 3’TSD
PPTpol env
Dmin ! (b3 – b5) ! Dmax
Lmin ! (e5 – b5), (e3 – b3) ! Lmax
b5 e5 b3 e5
• structural motifs (e.g. LTRs)=> LTRstruc, LTRharvest
• comparative genomics=> compTE
• dispersed repeats (all-by-all, k-mers)=> RECON, PILER, RepeatScout, ReAS
Reviewed in: Bergman & Quesneville (2007) Brief Bioinf. 8:382-92
hmmall-by-allRECON
BLASTER
RepeatMasker
TBLASTX
RMBLR
Release 3
Release 4
Quesneville, Bergman et al. (2005) PLoS Comp. Biol. 1:e22
Detection of individual TE copies
Genomic TE distribution in D. melanogaster
~3% ~20%
genome-wideaverage ~5.5%
10
20
30
40
50
5 10 15 20
X# TEs per 50kb
~ centromere~ high-low rec.
Bergman, Quesneville et al. (2006) Genome Biology 7:R112
Duplication generates regions of high TE densityHDR 1
HDR 1
D. melanogaster
D. melanogaster
D.
yaku
ba
D. m
ela
no
gaste
r
10
20
30
40
50
5 10 15 20
X# TEs per 50kb
~ centromere~ high-low rec.
Bergman, Quesneville et al. (2006) Genome Biology 7:R112
Chimeric gene-TE transcripts create nested genes
genes
TEEST
Ptp61F
312ru
CG32320CG9168
FucTD CG9173
FBti0020016
LP01280
1340000 1360000 1380000 1400000
Chrom. Arm 3L
Lipatov, Lenkov, Petrov & Bergman (2005) BMC Biology 3:24
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
D.mel D.sim D.sec D.yak D.ere D.ana D.pse D.per D.wil D.vir D.moj D.gri
Proport
ion T
E/R
epea
t in
Sca
ffold
s >
200 K
b
BLASTER-tx+Repbase-NoDros
BLASTER-tx+BDGP
BLASTER-tx+PILER
RepeatMasker+ReAS
RepeatRunner+PILER
CompTE
TE abundance in 12 Drosophila genomes
Clark, Eisen, Smith, Bergman, et al (2007) Nature 450:203-18
5.3%
2.7% 3.7%
12.0
%
24.9
%
6.9%
15.6
%
2.8%2.7%
8.5%
13.9
%
8.9%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
D.mel D.sim D.sec D.yak D.ere D.ana D.pse D.per D.wil D.vir D.moj D.gri
Proport
ion T
E C
lass
in S
caffold
s >
200 K
b
LTR
LINE
TIR
OTHER
Abundance of major TE types is conserved across genus Drosophila
non-LTR
LTR
TIR
Clark, Eisen, Smith, Bergman, et al (2007) Nature 450:203-18
Is the genomic distribution of TEs in D. melanogaster affected by historical activity?
non-LTR retrotransposons exhibit a ‘pseudogene-like’ mode of sequence evolution
Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302
Alignment of paralogous TEs
non-LTR retrotransposons exhibit a ‘pseudogene-like’ mode of sequence evolution
Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302
D. mel - D. sim speciation
Bergman & Bensasson (2007) PNAS 104:11340-5
a_
inva
de
r2_
6
b_
mic
rop
ia_
4
c_
Ta
bo
r_3
d_
17
.6_
11
e_
Sta
lke
r_4
f_ro
ve
r_3
g_
fle
a_
16
h_
co
pia
_2
8
i_m
dg
3_
10
j_ro
o_
86
k_
Tra
nsp
ac_
4
l_o
pu
s_
16
m_
blo
od
_2
2
n_
41
2_
24
o_
Bu
rdo
ck_
13
p_
div
er_
9
q_
Tira
nt_
20
r_jo
cke
y2
_7
s_
He
len
a_
7
t_C
r1a
_3
6
u_
ba
gg
ins_
6
v_
G4
_1
0
w_
Do
c3
_7
x_
G5
_8
y_
BS
_1
5
z_
Ju
an
_9
zz_
Do
c_
53
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Div
erg
en
ce
(su
b/s
ite
)
0
1.80
3.60
5.41
7.21
9.01
10.81
Ag
e (
Mya
)
Retrotransposon demographics in D. melanogaster
Summary of retrotransposon demographicsinferred from intra-genomic comparisons
• LTR elements systematically younger than non-LTR elements
• Low frequency of LTR insertions may not be due to selection
• non-LTR families inserted in waves since speciation
• most LTR families inserted since colonization of non-African habitats
• LTR insertions not at transposition-selection equilibrium
Bergman & Bensasson (2007) PNAS 104:11340-5
From evolutionary statics to dynamics: population genomics of TEs using 454 sequencing
Population genomics of TEs using 454 sequencing
Hybrid TE-unique reads“Unique Flank Tags”
Strain X
454 Reads
TEs
Population genomics of TEs using 454 sequencing
Hybrid TE-unique reads“Unique Flank Tags”
Strain X
454 Reads
TEs
KNOWN ✓ReferenceNEW !
Population genomics of TEs using 454 sequencing
TEs in reference sequence
Population genomics of TEs using 454 sequencing
TEs in reference sequence
Known INE-1 insertion present in NC and AF strains
Novel jockey insertion present in >1 strain
Preliminary findings using 454 sequencing
• 10 strains of D. melanogaster: 6 USA, 4 Malawi
• ~1/3 of INE-1 found in nature, so estimate ~3900 annotated TEs present in >=1 wild strain
• ~24% of annotated TEs found in >=1 wild strain (1300/5400)
• ~72% found in nature in low recomb. regions (950/1300)
• DNA transposons (~30%) found in nature more often than LTR/non-LTR retrotransposons (~10%)
• consistent with all TEs fixed in low recomb. regions plus few hundred segregating in high recomb. regions
Summary
• Mature methods exist for analysis of TEs in genomeshttp://www.bioinf.manchester.ac.uk/bergman/te-tools.html
• Structural classes of TEs have different genome dynamics
• Recent LTR insertion has implications for transposition-selection balance paradigm
• Population genomics using next generation sequencing will help resolve forces controlling TE evolution
Hadi QuesnevilleRuqiang Li
Douda Bensasson
2 Post-doctoral positions (available ~Sep 2008)
Andy Clark