Intron number evolution and alternative splicing functioning as
bridge in evolution
Kemin Zhou, Ph.D.
April 22, 2011
Splicing pre-mRNA
5’SS 3’SSBranch
Poly Pyrimidine Track
Intron
mRNA
A
Input Data: 16 Fungal Genomes
Database Species
Aspni1 Aspergillus niger
Mycfi1 Mycosphaerella fijiensis
Mycgr1 Mycosphaerella graminicola
Necha2 Nectria haematococca
Picst3 Pichia stipitis
Trire2 Trichoderma reesei
Trive1 Trichoderma virens
copci1 Coprinus cinereus
cryneo1 Cryptococcus neoformans
Lacbi1 Laccaria bicolor
Phchr1 Phanerochaete chrysosporium
Pospl1 Postia placenta
Sporo1 Sporobolomyces roseus
ustma1 Ustilago maydis
Phybl1 Phycomyces blakesleeanus
Batde5 Batrachochytrium dendrobatidis
Asc
omyc
ota
Bas
idio
myc
ota
Phylum
Chytridiomycota
Zygomycota
0.04
-0.14
0.62
0.05
0.28
0.05
-0.020
-1.07
-3.48
-0.04
-1.21
-0.01
-0.02 -0.003
0.01-0.03
0.01
64
80 ustma1
Picst3
Necha2
Trive1Trire2Aspni1
Mycgr1
Mycfi1
Phybl1
Batde5
Sporo1
cryneo1
copci1Lacbi1
Pospl1
Phchr1
0.1
-0.40
0.1
-0.04 -2.33
0
-0.28
0-5.57
-0.03
0.07
-0.23
1.68
7.08
6.90
7.89
7.32
7.29
7.21
1.44
3.33
3.353.313.76
2.48
2.49
6.18
5.9
7.25
Ascomycota
Basidiomycota
Pezizomycotina
Saccharomycotina
EurotiomycetesSordariomycetes
Dothideomycetes
Ustilaginomycotina
PucciniomycotinaAgaricomycotina
Tremellom
ycetes
Agaricomycetes
Zygomycota
Chytridiomycota
Conservative Estimated Number of Exons
Ascomycota
Basidiomycota
Zygomycota
Chytridiomycota7.25
-3.48
0
-1.07
3.77
7.25
6.18
0.0
5.90
6.18
Reverse TranscriptaseRNA-dependent DNA polymerase
is a DNA polymerase enzyme that transcribes single-stranded RNA into double-stranded DNA. It also helps in the formation of a double helix DNA once the RNA has been reverse transcribed into a single strand cDNA.
RNA
DNA
Reverse Transcriptase Enzymology
R2 RT 1094-nt RNA R2 RT 4.0 x 10E-3/nt AMV RT 9.3 x 10E-3/nt590-nt RNA R2 RT 2.6 x 10E-3/nt AMV RT 5.4 x 10E-3/ntpoly(rA) R2 RT 0.4 x 10E-3/nt AMV RT 1.5 x 10E-3/nt
0 500 1000 1500 2000
0.00
000.
0010
0.00
20
Nucleotides
Den
sity
lambda0.00260.0010.0005
Exponential Distributionfall-off rate=lambda
Arkadiusz Bibillo and Thomas H. EickbushJ. Biol. Chem. 2002
Intron Loss by Homologous Recombination
Genomic DNA
mRNA
Partial cDNA
Homologous recombination
RNA Polymerase II
Reverse Transcriptase
RT Foot Prints (RTFP)
Length (Nucleotides)
Cou
nts
0 2000 4000 6000
02
46
810
Aspni1
0 2000 4000 6000
020
40
Mycfi1
0 2000 4000 6000
05
1015
20 Mycgr1
0 2000 4000 6000
02
46
810
Necha2
0 2000 4000 6000
02
46
810
Picst3
0 2000 4000 6000
02
46
810
Trire2
0 2000 4000 6000
02
46
810
Trive1
0 2000 4000 6000
02
46
810
copci1
0 2000 4000 6000
02
46
810
cryneo1
0 2000 4000 6000
02
46
810
Lacbi1
0 2000 4000 60000
24
68
10
Phchr1
0 2000 4000 6000
010
2030
40
Pospl1
0 2000 4000 6000
02
46
810
Sporo1
0 2000 4000 6000
02
46
810
ustma1
0 2000 4000 6000
02
46
810
Batde5
0 2000 4000 6000
02
46
810
Phybl1
Intron Relative Location
Percent Relative Location from 5’-End
Co
un
t
0 20 40 60 80 100
100
200
300
400
Aspni1
0 20 40 60 80 100
100
150
200
250
Mycfi1
0 20 40 60 80 100
100
150
200
250
300
Mycgr1
0 20 40 60 80 100
200
300
400
500 Necha2
0 20 40 60 80 100
1030
5070
Picst3
0 20 40 60 80 100
100
200
300
400 Trive1
0 20 40 60 80 100
5015
025
035
0
Trire2
0 20 40 60 80 100
300
500
700 copci1
0 20 40 60 80 100
100
200
300
400
cryneo1
0 20 40 60 80 100
400
600
800
1000 Lacbi1
0 20 40 60 80 10020
030
040
050
0 Phchr1
0 20 40 60 80 100
200
300
400
500
Pospl1
0 20 40 60 80 100
5015
025
035
0
Sporo1
0 20 40 60 80 100
4060
8010
0
ustma1
0 20 40 60 80 100
100
200
300
400
Batde5
0 20 40 60 80 100
200
400
600
800
Phybl1
Number of Exons of Ancestor
Mean Relative Intron Location
Mea
n N
umbe
r of
Exo
ns
0.40 0.42 0.44 0.46 0.48 0.50
12
34
56
78
Aspni1
Batde5
copci1
cryneo1
Lacbi1
Mycfi1Mycgr1
Necha2
Phchr1
Phybl1
Picst3
Pospl1
Sporo1
Trire2Trive1
ustma1
7.66
Exon Length As a Function of Intron Number
Number of Introns
Ave
rag
e E
xo
n L
eng
thT
ota
l E
xon
Len
gth
5.1988961.11.1060 7812.0 xeL x
020
060
010
00
)1( xLG
0 10 20 30 40 50 60 70
2000
4000
6000
8000
Fungal Exon Length Distribution
80 1600 100 200 300 400 500
050
010
0015
0020
0025
0030
00
Exon Length
Cou
nt
0 2800671 2069102 206494
Proc. Natl. Acad. Sci. USAVol. 93, pp. 14632–14636, December 1996Evolution
Intron positions correlate with module boundaries inancient proteins (intron evolutionyintrons-early)SANDRO JOSE DE SOUZA*, MANYUAN LONG, LLOYD SCHOENBACH, SCOTT WILLIAM ROY, AND WALTER GILBERTDepartment of Molecular and Cellular Biology, Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138
Phylogenetically Older Introns Strongly Correlate With Module Boundaries in Ancient ProteinsAlexei Fedorov,1,2 Scott Roy,1 Xiaohong Cao,1,3 and Walter Gilbert1,41Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
2003
Average Module Size 25 aa1996
Protein Length 400 aadbname meanl stdl Mean(log) Std(log) Median len
Aspni1 499.2 379.8 6.010 0.631 419
Batde5 465.7 405.0 5.882 0.719 359
copci1 476.6 355.6 5.933 0.691 385
cryneo1 547.6 368.6 6.104 0.654 467
Lacbi1 388.6 318.1 5.679 0.769 308
Mycfi1 446.9 345.3 5.850 0.745 377
Mycgr1 457.3 343.0 5.892 0.701 379
Necha2 487.5 336.3 6.014 0.591 421
Phchr1 460.9 328.0 5.929 0.639 378
Phybl1 408.8 331.1 5.777 0.679 326
Picst3 504.1 351.1 6.020 0.647 429
Pospl1 440.3 310.4 5.883 0.638 357
Sporo1 575.5 479.8 6.070 0.763 438
Trire2 503.8 455.5 6.005 0.657 418
Trive1 489.3 461.3 5.976 0.647 402
ustma1 643.0 456.1 6.243 0.682 534
mean 487.2 376.6 6.0 0.7 399.8
median 482.0 353.3 6.0 0.7 393.5
F-test on log(protein length): p-value = 2.2e-16 Exp(6)=403.4
Ancestral number of exons
75 nt/exon
1200 nt/gene= 16 exons/gene
A General Tendency for Conservation of Protein Length Across Eukaryotic KingdomsDaryi Wang,* Mufen Hsieh,* and Wen-Hsiung Li**Computational and Evolutionary Genomics, Center for Genomics Research, Academia Sinica, Taipei, Taiwan; andDepartment of Ecology and Evolution, University of Chicago
2005
400 aa
9 10 11 12 13 14
23
45
67
log(Total RTFP Length)
Ave
rag
e N
EP
G
Aspni1
Batde5
copci1
cryneo1
Lacbi1
Mycfi1
Mycgr1Necha2
Phchr1
Phybl1
Picst3
Pospl1
Sporo1
Trire2
Trive1
ustma1
Intercept: 9.69 ± 1.99Slope: -0.30 ± 0.16
Intercept: 4.04 ± 0.35Slope: -0.11 ± 0.03
Intron number by RT effect
4.04
9.69
Conserved Genes Have More Introns
Exon Number Conserved in All Species
Exo
n N
um
ber
Sp
ecie
s-sp
ecif
ic
2 3 4 5 6 7 8
23
45
67
Aspni1
Batde5
copci1
cryneo1
Lacbi1
Mycfi1
Mycgr1
Necha2
Phchr1Phybl1
Picst3
Pospl1
Sporo1
Trire2
Trive1
ustma1
y = 0.503 x + 1.172 No Sporo1, p-val=8.196e-07
Number of Exon vs. Genome Size
16.5 17.0 17.5 18.0
23
45
67
8
log (genome size)
nu
mb
er o
f ex
on
s
Aspni1
Batde5
copci1cryneo1
Lacbi1
Mycfi1Mycgr1
Necha2
Phchr1
Phybl1
Picst3
Pospl1
Sporo1
Trire2Trive1
ustma1
allbetweenPhylumSpecies
p-value:0.0006588
Introns are getting longer for less conserved genes except for genomes with very few introns.
-20
-10
0
10
20
30
40
50
60
70
Asp
ni1
Bat
de5
copc
i1cr
yneo
1
Lacb
i1
Myc
fi1
Myc
gr1
Nec
ha2
Phc
hr1
Phy
bl1
Pic
st3
Pos
pl1
Spo
ro1
Trir
e2
Triv
e1
ustm
a1
Intr
on
Len
gth
Dif
fere
nce
(S
SG
-GC
AS
)
Chi-square Test Younger – Older Genes
Relative Location 1/10 (floor(reloc*10))
0 2 4 6 8
-0.0
40.
000.
04 1.204e-07
Aspni1
0 2 4 6 8
-0.0
40.
000.
04 1.482e-18
Mycfi1
0 2 4 6 8
-0.0
40.
000.
04 2.228e-10
Mycgr1
0 2 4 6 8
-0.0
40.
000.
04 1.390e-25
Necha2
0 2 4 6 8
-0.0
40.
000.
04 1.303e-03
Picst3
0 2 4 6 8
-0.0
40.
000.
04 1.397e-35
Trive1
0 2 4 6 8
-0.0
40.
000.
04 5.121e-18
Trire2
0 2 4 6 8
-0.0
40.
000.
04 4.021e-08
copci1
0 2 4 6 8
-0.0
40.
000.
04 5.447e-01
cryneo1
0 2 4 6 8
-0.0
40.
000.
04 1.770e-35
Lacbi1
0 2 4 6 8-0
.04
0.00
0.04 1.330e-01
Phchr1
0 2 4 6 8
-0.0
40.
000.
04 4.135e-02
Pospl1
0 2 4 6 8
-0.0
40.
000.
04 8.968e-02
Sporo1
0 2 4 6 8
-0.0
40.
000.
04 2.997e-03
ustma1
0 2 4 6 8
-0.0
40.
000.
04 8.028e-11
Batde5
0 2 4 6 8
-0.0
40.
000.
04 1.047e-22
Phybl1
Dif
fere
nce
of
freq
uen
cy
Timing of Intron Loss
• Dramatic intron loss happened during the earlier evolution of the ancestor of Ascomycota.
• Basidiomycota: – Most genomes had little intron gain loss since
divergence from common ancestor– Lacbi1 younger genes have more introns located to
both ends, indication for modern exon shuffling
• Two yeasts: Picst3 and ustma1 younger genes have more introns near 3’-end relative to older genes
Number of exons in ancestor
• Previous results about 5.8 exons/gene
• This study: 7.25, 7.66, 9.69, and 16
• First three methods under estimate
• 16 is the most unbiased estimated
Gene Birth Big Bang
• Previous evolution has generated short modules of about 25 aa on average
• In a very short time scale, genes were formed by a large scale exon-shuffling process
• This ancient gene pool has about 16 exons on average
• Subsequence evolution is dominated by intron loss
Ancient nature and bridging function of alternative splicing
Evidence-based Alternative Splicing
Genome EST
COMBEST
Gene Model with AS
+
Genomes Chlre4 Agabi2 Aspca3 Spoth1Source Sanger 454 454 Solexa+454
Count 309,185 1,140,141 2,466,463 42,173,117
Average len. 927.3 221.6 401.8 40.8
min/max len. 15/5159 50/1479 47/961 26/1014
Fraction Mapped 0.66 0.87 0.92 0.95
Size (mb) 112 30 36 39
Gap fraction 0.075 0.007 0.056 0.003
Num. models 16,696 10,443 11,624 8808
Exons/model 7.37 5.99 3.47 3.02
Coding fraction 0.62 0.82 0.91 0.91
GC Content 0.64 0.46 0.52 0.52
EST Coverage 2.87x 8.94x 29.52x 46.30x
Characteristics of InputE
ST
Gen
ome
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
32 64 128 256 512 1024
Chlre4Agabi2Aspca3Spoth1
Intron length and Splice SitesDatabase Chlre4 Agabi2 Aspca3 Spoth1
Canonical 98.87% 99.04% 97.54% 99.11%
Intron length
0
0.01
0.02
0.03
0.04
0.05
64 128 256 512 1024
Chlre4
Fre
quen
cy
Quantity of AS
Distribution of Number of Alternatively Spliced Forms
Number of Models per Gene
Cou
nt
Fig 1
Higher Coverage, Longer Assembly, More Alternative Splicing and Antisense
Genome Chlre4 Agabi2 Aspca3 Spoth1
Coverage* 2.87 8.94 29.52 46.30
Num EST per Assembly 22.1 134.0 252.2 5567.4
mRNA length 1054.1 1085.3 1739.0 1829.0
Alt. of all/multiexon 0.08/0.15 0.28/0.35 0.33/0.52 0.33/0.63
Antisense fraction 0.0675 0.1433 0.2375 0.2880
*Coverage is the normalized mapped total EST length over Genomic length
Aspergillus aculeatus: 70% AS of multiexon genes 338,255,050 EST
Alternative Splicing Correlates with number of exons, expression level, and
length of longest intron
Genome Chlre4 Agabi2 Aspca3 Spoth1
Factor coefficient p-value coefficient p-value coefficient p-value coefficient p-value
intercept 1.012e+00 <2e-16 1.043e+00 <2e-16 5.364e-01 <2e-16 5.599e-01 <2e-16
numexon 2.508e-02 <2e-16 9.289e-02 <2e-16 3.686e-01 <2e-16 3.629e-01 <2e-16
profmaxh 1.267e-03 <2e-16 1.038e-03 <2e-16 1.113e-03 <2e-16 2.180e-04 <2e-16
maxintronlen 7.174e-05 1.33e-4 1.154e-03 7.12e-09 1.553e-03 <2e-16 6.165e-04 3.12e-07
mRNAlen -8.125e-05 2.05e-05 9.177e-05 3.36e-16
overall <2.2e-16 <2.2e-16 <2.2e-16 <2.2e-16
Linear regression analysis of number of AS against number of exons, profmaxh, max intron length, and mRNA length
Restriction on Contributing Factors
• Intrinsic Property– Contribution from each intron is smaller for
intron-rich genomes– Long introns are more predictive of AS in
genomes with short average intron length
• External Measure– Exceedingly high EST coverage lower the
contribution from expression level (saturation effect)
Ancient Measure by Conservation Pattern
Archaea
Bacteria
Eukaryota
P
UWithout Fungi
FungiProtein
Ancient proteins should be conserved in all three kingdoms
Top 10 Isoforms from Aspca3NISO Peplen Blast Hit Definition
20 337 glyceraldehyde-3-phosphate dehydrogenase gpdA
18 331 malate dehydrogenase, NAD-dependent
16 193 zinc knuckle domain protein
16 226 60S ribosomal protein L13
15 522 extracellular alpha-amylase
15 137 60S ribosomal protein L35a
15 395 conserved hypothetical protein
15 25 NOHIT, most interesting
14 107 60S ribosomal protein L30
14 179 nucleosome-binding protein (Nhp6a)
Manually examined top 100, this appears to be the case.
Genes with AS Tend to be more Ancient
AS Protein
NoneAS Protein
% Hits All Three Kingdoms
Blast Search Against nr - Fungi
Genome AS NoAS P-value
Agabi2 9.81% 5.62% 1.60E-14
Aspca3 10.17% 7.15% 1.86E-10
Spoth1 10.23% 4.51% <2.2e-16
Ancient => Conservation
Genes with AS Are More Likely to Be Conserved within Ascomycota
Protein Sequence Best hits between Aspca3 and Spoth1
Χ-squared=4280.4, df=3, p-value < 2.2e-16
Aspca3
AS: 0.3344 NoAS: 0.6656
Spoth1
AS:
0.3208
E: 0.110
O: 0.3561
C: 2296
E: 0.219
O: 0.2250
C: 1451
NoAS:
0.6714
E: 0.225
O: 0.1906
C: 1229
E: 0.447
O: 0.2283
C: 1472
Genes with AS Are More Conserved between Ascomycota and Basidiomycota
X-square=2360.138, df=3, p-value < 2.2e-16
Aspca3
AS: 0.3344 NoAS: 0.6656
Agabi2
AS:
0.2810
E: 0.0940
O: 0.2801
C: 1283
E: 0.1870
O: 0.1670
C: 765
NoAS:
0.7190
E: 0.2404
O: 0.3152
C: 1444
E: 0.4786
O: 0.2377
C: 1089
AS Profile Not Conserved
Exon Alignment Quality
AP 0 1 2 3 4 total
Aspca3Spoth1
nn 42 | 0.029 246 | 0.167 226 | 0.154 222 | 0.151 736 | 0.500 1472
ny 680 | 0.047 122 | 0.084 247 | 0.170 419 | 0.289 595 | 0.410 1451
yn 46 | 0.037 144 | 0.117 264 | 0.215 338 | 0.275 437 | 0.356 1229
yy 179 | 0.078 211 | 0.092 518 | 0.226 930 | 0.405 458 | 0.199 2296
total 335 | 0.052 723 | 0.112 1255 | 0.195 1909 | 0.296 2226 | 0.345 6448
Agabi2Aspca3
nn 2 | 0.002 14 | 0.013 66 | 0.061 298 | 0.274 709 | 0.651 1089
ny 5 | 0.003 9 | 0.006 64 | 0.044 667 | 0.462 699 | 0.484 1444
yn 0 | 0.000 4 | 0.005 37 | 0.048 260 | 0.340 464 | 0.607 765
yy 1 | 0.001 12 | 0.009 102 | 0.080 710 | 0.553 458 | 0.357 1283
total 8 | 0.002 39 | 0.009 269 | 0.059 1935 | 0.422 2330 | 0.509 4581
AP: AS x AS pair
0: perfect, 1: almost perfect, 2: indes, 3: partial, and 4: no alignment
Basic Types of Alternative Splicing5’ splice site selection Alternative Donor
3’ splice site selectionAlternative Acceptor
Cassette Exons Exon skipping / Retention
Intron retentionRI
CE
AA
AD
One or more
Composite Types
Composite TypesAP
AD+AA
CE+AD
AA or CE Alternative 3’ exons no overlap
3’ ends for cassette exonsMay not be end for other
Between CE and AA
CE variants: AD, AA, ADAA
CE+AA
CE+AP
Intron Retention Variants
2 or more intron retention indicating Genomic comtamination?
Mutually exclusive exons
ME
Significant Overlap of Middle
AA AD
AS Type Distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
AD AACE
CE_AD
CE_AA
CE_ADAA IR
ADAA
CEorAAen
dM
E
Oth
er
Spoth1
Aspca3
Agabi
Chlre4
AS Category
Fra
ctio
n of
Tot
al
RI Types
• No RI: Intron Not retained
• Minor: Retained in a minor isoform
• Major: Retained in a major isoform
Major is the most abundant isoformMinor is not the most abundant isoform
Length { 3n, 3n+1, 3n+2 }
3n+1, 3n+2
NRI or Minor RI favor PTCMajor RI avoids PTC
P-value of Chi-Square test against 1/3<
2.2
E-1
6
8.1
8E
-08
2.0
1E
-04
0.0
31
64
4.9
1E
-09
0.0
38
77
<2
.2E
-16
3.4
88
E-4
5.2
9E
-11
0.9
81
2.1
2E
-10
0.0
03
04
1
0.0
01
70
5
0.5
75
0.6
07
0.0
05
32
9
1.0
6E
-02
0.5
64
7
<2
.2E
-16
1.0
2E
-03
0.0
17
7
0.0
32
13
<2
.2E
-16
0.0
98
3
Stop0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes No
Yes
0.358Minor0.054
0.748Major0.010
0.260NRI
0.783
0.262Minor0.183
0.689Major0.034
0.190NRI
0.985
0.238Minor0.013
0.512Major0.002
0.258NRI
0.751
0.316Minor0.189
0.568Major0.060
Agabi2 Aspca3 Chlre4 Spoth1
3n 3n+1 3n+2
0.327NRI
0.936
Stopless Frac.
RI Type Frac
Intron Phase
UUUGCAAUUCUAGAAGAC F A I L E D
0 1 2
Stopless Introns Favor Phase 0 More
<2
.2e
-16
<2
.2e
-16
<2
.2e
-16
1.9
59
E-1
4
<2
.2e
-16
<2
.2e
-16
<2
.2e
-16
1.0
9E
-04
P-value from Chi-Square Test against 1/3
A. Fractions of three phases
B. Difference against population
-0.150
-0.100
-0.050
0.000
0.050
0.100
0.150
Stop
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
No Yes No Yes No Yes No YesAgabi2 Aspca3 Chlre4 Spoth1
ph0ph1ph2
Stopless 3n favors Phase 1
<2.
2e-1
6
<2.
2e-1
6
<2.
2e-1
6
<2.
2e-1
6
1.06
E-0
7
<2.
2e-1
6
0.98
9
0.61
5
0.65
96
0.00
177
0.16
6
0.00
0178
Chi-Square test against stopless population
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
3n 3n+1 3n+2 3n 3n+1 3n+2 3n 3n+1 3n+2 3n 3n+1 3n+2
Agabi2 Aspca3 Chlre4 Spoth1
ph0 ph1 ph2
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
A
B
B. Difference against population
Explanation of Phase for 3n Introns
NNN|GTNNN…NNNAG|NNNNN
NNN|GTNNN…NNNAG|NNNNN
NNN|GTNNN…NNNAG|NNNN
0
1
2
Phase E,Q,KV
C,R,S,G 12 AA V,A,D,E,G
S,RF,L,S,Y,C,W
3*1*
2*
1*
13AA
# of Stops
1
2
4
1. Splicing of zero-phase introns produces the most certain outcome.2. Phase 0 has the least amount of stop codons, thus least filtering with stopless introns3. Phase 1 intron favored in major RI because of the flexibility and less bulky residues
permitted on both ends, especially G, S, C, and A.4. Phase 2 requires the most hydrophobic residues and at 3’ end only S and R are
permitted
Flexibility
Limited
Best
Limited
RI Evolution: Minor Major loss
NRIMinor
RI
NMD
New Function
Splicing Error
Evolution
Stop Codon3n+1, 3n+2
3n, no stop
Gene
MajorRI
Fixed in PopulationBecome dominant form
Complete Elimination
Partial Elimination
Minor RI: Avoid 3n,Has stop
NRI: Avoid 3n,Has stop
What We See
A new Intron loss mechanism
exonization
Vertebrate Evolution
Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates
Proc Natl Acad Sci U S A. 2001 September 25; 98(20): 11382–11387.
Proteolipid protein (PLP)
RI + NRI
Fish (NRI)
Amphibians (RI)
Other Tetrapods (NRI + RI)Reptiles, Birds, Mammals
Ancestor
Byrappa Venkatesh, Mark V. Erdmann, and Sydney Brenner
Conclusion
• Fungi and green alga has abundant AS
• AS is ancient
• AS is conserved but the exact AS profile is not conserved
• Number of exons, expression level, and long introns contributes to AS
• RI dominates AS and bridge the evolution of new protein functions
Transcription overlap
Number of genes per congregation
Genome Chlre4 Agabi2 Aspca3 Spoth1
Congreg. > 1 gene 29.2% 21.1% 39.3% 34.9%
Congregated Genes 48.0% 37.2% 63.5% 58.4%
Cou
nt
Number of Genes per Congregation
A New Function Evolved from Gene FusionManyuan Long1Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637, USA
Overlapping AS as a bridge to novel gene
2000
Novel Human Gene
Acknowledgement
• Annotation– Igor Grigoriev– Alan Kuo– Andrea Aerts– Bobby Otillar
• R&D– Asaf Salamov
• Annotation pipeline– Frank Korzeniewski– Xueling Zhao
• EST Group– Erika Linquist– Jasmyn Pangilinan– Zhong Zhang
• IT Group
Funding:Department of Energy
Statistical and mathematical
Mingkun Li
GDS Group
Sydney Brenner
Introns Boost Expression
Database Chlre4 Agabi2 Aspca3 Spoth1
Exon Structure S M S M S M S M
Mean profmaxh 10.4 24.1 10.7 51.2 42.8 111.7 200.4 372.9
T-test result of gene expression levels as measured as the maximum height of base-coverage profile (profmaxh). S for single exon genes. M for multiple exon genes.
All p-values < 2.2e-16.