evolution of alternative splicing mikhail gelfand institute for information transmission problems,...

46
Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation Analysis and Alternative Splicing” Berlin, December 2004

Upload: jared-myron-sanders

Post on 08-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Alternative splicing of human (and mouse) genes

TRANSCRIPT

Page 1: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Evolution of alternative splicing

Mikhail GelfandInstitute for Information Transmission Problems,

Russian Academy of Sciences

Workshop “Gene Annotation Analysis and Alternative Splicing”Berlin, December 2004

Page 2: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Overview

• Exon-intron structure of orthologous genes– human–mouse – Drosophila–Anopheles

• Sequence divergence in alternative and constitutive regions

• Evolution of splicing and regulatory sites • Alternative splicing and protein structure

Page 3: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Alternative splicing of human(and mouse) genes

5% Sharp, 1994 (Nobel lecture)35% Mironov-Fickett-Gelfand, 199938% Brett-…-Bork, 2000 (ESTs/mRNA)22% Croft et al., 2000 (ISIS database)55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)

42% Modrek et al., 2001 (HASDB)~33% CELERA, 2001

59% Human Genome Consortium, 200128% Clark and Thanaraj, 2002all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)

41% (mouse) FANTOM & RIKEN, 200260% (mouse) Zavolan et al., 2003

Page 4: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

• Exon-intron structure of orthologous genes– human – mouse – Drosophila–Anopheles

• Sequence divergence in alternative and constitutive regions

• Alternative splicing and protein structure

Page 5: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Data

• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)

• additional variants– UniGene (human and mouse EST clusters)

• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome

Page 6: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Methods

• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)

• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA

against genomic DNA)• Pro-Frame (spliced alignment, proteins against

genomic DNA)– confirmation of orthology

• same exon-intron structure• >70% identity over the entire protein length

– analysis of conservation of alternative splicing• conservation of exons or parts of exons• conservation of sites

Page 7: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

166 gene pairs

42 84 40

human mouse

Known alternative splicing:

126 124

Page 8: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Page 9: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Human genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 56 25 74 26Alt. donors 18 7 16 10Alt. acceptors 13 5 19 15Retained introns 4 3 5 0Total 96 30 114 51Total genes 45 28 41 44

Conserved elementary alternatives: 69% (EST) - 76% (mRNA)

Genes with all isoforms conserved: 57 (45%)

Page 10: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Mouse genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 70 5 39 9Alt. donors 24 6 17 6Alt. acceptors 15 6 16 9Retained introns 8 7 10 4Total 117 24 82 28Total genes 68 22 30 26

Conserved elementary alternatives: 75% (EST) - 83% (mRNA)

Genes with all isoforms conserved: 79 (64%)

Page 11: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Real or aberrant non-conserved AS?• 24-31% human vs. 17-25% mouse elementary

alternatives are not conserved• 55% human vs 36% mouse genes have at least

one non-conserved variant• denser coverage of human genes by ESTs:

– pick up rare (tissue- and stage-specific) => younger variants

– pick up aberrant (non-functional) variants• 17-24% mRNA-derived elementary alternatives

are non-conserved (compared to 25-32% EST-derived ones)

Page 12: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

smoothelin

human

common

mouse

human-specific donor-site

mouse-specific cassette exon

Page 13: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

autoimmune regulator

human

common

mouse

retained intron; downstream exons read in two frames

Page 14: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Na/K-ATPase gamma subunit (Fxyd2)

human

mouse

(deleted) intron

com

mon

alternative acceptor site within (inserted) intron

Page 15: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Comparison to other studies.Modrek and Lee, 2003: skipped exons

• 98% constitutive exons are conserved• 98% major form exons are conserved• 28% minor form exons are conserved

• inclusion level is a good predictor of conservation

• inclusion level of conserved exons in human and mouse is highly correlated

Page 16: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Minor non-conserved form exons are errors? No:

• minor form exons are supported by multiple ESTs

• 28% of minor form exons are upregulated in one specific tissue

• 70% of tissue-specific exons are not conserved

• splicing signals of conserved and non-conserved exons are similar

Page 17: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Thanaraj et al., 2003:extrapolation from EST comparisons

• 61% (47-86%) alternative splice junctions are conserved

• 74% (71-78%) constitutive splice junctions are conserved

• the former number is consistent with other studies, whereas the latter seems to be an underestimate

Page 18: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Regulation of alternative splicing: introns

• Brudno et al., 2001: UGCAUG is over-represented downstream of tissue-specific exons (brain, muscle).

• Sorek and Ast, 2003: Enhanced conservation (between human and mouse) in intronic sequences flanking alternatively spliced exons. UGCAUG is over-represented in conserved regions.

Page 19: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

• Exon-intron structure of orthologous genes– human – mouse – Drosophila–Anopheles

• Sequence divergence in alternative and constitutive regions

• Alternative splicing and protein structure

Page 20: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Fruit fly and mosquito

• Technically more difficult than human-mouse:– incomplete genomes– difficulties in alignment, especially at gene

termini– changes in exon-intron structure irrespective of

alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Page 21: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Filtering of the

dataset

FlyBase

alternatively spliced fruit fly gene and all its protein isoforms

Non-canonical sites:exclude isoform

Pro-Frame alignment of all isoforms with the fruit fly genome. Frameshift or in-frame stop for at least one isoform:

exclude gene

No constitutive segments inside gene:exclude gene

List of orthologous

pairs

List of filtered fruit fly genesENSEMBL

Pro-Frame alignment of all fruit fly isoforms with the mosquito genome mosquito

genesSimilarity for all isoforms <30%: exclude

orthologous pair

Poly-N within aligned region in the mosquito genome for at least one isoform:

exclude orthologous pair

Set of filtered orthologous pairs

Page 22: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Classification of exons and coding segments • for each pair of isoforms define: mutually Exclusive

exon, Cassette exon, retained Intron, alternative Acceptor site, alternative Donor site; then merge these definitions over all pairs for a gene

--I--

----D ---AD -C-------- EC------A-E----

Left marginal coding segments

Internalcoding segments

isoform 1

isoform 2

isoform 3

- exon - alternative coding segment - constitutive coding segment

----D

E----

E----

E----

E----

----D

----D

-----

-----

---A-

---A-

EC---

--I--

---AD

---AD

-C--- -C---

-C---

constitutive exon

---AD

Right marginal coding segments

Left marginal exons

Internal exons

Right marginal exons

Page 23: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

How to define conservation of fruit fly alternative exons

• Alignment of an exon may depend on the isoform. In the cases listed below, shorter exons are assumed to be conserved, whereas longer ones are considered missing

isoform 1

isoform 2

- similarity in alignments of all isoforms including this segment was less than 35%

- similarity in alignment of at least one isoform including this segment was greater than 35%

**missing exon **missing exon *missing exon ***missing exon

Page 24: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Conservation of fruit coding segments in the

mosquito genome. Small (curated) sample

Type of segment

Missing Conserved Total

left marginal (alternative)

46 (77%) 14 (23%) 60 (12%)

internal alternative

22 (55%) 18 (45%) 40 (8%)

internal constitutive

83 (24%) 264 (76%) 347 (69%)

right marginal (alternative)

31 (56%) 24 (44%) 55 (11%)

Total 182 (36%) 320 (64%) 502 (100%)

Page 25: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Conservation of fruit coding segments in the

mosquito genome. Large (non-curated) sample

Type of segment

Missing Conserved Total

left marginal (alternative)

858 (57%) 639 (43%) 1497 (23%)

internal alternative

215 (55%) 178 (45%) 393 (6%)

internal constitutive

903 (23%) 2999 (77%) 3902 (59%)

right marginal (alternative)

414 (53%) 369 (47%) 783 (12%)

Total 2390 (36%) 4185 (64%) 6575 (100%)

Page 26: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Classification of slice events for fruit fly exons

• divided exon• joined exon• exactly conserved exon• mixed;

d eDr

An

- slice

j jj d m jd j m m j

- exon

Page 27: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Different types of events for the same exon dependent on an isoform

dDr (isoform 1)

- slice

j

- exon

An

d

Dr (isoform 2)j

An

j

j

e

e

Page 28: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons

missing mixed joined divided exact

constitutive 728 (23%) 212 (7%) 754 (23%) 407 (13%) 1356 (42%)

Donor site 229 (50%) 21 (5%) 52 (11%) 47 (10%) 130 (28%)

Acceptor site 390 (43%) 45 (5%) 133 (15%) 124 (14%) 250 (28%)

retained Intron 37 (70%) 3 (6%) 2 (4%) 8 (15%) 6 (11%)

Cassette exon 90 (59%) 4 (3%) 9 (6%) 6 (4%) 50 (33%)

Exclusive exon 10 (15%) 1 (1%) 1 (1%) 1 (1%) 55 (82%)

Page 29: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptorsite

retainedIntron

Cassetteexon

Exclusiveexon

EXACT

divided

joined

mixed

MISSING

Page 30: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Fruit fly and mosquito

• The general results are the same as for the human-mouse comparison: more conservation of constitutive segments than alternative ones:– 75% const. and 45% alt. segments are

conserved– constitutive exons: >50% conserved exactly,

~25% intron in drosophila, ~8% intron in anopheles

– conservation of alternatives: 36% cassette exons, 51% donor sites, 63% acceptor sites, 83% mutually exclusive exons

Page 31: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

• Exon-intron structure of orthologous genes– human – mouse – Drosophila – Anopheles

• Sequence divergence in alternative and constitutive regions

• Alternative splicing and protein structure

Page 32: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates

Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end

0,1760,199

0,187

0,301

0,00

0,10

0,20

0,30

Constitutive N-endalternative

Internalalternative

C-endalternative

d N/dS

0,886 0,874 0,878

0,807

0,7

0,8

0,9

Constitutive N-endalternative

Internalalternative

C-endalternative

Am

ino-

acid

iden

tity

• Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio)

• Less amino acid identity in alternative regions

Page 33: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Genes with length of both const. and alt. reg. > 80 nt

• Horizontal axis: difference in dN/dS in const. and alt. regions• Vertical axis: number of genes• Violet : dN/dS in const. regions > dN/dS in alt. regions • Yellow: dN/dS in const. regions < dN/dS in alt. regions

658

207

79

27 19 27

773

333

140

7144 58

0

100

200

300

400

500

600

700

800

900

0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 >0.5

Page 34: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

279 proteins from SwissProt+TREMBL with “varsplic” features

constitutive alternative % alt. to all

length 199270 66054 25%all SNPs 1126 368 25%synonymous 576 (51%) 167 (45%) 22%benign 401 (36%) 141 (38%) 26%damaging 149 (13%) 60 (16%) 29%

again, there is some evidence of positive selection towards diversity. This is not due to aberrant ESTs

(only protein data are considered).

Page 35: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

• Exon-intron structure of orthologous genes– human – mouse – Drosophila – Anopheles

• Sequence divergence in alternative and constitutive regions

• Alternative splicing and protein structure

Page 36: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Data• Alternatively spliced genes (proteins) from

SwissProt– human– mouse

• Protein structures from PDB• Domains from InterPro

– SMART– Pfam– Prosite– etc.

Page 37: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

a)

6%10%

15%37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

Page 38: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Rat

io(o

bser

vere

d/ex

pect

ed)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domainsoutside domains

Page 39: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%10%

15%37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Page 40: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%10%

15%37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite patterns

unaffected

Prosite patterns

affected

FT positions

unaffected

FT positions affected

Expected Observed

Page 41: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

An attempt of integration

• AS is often young (as opposed to degenerating)

• young AS isoforms are often minor and tissue-specific

• … but still functional– although unique isoforms may be result of aberrant

splicing• AS regions show evidence for positive

selection – excess damaging SNPs– excess non-synonymous codon substitutions

Page 42: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

What to do

• Each isoform (alternative region) can be characterized:– by conservation (between genomes)– if conserved, by selection (positive vs negative)

• human-mouse, also add rat; compare species of Drosophila and Caenorhabditis

– pattern of SNPs (synonymous, benign, damaging)– tissue-specificity

• in particular, whether it is cancer-specific– degree of inclusion (major/minor)– functionality (for isoforms)

• whether it generates a frameshift• how bad it is (the distance between the stop-codon and

the last exon-exon junction)

Page 43: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

What to expect

• Cancer-specific isoforms will be less functional and more often non-conserved

• Set of non-conserved isoforms will contain a larger fraction of non-functional isoforms; and this may influence evolutionary conclusions on the sequence level

• Still, after removal of non-functional isoforms, one would see positive selection in alternative regions (more non-synonymous substitutions compared to constant regions etc.), especially in tissue-specific ones

Page 44: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

ReferencesNurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003)

Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics 12: 1313-1320.

Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S. (2003) Increase of functional diversity by alternative splicing. Trends in Genetics 19: 124-128.

Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG (2001) Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Research 29: 2338-2348.

Mironov AA, Fickett JW, Gelfand MS (1999). Frequent alternative splicing of human genes. Genome Research 9: 1288-1293.

Page 45: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Acknowledgements

• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)

• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute– Russian Fund of Basic Research– Russian Academy of Sciences

Page 46: Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation

Authors• Andrei Mironov (Moscow State University) – spliced alignment• Ramil Nurtdinov (Moscow State University) – human/mouse

comparison• Irena Artamonova (Institute of Bioorganic Chemistry, now

Institute of Bioinformaics, GSF) – human/mouse comparison, MAGEA family

• Dmitry Malko (GosNIIGenetika) – Drosophila/Anopheles comparison

• Inna Dubchak (Lawrence Berkeley Lab) – sites• Michael Brudno (UC Berkeley, now Stanford) – sites• Ekaterina Ermakova (Moscow State University) – evolution of

alternative/constitutive regions• Vasily Ramensky (Institute of Molecular Biology) – SNPs• Eugenia Kriventseva (EBI, now BASF) – protein structure• Shamil Sunyaev (EMBL, now Harvard University Medical

School) – protein structure