progress report yiming zhang 02/10/2012. all as events in asip intron retention exon skipping...

17
Progress report Yiming Zhang 02/10/2012

Upload: reynold-arnold

Post on 27-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Progress report

Yiming Zhang02/10/2012

Page 2: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

All AS events in ASIP

• Intron retention• Exon skipping• Alternative Acceptor site NAGNAG AltA• Alternative Donor site GYNGYN AltD• Alternative both sites (AltP)

Page 3: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

NAGNAG alternative splicing

Figure 1. NAGNAG alternative splicing with E and I sites and isoforms.

NAGNAG alternaive splicing can result in one of three possibilities (Figure 1) - constitutive use of the first acceptor (the so-called exonic, or “E” variant), constitutive use of the second acceptor (the so-called intronic, or “I” variant), or use of both acceptors, that is,alternative splicing (the “EI” variant). Sinha et al. 2010

Page 4: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

GYNGYN alternative splicing

Figure 2. GYNGYN alternative splicing with e and i sites and isoforms.

Hilller et al. 2006

Page 5: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

All intronsAll introns

Constitutive

NAGNAG-E

NAGNAG-I

GYNGYN-e

GYNGYN-I

……

Alternative

IntronR

ExonS

AltA

NAGNAG-ei

……

AltD

GYNGYN-ei

……

Multiple AS

……

Unclear

Page 6: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Intron statistics from ASIPAT BD GM LJ MT OS PP PT SB SL VV Total

Cons. EST>=4 all 36040 760 16203 1056 6351 32393 15306 3347 9014 1681 9000 131151NAG-E 1494 30 583 39 195 1299 533 102 317 69 349 5010NAG-I 522 5 212 13 74 400 331 36 96 26 139 1854GYN-e 1283 40 621 34 246 1406 748 135 403 58 382 5356GYN-i 489 14 296 13 89 705 241 71 168 27 208 2321

EST>=10 all 13366 220 5697 406 2509 14409 6606 1129 3366 689 4091 52488NAG-E 574 10 189 7 85 568 220 40 117 24 148 1982NAG-I 199 1 74 3 32 160 107 11 31 12 55 685GYN-e 467 5 232 17 100 637 330 36 163 28 177 2192GYN-i 152 7 94 7 39 303 88 25 46 14 88 863

Alt. EST>=2 IntronR 4197 50 1669 76 327 8233 2004 256 1671 91 806 19380AltA_all 648 10 189 11 49 926 404 38 92 24 97 2488

AltA_NAG 128 4 31 3 6 178 44 3 27 4 30 458AltD_all 305 2 99 5 47 575 453 15 64 16 71 1652

ALtD_GYN 9 0 0 1 1 10 3 0 1 0 2 27AltP 51 0 32 1 0 188 27 2 39 0 16 356

ExonS 476 5 371 11 72 2100 429 79 304 59 238 4144

Table 1. Intron statistics from ASIP. 4 species which have small amount of data are not listed here. All statistics are intron-based instead of event-based which means redundancy has been removed. The most common type of alternative intron type is IntronR, second common type is ExonS. NAGNAG AS occurs much more frequently in AltA than GYNGYN AS occurs in AltD.

Page 7: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Background

NAGNAG alternative splicing which can insert or delete a single amino acid in the protein, is very common and well studied in animals.

• The NAGNAG motif is present in 30% of human genes and is functional in at least 5% of the genes. Hiller et al. 2004

• NAGNAG AS is frame-preserving, the vast majority of cases should lead to different proteins. Studies so far have found evidence of both cases where such proteins have variations in function, as well as those in which there is no noticeable difference. Akerman et al. 2006 Iida et al. 2008

• The GO analyses in some studies shows genes with specific GO term DNA binding to be statistically significant and more than half of all AS-NAGNAG events affected polar amino acid residues.

Iida et al. 2008 Sinha et al. 2010

Page 8: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Background The studies of NAGNAG AS in plant is few right now (Only 3 species:

Arabidopsis, Rice and Physcomitrala).

• One study found 321 and 372 AS-NAGNAG events in Arabidopsis and rice, respectively. Another study found 6% of all introns and 21% of all annotated genes in Arabidopsis harbor a genomic NAGNAG acceptor motif. Iida et al. 2008 Schindler et al. 2008

• In addition, the GO analysis is agreed with previous study in human that the specific GO term DNA binding is statistically significant. Some study indicates that NAGNAG acceptors frequently occur in the Arabidopsis genome and are particularly prevalent in SR and SR-related protein-coding genes. Sinha et al 2010 Schindler et al. 2008

Page 9: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Background The state-of-the-art in silico studies for prediction of NAGNAG splice site

are done by Sinha's group for both human and plant species. They achieved high balanced specificity and sensitivity for both human and plant species.

• The most informative features they found are the nucleotides in the NAGNAG and in its immediate vicinity, along with the splice sites scores.

• The model they trained on human data also can achieve high AUC on plant data shows that NAGNAG splicing in plants is similar to that in animals.

Sinha et al. 2009, 2010

Page 10: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

NAGNAG dataset I tried to predict NAGNAG events (thus to predict EI, I or E isoforms) based

on the dataset I generated from ASIP using Random Forest.

• Strict criteria has been used to identify NAGNAG events from ASIP database: For E and I events, at least 10 ESTs or cDNAs support them, and for EI events at lease 2 EST or cDNA support each isoform.

• After removing redundancy, I got 458 EI form alternative NAGNAG introns, 1988 E form constitutive introns and 685 I form constitutive introns in 15 plant species.

Page 11: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Features

Figure 3. A total of 28 features which each represented a nucleotide, and thus had four possible values (A, C, G, T). U1, U2, U3 are the first three nucleotides in the upstream exon. D1, D2, D3 are the first three nucleotides in the downstream exon. A weak polypyrimidine tract (PPT) can contribute to AS. So P1-P20 are PPT upstream of NAGNAG. Finally, I also use intron length as an additional feature.

Page 12: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Classifier evaluation Random Forest with 200 trees has been used and 5 fold cross validation

has been applied.

TP rate FP rate Precision Recall F-measure ROC area Class

0.992 0.089 0.951 0.992 0.971 0.995 E

0.953 0.023 0.92 0.953 0.936 0.995 I

0.657 0.017 0.87 0.657 0.749 0.967 EI

The evaluation results strongly agree with Sinha’s paper (For Physcomitrella) in which AUC = 0.96, 0.99 and 0.98 for the EI, E and I forms, respectively.

Page 13: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Figure 4. The EI class, or AS, harder to predict (AUC = 0.967) than the two constitutive variants, E and I (AUC = 0.995 for both).

Page 14: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Most informative features

N2 N1 P_-1 P_-2 D1 P_-7 D20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 5. Most informative features according to information gain.

Page 15: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Sequence Logos

Figure 6a.

Figure 6b.

Page 16: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Figure 6c.

Figure 6d.

Figure 6a-6d. Sequence logos of NAGNAG splice sites. 6a: E sites; 6b: I sites; 6c: EI sites; 6d: all splice sites. Position 1-3 is U1-U3. Position 4-24 are P20-P1. Position 30-32 are D1-D3.

Page 17: Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor

Conclusion• NAGNAG-AS can be predicted with high accuracy. Using carefully

constructed training and test datasets, an in silico performance of AUC = 0.967, 0.995 and 0.995 was achieved for the EI, E and I forms, respectively.

• The most informative features are the nucleotides in the NAGNAG and in its immediate vicinity.

• NAGNAG AS in plants is similar to that in animals and is largely dependent on the splice site and its immediate neighborhood.