regulation of alternative splicing jihye kim oral preliminary exam (may 7, 2007)

47
Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Upload: darleen-lindsey

Post on 14-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Regulation of Alternative Splicing

Jihye Kim

Oral Preliminary Exam (May 7, 2007)

Page 2: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Outline

• Alternative Splicing Overview• Goal : Investigate “regulation” of AS• Method : Association Rule Mining• Part I : Finding association rules of cis-regulatory

elements involved in alternative splicing

• Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing

• Summary• Future Work

Page 3: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splicing

• Introns are removed and flanking exons are concatenated

• Spliceosome

- snRNPs and other proteins

[image from http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.11.spliceosome.jpg]

Page 4: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splice Sites

• Recognized by spliceosome• Splice sites are too weak to predict intron

location accurately

[image from http://web-books.com/MoBio/Free/Ch5A4.htm]

5’ 3’

Page 5: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splicing Factors and Binding Sites

• Assist spliceosome to identify splice sites• Splicing factors

– SR (serine/arginine-rich) proteins

• Exonic and intronic enhancers and silencers (cis-acting)– ESE (A/G rich motifs), ESS (hnRNP), ISE (G triples, UGCAUG), ISS

[Source from Katherina Kechris in Rocky’05 Conference]

Exon Exon 2

Page 6: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Alternative Splicing

• Over 70% in human genome• Major mechanism to generate protein diversity• Highly relevant to disease

– 15% disease-causing mutations affect splicing [Krawczak 1992]

[Krawczak 1992] Krawczak, M., Reiss, J., and Cooper, D.N. 1992 Hum. Genet. 90: 41-54

protein

Pre-mRNA

mRNA

Page 7: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Types of Alternative Splicing

[Source from Cartegni et al. 2002]

Cassette Exon

Page 8: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Investigating Alternative Splicing

• Traditionally, align ESTs and mRNAs to genomic sequences

• Recently, microarray technology

(Splice arrays)– Exon skipping is measured– Hard to measure other types of AS

Page 9: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Previous Work on AS Regulation

• Most methods– use only sequence data– focus on the effect of individual motifs

• Brain-specific exon skipping [Brudno 2001]– 25 brain-specific cassette exons from literature– Over-representation of UGCAUG in downstream intron

• RESCUE-ESE [Fairbrother 2002]– Frequent hexamers in exon by weak splice sites– 10 ESE motifs show enhancer activity in experiment

[Brudno 2001] Brudno M., Gelfand M.S., et al., 2001 NAR 20 (11) 2338-21348[Fairbrother 2002] Fairbrother WG., et al., 2002 Science 9;297(5583):1007-13

Page 10: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

What We Have Done So Far

• Investigate cis-regulatory motifs that influence amount of AS or tissue-specific AS[Jihye Kim, Sihui Zhao, Steffen Heber, “Finding association rules of cis-regulatory elements involved in alternative splicing”, Proceedings of the 45th annual southeast regional conference (ACM-SE) pp. 232 – 237]

[Jihye Kim, Sihui Zhao, Steffen Heber, “Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing”,7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted)

– Use mouse splice array data– Apply Association Rule Mining– Investigate motif combination involved in tissue-

specific AS

Page 11: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS Datasets in Mouse

• Dataset– Splice Array [Pan 2004]

with 6 probes– 3126 exon skipping

genes in mouse

– %ASex : percentage of exon skipping in 10 tissues

[Pan 2004] Pan, Q., et al., 2004 Mol Cell 16(6):929-942

Aim I-I : representing data context

Page 12: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Association Rule Mining• By Agrawal et al. in 1993• Initially used for Market Basket Analysis

• An association rule is a pattern that states when X occurs, Y occurs with certain probability

• X : antecedent (left-hand-side, lhs), Y : consequent (right-hand-side, rhs)

• Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf)

X Y

Page 13: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Rule Strength Measures

• Given a rule,

– Support = Pr(X∧Y)

– Confidence = Pr(Y | X)

– Lift = Pr(X∧Y)/ Pr(X)Pr(Y)• Dependency of lhs and rhs• Generally, lhs and rhs have positive dependency

if lift >1.0

X Y

Page 14: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 15: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemset = itemset whose support > 0.5

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 16: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 17: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Bread(2/5 < 0.5)

Page 18: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8)Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Page 19: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Page 20: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Beer => Jam (2/4 < 0.7)

Page 21: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Beer => Diaper (0.75)

Page 22: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Apriori Algorithm

• Most popular algorithm

• Two steps:– Find all itemsets that satisify min_supp.

(frequent itemsets)• any subset of a frequent itemset is also frequent• Find all 1-item frequent itemsets; then all 2-item

frequent itemsets, and so on.

– Generate Rules• A B is an association rule if

Confidence(A B) ≥ min_conf

Page 23: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Part I : Finding association rules of cis-regulatory elements involved in alternative splicing[Proceedings of the 45th annual southeast regional conference (ACM-SE) Winston-Salem, North Carolina pp. 232 – 237]

Page 24: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

K-mers Around Cassette Exon (items)

• Pre-mRNA sequences– Transcripts from NCBI– BLAT to align transcripts

to mouse genome– 200 bps from 7 regions

around cassette exon– 2565 genes in total

• Items (6mers) :AAAAAA to TTTTTT in region 1 … 7

Aim I-I : representing data context

Page 25: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM in Finding AS Motif Rule

• Items : all possible hexamers (motifs)• Transactions : 2565 AS genes• Goal : finding motif association rules in AS

genes. (e.g., AGGATA TTAGCT)• By Apriori algorithm [Agrawal 1993]

Find All Frequent Hexamers

Generate Hexamer Rules

[Agrawal 1993] Agrawal R., Imielinski T., Swami AN., 1993 SIGMOD 22(2):207-216

Page 26: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

Page 27: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3-mer sets (support)AGG (0.8),

Page 28: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

Page 29: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

- Rules (confidence)AGG GATconf = 2 / 4 = 0.5 < minconf

Page 30: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

- Rules (confidence)AGG TAG (0.75)TAG AGG (1.0)

Page 31: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Association Rules from AS Genes

1 2 3 4 5 6 7

- 7_TGAAGA, 7_GAAGAA (ASF/SF2, SRp55)

- 6_TTTTCT, 6_AATAAA, …

- Among 6,000 6-mers, 1/3 are in AEDB

- Candidates of regulatory motifs

Association Rules

Minconf = 0.4

Frequent 6-mers

Minsup = 0.05 (129 genes)

- 7_AAAAAT 7_TGAAGA, 7_AAAGGA 7_AGAAGA,

- 7_GAAAAA 7_AAGAAG, 7_CTGCCT 7_CTGGAG,

- 7_AGGAAA 7_AAGAAG, 7_AATAAA 7_AAGAAG

- Candidates of regulatory combinations for AS

Aim I-II : finding motif association rules for all AS genes

Page 32: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Clustering by AS Pattern in 10 Tissues

• Hypothesize : Motif combinations “cause” AS profile• Cluster genes based on AS profile. We use

– Euclidean distance / Correlation – Average linkage clustering

• Frequent 6-mers in cluster are motif candidates

Aim I-III : finding motif association rules for cluster

Page 33: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Association Rules from Clusters

1 2 3 4 5 6 7

• Lift (XY) > 2.0• Comparison with outside the

cluster (p-value < 2.13e-10)• Association rules are

candidates of motif combinations for the corresponding AS pattern

Correlation based clusters

Aim I-III : finding motif association rules for cluster

Page 34: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing[7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted)]

Page 35: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Finding Motifs Involved in Tissue-Specific AS

• Items : – hexamers in gene regions and– exon skipping rate in tissues

• Transactions :– 2565 genes from Pan’s data set

• Goal : find associations AGGATA in cassette exon High exon skipping in Brain

• We focus on complex rules, e.g.{AGGATA in cassette exon, CCTGCG in downstream intron} High exon skipping in Brain

Aim II-I : finding motif association rules for tissue-specific AS

Page 36: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS profile items

• Use quartile to convert numeric %ASexes to character AS profile items– BrainLow :The first %ASex

quartile in Brain– BrainHigh : The last %ASex

quartile in BrainBrainLow BrainHigh

Page 37: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Combination ARM Example

[Sequence]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

[AS profile]

BH, HH

BH, HL

BH, HH

BL, HH

BH, HL

BH : BrianHighBL : BrainLowHH : HeartHighHL : HeartLow

+

Page 38: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Combination ARM Example

Page 39: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Tissue-Specific AS Motif Combinations

• With strict thresholds– Min_supp = 0.01, Min_conf = 0.5, Min_lift = 1.2– MinLen of lhs = 2 (for complex rule)

• Rule appearance– lhs : hexamers, rhs : AS profile items

• 197 association rules are found in total• 27 complex rules are found

– lhs : combinations of 34 frequent hexamersrhs : AS profile items in tissues

– All rules have >1.9 lift – 23 rules show motif combinations in different regions

Aim II-I : finding motif association rules for tissue-specific AS

Page 40: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Antecedent Consequent Support Confidence Lift

{X4_GCTGGA, X4_TGCTGG} {IntestineLow} 0.016 0.519 2.006

{X4_GCTGGA, X4_TGCTGG} {LungLow} 0.016 0.506 1.961

{X4_TGCTGG, X4_CTGGAG} {IntestineLow} 0.011 0.539 2.083

{X4_TGCTGG, X4_CTGGAG} {LungLow} 0.010 0.5 1.937

{X5_TTTTTA, X7_AGAGGA} {HeartHigh} 0.010 0.510 2.043

{X1_AGCAGC, X5_TTTTTA} {MuscleHigh} 0.010 0.54 2.220

{X1_GAGCAG, X3_TTTTAA} {MuscleHigh} 0.010 0.510 2.096

{X1_GAGCAG, X3_TTCTTT} {LiverHigh} 0.013 0.508 2.048

{X4_AGAAGA, X5_TTATTT} {SalivaryLow} 0.011 0.528 2.066

{X4_AGAAGA, X5_TTATTT} {HeartLow} 0.011 0.528 2.075

{X4_AGAAGA, X5_TTATTT} {KidneyLow} 0.011 0.528 2.023

{X4_AGAAGA, X5_TTATTT} {LiverLow} 0.011 0.528 2.041

{X3_ATTTTT, X6_TTCCTG} {SalivaryHigh} 0.011 0.509 2.031

{X3_TTGTTT, X6_TGTCTC} {LiverHigh} 0.011 0.5 2.017

{X2_GCCTGG, X3_CCTCTG} {LiverLow} 0.011 0.542 2.092

{X2_GTGGGG, X5_TTGTTT} {MuscleHigh} 0.013 0.516 2.120

{X5_ATTTTA, X6_TGCTGT} {SalivaryHigh} 0.010 0.510 2.034

{X5_TCTTTT, X6_TTGTCT} {SalivaryHigh} 0.010 0.634 2.530

{X3_TCTGTT, X6_TTGTCT} {HeartHigh} 0.012 0.527 2.110

{X5_TTTTTA, X6_TTGTCT} {HeartHigh} 0.014 0.507 2.032

{X3_CTCTTT, X5_TTAAAA} {KidneyHigh} 0.010 0.5 2.042

{X2_GGGTGG, X5_TTATTT} {SalivaryHigh} 0.011 0.510 2.032

{X5_TCTTTT, X6_TTTTCA} {IntestineHigh} 0.011 0.5 2.007

{X3_TTTATT, X6_TTTCCT} {IntestineHigh} 0.014 0.522 2.094

{X5_TCTTTT, X5_TTATTT, X5_TTTTTA} {HeartHigh} 0.010 0.5 2.004

{X5_TTCTTT, X5_TATTTT, X5_TTTTCT} {SalivaryHigh} 0.011 0.527 2.104

{X3_TATTTT, X3_ATTTTT, X5_TTGTTT} {BrainHigh} 0.011 0.510 2.084

1 2 34 5 6 7

Aim II-I : finding motif association rules for tissue-specific AS

{5_TTTTTA, 7_AGAGGA} => {HeartHigh}

Page 41: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS Profile of Motif Combinations

Aim II- II : analyzing motif combination

1 2 3 4 5 6 7

Page 42: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Summary of Graphs

• In some cases, genes with one motif do not show any different AS profile from all AS genes

• However, often, genes containing all multiple motifs show significantly changed exon skipping levels

• Combination of cis-regulatory motifs can influence AS profile in tissues

Page 43: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

• AEDB in EBI– Transcript regulatory sequences from literature– 292 enhancers and silencers

• >60% extracted frequent hexamers are part of AEDB motifs

• >97% of hexamers involved in complex rules are part of AEDB motifs

Comparison with AEDB

Page 44: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Summary

• Association rule mining (ARM) applied

• Finding motif association rules for AS

• Finding motif association rules for AS clusters

• Finding motif combinations for tissue-specific AS

Page 45: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Future Work

Improve method• Improve motif representation, e.g.

– variable motif length, gapped k-mers– results from motif finding tools

• Improve AS profile representation• Add more features, e.g.

– position and distance between motifs– splice site– exon / intron length– conservation, gene information

• Statistical analysis– Thresholds– Multiple testing

Page 46: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Future Work

• Systematic analysis of simple & complex motifs • Other data sources

– Human splice array [Johnson 2003]– ESTs

• Investigate discovered motifs– Apply motif discovery tools– Analyze genome occurrence– Analyze gene and protein structure

• Build predictive model and apply it (If I have enough time )

• Experimental verification[Johnson 2003] Science. 2003 Dec 19;302(5653):2141-4

Page 47: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Acknowledgements

• Dr. Steffen Heber

• Dr. Eric A. Stone

• Dr. Zhao-Bang Zeng

• Dr. Barbara Sherry

• Sihui Zhao

• Li Zhang

• Hyunmin Kim

THANK YOU