o verrepresented segment strings (aug/8/2011)
DESCRIPTION
1. O verrepresented Segment Strings (Aug/8/2011). Bob Harris Penn State Center for Comparative Genomics and Bioinformatics. [email protected]. Overview. Analysis of segmentation sequences, incorporating longer local context Update of previous enrichment/depletion plots - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/1.jpg)
11
Overrepresented Segment Strings(Aug/8/2011)
Bob HarrisPenn State
Center for Comparative Genomics and Bioinformatics
![Page 2: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/2.jpg)
2
• Analysis of segmentation sequences, incorporating longer local context
• Update of previous enrichment/depletion plots– For the round8 segmentations
Overview
![Page 3: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/3.jpg)
3
Motivation
> segway.k562.coordinated chr10:812820-872329AOUNDKAGAGXGRXNXNCDUXNYNUNCNCYCYCYNYCYCYCNCYNCYCNXCNCYCNXNYNYNCNCYCNDCYNDYCYCYCNCICICDNXCICIWTMJMTWICYCYNCBDUXRNCURDXNUDUVRGVUAVAGKUVUXGAVARXRDKDVXKXAGAXDXRAXRVKPBPIQBQBQVBQBQLQHQHLQVKQVQVLTLBVUVQVKVLQVQBVLVQVOVQLQLQLQLQLQLHLVUVQLVLQLQLQVLQLQHQLVLQVL
Quick eyeball test usingone-character class-encoding: A=class 0 B=class 1 … 2,13,24 is C,N,Y
![Page 4: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/4.jpg)
4
Redundancy Apparent, but…
• How surprising are the C,N,Y (2,13,24) groups?– Together these classes have only average probability– But 1st and 2nd order probabilities favor continuing in
this group
> segway.k562.coordinated chr10:812820-872329AOUNDKAGAGXGRXNXNCDUXNYNUNCNCYCYCYNYCYCYCNCYNCYCNXCNCYCNXNYNYNCNCYCNDCYNDYCYCYCNCICICDNXCICIWTMJMTWICYCYNCBDUXRNCURDXNUDUVRGVUAVAGKUVUXGAVARXRDKDVXKXAGAXDXRAXRVKPBPIQBQBQVBQBQLQHQHLQVKQVQVLTLBVUVQVKVLQVQBVLVQVOVQLQLQLQLQLQLHLVUVQLVLQLQLQVLQLQHQLVLQVL
![Page 5: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/5.jpg)
5
Overrepresented Strings
• String of 2N segments
• Estimate expected probability with Nth order model– e.g. pr(ABCD) = pr(AB) pr(C|AB) pr(D|BC)
• “Evaluate” strings with high observed:expected ratio– Comparison to “features”. In this case RNAseq contigs
• Caveat(?): length of segments ignored
![Page 6: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/6.jpg)
6
Overrepresented Strings, Example
• Length-4 strings in segway.k562.coordinated– Highest obs/exp ratio, after
eliminating rare observations
string #obs’d #exp’d obs/exp21-10-0-21 3761 970.80 3.87411221-0-10-21 3561 966.65 3.68386513-23-20-13 5227 2386.44 2.19029613-20-23-13 5177 2371.56 2.18295313-23-17-13 3205 1530.04 2.09471113-17-23-13 3156 1535.76 2.05500416-21-11-16 4833 2466.86 1.95917414-23-17-14 3263 1711.13 1.90692816-11-21-16 4629 2443.15 1.89468710-6-0-10 6980 3686.84 1.89322214-17-23-14 3180 1686.41 1.88565810-0-6-10 6846 3632.72 1.88453623-0-6-23 3265 1748.77 1.86702323-6-0-23 3254 1749.80 1.85964423-6-14-23 8780 4821.21 1.82112123-14-6-23 8933 4927.23 1.81298524-13-3-24 5419 3007.67 1.80172723-0-14-23 7142 4023.34 1.77514124-3-13-24 5270 2987.69 1.76390623-6-10-3 3045 1734.93 1.75511524-3-10-3 3192 1832.07 1.7422873-10-6-23 3046 1751.86 1.73872423-14-0-23 7000 4028.87 1.7374613-10-3-24 3126 1809.36 1.727681 …
![Page 7: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/7.jpg)
7
CSHL RNAseq contigs
• CSHL RNAseq contigs– ftp: //genome.crg.es/pub/Encode/data_analysis/
ForDeadZones/Contigs_IDR0.1_CSHL.tar.gz• Differentiated by cell line (14), compartment (6),
RNA fraction (4)• and attributed to 11 biotypes (gencode v7 exons)
– non coding, protein coding, etc.– and a 12th type — empty, or “no exon”
• From Sarah Djebali, Felix Schlesinger, Wei Lin
![Page 8: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/8.jpg)
8
Measuring Enrichment
• Vf,s = enrichment of string s for feature f
{s} = set of bases covered by string s (in either direction){f} = set of bases covering the feature{fs} = intersection of {f} and {s}{F} = union of {f’} for all features f’# = size of set
• I plot log2(Vf,s ), fold enrichment– Or, if negative, fold depletion
€
=#{ fs} #{s}#{ f } #{F}
![Page 9: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/9.jpg)
9
Single-segment Enrichment
segway.k562.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 10: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/10.jpg)
10
Length-4 Strings Enrichment
segway.k562.coordinated vs CSHL RNAseq contigs(highest observed/expected strings)
white = no occurrences
![Page 11: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/11.jpg)
11
Length-4 Strings Enrichment
segway.k562.coordinated vs CSHL RNAseq contigs(highest observed/expected strings)
![Page 12: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/12.jpg)
12
To Do
• Incorporate single-segment enrichment into evaluation of multi-segment strings
• Longer strings
• Run on all 14 round 8 segmentations– And the bake-off composites
![Page 13: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/13.jpg)
13
Aligning Class Sequences
• Work in progress, with these questions…
• Do longer, highly similar sequences indicate similar function?
segway.k562.coordinated chr10:88422790-88427017 CYCNCYNCNYNCNCNCNCNsegway.k562.coordinated chr13:113696011-113701344 CYCNCYNCNYNCNCNCNCN
• Or do small changes indicate functional differences?
segway.k562.coordinated chr10:133868081-133875219 NCNXnXNXNXNCYNCNCNCNXNCNsegway.k562.coordinated chr13:113638232-113645027- NCNXoXNXNXNCYNCNCNCNXNCN
![Page 14: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/14.jpg)
14
Aligning Class Sequences
• Do longer, highly similar sequences indicate similar function?
![Page 15: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/15.jpg)
15
Aligning Class Sequences
• Or do small changes indicate functional differences?
![Page 16: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/16.jpg)
16
Alignments
• Confounded by presence of 2- and 3-segment cycles– Implement separate search for short repeated cycles– Then align with those masked
• Should incorporate segment lengths
• May be better to align in peak space
![Page 17: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/17.jpg)
17
Appendix
• The following slides show single-segment enrichment heatmaps for all 14 round 8 segmentations
![Page 18: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/18.jpg)
18
Single-segment Enrichment
segway.gm12878.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 19: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/19.jpg)
19
Single-segment Enrichment
segway.h1hesc.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 20: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/20.jpg)
20
Single-segment Enrichment
segway.helas3.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 21: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/21.jpg)
21
Single-segment Enrichment
segway.hepg2.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 22: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/22.jpg)
22
Single-segment Enrichment
segway.huvec.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 23: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/23.jpg)
23
Single-segment Enrichment
segway.k562.all vs CSHL RNAseq contigs
white = no occurrences
![Page 24: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/24.jpg)
24
Single-segment Enrichment
segway.k562.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 25: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/25.jpg)
25
Single-segment Enrichment
segway.tier1-2.coordinated vs CSHL RNAseq contigs
white = no occurrences
![Page 26: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/26.jpg)
26
Single-segment Enrichment
chromhmm.GM12878_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences
![Page 27: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/27.jpg)
27
Single-segment Enrichment
chromhmm.H1_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences
![Page 28: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/28.jpg)
28
Single-segment Enrichment
chromhmm.HELA_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences
![Page 29: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/29.jpg)
29
Single-segment Enrichment
chromhmm.HEPG2_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences
![Page 30: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/30.jpg)
30
Single-segment Enrichment
chromhmm.HUVEC_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences
![Page 31: O verrepresented Segment Strings (Aug/8/2011)](https://reader036.vdocuments.mx/reader036/viewer/2022062816/56815232550346895dc0797b/html5/thumbnails/31.jpg)
31
Single-segment Enrichment
chromhmm.K562_concatenate_25 vs CSHL RNAseq contigs
white = no occurrences