presenting results laura biggins [email protected] v1.0 1
TRANSCRIPT
2
I have my results in a table… what next?
Plot everything?
3
ArtefactsArtefacts in the data can be caused by a whole myriad of reasons during any stage from library preparation to the final step of the analysis where the gene lists are produced.
• RNA-seq – transcript length, expression levelRibosomal, cytoskeleton, extracellular, secretedmulti-mapping reads – multi vs singleribosomal, translation
• Bisulphite – CpG density• GC content – low and high GC fragments are underrepresented in
libraries• Location, average copy number • Starting population of cells – remember to include background• Completely random genes….
4
Differential power• RNA-seq – transcript length, expression level• Bisulphite – CpG density
• Non-random distribution– CpG density
5
• Mapping– multi-mapping– genome
• Splice variants– Analysis at transcript vs gene level
6
Copy number variation
7
Categories to be wary of
• ribosomal• cytoskeleton• extracellular• secreted • translation• glycoprotein
8
Beware…
GC < 0.35
9
GC > 0.6
10
All genes on chr 2, 8, 13
11
No of transcripts > 4
Random sets of 1000 genes put through DAVID
Artefacts – checking your gene list
12
• Make sure background is appropriate• Be suspicious of some ontology categories –
Ribosomal, cytoskeleton, extracellular, secreted, translation
http://www.bioinformatics.babraham.ac.uk/shiny/gene_screen/
gene_screen – Shiny app to check for obvious differences in target genes compared to background population
13
What next?
14
Figure examples
15
Figure examples
16
GO graph
Genes are often annotated with many functions
17
Displaying ResultsInterpreting and exploring results• How can the results be displayed so that I can
interpret and explore them most easily?– Understanding the functional terms (incl GO hierarchy)
– Finding relevant information amongst the masses (GOslim, redundant terms, clustering)
Presenting results• How should I present my results?• What information should I include?
18
Interpreting and Exploring Results• How can the results be displayed so that I can
interpret them most easily?• Understanding the functional categories– GOrilla – hierarchical map– Panther - interactive pie charts
• Reducing redundancy– DAVID – clusters of similar functions– REVIGO - semantic similarity– GOslims
19
GOrilla
cbl-gorilla.cs.technion.ac.il/
20
Panther
21
GOrilla
cbl-gorilla.cs.technion.ac.il/
22
Exploring Results• How can the results be displayed so that I can
interpret them most easily?• Understanding the functional categories– Gorilla – hierarchical map– Panther - interactive pie charts
• Reducing redundancy– DAVID – clusters of similar functions– REVIGO - semantic similarity– GOslims
23
GOrilla
cbl-gorilla.cs.technion.ac.il/
24
Exploring results
25
Reducing redundancy
http://revigo.irb.hr/
26
Reducing redundancy
27
Reducing redundancy
Giraph.jar
genelist3.txt
mouse_genes_seqmonk.txt
28
Reducing redundancy
• Use a clustering tool• Use a GOslim – various versions available, may lose
the interesting detail• Select non-redundant terms yourself – be
consistent– P-value filter, top x number of categories, largest
categories, most enriched
What information should be included?
29
30
Figure examples
31
Figure examples
32
Figure examples
33
Summary
• Beware of artefacts – if something looks too good to be true it probably is….
• Remember your background population• Do not try and plot absolutely everything• Choose a method to deal with redundant terms• Think about what you’re plotting and whether
it makes sense• Do not be afraid of including tables
34
Exercise 2
Category Term Count% PValue Genes List TotalPop HitsPop TotalFold EnrichmentBenjamini FDR
GOTERM_BP_FATGO:0006955~immune response 30 29 1.86E-22 CSF2, C3, LY86, H2-D1, OAS3, OAS2, CD74, B2M, LIF, OASL2, OASL1, GBP10, H2-K1, CIITA, ICAM1, H2-Q10, GBP6, GBP5, GBP9, H2-Q6, H2-Q7, PSMB9, SERPINA3G, H2-EB1, IRF8, H2-T22, TGTP1, TGTP2, OAS1A, GBP4, GBP3, GBP281 471 10.68 1.59E-19 2.88E-19
GOTERM_MF_FATGO:0005525~GTP binding 18 17 1.34E-11 GBP6, GM12185, EIF2S3Y, GBP5, GIMAP7, GBP9, IFI47, IGTP, GVIN1, GM4841, GBP10, IIGP1, TGTP1, TGTP2, GBP4, GBP3, GM4951, GBP2, GM407078 354 8.662 2.32E-09 1.64E-08
GOTERM_MF_FATGO:0032561~guanyl ribonucleotide binding 18 17 2.00E-11 GBP6, GM12185, EIF2S3Y, GBP5, GIMAP7, GBP9, IFI47, IGTP, GVIN1, GM4841, GBP10, IIGP1, TGTP1, TGTP2, GBP4, GBP3, GM4951, GBP2, GM407078 363 8.448 1.73E-09 2.44E-08
GOTERM_MF_FATGO:0019001~guanyl nucleotide binding 18 17 2.00E-11 GBP6, GM12185, EIF2S3Y, GBP5, GIMAP7, GBP9, IFI47, IGTP, GVIN1, GM4841, GBP10, IIGP1, TGTP1, TGTP2, GBP4, GBP3, GM4951, GBP2, GM407078 363 8.448 1.73E-09 2.44E-08
GOTERM_BP_FATGO:0019882~antigen processing and presentation 10 9.5 1.90E-09 H2-K1, ICAM1, H2-Q10, H2-EB1, H2-D1, H2-T22, H2-Q6, H2-Q7, CD74, B2M, PSMB981 87 19.28 8.10E-07 2.94E-06
GOTERM_BP_FATGO:0048002~antigen processing and presentation of peptide antigen 7 6.7 4.88E-08 H2-K1, H2-Q10, H2-EB1, H2-D1, H2-Q6, H2-Q7, CD74, B2M81 35 33.55 1.39E-05 7.55E-05
GOTERM_BP_FATGO:0001916~positive regulation of T cell mediated cytotoxicity 4 3.8 1.61E-05 H2-K1, P2RX7, H2-Q6, H2-Q7, B2M81 9 74.56 0.002288 0.024911
GOTERM_BP_FATGO:0006952~defense response 13 12 1.12E-05 CIITA, H2-K1, LYZ2, C3, LY86, H2-D1, IFI47, H2-Q6, H2-Q7, CD74, B2M, P2RX7, CD44, IRF881 448 4.868 0.001916 0.017378
GOTERM_BP_FATGO:0001914~regulation of T cell mediated cytotoxicity 4 3.8 3.13E-05 H2-K1, P2RX7, H2-Q6, H2-Q7, B2M81 11 61 0.003817 0.048513
GOTERM_MF_FATGO:0032555~purine ribonucleotide binding 32 30 5.16E-09 OAS3, HSPA1A, HSPA1B, OAS2, CKB, IGTP, OASL2, GM4841, OASL1, DDX3Y, GBP10, IIGP1, TOP2A, GM4070, CIITA, GM12185, GBP6, EIF2S3Y, MYO6, GBP5, GIMAP7, GBP9, IFI47, PSMB9, MYO10, P2RX7, GVIN1, TGTP1, TGTP2, OAS1A, GBP4, GM4951, GBP3, GBP278 1796 3.035 2.23E-07 6.31E-06
GOTERM_MF_FATGO:0003924~GTPase activity 11 10 3.07E-09 GBP6, IGTP, GBP5, EIF2S3Y, GBP9, GBP10, IIGP1, TGTP1, TGTP2, GBP4, GBP3, GBP278 128 14.64 1.77E-07 3.75E-06
GOTERM_CC_FATGO:0009897~external side of plasma membrane 12 11 3.14E-09 H2-K1, LY6A, LY6C1, ICAM1, P2RX7, IL12RB1, S1PR1, CD44, CD274, H2-D1, H2-Q6, H2-Q7, CD7461 206 11.94 3.46E-07 3.55E-06
35
05
101520253035
Count
0.00E+004.00E+008.00E+001.20E+011.60E+012.00E+01
-log(FDR)
0
20
40
60
80
Fold Enrichment
36
37
Panther plots