7e école de bioinformatique ifb-aviesan gene set analysis ... - fgsa... · fgsa: identify sets of...
TRANSCRIPT
1 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
or using gene lists to better understand underlying biology
Natalia PietrosemoliRachel Legendre
Introduction to Functional Gene Set Analysis
28/11/2018
7e école de bioinformatique IFB-AVIESAN
2 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
• High-throughput experiments per se do not produce biological findings
• Genes do not work alone, but in an intricate network of interactions
• Better interpretation of the data in the context of biological processes, pathways and networks
• Global perspective on the data and posed problem
Analysing gene-sets vs analysing single genes
Why
What
How
What, why, how?
biomarkers
gene signatures
3 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
How do we interpret a long list of seemingly unrelated Differentially Expressed Genes (DEGs)?
How do we gain insights on the biological mechanisms, phenotypic differences that are not inferable directly from the list of DEGs?
What happens to genes with small but coordinated expression changes?
✔ Complexity reduction
✔ Increases explanatory/interpretative power
✔ Allows comparison among different datasets/methods
✔ Allows comparison of different OMIC data
Analysing Gene Expression Profiles (GEPs) in gene-set space vs in gene space:
What do we obtain from FGSA?
4 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Gene-sets: Any a priori classification of genes into: « biologically relevant » groups
No need for gene sets to be exhaustive or disjoint!
Functional analysis, pathway analysis, gene set enrichment analysis (GSEA), knowledge-driven pathway analysis, … :
FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to the background set of genes
members of same biochemical
pathway
FUNCTIONtargets of the
same regulatory elements
REGULATIONproteins expressed in
the same cellular compartment
LOCALIZATIONproteins co-expressed
under certain conditions
PHENOTYPEuser defined
relevant classification
WHATEVER
Functional Gene Set Analysis
5 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Gene annotations: speaking the same language
Gene Ontology (GO)Controlled vocabulary (fixed terms) for annotating genes (i.e. describing gene function) according to:
Molecular Function: molecular-level activities performed by gene products
Cellular Component: locations relative to cell compartments & structures
Biological Process: larger processes accomplished by multiple molecular activities
GO in a nutshell:
• Terms are organism independent• Annotation can be manually or electronically assigned• Curation of annotation is largely manual*
general
specific
6 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
• Predicting unknown functions (guilt by association, location in network)• Drafting « maps » and adding detail to pathways/interaction networks
DNA RNA
Gene level Protein level
1 protein =n functions=n networks!
PPIs are useful for understanding functional relationships between proteins and the biology of the cell
Protein – Protein Interactions (PPIs)
7 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
EGFR-centered pathway
EGFR-centered network
Adapted from: Nature Methods. Pathway and network analysis of cancer genomes (2015)
Protein pathways vs Protein networks
8 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Input
Gene ann. Database
Annotations
Dataset
Gene Set Database
PathwayDatabase
1
2
Functional Gene Set Analysis
Assess gene set significance
Method Output
General framework for FGSA
ad hoc annotations
9 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Hallmark gene sets
By chromosome
Canonical pathways
Motifs: MIR targets, TF targets, etc
Cancer-related sigs
Oncogenics sigs
Immunogenic sigs
Gene ontologies
FGSA: Annotations
KEGG pathways
Biocarta pathways
Reactome pathways
Geneontology
… *G
ener
ic
Org
anis
m
spec
ific
10 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
InputDataset1
Functional Gene Set Analysis
Assess gene set significance
Method Output
General framework for FGSA
Gene ann. Database
Annotations
Gene Set Database
PathwayDatabase
2
ad hoc annotations
11 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Cond A Cond B
FDR < 0.05
FDR < 0.05B
iolo
gica
l m
eani
ng?
Gene Set 1
Gene Set 2
Gene Set N
Gene set enriched in Cond A
Gene set enriched In Cond B
If a GS is relevant to a given phenotype, a good proportion of genes will show some amount of differential expression between the conditions
FGSA: Principle
12 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Adapted from: PLoS Comput Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges (2012)
Gene set DB
Input
Functional Gene Set Analysis
Assess gene set significance
Differential expression analysis
Differentially expressed genes (DEGs)
Number of DEGs in each gene set
Over-Representation Analysis (ORA)
1st g
en Gene-level statistics
Functional Class Scoring (FCS) Gene-set statistics
2nd
gen
DEGs or gene-level statistics
Pathway Topology / structure (PT)
Pathway topology• Number of reactions• Position of gene• Type of reaction
Pathway impact factor3r
d ge
n
High through-put data
- Independence assumption (often)
- Reduced info available on topology/structure
- Threshold dependant - Independence assumption
FGSA: Overall method classification
13 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
1st gen: Over-Representation Analysis (ORA) list of DE genes
2st gen: Functional Class Scoring (FCS) gene-level stats
3rd gen: Pathway Topology / structure (PTS)
R packages: GOANA, goseq Stand alone: Ingenuity Pathway Analysis, GSEAWeb tools: g:Profiler, Reactome, DAVID
R packages: SPIA, CePaWeb tools: David
R packages: GAGE, SeqGSEA, romer (m)roast, cameraStand alone: Ingenuity Pathway Analysis, GSEAWeb tools: g:Profiler, Reactome
TOOLS
Why R packages?- We can usually couple the chosen methods to chosen annotations- Methods & annotation packages are (usually!) well maintained and documented- To keep track of all parameters used: reproducible research!!!
FGSA: Methods and tools
list of DE genes or gene level stats + PT: num. of reactions, position of gene, type of reaction
14 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Method: ReViGO
Input: - GO categories, or - GO categories + p. value, or - GO categories + other quant.
Data: List of Gene Ontology terms (e.g. Goseq results)
Question: Summarization and visualization of Gene Ontology terms
Category Num. of termsBiological Process 40 over / 29 under
Molecular Function 8 over / 2 under
Cellular Compartment 18 over / 26 under
Result visualization & summarization
15 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018Main terms Daughter terms
output: Table
Result visualization & summarization
16 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
output: Interactive graph
bubble color: user-provided p-valuebubble size: frequency of GO term
Result visualization & summarization
17 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Summary of the 40 over-represented biological process terms by « metacategory »
output: Tree map
Result visualization & summarization
18 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
KEGG Pathway visualization
Method: Pathview
Input: - Gene expression profiles of the organism and the differential expression values for condition A vs condition B - KEGG pathway IDs you want to visualize
19 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Pathway visualization
Output Pathview
20 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
https://www.patricbrc.org/
80+ species (mammals, plants, fungi, insects, from Ensembl and Ensembl Genomes)
FREE
25 species
Some stand alone FGSA tools
21 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Identify the type of input you have list of (DE) genes, stats on all genes, pathway topology
Identify the annotation resources available for your organism GO, KEGG, MSigDB, …
Remember Occam’s Razor principle! Choose a method you understand, with results you can interpret from the biological point of view
Always, always keep in mind your knowledge of the system
FGSA Good practices
22 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
• Your organism needs to be annotated!
• Not every gene belongs to a pathway
• Resulting enriched pathways -> statistical probability rather than a biological certainty
• Context -> crucial in pathway analysis
• Findings should to be validated experimentally
Caveats and limitations
23 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018
Summary• You need to have the organism (good) annotation
• Selection of the method has a large impact on the results
• Proper visualization is crucial to the result interpretation
• Weird pathways appearing? Keep in mind the gene set overlapping