7e école de bioinformatique ifb-aviesan gene set analysis ... - fgsa... · fgsa: identify sets of...

23
1 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018 or using gene lists to better understand underlying biology Natalia Pietrosemoli Rachel Legendre Introduction to Functional Gene Set Analysis 28/11/2018 7e école de bioinformatique IFB-AVIESAN

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

1 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

or using gene lists to better understand underlying biology

Natalia PietrosemoliRachel Legendre

Introduction to Functional Gene Set Analysis

28/11/2018

7e école de bioinformatique IFB-AVIESAN

Page 2: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

2 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

• High-throughput experiments per se do not produce biological findings

• Genes do not work alone, but in an intricate network of interactions

• Better interpretation of the data in the context of biological processes, pathways and networks

• Global perspective on the data and posed problem

Analysing gene-sets vs analysing single genes

Why

What

How

What, why, how?

biomarkers

gene signatures

Page 3: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

3 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

How do we interpret a long list of seemingly unrelated Differentially Expressed Genes (DEGs)?

How do we gain insights on the biological mechanisms, phenotypic differences that are not inferable directly from the list of DEGs?

What happens to genes with small but coordinated expression changes?

✔ Complexity reduction

✔ Increases explanatory/interpretative power

✔ Allows comparison among different datasets/methods

✔ Allows comparison of different OMIC data  

Analysing Gene Expression Profiles (GEPs) in gene-set space vs in gene space:

What do we obtain from FGSA?

Page 4: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

4 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Gene-sets: Any a priori classification of genes into: « biologically relevant » groups

No need for gene sets to be exhaustive or disjoint!

Functional analysis, pathway analysis, gene set enrichment analysis (GSEA), knowledge-driven pathway analysis, … :

FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to the background set of genes

members of same biochemical

pathway

FUNCTIONtargets of the

same regulatory elements

REGULATIONproteins expressed in

the same cellular compartment

LOCALIZATIONproteins co-expressed

under certain conditions

PHENOTYPEuser defined

relevant classification

WHATEVER

Functional Gene Set Analysis

Page 5: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

5 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Gene annotations: speaking the same language

Gene Ontology (GO)Controlled vocabulary (fixed terms) for annotating genes (i.e. describing gene function) according to:

Molecular Function: molecular-level activities performed by gene products

Cellular Component: locations relative to cell compartments & structures

Biological Process: larger processes accomplished by multiple molecular activities

GO in a nutshell:

• Terms are organism independent• Annotation can be manually or electronically assigned• Curation of annotation is largely manual*

general

specific

Page 6: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

6 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

• Predicting unknown functions (guilt by association, location in network)• Drafting « maps » and adding detail to pathways/interaction networks

DNA RNA

Gene level Protein level

1 protein =n functions=n networks!

PPIs are useful for understanding functional relationships between proteins and the biology of the cell

Protein – Protein Interactions (PPIs)

Page 7: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

7 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

EGFR-centered pathway

EGFR-centered network

Adapted from: Nature Methods. Pathway and network analysis of cancer genomes (2015)

Protein pathways vs Protein networks

Page 8: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

8 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Input

Gene ann. Database

Annotations

Dataset

Gene Set Database

PathwayDatabase

1

2

Functional Gene Set Analysis

Assess gene set significance

Method Output

General framework for FGSA

ad hoc annotations

Page 9: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

9 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Hallmark gene sets

By chromosome

Canonical pathways

Motifs: MIR targets, TF targets, etc

Cancer-related sigs

Oncogenics sigs

Immunogenic sigs

Gene ontologies

FGSA: Annotations

KEGG pathways

Biocarta pathways

Reactome pathways

Geneontology

… *G

ener

ic

Org

anis

m

spec

ific

Page 10: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

10 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

InputDataset1

Functional Gene Set Analysis

Assess gene set significance

Method Output

General framework for FGSA

Gene ann. Database

Annotations

Gene Set Database

PathwayDatabase

2

ad hoc annotations

Page 11: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

11 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Cond A Cond B

FDR < 0.05

FDR < 0.05B

iolo

gica

l m

eani

ng?

Gene Set 1

Gene Set 2

Gene Set N

Gene set enriched in Cond A

Gene set enriched In Cond B

If a GS is relevant to a given phenotype, a good proportion of genes will show some amount of differential expression between the conditions

FGSA: Principle

Page 12: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

12 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Adapted from: PLoS Comput Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges (2012)

Gene set DB

Input

Functional Gene Set Analysis

Assess gene set significance

Differential expression analysis

Differentially expressed genes (DEGs)

Number of DEGs in each gene set

Over-Representation Analysis (ORA)

1st g

en Gene-level statistics

Functional Class Scoring (FCS) Gene-set statistics

2nd

gen

DEGs or gene-level statistics

Pathway Topology / structure (PT)

Pathway topology• Number of reactions• Position of gene• Type of reaction

Pathway impact factor3r

d ge

n

High through-put data

- Independence assumption (often)

- Reduced info available on topology/structure

- Threshold dependant - Independence assumption

FGSA: Overall method classification

Page 13: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

13 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

1st gen: Over-Representation Analysis (ORA) list of DE genes

2st gen: Functional Class Scoring (FCS) gene-level stats

3rd gen: Pathway Topology / structure (PTS)

R packages: GOANA, goseq Stand alone: Ingenuity Pathway Analysis, GSEAWeb tools: g:Profiler, Reactome, DAVID

R packages: SPIA, CePaWeb tools: David

R packages: GAGE, SeqGSEA, romer (m)roast, cameraStand alone: Ingenuity Pathway Analysis, GSEAWeb tools: g:Profiler, Reactome

TOOLS

Why R packages?- We can usually couple the chosen methods to chosen annotations- Methods & annotation packages are (usually!) well maintained and documented- To keep track of all parameters used: reproducible research!!!

FGSA: Methods and tools

list of DE genes or gene level stats + PT: num. of reactions, position of gene, type of reaction

Page 14: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

14 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Method: ReViGO

Input: - GO categories, or - GO categories + p. value, or - GO categories + other quant.

Data: List of Gene Ontology terms (e.g. Goseq results)

Question: Summarization and visualization of Gene Ontology terms

Category Num. of termsBiological Process 40 over / 29 under

Molecular Function 8 over / 2 under

Cellular Compartment 18 over / 26 under

Result visualization & summarization

Page 15: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

15 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018Main terms Daughter terms

output: Table

Result visualization & summarization

Page 16: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

16 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

output: Interactive graph

bubble color: user-provided p-valuebubble size: frequency of GO term

Result visualization & summarization

Page 17: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

17 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Summary of the 40 over-represented biological process terms by « metacategory »

output: Tree map

Result visualization & summarization

Page 18: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

18 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

KEGG Pathway visualization

Method: Pathview

Input: - Gene expression profiles of the organism and the differential expression values for condition A vs condition B - KEGG pathway IDs you want to visualize

Page 19: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

19 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Pathway visualization

Output Pathview

Page 20: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

20 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

https://www.patricbrc.org/

80+ species (mammals, plants, fungi, insects, from Ensembl and Ensembl Genomes)

FREE

25 species

Some stand alone FGSA tools

Page 21: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

21 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Identify the type of input you have list of (DE) genes, stats on all genes, pathway topology

Identify the annotation resources available for your organism GO, KEGG, MSigDB, …

Remember Occam’s Razor principle! Choose a method you understand, with results you can interpret from the biological point of view

Always, always keep in mind your knowledge of the system

FGSA Good practices

Page 22: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

22 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

• Your organism needs to be annotated!

• Not every gene belongs to a pathway

• Resulting enriched pathways -> statistical probability rather than a biological certainty

• Context -> crucial in pathway analysis

• Findings should to be validated experimentally

Caveats and limitations

Page 23: 7e école de bioinformatique IFB-AVIESAN Gene Set Analysis ... - FGSA... · FGSA: Identify sets of genes that are significantly overrepresented in a list of genes with respect to

23 | Natalia Pietrosemoli | Introduction to FGSA | 13-21/09/2018

Summary• You need to have the organism (good) annotation

• Selection of the method has a large impact on the results

• Proper visualization is crucial to the result interpretation

• Weird pathways appearing? Keep in mind the gene set overlapping