comparison of tools to detect differential expression in...
TRANSCRIPT
![Page 1: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/1.jpg)
Comparison of tools to detect differential expression in
RNA-seq studies
Fatemeh Seyednasrollah, Asta Laiho, Laura L. Elo
University of Turku
Turku Center for Biotechnology
Computational Biomedicine Group
1
![Page 2: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/2.jpg)
Contents Biological Background
Computational Background
Experimental design
Results
Conclusions
2
For more information please refer to:
'Comparison of software packages for detecting differential expression in RNA-seq studies‘ Fatemeh Seyednasrollah, Asta Laiho, Laura L. Elo “Briefings in Bioinformatics”
![Page 3: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/3.jpg)
Genome • Genetic material is encoded by DNA (RNA in some
viruses) molecules and determines the instruction of
functional elements of living organisms.
• Coding elements: A, T, C, G (4n)
3 http://ndla.no/nb/node/127042?fag=52234
![Page 4: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/4.jpg)
RNA: Phenotype indicator
• Complementary rule: A binds T and C binds G
4 http://hyperphysics.phy-astr.gsu.edu/hbase/organic/transcription.html
![Page 5: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/5.jpg)
Gene Expression: From DNA to
Proteins
5 1999 Addison Wesley Longman Inc.
![Page 6: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/6.jpg)
Sequencing
• Process of determining the precise order of
nucleotides within a DNA strand.
• Next Generation Sequencing(NGS): massively
parallel
6
![Page 7: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/7.jpg)
RNA-sequencing
RNA sequencing:
The most recent powerful technology in
“Gene Expression Profiling” studies or
“Transcriptomics”
Overall Mechanism:
Detection and quantification of any genomic
features of interest in terms of “read counts”
7
![Page 8: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/8.jpg)
RNA-seq in a Nutshell
Millions of short
reads from
transcripts are
produced,
sequenced and
then
reassemble
and/or
mapped to the
reference of
origination using
mapping
algorithms.
8
![Page 9: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/9.jpg)
Computational Background
There is a linear relation between the read counts
and expression level of biological feature of
interest.
Counts are positive integers (discrete distribution).
Library size: total number of reads for a specific
sample and is determined with sequencing depth.
9
![Page 10: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/10.jpg)
Analysis Pipeline Quality control(fastq files)
fastQC
Alignment
TopHat2
Expression level quantification
HTSeq
Normalization
Package default/TMM
Statistical analysis
Eight state-of-the-art
methods
10
Millions of short reads
Quality Control
Mapping /reads reassembling
Summarization: Table of counts
Normalization
DE testing
Flowchart is based on: “From RNA-seq reads to differential expression results”, Alicia Oshlack et al, 2010, Genome Biology
![Page 11: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/11.jpg)
Table of Counts?
11
Matrix of data with genomic features as rows and experiment samples as column Is the difference across the conditions greater than what we expect from normal biological variation? Can we detect reliable differentially expressed biomarkers?
![Page 12: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/12.jpg)
Differential Expression Analysis
Normalization
Technical biases: different platforms
Length biases
RNA composition biases
Statistical modeling
Parametric or non-parametric
Testing the differential expression
Investigating the significance of differentiation
across different conditions
12
![Page 13: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/13.jpg)
Data Complexity Accurate estimation of variance
Data sampling: Individual level
RNA extraction level
RNA sequencing level
Few number of samples
Biological samples leads to “overdispersion”
13
Poisson
Negative Binomial
Anders and Huber Genome Biology 2010 11:R106 doi:10.1186/gb-2010-11-10-r106
![Page 14: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/14.jpg)
Variance and Statistical Testing
Null hypothesis:
The expression of gene “i” is equal across different
conditions.
If the null hypothesis is rejected, is it because of
natural biological variation (estimated variance) or is it
because of the experimental condition difference?
Main challenge:
Accurate estimation of different types of variance
14
![Page 15: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/15.jpg)
Methods Algorithms
15
![Page 16: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/16.jpg)
Research Question
Various statistical methods are available
Yet, neither an optimized method for all datasets,
nor a clear instruction for choosing the best
methodology has been reported.
To assist in making a biologically
and statistically meaningful
decision, we present a systematic
practical pipeline comparison
of eight state-of-the-art
computational methods.
16
![Page 17: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/17.jpg)
Data sets
Datasets: Generate by Illumina platform Publicly available to make the analysis
reproducible
Different level of heterogeneity
Different organisms
large number of samples
17
28 Female
28 Male
Human
10 C57BL/6J strain
10 DBA/2J strain
Mouse
Human data set: lymphoblastoid cell lines of 56 unrelated Nigerian individuals
![Page 18: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/18.jpg)
Performance Criteria
Estimation Criteria:
Number of detections and their consistency
False positive discoveries
Correlation between methods
Run times
FDR < 0.05
18
![Page 19: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/19.jpg)
Experimental Design
19
Select initial N samples from each distinct groups randomly
Run the statistical analysis
Add x more samples as the input until all
Repeat the task for ten times
For the false discoveries, we did the same but selection procedure happened within the same group like sampling within female samples
![Page 20: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/20.jpg)
Experimental Design
20
D2 strain
Randomly 2 samples
3 samples
6 samples
8 samples
10 samples
B6 strain
Randomly 2 samples
3 samples
6 samples
8 samples
10 samples
Add more
Mouse data set experimental design -> try to simulate wide range of possible situations
![Page 21: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/21.jpg)
Data sets Intrinsic Properties Mouse Data set : more homogenous
Human Data set : heterogeneous plus outliers
(male)
21
Sp
earm
an C
orr
elat
ion
![Page 22: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/22.jpg)
Results : Number of Detections
22
Number of detections increase as the sample sizes increase, exception: NOIseq and Cuffdiff2.0.0 (poor detection)
At some points the curve starts to be stabilized especially in human data set Moderate methods : DESeq (more conservative) and Limma Data dependent methods : EBSeq and baySeq
![Page 23: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/23.jpg)
Results: Methods consistency
23
Limma and DESeq among the top methods
![Page 24: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/24.jpg)
Results : Relative False Discoveries
24
Number of False discoveries decreases by increasing sample sizes , especially in less heterogeneous data set (mouse)
In general, Limma, DESeq and baySeq performs well in mock comparisons EBSeq, SAMseq and edgeR find more number of false positives
![Page 25: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/25.jpg)
Results : Methods Overlaps
(mouse)
25
Do you need running combinatorial analysis? Results only consider on significantly differential expressed genes (FDR <0.05)
![Page 26: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/26.jpg)
Results : Methods Ranking Correlations
26
Results consider
on 1952 genes
which were
among the top
1000 ranked
genes within all
methods for
Mouse data set
and
corresponding
spearman rank
correlations
![Page 27: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/27.jpg)
Results : Run Time Run Time : The most efficient method has the least
run time
27
![Page 28: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/28.jpg)
Conclusions
Which methods you can rely on?
Limma and DESeq represent higher performance and level of consistency under different conditions
Do you have small number of biological replicates? We do not recommend non-parametric approaches like SAMseq
Try combinational analysis to verify the results
Do you have more than five replicates?
Avoid using NOIseq and Cuffdiff
Do you use edgeR in your analyses?
be aware of possibility of inconsistency between results and high potential risk of false discoveries
Do you have heterogeneous data set?
baySeq can be a powerful method
And … more important of all: Investigate thoroughly input data set properties in advance next you can choose the statistical method
28
![Page 29: Comparison of tools to detect differential expression in ...napsu.karmitsa.fi/courses/seminar/Fatemehsadat.pdf · Expression level quantification HTSeq Normalization Package default/TMM](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f02515b7e708231d403aca7/html5/thumbnails/29.jpg)
29