computational methods for analysis of single cell rna-seq data ion măndoiu computer science &...
TRANSCRIPT
![Page 1: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/1.jpg)
Computational Methods for Analysis of Single Cell RNA-Seq Data
Ion MăndoiuComputer Science & Engineering Department
University of [email protected]
![Page 2: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/2.jpg)
Outline
• Intro to RNA-Seq– Next-generation sequencing technologies– RNA-Seq applications– Analysis challenges for single cell data
• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering, and
differential expression– Tertiary analysis: functional annotation
• Conclusions
![Page 3: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/3.jpg)
2nd Gen. Sequencing: Illumina
![Page 4: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/4.jpg)
2nd Gen. Sequencing: Illumina
![Page 5: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/5.jpg)
• ION Torrent uses a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way• Each well holds a different DNA template generated by emulsion PCR. Beneath the wells is an ion-sensitive layer and beneath that a proprietary ION sensor• The sequencer sequentially floods the chip with one nucleotide after another; in each cycle the voltage change recorded at a well is proportional to the number of incorporated bases
2nd Gen. Sequencing: ION Torrent
![Page 6: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/6.jpg)
6
3rd Gen. Sequencing: PacBio SMRT
![Page 7: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/7.jpg)
3rd Gen. Sequencing: PacBio SMRT
![Page 8: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/8.jpg)
3rd Gen. Sequencing: Oxford Nanopore
http://www.technologyreview.com/article/427677/nanopore-sequencing/
![Page 9: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/9.jpg)
Standard (Bulk) RNA-Seq
Reverse transcribe into cDNA & shatter into fragments
Sequence fragment ends
A B C D E
Map reads
Gene expression quantification
Isoform expressionquantification
A B C
A C
D E
Transcriptome reconstruction
AAAAAA
AAAAAAAAAAAA
![Page 10: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/10.jpg)
Alternative Splicing
Pal S. et all , Genome Research, June 2011
![Page 11: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/11.jpg)
Transcriptome Reconstruction
![Page 12: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/12.jpg)
Common Approaches
• De novo (genome independent reconstruction)– Trinity, Oases, TransABySS
• de Brujin k-mer graph
• Genome guided– Scripture
• Reports “all” transcripts
– Cufflinks, IsoLasso, SLIDE• Minimize set of transcripts explaining reads
• Annotation guided– RABT
• Simulate reads from annotated transcripts
![Page 13: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/13.jpg)
1 742 3 65t1 :
1 743 65t2 :
1 742 3 5t3 :
t4 :1 743 5
1 742 3 65
Genome-Guided Transcriptome Reconstruction – Multiple Solutions
![Page 14: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/14.jpg)
Which Solution is Most Likely?• TRIP: select smallest set of transcripts with good
statistical fit between fragment length distribution– empirically determined during library preparation– implied by “mapping” read pairs
1 3
1 2 3
500
300
200 200 200
200 200
Series1
Series1
![Page 15: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/15.jpg)
TRIP Results
• 100x coverage, 2x100bp pe reads; annotations for genes
FPTP
TPPPV
SensPPV
SensPPVFScore
2
FNTP
TPSens
![Page 16: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/16.jpg)
Why Single Cell RNA-Seq?
Macaulay and Voet, PLOS Genetics, 2014
![Page 17: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/17.jpg)
Challenges
• Low RNA input + low RT efficiency– Especially problematic for low expression genes
Macaulay and Voet, PLOS Genetics, 2014
![Page 18: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/18.jpg)
Challenges
• Stochastic effects (e.g., transcriptional bursting) hard to distinguish from regulated transcriptional heterogeneity
• PCR amplification bias results in distortion of transcript abundances
![Page 19: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/19.jpg)
SMARTer RNA-Seq Protocol
![Page 20: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/20.jpg)
Islam et al. http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2772.html
Correcting PCR Bias using UMIs (STRT-C1)
![Page 21: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/21.jpg)
Outline
• Intro to RNA-Seq– RNA-Seq applications– Analysis challenges for single cell data
• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering,
and differential expression– Tertiary analysis: functional annotation
• Conclusions
![Page 22: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/22.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 710
0.5
1
1.5
2
2.5
Lane 1 Lane 2
Lane 3
Read position
Perc
enta
ge o
f rea
ds w
ith m
ism
atch
es
![Page 23: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/23.jpg)
Tools to analyze and preprocess fastq files• FASTX (http://hannonlab.cshl.edu/fastx_toolkit/)
– Charts quality statistics– Filters sequences based on quality– Trims sequences based on quality– Collapses identical sequences into a single sequence
• PRINSEQ (http://prinseq.sourceforge.net/)– Generates read length and quality statistics– Filters reads based on length, quality, GC content
and other criteria– Trims reads based on length/position or quality
scores
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 24: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/24.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 25: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/25.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 26: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/26.jpg)
http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 27: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/27.jpg)
RNA-Seq read mapping strategies:– Ungapped mapping (with mismatches) to genome
• Cannot align reads spanning exon-junctions
– Local alignment (Smith-Waterman) to genome• Very slow
– Spliced alignment to genome• Computationally harder than ungapped alignment, but much
faster than local alignment
– Mapping on transcript libraries• Fastest, but cannot align reads from un-annotated transcripts
– Mapping on exon-exon junction libraries• Cannot align reads overlapping un-annotated exons
– Hybrid approaches
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 28: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/28.jpg)
Comparison of spliced read mapping tools
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Kim et al. http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3317.html
![Page 29: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/29.jpg)
• Cannot use raw read counts (why not?)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Islam et al. http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2772.html
![Page 30: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/30.jpg)
• CPM = count per million– Ignores multireads underestimates expression of genes in large families
– Does not normalize for gene length cannot compare CPMs b/w genes
– Comparing CPMs between samples assumes similar transcriptome size
• RPKM/FPKM = reads/fragments per kilobase per million– [Mortazavi et al. 08] Fractionally allocates multireads based on unique read
estimates
– Length for multi-isoform genes?
– Comparing FPKM between samples assumes similar (weighted) transcriptome size
• TPM: transcripts per million– Still relative measure of expression, but comparable between samples
– Most accurate estimation methods use multireads and isoform level estimation
• UMI counts– Absolute measure of expression?
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 31: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/31.jpg)
A B C D E
A C
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Gene ambiguous reads
Isoform ambiguous reads
![Page 32: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/32.jpg)
Expectation-maximization approach (IsoEM, RSEM)
irw ,
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
A B C
A C
i
j
Series1
Fa(i)
Series1
Fa (j)
![Page 33: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/33.jpg)
EM Algorithm
1. Start with random transcript frequencies
0.2
0.2
0.2
0.2
0.2
2. Fractionally allocate reads to transcripts
1
1
1
0.50.5
0.50.5
0.5
0.5
3. Compute expected #reads for each transcript
0.5
2.5
0.5
1
1.5
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 34: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/34.jpg)
1. Start with random transcript frequencies
2. Fractionally allocate reads to transcripts
3. Compute expected #reads for each transcript
0.5
2.5
0.5
1
1.5
4. Update transcript frequencies using maximum likelihood estimates
0.5/6
2.5/6
0.5/6
1/6
1.5/6
EM AlgorithmReads QC Read mapping Quantification Cells QC Normalization Clustering Differential
expressionFunctional analysis
![Page 35: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/35.jpg)
1. Start with random transcript frequencies
2. Fractionally allocate reads to transcripts
3. Compute expected #reads for each transcript
4. Update transcript frequencies using maximum likelihood estimates
0.5/6
2.5/6
0.5/6
1/6
1.5/6
5. Repeat steps 2-4 until convergence
EM AlgorithmReads QC Read mapping Quantification Cells QC Normalization Clustering Differential
expressionFunctional analysis
![Page 36: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/36.jpg)
Detected genes/cell -- main population
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Detected genes/cell -- minor population
Detected genes/cell -- bi-modal distribution
![Page 37: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/37.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Batch effects can be larger than biological effects, but can be corrected by normalization procedures
CPM & TPM datasets pre-quantile normalization
CPM & TPM datasets post-quantile normalization
![Page 38: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/38.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Quantile normalization (Irizarry et al 2002) • Shifts CPM/FPKM/TPM values for each cell to match a reference
distribution (e.g., distribution of means)- Highest value gets matched to highest value in reference- 2nd highest gets mapped to 2nd highest value in reference- And so on
Distribution of TPMs
Reference distribution
![Page 39: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/39.jpg)
Principal Component Analysis
• Linear transformation of the data:
– 1st component = direction of max. variance– 2nd component = orthogonal on 1st, max. residual variance
• Used for dimensionality reduction (ignore high components)– Visualization for exploratory analysis– Feature selection
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 40: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/40.jpg)
What makes a good clustering?• Homogeneity: Elements within a cluster are close to
each other• Separation: Elements in different clusters are further
apart from each other
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Bad clustering Good clustering
![Page 41: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/41.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Algorithm ParametersK-means K = Number of clusters
Fuzzy c-means Clustering (FCM)
K = number of clustersd = Degree of fuzziness
Hierarchical Clustering (HCS)
Metric = euclidean, seuclidean, cityblock, minkowski, chebychev, cosine, correlation, spearmanMethod = average, centroid, complete, median, single
EM Clustering K = Number of clustersS = Number of initial seedsI = Number of iteration
SNN-Cliq n = Size of the nearest neighbor listr = Density threshold of quasi-cliques m = Threshold on the overlapping rate for merging.
Many clustering algorithms!
![Page 42: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/42.jpg)
K-Means Clustering
• Goal: find K clusters minimizing the mean squared distance from data points to corresponding cluster centroids
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 43: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/43.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
K-Means Clustering
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expr
essi
on in
con
ditio
n 2
k1
k2
k3
![Page 44: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/44.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expr
essi
on in
con
ditio
n 2
k1
k2
k3
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
K-Means Clustering
![Page 45: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/45.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expr
essi
on in
con
ditio
n 2
k1
k2k3
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
K-Means Clustering
![Page 46: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/46.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expr
essi
on in
con
ditio
n 2
k1
k2 k3
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
K-Means Clustering
![Page 47: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/47.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Accuracy measuresPurity
U: set of ground truth classes; V: set of the computed clusters; N:total # of objects in dataset
Adjusted Rand Index (AR)
Rand Index (RI) RI= (TP+TN)/(TP+FP+FN+TN)
F1 Score F1 Score= 2×TP/(2×TP+FP+FN)
Mirkin’s index (MI) It counts the number of disagreements in data pairs between two clustering. It is the ratio of the number of disagreeing pairs to the total number of pairs. Lower value of Mirkin’s index indicates better clustering.
Hubert’s index (HI) HI = RI – MI
Corr Maximum weighted Pearson correlation between average expression value of each class at ground truth and computed cluster
![Page 48: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/48.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Accuracy comparison (Pollen et al. 2014, MiSeq)
![Page 49: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/49.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Accuracy comparison (Pollen et al. 2014, HiSeq)
![Page 50: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/50.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Accuracy comparison (Zeisel et al. 2015)
![Page 51: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/51.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Tests for differential gene expression must take both fold change and statistical significance into account
*
DE
FC = 2 FC = 2 FC = 1.5
*
![Page 52: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/52.jpg)
• Many reliable DE methods for data with replicates – edgeR [Robinson et al., 2010]– DESeq [Anders et al., 2010]
• When no/few replicates available bootstrapping provides a robust alternative
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 53: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/53.jpg)
Sensitivity results on Illumina MCF-7 data with varying number of replicates and minimum fold change 1.5
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
![Page 54: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/54.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
Spindle 0.00001
Apoptosis 0.00025
ENRICHMENTTEST
Enrichment Table
Experimental Data
A priori knowledge +existing experimental data
Gene expression table
Gene-setDatabases
Interpretation& Hypotheses
![Page 55: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/55.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
http://david.abcc.ncifcrf.gov/
![Page 56: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/56.jpg)
Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression
Functional analysis
http://www.genemania.org/
![Page 57: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/57.jpg)
Outline
• Intro to RNA-Seq– RNA-Seq applications– Analysis challenges for single cell data
• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering,
and differential expression– Tertiary analysis: functional annotation
• Conclusions
![Page 58: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/58.jpg)
Conclusions • The range of single-cell applications continues to
expand, fueled by advances in microfluidics technology and library prep protocols• ATAC-Seq, GT-Seq, Methyl-Seq, …
• Primary analysis is compute intensive• Requires server/cluster/cloud + linux + scripting• Galaxy framework (https://usegalaxy.org/) provides web-
based interface to many tools
• Most secondary/tertiary analyses can be done on PC/Mac using
• R environment (some programming)• Many can be done using web-based tools and user-friendly
apps (we’ll use JMP)
![Page 59: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/59.jpg)
Conclusions• Development of single-cell specific analysis methods
critical for fully realizing the potential of the technology• Allele specific expression• Biomarker selection• Cell type assignment• Lineage reconstruction• Characterization of heterogeneity
• Joint analysis of bulk and single cell data still needed to get unbiased cell type frequencies• Can also identify and characterize cell types missed by
current capture protocols
![Page 60: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/60.jpg)
Single cells or AND computational deconvolution
![Page 61: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649ed85503460f94be64a1/html5/thumbnails/61.jpg)
Acknowledgements
Sahar Al SeesiMarius NicolaeElham Sherafat
Craig Nelson
Adrian CaciulaSerghei Mangul
Yvette Temate TiagueuAlex Zelikovsky
Edward HemphillJames Lindsay