gene expression analyses - welcome to sandberg...
TRANSCRIPT
![Page 1: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/1.jpg)
Rickard Sandberg
Gene Expression Analyses
Assistant Professor Ludwig Institute for Cancer Research Department of Cell and Molecular Biology Karolinska Institutet
![Page 2: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/2.jpg)
Outline
- microarrays
- RNA-Seq
- Common gene expression analyses steps
- clustering of samples
- differential expression tests
- enrichment tests
![Page 3: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/3.jpg)
Transcriptome analyses
- rRNAs (dominating, ~95%)
- mRNAs (~5%)
- long non-coding RNAs (e.g. lincRNAs) (~0.05%)
- snoRNAs, snRNAs
- microRNAs, piRNAs
![Page 4: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/4.jpg)
Different protocols identify different parts of the transcriptome
PolyA selection
- rRNAs (dominating, ~95%)
- mRNAs (~5%)
- long non-coding RNAs (e.g. lincRNAs) (~0.05%)
- snoRNAs, snRNAs
- microRNAs, piRNAs
![Page 5: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/5.jpg)
Different protocols identify different parts of the transcriptome
Ribominus (removal of
ribosomal RNAs)
not so random hexamers or DSN
- rRNAs (dominating, ~95%)
- mRNAs (~5%)
- long non-coding RNAs (e.g. lincRNAs) (~0.05%)
- snoRNAs, snRNAs
- microRNAs, piRNAs
![Page 6: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/6.jpg)
Different protocols identify different parts of the transcriptome
small RNA protocol
- rRNAs (dominating, ~95%)
- mRNAs (~5%)
- long non-coding RNAs (e.g. lincRNAs) (~0.05%)
- snoRNAs, snRNAs
- microRNAs, piRNAs
![Page 7: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/7.jpg)
DNA microarrays
!oligonucleotide arrays (affymetrix, agilent, illumina etc) cDNA microarrays (competitive hybridization)
![Page 8: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/8.jpg)
Important Considerations
§ Microarrays where designed based on EST-clusters § Probes mapping at multiple locations § Multiple probe sets mapping to the same gene !
§ Many projects curated microarray probes to only allow for uniquely mapping ones, e.g. customCDF
http://brainarray.mbni.med.umich.edu/Brainarray/Database/ CustomCDF/genomic_curated_CDF.asp
![Page 9: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/9.jpg)
Basis of Microarrays
![Page 10: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/10.jpg)
Steps in microarray analyses
§ Start with RAW data (for affy arrays = CEL files) § Normalize
àremove systematic strength biases àoften quantile normalization
§ Background adjust/transform àTries to estimate signal from background àlog2 transform (ratios problem, stabilize variance)
§ Gene (or probeset summarization) àmedian polish (fancy average of probes targeting
the same gene/transcript/probe set)
![Page 11: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/11.jpg)
Gene Expression - Microarray data
§ Repositories of raw and processed data: àGene Expression Omnibus (GEO)
http://www.ncbi.nlm.nih.gov/geo/ àArrayExpress
http://www.ebi.ac.uk/microarray-as/ae/
§ Databases with Gene Expression Atlases àHuman, Mouse and Rat Tissue Atlas
Symatlas / BioGPShttp://biogps.gnf.org/
àCancer Gene expression atlas: oncominewww.oncomine.org
![Page 12: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/12.jpg)
!In what tissues are my gene expressed? using BioGPS (former symatlas)
http://biogps.gnf.org/
![Page 13: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/13.jpg)
Finding experiments where my gene is differentially expressed
ArrayExpress GEO
§ Do not use updated CDFs (probe to transcript mappings) § Constantly evolving (hard to reproduce years later) § Offer no quality control § Limited capabilities for more comprehensive analyses
![Page 14: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/14.jpg)
What are the methods measuring?
• Expressed Sequence Tags• Traditional 3’UTR focused microarrays
• Exon and Tiling Arrays• Deep Sequencing using Illumina/Solexa, SOLiD, (454)
![Page 15: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/15.jpg)
Isolate polyA+ RNA
mRNA-seq protocol
Wang et al. 2009 Nat Rev Gen
§ polyA+ RNAs § rRNA- RNAs § short RNAs (e.g. miRNAs) § Ribosome footprint
sequencing § GRO-Seq (Global Run On
sequencing) § CLIP-Seq (RNA-protein
interactions) !
§ non-RNA applications:ChIP-Seq, DNAse hypersensitive sites,...
![Page 16: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/16.jpg)
Strand-specific RNA-Seq protocols
![Page 17: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/17.jpg)
Genome Chromosome Fasta Files
+
Known and putative splice junctions Fasta File
2. map reads towards genome + junction compilation
GTAAGT-----------AG Exon n+1
1. compile sets of junctions
Exon n
Mapping of splice junctions
![Page 18: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/18.jpg)
Tophat first MethodIdentifying the transcriptome
A B C identify candidate exons
via genomic mapping
A B C A B C Generate possible
pairings of exons
Align “unmappable”
reads to possible junctions
A B C A B C
![Page 19: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/19.jpg)
Longer readsLonger reads
GATGTTCTCAGTGTCC GATGTAATCAGTGTCC AACCCTCTCAGTGTCC
>HWI-EAS229_75_30DY0AAXX:7:1:0:949
Very long (100Kb+) intron
By segmenting the long reads, and mapping the segments independently, we can
look harder for junctions we might have missed with shorter reads
Running time
independent of
intron size
![Page 20: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/20.jpg)
Mapping to transcriptomeExons 5’UTR 3’UTRIntronsGene:
DNA (genome)W
C
pre-mRNA
Transcription
AAAAA
RNA processing (splicing, polyadenylation)
mRNA AAAAA
Exons 5’UTR 3’UTRIntronsGene:
DNA (genome)W
C
![Page 21: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/21.jpg)
Microexons and junction coverage
Exons 5’UTR 3’UTRIntronsGene:
DNA (genome)W
C
2 or more splice junctions within the same read
in-house mapping tophat mapping
Different read length will have different problems!
![Page 22: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/22.jpg)
Finding novel non-annotated genes or transcript variants
![Page 23: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/23.jpg)
Mapping'speed 308'M'reads'/'hour%'uniquely'mapping 60%'multimapping 25%'unmapped 15
Example of STAR aligned single-cell RNA-Seq data
281 719 splice junctions 279 356 with GT/AG 2 123 with GC/AG 215 with AT/AC
![Page 24: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/24.jpg)
TestesLiverSkeletal MuscleHeartAK074759BC011574AK092689
log 1
0(read
s) 02
02
02
02
3B
3A
3B
RNA-Seq generate quantitative expression estimates
<10M reads
Brain expression / UHR expression (Taqman)
Bra
in R
eads / U
HR
Reads (R
NA
-SE
Q)
104
R = 0.953
slope = .933103
102
101
100
10-1
10-2
10-3
10-4
104 103 102 101 100 10-1 10-2 10-3 10-4
Mortazavi et al. Nat Methods 2008 Ramskold et al. PLoS Comp Biol 2009
03691215 12.3
0.13 0.10Exon Intron Intergenic
MKPR
Wang*, Sandberg* et al. Nature 2008
150x
![Page 25: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/25.jpg)
How gene expression levels are estimated
gene A (2 kb transcript) gene B (600 bp transcript)
ACGCG... TCGAG... AGGTA... CCGTG... CTGCG...
Sequencing
FragmentationThe number of fragments are proportional to the abundance and length of the transcript.
Normalize for different transcripts lengths and different sequence depths in different samples.
RPKM (Reads per kilobase and million mappable reads): Given 10 million mappable reads:
RPKM, Gene A: 500 reads x 1000/2000 x 106/107
500 / (2 x 10) = 25 RPKM
RPKM roughly corresponds to transcripts per cell (Mortazavi et al. 2008) (assuming a standard cell with ~ 300.000 transcripts)
Fragments PKM (FPKM)
![Page 26: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/26.jpg)
Gene quantification and mRNA copy numbers in cells
CN
X LT
=
X =109R T
C, number of reads mapping to transcript N, total number of sequenced reads !X, copies per cell of transcript T, total length of transcriptome L, transcript length !R, RPKM (reads per kilobase and million
mappable reads)
T, can be estimated from !1. starting amount of mRNA 2. spiked in controls 3. estimate transcriptome length - if 300.000 transcript of around 1500 nt each -> 4.5 *108
- 1 RPKM ~ 0.5 transcripts per cell
XN LC T= = 106
R T103
![Page 27: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/27.jpg)
Depth needed for accurate expression level estimation
Perc
enta
ge o
f gen
es w
ithin
±20
% o
f fin
al e
xpre
ssio
n
100
80
60
40
20
01 5 10 15 20 25 30 35 40 45
1-9 RPKM (n=4338)10-29 RPKM (n=3048)30-99 RPKM (n=2817)100-999 RPKM (n=1469)1000-6705 RPKM (n=56)
Million mapped reads
B
A
01 5 10 15 20 25 30 35 40 45
Million mapped reads
Perc
enta
ge o
f gen
es w
ithin
fold
-cha
nge
of fi
nal e
xpre
ssio
n
100
80
60
40
20
2-fold1.5-fold1.2-fold1.1-fold1.05-fold
Mortazavi et al. 2008 Ramskold/Kavak et al. 2011 (bookchapter)
![Page 28: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/28.jpg)
RNA sequencing of blastocyst-derived cell lines
Read counts for selected genes
ES TS XEN EpiSCNanog 6525 20 1 263
Cdx2 124 6256 1 1
Sox17 11 5 9814 99
Sox3 151 1234 6 796
Shh 0 0 0 1
Ihh 4 12 107 17
Dhh 10 212 575 80
![Page 29: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/29.jpg)
Significance of expression level
background RPKM ~ 0.05 RPKM detection level of 0.3 RPKM an average 1 500 nt transcript 20 M uniquely mapping reads !background model: 0.05 x 1.5 x 20 = 1.5 reads !expressed at 0.3 RPKM: 0.3 x 1.5 x 20 = 9 reads binomial test for 9 reads out of 20 M mapping to transcript given a background probability of 1.5 / 20x109 gives a p-value of 2.8e-5 !!expressed at 1 RPKM: 1 x 1.5 x 20 = 30 reads
0.05 RPKM 1 RPKM
![Page 30: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/30.jpg)
Mixed species/strains experiments
§ Mixed species experiments allows mapping of host and pathogen interactions
§ Parasite-host interactions
§ Tumor-stroma interactions
![Page 31: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/31.jpg)
Allele-sensitive RNA-seq using mouse crosses
![Page 32: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/32.jpg)
Fusion events, e.g. translocations in cancer
Oszolak and Milos, Nature Rev Genet 2011
![Page 33: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/33.jpg)
Outline
- microarrays
- RNA-Seq
- Common gene expression analyses steps
- clustering of samples
- differential expression tests
- enrichment tests
![Page 34: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/34.jpg)
Early Quality Control
0.0
0.2
0.4
0.6
0.8
1.0
20% at 3'Middle20% at 5'
SMARTer
Varian
t #2
varia
nt #3
Optimize
d
varia
nt #1
varia
nt #4
Supplementary Figure 6. Read coverage across genes in single-cell RNA-Seq data.Fraction of reads mapping to the 20% 5’ most, the 20% 3' most, and the 60% in the middle region for all individual single-cell transcriptome data from HEK293T cells. Variant protocols are as the optimized except for differences in volume of TSO used (variant #1 use 2 ul instead of 1ul), template switching oligo (variant #2 uses rGrG+N, variant #4 uses rGrGrG) or preamplification enzyme (variant #3 uses Advantage 2).
fraction o
f m
apped r
eads
0.00
0.02
0.04
0.06
0.08
0.10
0.12
123
456
789
Read mapping (STAR to hg19)
Reads (
%)
0
20
40
60
80
100
No matchMultimappingUniquely mapping
fraction o
f m
apped r
eads
0.0
0.2
0.4
0.6
0.8
1.0
IntergenicIntronic Exonic
Number of mismatches:
Genomic regions
Variant #2
Variant #3
Optim
ized
variant #1
SM
ARTe
r
variant #4
Supplementary Figure 2. Mapping statistics for single-cell libraries generated using SMARTer, optimized Smart-Seq and variants of the optimized protocol.(A) The fraction of uniquely aligned reads with 1 to 9 mismatches for each single-cell RNA-
Seq library. (B) Percentage of reads that could be aligned uniquely, aligned to multiple
genomic coordinates (multimapping) or did not align for all single-cell RNA-Seq libraries. (C)
The fraction of uniquely aligned reads that mapped to exonic, intronic or intergenic regions
(annotations based on RefSeq gene models). Variant protocols are as the optimized except
for differences in volume of TSO used (variant #1 use 2 ul instead of 1ul), template switch-
ing oligo (variant #2 uses rGrG+N, variant #4 uses rGrGrG) or preamplification enzyme
(variant #3 uses Advantage 2).
A B
C
Variant #2
Variant #3
Optim
ized
variant #1
SM
ARTe
r
variant #4
Variant #2
Variant #3
Optim
ized
variant #1
SM
ARTe
r
variant #4
![Page 35: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/35.jpg)
Biological QC Look at replicates and that samples group by
origin/type
Hierarchical clustering
−100
−50
0
50
100
150
í100 −50 0 50 100 150
PC3 (n=4)
T24(n=4)
Lncap (n=4)
SVD component 1
SVD
com
pone
nt 2
PCA / SVD
![Page 36: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/36.jpg)
U251
SNB-19
SF-295
SNB-75
HS-578T
SF-539
SF-268
BT-549
HOP-62
NCI-H226
A498
RXF-393
786-0
CAKI-1
UO-31
ACHN
TK-10
MDA-MB-231
HOP-92
SN12C
ADR-RES
OVCAR-8
LOXIMVI
PC-3
OVCAR-3
OVCAR-4
IGROV1
SK-OV-3
OVCAR-5
DU-145
EKVX
A549
NCI-H460
RPMI-8226
K562
K562
K-562
HL-60
MOLT-4
CCRF-CEMSR
HCT-116
SW-620
HCT-15
KM12
HCC-2998
COLO205
HT-29
MCF7
MCF7
MCF7
T-47D
NCI-H322
NCI-H23
NCI-H522
SK-MEL-5
MDA-MB435
MDA-N
M-14
SK-MEL-28
UACC-257
MALME-3M
UACC-62
SK-MEL-2A
1.00
-1.00
0.60
0.20
-0.20
-0.60
leukaemia colon melanomaCNS renal ovarian
breastprostatenon-small-lung
NCI60 cell line expression clustering
ordering pretty arbitrary
Careful about high order clustering
![Page 37: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/37.jpg)
Singular Value Decompostion (SVD)Genes
e_0m
e_30m
e_60m
e_90m
e_120m
e_150m
e_180m
e_210m
e_240m
e_270m
e_300m
e_330m
e_360m
e_390m
Arrays
Genes
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Eigenarrays
1413121110987654321
Eigenarrays
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Eigengenes
1413121110987654321
Eigengenes
e_0m
e_30m
e_60m
e_90m
e_120m
e_150m
e_180m
e_210m
e_240m
e_270m
e_300m
e_330m
e_360m
e_390m
Arrays
![Page 38: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/38.jpg)
QC: Similarities between replicates
0 hr
6 hr
48 hrSa
mpl
e Pr
ojec
tion
(eig
enge
ne 2
, 31%
)
Sample Projection (eigengene 1, 52%)
Eigengenes 0 hr 6 hr 48 hr 0 hr 6 hr 48 hr
SVD Analysis of Mouse T-cell Stimulation
Captures 83% of variation
![Page 39: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/39.jpg)
QC: Outliers
Embryoid bodiesSonic Hedgehog induced
?
![Page 40: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/40.jpg)
Differential Expression
Either based on reads or RPKM values
Most tools developed for microarrays are based on probe set expression values, whereas RNA-Seq tools aim to use read counts !Reads • have more statistical power • have unresolved biases • need fewer replicates? !
Expression levels, RPKMs • better understood statistics, but has less power
![Page 41: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/41.jpg)
Statistical models of differential expression
![Page 42: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/42.jpg)
Statistical models of differential expression
![Page 43: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/43.jpg)
Transcript length effects in differential expression tests
Oshlack and Wakefield Biology Direct 2009
p-values should not be the basis for sorting
![Page 44: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/44.jpg)
non-coding RNAs in prostate cancer: Expression and differential expression
![Page 45: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/45.jpg)
Enrichment analyses
![Page 46: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/46.jpg)
Goals of enrichment analyses
![Page 47: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/47.jpg)
Factors to consider
![Page 48: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/48.jpg)
Gene Sets, e.g. pathways and gene ontology
§ Gene Ontology § KEGG § BioCarta § PANTHER !
§ Chromosomal location
§ Genes found differentially expressed in another experiment
![Page 49: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/49.jpg)
Two strategies
![Page 50: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/50.jpg)
List-based enrichment analyses
Gene In List Gene NOT In List
In Category a bNOT In Category c d
all genes
in category
gene set
in category
![Page 51: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/51.jpg)
Assessing significance
![Page 52: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/52.jpg)
DAVID
![Page 53: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/53.jpg)
Query many types of gene sets in one go
Current Background: HOMO SAPIENS Check Defaults ! • Main Accessions (0 selected) • Other Accessions (0 selected) • Gene Ontology (3 selected) • Protein Domains (3 selected) • Pathways (3 selected) • General Annotations (0 selected) • Functional Categories (3 selected) • Protein Interactions (0 selected) • Literature (0 selected) • Disease (1 selected) • Tissue Expression
![Page 54: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/54.jpg)
Gene set enrichment analyses (GSEA)
![Page 55: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/55.jpg)
![Page 56: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/56.jpg)
Molecular Signature db
![Page 57: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/57.jpg)
Gene Ontology analyses
§ Note: Background matterschoosing the wrong background set of genes may affect/confound your results
§ Depends upon preselected categories !
§ List-dependente.g. DAVID, http://david.abcc.ncifcrf.gov/ !
§ List-independent methodse.g. GSEA, http://www.broad.mit.edu/gsea/
![Page 58: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7dfd3dc71ddd3942347943/html5/thumbnails/58.jpg)
Questions?