david-emlyn parfitt, columbia illumina seminar 11/9/2011
TRANSCRIPT
![Page 1: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/1.jpg)
David-Emlyn Parfitt
Shen Lab, Irving Cancer Research Center
Using RNA Seq to conduct systems-level analysis of
embryonic pluripotency, self-renewal and differentiation
![Page 2: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/2.jpg)
The molecular regulators of self-renewal and pluripotency are
not completely defined or characterized
mESC hESCmEpiSC
Mouse blastocyst
(3.5 days)
Mouse egg cylinder
(5.5 days)
Epiblast
Inner Cell
Mass
≈
Human blastocyst
(5-7 days)
Self-renewal and PluripotencyNanog
Oct4
Sox2
JAK-STAT
MAPK
Novel Master Regulators?
![Page 3: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/3.jpg)
150 Combinatory
Chemical
Treatments
Genome-Wide GEP Data
Algorithmic
analysis
(ARACNe,
MINDy)
Master
Regulator
Analysis
Ra
nk
ESC/EpiSC
„Interactome‟
In vitro and in vivo
validation
Defining the molecular networks associated with stem cell self-
renewal, pluripotency and differentiation
Which tool to use for
expression profiling?
![Page 4: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/4.jpg)
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
Arrays:
Well defined technique
High throughput
Discrete measurement
Background noise + batch effect
No distinction between isoforms/alleles
![Page 5: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/5.jpg)
aaaaaaa
aaaaaaa
Total RNA
Fragment
Reverse-transcribe
to cDNA
aaaaaaa
aaaaaaa
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
RNA Sequencing:
![Page 6: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/6.jpg)
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
Single base resolution
Low background noise
Distinction of isoform and allelic
expression
Low amount of RNA needed
*Including non-coding RNAs, depending
on purification protocol
RNA Sequencing:
aaaaaaa
aaaaaaa
Total RNA*
Reverse-transcribe
to cDNA
aaaaaaa
aaaaaaa
Algorithmic and logistic challenge
Lengthy library preparation
![Page 7: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/7.jpg)
RNA-Sequencing Methodology:
Deciding the parameters
Read length?
-Efficiency vs faithfulness
Single end or paired end reads?
-Efficiency vs faithfulness
-Alignment accuracy
Number of reads?
-Depth of coverage
-Cost
How many to effectively cover
the mouse genome (~50MB)?
aaaaaaa
aaaaaaa
aaaaaaa
aaaaaaa
![Page 8: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/8.jpg)
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
RPKM:
Normalized measurement of transcript abundance
Reads per kilobase of exome per million mapped
reads
RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
![Page 9: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/9.jpg)
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
RPKM:
Normalized measurement of transcript abundance
Reads per kilobase of exome per million mapped
reads
RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
![Page 10: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/10.jpg)
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
100 million, 100bp, SE reads
![Page 11: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/11.jpg)
RA-72H-1 RA-72H-2 CM CM
Number of raw reads (million) 97.3 88 87 95
Number of mapped reads (million) 97 87.7 87 94
Transcripts w. RPKM > 0.01 (/27641) 72% 77% 84% 84%
Setting the transcript ‘detection’ threshold
![Page 12: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/12.jpg)
RA-72H-1 RA-72H-2 CM CM
Number of raw reads (million) 97.3 88 87 95
Number of mapped reads (million) 97 87.7 87 94
Transcripts w. RPKM > 1 (/27641) 49% 48% 51% 52%
Setting the transcript ‘detection’ threshold
![Page 13: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/13.jpg)
r2=0.9 r2=0.97
RPKM is constant, regardless of number of reads
“RPKM for a particular transcript does not change
when overall number of reads changes”
![Page 14: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/14.jpg)
0.749
0.725
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Media
nR
PK
M
20 40 60 80
Reads (millions)
i.e. We are not detecting significantly more genes/transcripts above
20-30 million reads
RPKM becomes relatively constant with increased read
number
![Page 15: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/15.jpg)
0.7
0.75
0.8
0.85
0.9
0.95
1
0 20 40 60 80 100
Perc
ent
of final
transcripts
Reads (millions)
[60,)
[30,60)
[15,30)
[7.5,15)
[3.75,7.5)
[0.01,3.74)
Transcript
Abundance
(RPKM)
Between 20 and 30 million 100bp reads is sufficient to capture
~100% of the most abundant transcripts and 95% of the least
abundant
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
![Page 16: David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011](https://reader030.vdocuments.mx/reader030/viewer/2022020116/55a7bd951a28ab013f8b4681/html5/thumbnails/16.jpg)
Acknowledgements
Shen Lab:
Michael Shen
Hui Zhao
Shen Lab Members
Califano Lab:
Andrea Califano
Mariano Alvarez
Yufeng Shen
Xiaoyun Sun
Olivier Couronne
Erin Bush