comparison of rna sequencing with 19,319 lab validated rt ... · comparison of rna sequencing with...

Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays

Jan Hellemans, PhD

London, UK

October 20-21, 2014

Acknowledgements

•  Biogazelle

Biogazelle team & collaborators

•  Ghent University

•  Steve Lefever

•  SEQC consortium

•  Christopher Mason

•  David Kreil

•  Leming Shi

•  Bio-Rad

•  qPCR: reference technology for nucleic acid quantification

•  sensitivity and specificity

•  wide dynamic range

•  speed

•  relatively low cost

•  conceptual and practical simplicity

•  easy to perform ≠ easy to do it right

•  many steps involved

•  all need to be right

Introduction

Assays & MIQE

•  design

•  amplicon length

•  primer positions (exonic or intron-spanning)

•  transcript coverage

•  in silico verification

•  specificity prediction (retropseudogenes and other homologues)

•  secondary structure analysis

•  empirical (wet lab) validation

•  specificity assessment (gel, melt, amplicon sequencing)

•  Cq of NTC (for SYBR assays)

•  amplification efficiency determination (slope, E, SE(E), r²)

The perfect assay

•  specific for the gene of interest (no off-target amplification)

•  detection of all transcript variants

•  detection not affected by polymorphisms (no allelic bias or drop out)

•  amplification efficiency ~100%

•  no gDNA co-amplification

•  no primer dimer formation

properties

The perfect assay

The perfect assay

•  For some genes, there is no perfect assay

•  no unique sequence (homology with other genes – pseudogenes)

•  no common sequence among all transcripts

•  regions are excluded because of repeats, secondary structures, SNPs, homology, ...

•  Make the best possible compromise and report potential issues

•  Design à in silico quality control à lab validation

... or the best possible

Assay design using primerXL

•  database of genomic information (transcripts, SNPs, ...)

•  tools for target region selection (maximize transcript coverage)

•  primer3 design engine

•  analysis of secondary structures and SNPs in primer annealing regions

•  specificity prediction (BiSearch)

•  relaxation cascade (from perfect to best possible)

BiSearch specificity prediction

•  BiSearch loose

•  1222222222222222

•  BiSearch strict

•  1233333333333

BiSearch specificity prediction

•  BiSearch loose

•  1222222222222222

•  only the gene of interest (FFAR2)

•  BiSearch strict

•  1233333333333

reads seq gene_list official_symbol location

2843 CATGGCAGTCACCATCTTCTGCTACTGGCGTTTTGTGTGGATCATGCTCTCCCAGCCCCTTGTGGGGGCCCAGAGGCGGCGCCGAGCCGTGGGGCTGGCTGTGGTGACGCTGCTCAATTTCCTGGTGTGCTTCGGACCTTACAGATCGGAA

ENSG00000126262 FFAR2 19:35940617-35942667

1897 GTAAGGTCCGAAGCACACCAGGAAATTGAGCAGCGTCACCACAGCCAGCCCCACGGCTCGGCGCCGCCTCTGGGCCCCCACAAGGGGCTGGGAGAGCATGATCCACACAAAACGCCAGTAGCAGAAGATGGTGACTGCCATGAGATCGGAA

ENSG00000126262 FFAR2 19:35940617-35942667

1535 GTAAGGTCCGAAGCACACCGAGAGCTGGGAGCAGGAGCTACACAGTCTGCTGGCCTCACTGCACACCCTGCTGGGGGCCCTGTACGAGGGAGCAGAGACTGCTCCTGTGCAGAATGAAGGCCCTGGGGTGGAGATGCTGCTGTCCTCAGAA

ENSG00000141456 AC091153.1 17:4574680-4607632

1097 CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATTCTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA

GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGG

ENSG00000141456 AC091153.1 17:4574680-4607632

1091 CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATTCTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCAGTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGGT

ENSG00000141456 AC091153.1 17:4574680-4607632

Wet lab validation

•  PCR composition

•  total volume: 5 µl

•  instrument: CFX384 (with automation)

•  mastermix: SsoAdvanced SYBR

•  primer conc: 250 nM each

•  PCR program

•  default cycling protocol for SsoAdvanced SYBR (Ta=60°C)

•  Samples

•  cDNA: 25 ng (total RNA equivalents – Agilent Universal human reference RNA = MAQC A)

•  gDNA: 2.5 ng (Roche)

•  NTC: water + carrier (5 ng/μl yeast transfer RNA)

•  synthetic template (pooled 60-mers in concentration range: 20 M – 20 copies)

setup

Wet lab validation

•  lab validation of 103 053 assays (human, mouse and rat coding genes)

•  1 456 142 reactions

•  3 822 PCR plates (384-well)

•  equivalent to 15 288 PCR plates (96-well)

some numbers

305 m

Amplification efficiency

•  initial publication: Vermeulen et al., Nucleic Acids Research, 2009

•  Biogazelle approach (easy & cost effective)

•  60-mer

•  no modifications, standard desalted

•  7 points dilution series: 20 000 000 > 20 molecules

•  equivalent to full length double stranded template

•  limitation: behavior of first cycles amplifying from cDNA are not evaluated

synthetic templates

30 nt 3’ 30 nt 5’

ds template ss oligo r²<0.99 1 1 median E 2.00 2.01 average E 2.00 2.01 count E <> [1.90-2.10] 1 3 paired t-test p-value 0.14

Amplification efficiency distribution (n = 50 133)

89%

Amplification efficiency distribution (n = 50 133)

89%

redesign

redesign

Specificity

•  amplicon sizing ( + melt analysis for SYBR assays)

•  limited sensitivity for detecting low level non-specific coamplification

•  failure to observe non-specific amplification of sequences with similar size and/or Tm e.g. expressed pseudogenes or homologous genes

•  Next level of specificity assessment

•  in silico specificity predictions by BiSearch

•  massively parallel sequencing of pooled PCR products

•  average coverage > 1000-fold à lab specificity > 99.9%

•  50 – 200 times more sensitive than size analysis and Sanger sequencing

NGS for increased sensitivity

Specificity most assays are 100% on-target

Specificity

0%

25%

50%

75%

100%

% o

n-t

arg

et

2/3 of non-specific assays may go unnoticed without NGS

0% 20% 40% 60%

0 < x < 0.1 0.1 < x < 0.2 0.2 < x < 0.3 0.3 < x < 0.4 0.4 < x < 0.5 0.5 < x < 0.6 0.6 < x < 0.7 0.7 < x < 0.8 0.8 < x < 0.9

0.9 < x < 1

Specificity

perfect 60 293 86%

acceptable (<10% non-specific)

5 866 8%

predicted non-specificity (no specific design found)

1 204 2%

failing specificity QC criteria 2 467 4%

the power of in silico verification

MIQE compliant PrimePCR assay validation data sheet

Dynamic range

> 10 000 000 fold

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

16 7

77.2

16

8 3

88

.608

4 1

94

.304

2 0

97

.152

1 0

48

.576

52

4.2

88

26

2.1

44

13

1.0

72

65

.536

32

.768

16.3

84

8.1

92

4.0

96

2.0

48

1.0

24

0.5

12

0.2

56

0.1

28

0.0

64

0.0

32

0.0

16

0.0

08

0.0

04

0.0

02

0.0

01

ge

ne

co

un

t

copies per cell

human mouse rat

SEQC

•  multisite, cross-platform analysis of RNAseq

•  FDA sponsored and guided MAQC-III

•  Nature Biotechnology, Sept 2014 Focus on RNA sequencing quality control (SEQC) 2 Biogazelle co-authors

•  MAQC samples reference RNA with built in controls – known truths

•  > 100 billion reads

•  compared against qPCR (PrimePCR)

RNAseq vs PrimePCR Differential expression

454 ILMN PGM PRO

0.83 0.89 0.86 0.89

13,190 genes 16,264 genes 14,981 genes 16,242 genes

qPCR (PrimePCR) vs RNAseq (Illumina) r² = 75% for genes detected by both platforms

qPCR (PrimePCR) vs RNAseq (Illumina)

Saturation analysis

preparation sample libraries reads GENCODE12 mapping

PrimePCR mapping

ribo-depleted

MAQC A 22 5 304 M 1 955 M (37%) 1 692 M (32%)

MAQC B 17 3 370 M 1 447 M (43%) 1 193 M (35%)

poly-A–enriched

MAQC A 4 427 M 291 M (68%) 278 M (65%)

MAQC B 4 446 M 323 M (72%) 297 M (67%)

ABRF-NGS dataset

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

MAQC A - detection

MAQC B - detection

Saturation analysis ribo-depletion RNAseq - % of GENCODE12


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

MAQC A - detection

MAQC B - detection


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

MAQC A - detection

MAQC B - detection

MAQC A - quantification

MAQC B - quantification

Saturation analysis poly-A RNAseq - % of GENCODE12

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

MAQC A - detection

MAQC B - detection

MAQC A - quantification

MAQC B - quantification

Saturation analysis ribo-depletion RNAseq for MAQC A - GENCODE12 vs PrimePCR

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

GENCODE12 detection

primePCR detection

GENCODE12 quantification

primePCR quantification

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 096 000 000

2 048 000 000

1 024 000 000

512 000 000

256 000 000

128 000 000

64 000 000

32 000 000

16 000 000

8 000 000

4 000 000

2 000 000

1 000 000

500 000

250 000

125 000

ribo-depletion RNAseq - detection

poly-A RNAseq - detection

qPCR - detection

ribo-depletion RNAseq - quantification

poly-A RNAseq - quantification

qPCR - quantification

Saturation analysis MAQC A - % of PrimePCR

Confirmation rate of novel junctions

Junction prediction junctions confirmed confirmation rate

multiple algorithms (Cstar + Magic + Subread)

136 136 100%

single algorithm 24 20 83%

•  novel exon

•  one of the primers in the novel exon

•  novel junction

•  one of the primers overlapping the novel junction ≥ 5 bases at either side of junction

•  size analysis to confirm expected size for novel transcripts

Conclusions - I

•  Assay design and in silico verification

•  Transcript coverage

•  SNPs and secondary structures

•  Specificity prediction

•  Empirical assay validation

•  Efficiency in 90-110% range

•  Stringent specificity analysis by massively parallel amplicon sequencing

•  validated assays for human, mouse & rat coding genes PrimePCR

Conclusions - II

•  qPCR based transcriptome profiling

•  Samples from MAQC/SEQC study

•  PCR data as benchmark for evaluation of RNAseq

•  qPCR benefits: high sensitivity and large dynamic range

•  good correlation with RNAseq results

•  for individual genes, RNAseq ≤ 100 M reads gives lower sensitivity than qPCR

•  the majority of novel junctions identified by RNAseq can be confirmed by qPCR

comparison of rna sequencing with 19,319 lab validated rt ... · comparison of rna sequencing with...

Documents