improving sigmoidalfitting of real-timepcr data: linear ... · testing methods on replicate qpcr...
TRANSCRIPT
Improving sigmoidal fitting of real-timePCR
data: Linear-quadratic-sigmoidal hybrid
models, error analysis and weighting
Andrej-Nikolai Spiess
University Hospital Hamburg-Eppendorf
Department of Andrology
R: the lingua franca of statistical data analysis
• developed by John Chambers and Ross Ihaka at
the beginning of the 80´s as S
• was commercialised to S-Plus but also made
freely available (open-source) as R.
• 15 core people maintain R, ~ 2 million users
worldwide
• R‘s own repertoire of ~ 3000 functions can be
extended by ‚packages‘ (i.e. for spatial or
environmental statistics, genomics, no limit…)
=> ~ 4500 packages available
qpcR: An R package for real-time PCR analysis
Ratio
Calculation
(MC simulation,
Permutation (REST),
Error Propagation)
A typical workflow using qpcR
‚pcrimport‘
‚pcrfit‘ ‚modlist‘ ‚replist‘
Data import
Model
creation
0 10 20 30 40 50
02
46
810
Cycles
Raw
flu
ore
scence
0 10 20 30 40 50
02
46
810
12
Cycles
Raw
flu
ore
sce
nce
0 10 20 30 40 50
02
46
810
12
Cycles
Raw
flu
ore
sce
nce
Model
selectionL3, L4, L5,
L6, L7
AIC, AICc, BIC,
PRESS, F-test, R2Goodness
of fit
‚efficiency‘, ‚sliwin‘, ‚expfit‘, ‚calib‘
0 10 20 30 40 50
02
46
810
Cycles
Raw
flu
ore
scence
2.356
1.0
1.2
1.4
1.6
1.8
2.0
2.2
Effic
iency
cpD2: 15.68cpD1: 17.59
Eff: 1.795
resVar: 0.00603899 AICc: -102.58
Model: l5
0 10 20 30 40 50
02
46
810
Cycles
Raw
flu
ore
scence
Eff/Cq value
0 10 20 30 40 50
-6-4
-20
2
R^2 : 0.99986
Eff: 1.76
Cycles
log(R
FU
)
0 10 20 30 40 50
02
46
810
Cycles
Raw
flu
ore
scence
0 1 2 3 4 5
15
20
25
30
log(Dilution or copy number)
thre
shold
cycle
AIC: -4.55 Rsq: 0.99967
Eff: 2.04
‚ratiocalc‘‚ratiobatch‘Monte-Carlo
0.37 1 2.72 7.39 20.09 54.6
Permutation
0.37 1 2.72 7.39 20.09 54.6
Error propagation
0.37 1 2.72 7.39 20.09 54.6
5e-04
1e-03
5e-03
1e-02
5e-02
1e-01
5e-01
1e+00
r1c1:g1c1:r1s1:g1s1
2
5
10
r2c1:g1c1:r2s1:g1s1
0.16
0.18
0.20
0.22
0.24
0.26
0.28
0.30
r1c1:g1c1:r1s2:g1s2
2
4
6
8
10
12
r2c1:g1c1:r2s2:g1s2
Monte-Carlo
Simulation
Permutation
ErrorPropagation
r1c1:g1c1:r1s1:g1s1 r2c1:g1c1:r2s1:g1sr1c1:g1c1:r1s2:g1s2r2c1:g1c1:r2s2:g1s2
Mean.Sim 0.419531879 6.533495786 0.223828241 5.770492727
Std.dev..Sim 0.147043285 1.353377023 0.017764239 1.047243322
Median.Sim 0.396300098 6.393701355 0.223282684 5.669149665
MAD.Sim 0.133055518 1.304387664 0.017778066 1.016963311
Conf.lower.S 0.204116613 4.284885281 0.191219778 4.00151269
Conf.upper.S 0.767489852 9.538449133 0.260947425 8.068843646
Mean.Perm 0.410562957 6.513010694 0.223663646 5.749083527
Std.dev..Per 0.114406471 1.081009488 0.015364639 0.712937423
Median.Perm 0.330239654 6.74781282 0.236810989 6.37620751
MAD.Perm 0.079692836 1.010968373 0.006290143 0.169364005
Conf.lower.P 0.280518979 5.152642278 0.206810681 4.973967308
Conf.upper.P 0.560169465 7.961269931 0.240735435 6.481874382
perm > init.P 0.4995 0 0.535 0
perm == init 0.5005 0.4925 0.465 0.5125
perm < init.P 0 0.5075 0 0.4875
Mean.Prop 0.395972171 6.39779022 0.223126803 5.678021311
Std.dev..Pro 0.134606979 1.310527148 0.017678287 1.020096686
Conf.lower.P 0.13214734 3.829204209 0.188477998 3.678668546
Conf.upper.P 0.659797003 8.96637623 0.257775607 7.677374075
Data
export
What can qpcR do?Sigmoidal fitting:
3-, 4-, 5-, 6-, 7-parameter sigmoidal models. Model selection for the best model based on
F-test. Goodness of fit measures: AIC, AICc, BIC, R2, PRESS, Chi2. Extract Cq, efficiency
from any part of the curve, FDM, SDM
Other fitting methods:
Window-of-linearity, exponential fitting, LRE, Cy0
Mechanistic models: mak2, mak3, cm3, Also: maxRatio, bilinear model, linexp
Calibration curves:
Classical calibration (Cq versus Conc), confidence intervals, prediction of unknown
samples, bootstrap replicates for better efficiency estimates.
Outlier run detection:
KOD, SOD, multivariate outlier detection (FDM, SDM, slope, Fmax)
Batch analysis:
Fit 10000 (Fluidigm) curves and calculate Eff/Cq in 5 min!
Ratio calculation:
Calculate expression ratios for single or multiple setups by MC simulation/Permutation
(REST)/Error Propagation. Statistical dsplay of results. Conduct reference gene
averaging, if desired.
Visualization:
Single curves, batch curves, 3D display, heatmap display. Failed runs are automatically
marked.
Melting curve analysis:
Automatic detection of melting peaks (Tm) and automatic quality control.
qPCR data import:
Advanced import function that can import ALL data through a series of 8 steps.
Testing methods on replicate qPCR data
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
Cycles
Raw
flu
ore
scence
Rutledge et al. (2004)
6 / 20
0 10 20 30 40
3
4
5
6
7
8
Cycles
Raw
flu
ore
scence
Batsch et al. (2008)
5 / 3
0 10 20 30 40
0.5
1.0
1.5
2.0
Cycles
Raw
flu
ore
scence
Boggy et al. (2010)
6 / 2
0 10 20 30 40 50
0
10
20
30
40
50
60
Cycles
Raw
flu
ore
scence
Guescini et al. (2008)
7 / 12
0 10 20 30 40 50 60
0
2000
4000
6000
8000
10000
Cycles
Raw
flu
ore
scence
Lievens et al. (2012)
5 / 18
0 10 20 30 40 50
0
2
4
6
8
10
12
CyclesR
aw
flu
ore
scence
Spiess et al. (2008)
5 / 18
All datasets available at www.dr-spiess.de
Why sigmoidal models ?
Fitting complete curve…
0 10 20 30 40 50
0
2
4
6
8
10
Cycles
Raw
flu
ore
scence
Sigmoidal
Cq (SDM)
E(Cq) = F(Cq)/F(Cq – 1)
F0 = Fq/E(Cq)Cq
Fq
2 4 6 8 10 12 14
0.0
0.5
1.0
1.5
Cycles
Raw
flu
ore
scence
Mak2 (Boggy et al. 2010)
0 10 20 30 40 50
0
2
4
6
8
10
Cycles
Raw
flu
ore
scence
0 10 20 30 40 50
-1.5
-0.5
0.5
R^2 : 0.99949
Eff: 1.74
Cycles
log(R
FU
)
Exponential (Tichopad et al., 2003) Sliding window (Ramakers et al., 2003)
0 2 4 6 8 10
-10
12
3
R^2 : 0.99646
Eff: 1.934 Cycles 15:19
Raw fluorescence
Effic
iency
LRE (Rutledge et al. 2008)
as opposed to… (curve subsets)
0 10 20 30 40
0
2000
4000
6000
8000
10000
Cycles
Raw
flu
ore
scence
1 2 3 4
400
450
500
550
600
AIC
8.3E18 (!)
6E5 (!)
2E6 (!)
0 10 20 30 40
4000
6000
8000
10000
12000
Cycles
Raw
flu
ore
scence model RMSE
expGrowth 17.84
l7 23.70
l6 28.99
l5 50.77
l4 114.78
l4 l5 l6 l7
More parameters: Better fits...
How often do we need k1·x / k2·x2 ?
Estimate Pr(>|t|)
b -1.273839e+01 < 2.22e-16 ***
c -1.418255e-01 9.8655e-05 ***
d 1.082728e+01 < 2.22e-16 ***
e 1.781830e+01 < 2.22e-16 ***
f 9.283720e-01 < 2.22e-16 ***
k1 3.442433e-02 6.1591e-08 ***
k2 -5.524618e-04 4.3619e-09 ***
k2 k2
Rutledge: 117/120 83/120
Guescini: 80/84 82/84
Lievens: 90/90 81/90
Reps380: 360/380 353/380
0 5 10 15 20
4200
4400
4600
4800
5000
Cycles
Raw
flu
ore
scence
1 2 3 4
17.5
18.0
18.5
19.0
19.5
1 2 3 4
1.4
1.6
1.8
2.0
2.2
l4 l5 l6 l7 l4 l5 l6 l7
Cq and efficiency can differ between models
Cq Eff
Small error in efficiency and small error
in Cq => LARGE error in Ecq (Quantities) !
Efficiency
Fre
quency
1.70 1.75 1.80 1.85 1.90
050
100
150
200
250
µ = 1.8, σ = 0.036, c.v. = 2 %
Quantity
Fre
quency
1e+05 2e+05 3e+05 4e+05 5e+0
0100
200
300
400
µ = 136830, σ = 56474, c.v. = 41.3 %
ECq
Cq
Fre
quency
18.5 19.0 19.5 20.0 20.5 21.0 21.5
0100
200
300
400
500
µ = 20, σ = 0.4, c.v. = 2 %
Quantity
Fre
quency
50000 100000 150000 200000 2500000
50
100
150
200
250
300
ECq
µ = 131025, σ = 31596, c.v. = 24.1 %
ECq
µ = 142558, σ = 70051, c.v. = 49.1 % !!!!
0 10 20 30 40
4000
6000
8000
10000
12000
Cycles
Raw
flu
ore
scence
Similar to Shain et al. (2008)
„maxRatio“ method
Cubic spline fitting
Splines to calculate variance in Cq/Eff
of replicate datasetsAll 380 runs
0 10 20 30 40
0
2000
4000
6000
8000
10000
Cycles
Raw
flu
ore
scence
19.0
19.2
19.4
19.6
19.8
20.0
1.6
41.6
61.6
81.7
01.7
21.7
4
Cq Efficiency
Fq = 1000
Creating qPCR curves with defined
error in Cq/Efficiency or both Create 1000 Cq values with [20, 21] and Eff = 1.8 at Fq = 1. Calculate F0
for each curve by F0 = Fq/(ECq). Create a sigmoidal efficiency curve and
from this, create a qPCR curve F(n + 1) = F(n) * E(n).
Cq [20, 21]
Eff [1.7, 1.8]
qPCR curves with defined
Cq or efficiency…
slope [1.2, 1.8]
Asym (‚f‘) [0.8, 1.2]
qPCR curves with defined
asymmetry or slope…
20.0
20.5
21.0
21.5
1.8
2.0
2.2
2.4
2.6
Sim L4 L5 L6 L7 SPL Sim L4 L5 L6 L7 SPL
Performance of the different models
on the synthetic data
Cq Eff
How many replicates do we need
to estimate the error in Eff/Cq reliably ?
0 200 400 600 800 1000
0.8
00.8
50.9
00.9
51.0
0
# of replicates
st.dev
n=10: 95% n=20: 98%n=200: 99.9%
Create n (2 – 1000) samples from Normal distribution =>
calculate std. dev
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
Cycles
Raw
flu
ore
scence
Rutledge, 20 replicates
0 10 20 30 40 50 60
0
2000
4000
6000
8000
Cycles
Raw
flu
ore
scence
Lievens, 12 replicates
0 10 20 30 40
0.0
00
0.0
02
0.0
04
Cycles
VAR
0 10 20 30 40 50 60
0100000
200000
300000
Cycles
VAR
Typical variance structures in
qPCR replicates
Variance Variance
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
Cycles
Raw
flu
ore
scence
Sta
ndard
ized resid
ual valu
e
-10
12
3
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
Cycles
Raw
flu
ore
scence
Sta
ndard
ized resid
ual valu
e
-10
12
3
Better fit in exponential region
by using weights
unweighted Weighted by 1/Var
1 2 3 4 5 6
15
20
25
30
log(Dilution)
Cq R2: 0.9993
AIC: -106.4
1 2 3 4 5 6
15
20
25
30
log(Dilution)
Cq R2: 0.9995
AIC: -130.2
unweighted
Weighted by 1/Var
Increased linearity in calibration curve
analysis by using weights
Questions welcome !