improving sigmoidalfitting of real-timepcr data: linear ... · testing methods on replicate qpcr...

Improving sigmoidal fitting of real-timePCR

data: Linear-quadratic-sigmoidal hybrid

models, error analysis and weighting

Andrej-Nikolai Spiess

University Hospital Hamburg-Eppendorf

Department of Andrology

R: the lingua franca of statistical data analysis

• developed by John Chambers and Ross Ihaka at

the beginning of the 80´s as S

• was commercialised to S-Plus but also made

freely available (open-source) as R.

• 15 core people maintain R, ~ 2 million users

worldwide

• R‘s own repertoire of ~ 3000 functions can be

extended by ‚packages‘ (i.e. for spatial or

environmental statistics, genomics, no limit…)

=> ~ 4500 packages available

qpcR: An R package for real-time PCR analysis

Ratio

Calculation

(MC simulation,

Permutation (REST),

Error Propagation)

A typical workflow using qpcR

‚pcrimport‘

‚pcrfit‘ ‚modlist‘ ‚replist‘

Data import

Model

creation

0 10 20 30 40 50

02

46

810

Cycles

Raw

flu

ore

scence

0 10 20 30 40 50

02

46

810

12

Cycles

Raw

flu

ore

sce

nce

0 10 20 30 40 50

02

46

810

12

Cycles

Raw

flu

ore

sce

nce

Model

selectionL3, L4, L5,

L6, L7

AIC, AICc, BIC,

PRESS, F-test, R2Goodness

of fit

‚efficiency‘, ‚sliwin‘, ‚expfit‘, ‚calib‘

0 10 20 30 40 50

02

46

810

Cycles

Raw

flu

ore

scence

2.356

1.0

1.2

1.4

1.6

1.8

2.0

2.2

Effic

iency

cpD2: 15.68cpD1: 17.59

Eff: 1.795

resVar: 0.00603899 AICc: -102.58

Model: l5

0 10 20 30 40 50

02

46

810

Cycles

Raw

flu

ore

scence

Eff/Cq value

0 10 20 30 40 50

-6-4

-20

2

R^2 : 0.99986

Eff: 1.76

Cycles

log(R

FU

)

0 10 20 30 40 50

02

46

810

Cycles

Raw

flu

ore

scence

0 1 2 3 4 5

15

20

25

30

log(Dilution or copy number)

thre

shold

cycle

AIC: -4.55 Rsq: 0.99967

Eff: 2.04

‚ratiocalc‘‚ratiobatch‘Monte-Carlo

0.37 1 2.72 7.39 20.09 54.6

Permutation

0.37 1 2.72 7.39 20.09 54.6

Error propagation

0.37 1 2.72 7.39 20.09 54.6

5e-04

1e-03

5e-03

1e-02

5e-02

1e-01

5e-01

1e+00

r1c1:g1c1:r1s1:g1s1

2

5

10

r2c1:g1c1:r2s1:g1s1

0.16

0.18

0.20

0.22

0.24

0.26

0.28

0.30

r1c1:g1c1:r1s2:g1s2

2

4

6

8

10

12

r2c1:g1c1:r2s2:g1s2

Monte-Carlo

Simulation

Permutation

ErrorPropagation

r1c1:g1c1:r1s1:g1s1 r2c1:g1c1:r2s1:g1sr1c1:g1c1:r1s2:g1s2r2c1:g1c1:r2s2:g1s2

Mean.Sim 0.419531879 6.533495786 0.223828241 5.770492727

Std.dev..Sim 0.147043285 1.353377023 0.017764239 1.047243322

Median.Sim 0.396300098 6.393701355 0.223282684 5.669149665

MAD.Sim 0.133055518 1.304387664 0.017778066 1.016963311

Conf.lower.S 0.204116613 4.284885281 0.191219778 4.00151269

Conf.upper.S 0.767489852 9.538449133 0.260947425 8.068843646

Mean.Perm 0.410562957 6.513010694 0.223663646 5.749083527

Std.dev..Per 0.114406471 1.081009488 0.015364639 0.712937423

Median.Perm 0.330239654 6.74781282 0.236810989 6.37620751

MAD.Perm 0.079692836 1.010968373 0.006290143 0.169364005

Conf.lower.P 0.280518979 5.152642278 0.206810681 4.973967308

Conf.upper.P 0.560169465 7.961269931 0.240735435 6.481874382

perm > init.P 0.4995 0 0.535 0

perm == init 0.5005 0.4925 0.465 0.5125

perm < init.P 0 0.5075 0 0.4875

Mean.Prop 0.395972171 6.39779022 0.223126803 5.678021311

Std.dev..Pro 0.134606979 1.310527148 0.017678287 1.020096686

Conf.lower.P 0.13214734 3.829204209 0.188477998 3.678668546

Conf.upper.P 0.659797003 8.96637623 0.257775607 7.677374075

Data

export

What can qpcR do?Sigmoidal fitting:

3-, 4-, 5-, 6-, 7-parameter sigmoidal models. Model selection for the best model based on

F-test. Goodness of fit measures: AIC, AICc, BIC, R2, PRESS, Chi2. Extract Cq, efficiency

from any part of the curve, FDM, SDM

Other fitting methods:

Window-of-linearity, exponential fitting, LRE, Cy0

Mechanistic models: mak2, mak3, cm3, Also: maxRatio, bilinear model, linexp

Calibration curves:

Classical calibration (Cq versus Conc), confidence intervals, prediction of unknown

samples, bootstrap replicates for better efficiency estimates.

Outlier run detection:

KOD, SOD, multivariate outlier detection (FDM, SDM, slope, Fmax)

Batch analysis:

Fit 10000 (Fluidigm) curves and calculate Eff/Cq in 5 min!

Ratio calculation:

Calculate expression ratios for single or multiple setups by MC simulation/Permutation

(REST)/Error Propagation. Statistical dsplay of results. Conduct reference gene

averaging, if desired.

Visualization:

Single curves, batch curves, 3D display, heatmap display. Failed runs are automatically

marked.

Melting curve analysis:

Automatic detection of melting peaks (Tm) and automatic quality control.

qPCR data import:

Advanced import function that can import ALL data through a series of 8 steps.

Testing methods on replicate qPCR data

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

Cycles

Raw

flu

ore

scence

Rutledge et al. (2004)

6 / 20

0 10 20 30 40

3

4

5

6

7

8

Cycles

Raw

flu

ore

scence

Batsch et al. (2008)

5 / 3

0 10 20 30 40

0.5

1.0

1.5

2.0

Cycles

Raw

flu

ore

scence

Boggy et al. (2010)

6 / 2

0 10 20 30 40 50

0

10

20

30

40

50

60

Cycles

Raw

flu

ore

scence

Guescini et al. (2008)

7 / 12

0 10 20 30 40 50 60

0

2000

4000

6000

8000

10000

Cycles

Raw

flu

ore

scence

Lievens et al. (2012)

5 / 18

0 10 20 30 40 50

0

2

4

6

8

10

12

CyclesR

aw

flu

ore

scence

Spiess et al. (2008)

5 / 18

All datasets available at www.dr-spiess.de

Why sigmoidal models ?

Fitting complete curve…

0 10 20 30 40 50

0

2

4

6

8

10

Cycles

Raw

flu

ore

scence

Sigmoidal

Cq (SDM)

E(Cq) = F(Cq)/F(Cq – 1)

F0 = Fq/E(Cq)Cq

Fq

2 4 6 8 10 12 14

0.0

0.5

1.0

1.5

Cycles

Raw

flu

ore

scence

Mak2 (Boggy et al. 2010)

0 10 20 30 40 50

0

2

4

6

8

10

Cycles

Raw

flu

ore

scence

0 10 20 30 40 50

-1.5

-0.5

0.5

R^2 : 0.99949

Eff: 1.74

Cycles

log(R

FU

)

Exponential (Tichopad et al., 2003) Sliding window (Ramakers et al., 2003)

0 2 4 6 8 10

-10

12

3

R^2 : 0.99646

Eff: 1.934 Cycles 15:19

Raw fluorescence

Effic

iency

LRE (Rutledge et al. 2008)

as opposed to… (curve subsets)

0 10 20 30 40

0

2000

4000

6000

8000

10000

Cycles

Raw

flu

ore

scence

1 2 3 4

400

450

500

550

600

AIC

8.3E18 (!)

6E5 (!)

2E6 (!)

0 10 20 30 40

4000

6000

8000

10000

12000

Cycles

Raw

flu

ore

scence model RMSE

expGrowth 17.84

l7 23.70

l6 28.99

l5 50.77

l4 114.78

l4 l5 l6 l7

More parameters: Better fits...

How often do we need k1·x / k2·x2 ?

Estimate Pr(>|t|)

b -1.273839e+01 < 2.22e-16 ***

c -1.418255e-01 9.8655e-05 ***

d 1.082728e+01 < 2.22e-16 ***

e 1.781830e+01 < 2.22e-16 ***

f 9.283720e-01 < 2.22e-16 ***

k1 3.442433e-02 6.1591e-08 ***

k2 -5.524618e-04 4.3619e-09 ***

k2 k2

Rutledge: 117/120 83/120

Guescini: 80/84 82/84

Lievens: 90/90 81/90

Reps380: 360/380 353/380

0 5 10 15 20

4200

4400

4600

4800

5000

Cycles

Raw

flu

ore

scence

1 2 3 4

17.5

18.0

18.5

19.0

19.5

1 2 3 4

1.4

1.6

1.8

2.0

2.2

l4 l5 l6 l7 l4 l5 l6 l7

Cq and efficiency can differ between models

Cq Eff

Small error in efficiency and small error

in Cq => LARGE error in Ecq (Quantities) !

Efficiency

Fre

quency

1.70 1.75 1.80 1.85 1.90

050

100

150

200

250

µ = 1.8, σ = 0.036, c.v. = 2 %

Quantity

Fre

quency

1e+05 2e+05 3e+05 4e+05 5e+0

0100

200

300

400

µ = 136830, σ = 56474, c.v. = 41.3 %

ECq

Cq

Fre

quency

18.5 19.0 19.5 20.0 20.5 21.0 21.5

0100

200

300

400

500

µ = 20, σ = 0.4, c.v. = 2 %

Quantity

Fre

quency

50000 100000 150000 200000 2500000

50

100

150

200

250

300

ECq

µ = 131025, σ = 31596, c.v. = 24.1 %

ECq

µ = 142558, σ = 70051, c.v. = 49.1 % !!!!

0 10 20 30 40

4000

6000

8000

10000

12000

Cycles

Raw

flu

ore

scence

Similar to Shain et al. (2008)

„maxRatio“ method

Cubic spline fitting

Splines to calculate variance in Cq/Eff

of replicate datasetsAll 380 runs

0 10 20 30 40

0

2000

4000

6000

8000

10000

Cycles

Raw

flu

ore

scence

19.0

19.2

19.4

19.6

19.8

20.0

1.6

41.6

61.6

81.7

01.7

21.7

4

Cq Efficiency

Fq = 1000

Creating qPCR curves with defined

error in Cq/Efficiency or both Create 1000 Cq values with [20, 21] and Eff = 1.8 at Fq = 1. Calculate F0

for each curve by F0 = Fq/(ECq). Create a sigmoidal efficiency curve and

from this, create a qPCR curve F(n + 1) = F(n) * E(n).

Cq [20, 21]

Eff [1.7, 1.8]

qPCR curves with defined

Cq or efficiency…

slope [1.2, 1.8]

Asym (‚f‘) [0.8, 1.2]

qPCR curves with defined

asymmetry or slope…

20.0

20.5

21.0

21.5

1.8

2.0

2.2

2.4

2.6

Sim L4 L5 L6 L7 SPL Sim L4 L5 L6 L7 SPL

Performance of the different models

on the synthetic data

Cq Eff

How many replicates do we need

to estimate the error in Eff/Cq reliably ?

0 200 400 600 800 1000

0.8

00.8

50.9

00.9

51.0

0

# of replicates

st.dev

n=10: 95% n=20: 98%n=200: 99.9%

Create n (2 – 1000) samples from Normal distribution =>

calculate std. dev

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

Cycles

Raw

flu

ore

scence

Rutledge, 20 replicates

0 10 20 30 40 50 60

0

2000

4000

6000

8000

Cycles

Raw

flu

ore

scence

Lievens, 12 replicates

0 10 20 30 40

0.0

00

0.0

02

0.0

04

Cycles

VAR

0 10 20 30 40 50 60

0100000

200000

300000

Cycles

VAR

Typical variance structures in

qPCR replicates

Variance Variance

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

Cycles

Raw

flu

ore

scence

Sta

ndard

ized resid

ual valu

e

-10

12

3

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

Cycles

Raw

flu

ore

scence

Sta

ndard

ized resid

ual valu

e

-10

12

3

Better fit in exponential region

by using weights

unweighted Weighted by 1/Var

1 2 3 4 5 6

15

20

25

30

log(Dilution)

Cq R2: 0.9993

AIC: -106.4

1 2 3 4 5 6

15

20

25

30

log(Dilution)

Cq R2: 0.9995

AIC: -130.2

unweighted

Weighted by 1/Var

Increased linearity in calibration curve

analysis by using weights

Questions welcome !

improving sigmoidalfitting of real-timepcr data: linear ... · testing methods on replicate qpcr...

Documents