combining asca and mixed models to analyse high dimensional … · 2018-02-09 · combining asca...

24
Combining ASCA and mixed models to analyse high dimensional designed data M. Martin P. De Tullio B. Govaerts Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université catholique de Louvain (UCL), Belgium Laboratoire de Chimie Pharmaceutique, Université de Liège (ULg), Liège 1, Belgium January 30, 2018

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Combining ASCA and mixed models to analysehigh dimensional designed data

M. Martin 1 P. De Tullio 2 B. Govaerts 1

1 Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université catholique de Louvain (UCL), Belgium

2Laboratoire de Chimie Pharmaceutique, Université de Liège (ULg), Liège 1, Belgium

January 30, 2018

Page 2: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

High Dimensional Experimental Design data

Challenging data analysis

Application to life science data:From -omics sciences: genomics, transcriptomics, metabolomicsHigh Dimensional: multivariate; often m variables > n samplesBiological variability, instrumental noise/artifactsMulticollinearity between variables

Nontrivial Design Of Experiments (DOE):ex.: longitudinal, multi-centre, cross-over studies, etc.Presence of random factors (day, lab variation, ...)

2 / 24

Page 3: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Table of content

High Dimensional designed data analysis

ASCA methodology

Extension to mixed models: description & application

3 / 24

Page 4: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Methods needed to analyse HD designed data

Multivariate projection methods

- PCA, ICA, (O)PLS, ...- Disadvantages:

Simple exp. designs (e.g. 2 groups comparison)

Few statistical modelling and tests

Statistical regression methods

Depends on the response dimension:y(1�m): linear or logistic regression, ANOVA, mixed modelsY(n�m) with m < n: MANOVA, multivariate-GLM

But often mo n) need to combine dimension reduction and statisticalmodelling.

4 / 24

Page 5: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

ASCA methodology

) Combine the dimension reduction methodsand statistical modelling [Jansen et al., 2005]:

ANOVA-Simultaneous Components Analysis (ASCA)

Example: crossed ANOVA II model:yijk = �:: + �i + �j + (��)ij + �ijk

STEP 1: Parallel ANOVA decomposition of YY(n�m) is decomposed into effect matrices:

Y = M0 + MA + MB + MAB + E

STEP 2: (Residual-Augmented) effect matricesvisualisation

ASCA: Mf = TfP′f APCA: Mf +E = Tf P

′f

STEP 3: Effect importance quantification based onFrobenius norms jjMf jj

2

STEP 4: Global measure of effect significance based onpermutation tests

5 / 24

Page 6: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Limitations with classic ASCA and new developments

1. Balanced ANOVA designs with fixed factors

) Generalisation to unbalanced data with fixed and random factors

2. // ANOVA does not take into account the correlation between the responses

) Prior PCA reduction & back-transformations

3. Permutation tests implementation is challenging for advanced DOE[Anderson and Braak, 2003]

) Use of alternative test strategies: Likelihood ratio tests & bootstrap

6 / 24

Page 7: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

APCA+ for unbalanced designs (Thiel et al. [2017], Guisset et al. [Submitted])

7 / 24

Page 8: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Mixed Models for High Dimensional Designed Data (MiMoHD3)

8 / 24

Page 9: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

The Metabiose repeatability datasets

Context:Urine and Serum 1H-NMR spectral data from control and endometriosis patients

Statistical perspective of spectral reproducibility/repeatability and quality control: not yet wellstudied in metabolomics

Unbalanced design for the Serum dataset

3 factors:1 fixed: groups (G; 2)

2 random: patients (P; 7/group) &repetitions over weeks (W; 3)

Main research questions:Compare the groups

Quantify the variability of therepetitions and the patients

Test the significance of theserandom/fixed effects

Experimental design

9 / 24

Page 10: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 0: Prior PCA dimension reduction (1)

10 / 24

Page 11: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 0: Prior PCA dimension reduction (2)

Transform the highly correlated response matrix into a reduced number of orthogonalcomponents without information loss

PCA dimension reduction on the response matrix Y = TCP′C + F

Keep C = 11 first Principal Components (PC) withP

Cvar(PC) � 99%

●●

●●

● ●

●●

●●●

●●●●

●●

●●

●●

●●

● ●

−0.04

−0.02

0.00

0.02

0.04

0.06

−0.10 −0.05 0.00 0.05 0.10

PC1 (66.17%)

PC

2 (1

5.52

%)

Group

Endo

Control

Scores plot − Serum

●●

●●

● ●

●●

●●●

●●●●

●●

●●

●●

●●

● ●

−0.04

−0.02

0.00

0.02

0.04

0.06

−0.10 −0.05 0.00 0.05 0.10

PC1 (66.17%)

PC

2 (1

5.52

%)

Patient

210E

211E

214E

219C

221C

223E

226C

241E

248C

254E

258C

261C

265E

266C

Scores plot − Serum

11 / 24

Page 12: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 1: Parallel mixed models on Tn�C

12 / 24

Page 13: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

General framework for mixed models

Fixed + random factors vs linear regression (only 1 source of random variation)

The mixed model for one response tc can be written as:

Variance components = �2P + �

2R + �

2

Drop the hypothesis of independence between the samples) model advanced designs

Coding: Sum coding for fixed and dummy coding for random effects

Typical applications: Multicentre study, Multilevel data, Repeated data, etc.

Parameters are estimated with the Restricted Maximum Likelihood (REML) method

13 / 24

Page 14: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 2: PCA decomposition of (RA) effect matrices

14 / 24

Page 15: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

PCA on fixed/random pure effect matrices

●●●●●●●● ●●●●●●● ●●●●● ●●●●●● ●●●●●●●●● ●●●

−0.50

−0.25

0.00

0.25

0.50

−0.02 −0.01 0.00 0.01

PC1 (100%)

PC

2 (0

%)

Scores plot − Group effect − Serum

−4e−04

−2e−04

0e+00

2e−04

−0.002 −0.001 0.000 0.001 0.002

PC1 (97.61%)

PC

2 (2

.39%

)

Scores plot − Repetition effect − Serum

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●●●●●

−0.02

−0.01

0.00

0.01

0.02

−0.050 −0.025 0.000 0.025

PC1 (75.88%)

PC

2 (1

8.2%

)

Scores plot − Patient effect − Serum

PC

1

2.55.07.510.0−0.8

−0.6

−0.4

−0.2

0.0

Loadings plot − Group effect − Serum

PC

1P

C2

2.55.07.510.0

−0.2

0.0

0.2

0.4

0.6

−0.25

0.00

0.25

Loadings plot − Repetition effect − Serum

PC

1P

C2

2.55.07.510.0

−0.50

−0.25

0.00

−0.50

−0.25

0.00

0.25

0.50

Loadings plot − Patient effect − Serum

15 / 24

Page 16: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

PCA on fixed/random Residual-Augmented effect matrices

In APCA: E added to Mf

But which variance components should be added for mixed models?

Solution: RA effect matrices based on the ANOVA F-tests (Expected Mean Squaresratio)

MG = MG+MP +E

MP = MP+E

MR = MR+E

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●

−0.02

0.00

0.02

0.04

−0.05 0.00 0.05

PC1 (66.86%)

PC

2 (1

5.67

%)

Group

Endo

Control

Scores plot − RA Group effect − Serum

●●

● ●

●●

●●●

●●●

−0.02

−0.01

0.00

0.01

0.02

−0.02 −0.01 0.00 0.01 0.02

PC1 (58.33%)

PC

2 (2

2.02

%)

Repetition

1

2

3

Scores plot − RA Repetition effect − Serum

●●

●●

●●

●●

●●●

●●

●●

● ●

−0.02

0.00

0.02

0.04

−0.075 −0.050 −0.025 0.000 0.025 0.050

PC1 (61.58%)

PC

2 (1

8.17

%)

Patient

210E

211E

214E

219C

221C

223E

226C

241E

248C

254E

258C

261C

265E

266C

Scores plot − RA Patient effect − Serum

16 / 24

Page 17: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Steps 3 & 4: Global measure of factor importance/significance

17 / 24

Page 18: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 3: Quantification of effect importance

For each response tc, c = 1; ::; C:

Random effects: Variance components

�2P;c; �2R;c; �

2c

Fixed effects [Nakagawa and Schielzeth, 2013]:

�2G;c = var(�G;cxG)

For all tc responses:

Total variance:

�2tot =PC

c=1(�2G;c + �2P;c + �2R;c + �2c )

Serum Urine

GroupPatientRepetitionResiduals

Variance components percentage

Medium

020

4060

80

Serum UrineGroup 14.32 2.4Patient 64.03 91.48Repetition 0.6 0Residuals 21.05 6.12

Effect importance percentage

18 / 24

Page 19: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Step 4: Global measure of effect significance

Likelihood Ratio Test (LRT)Test the significance of a fixed/random effect in a (mixed) linear modelCompare the likelihoods L between nested models

LRT statistic: 2[log(Lfull)� log(Lnull)] �H0�2df

a Global LRT (GLRT)The test statistic for a fixed/random effect matrix

GLRT statistic: 2[PC

c=1(log(Lfull;c)� log(Lnull;c))] �H0

�2C�df

Assess the significance of an effect based on:

The �2 distribution with known df (fixed effects)bootstrap simulations (fixed/random effects)

19 / 24

Page 20: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Histograms of bootstrapped GLRT (Nsim = 2000)

Fixed Group effect − Serum

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

● True GLRT: 8.93Kernel densitychi2 distrib. (df=11)qchi2=0.95

p−val: 0.72

Fixed Group effect − Urine

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

● True GLRT: 15.61Kernel densitychi2 distrib. (df=12)qchi2=0.95

p−val: 0.365

20 / 24

Page 21: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Histograms of boostraped GLRT (Nsim = 2000)

Random Patient effect − Serum

0 20 40 60 80 100

0.00

0.04

0.08

0.12

● True GLRT: 112.44Kernel density

p−val: 0***

Random Repetition effect − Serum

Den

sity

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

● True GLRT: 2.01Kernel density

p−val: 0.84

Random Patient effect − Urine

0 200 400 600 800

0.00

0.02

0.04

0.06

0.08

0.10

0.12

● True GLRT: 850.15Kernel density

p−val: 0***

Random Repetition effect − Urine

Den

sity

0 5 10 15 20 25

0.00

0.05

0.10

0.15

0.20

● True GLRT: 0Kernel density

p−val: 1

21 / 24

Page 22: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

In summary

MiMoHD3, a combination of mixed models & multivariate projection methods

Innovative extension and generalisation in the ASCA(+) framework

Enable to model unbalanced designs with random factors

Take into account the correlation between the response variables

Global test of effect significance

Quantification and comparison of the mixed variability sources

Targeted applications: repeatability/reproductibility study & longitudinal data

22 / 24

Page 23: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

Acknowledgments

23 / 24

Page 24: Combining ASCA and mixed models to analyse high dimensional … · 2018-02-09 · Combining ASCA and mixed models to analyse high dimensional designed data M. Martin 1P. De Tullio

Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References

References

Marti Anderson and Cajo Ter Braak. Permutation tests for multi-factorial analysis of variance.Journal of Statistical Computation and Simulation, 73(2):85–113, 2003. doi:10.1080/00949650215733. URL http://dx.doi.org/10.1080/00949650215733.

Séverine Guisset, Bernadette Govaerts, and Manon Martin. Comparison of parafasca, acomdim,and amopls approaches in the multivariate glm modelling of multi-factorial designs.Chemometrics and Intelligent Laboratory Systems, Submitted.

Jeroen J. Jansen, Huub C. J. Hoefsloot, Jan van der Greef, Marieke E. Timmerman, Johan A.Westerhuis, and Age K. Smilde. Asca: analysis of multivariate data obtained from anexperimental design. Journal of Chemometrics, 19(9):469–481, 2005. ISSN 1099-128X. doi:10.1002/cem.952. URL http://dx.doi.org/10.1002/cem.952.

Shinichi Nakagawa and Holger Schielzeth. A general and simple method for obtaining r2 fromgeneralized linear mixed-effects models. Methods in Ecology and Evolution, 4(2):133–142,2013. ISSN 2041-210X. doi: 10.1111/j.2041-210x.2012.00261.x. URLhttp://dx.doi.org/10.1111/j.2041-210x.2012.00261.x.

Michel Thiel, Baptiste Féraud, and Bernadette Govaerts. Asca+ and apca+: Extensions of ascaand apca in the analysis of unbalanced multifactorial designs. Journal of Chemometrics, 31(6):e2895–n/a, 2017. ISSN 1099-128X. doi: 10.1002/cem.2895. URLhttp://dx.doi.org/10.1002/cem.2895. e2895 cem.2895.

24 / 24