structured data analysis -...
TRANSCRIPT
Neuroimaging-Genetic multiblock datasets
X1 DNA arrays (SNP)
p1 ~106
X2 Functional MRI
p2 ~104
X3 Developmental disorders - Reading
difficulties - Basic numerical
knowledge -…- Visuo-spatial
abilities - Visuo-motor abilities
p3 ~10
n ~100
c12=1 c23=1
c13=0
Block components
𝐲1 = 𝐗1𝐚1 = 𝑎11𝐒𝐍𝐏1 +⋯+ 𝑎1,𝑗1𝐒𝐍𝐏𝑗1
𝐲2 = 𝐗2𝐚2 = 𝑎21𝐂𝐆𝐇1 +⋯+ 𝑎2,𝑗2𝐂𝐆𝐇𝑗2
𝐲3 = 𝐗3𝐚3 = 𝑎31𝐁𝐄𝐇𝐀𝐕𝟏 + 𝑎3,𝑗3𝐁𝐄𝐇𝐀𝐕𝑗2
Block components should verified two properties at the same
time:
(i) Block components well explain their own block.
(ii) Block components are as correlated as possible for
connected blocks.
RGCCA optimization problem
argmax𝐚1 ,𝐚2 ,…,𝐚𝐽
𝑐𝑗𝑘 g cov 𝐗𝑗𝐚𝑗 ,𝐗𝑘𝐚𝑘
𝐽
𝑗≠𝑘
1 − 𝜏𝑗 var 𝐗𝑗𝐚𝑗 + 𝜏𝑗 𝐚𝑗 2
= 1, 𝑗 = 1,… , 𝐽 Subject to the constraints
and:
i d e n t i t y ( H o r s t s h e m e )
s q u a r e ( F a c t o r i a l s c h e m e )
a b o l u t e v a l u e ( C e n t r o i d s c h e m e )
g
S h r i n k a g e c o n s t a n t b e t w e e n 0 a n d 1j
otherwise 0
connected is and if 1kj
XX
jkcwhere:
• Tenenhaus A. and Tenenhaus M., Regularized Generalized Canonical Correlation Analysis, Psychometrika, vol. 76, Issue 2, pp. 257-284, 2011
• Tenenhaus A., Philippe C., Frouin V., Kernel Generalized Canonical Correlation Analysis, Computational Statistics and Data Analysis, submitted.
• Tenenhaus A. and Guillemot V. (2013): RGCCA Package. http://cran.project.org/web/packages/RGCCA/index.html
Block components
Block components should verified two properties at the same
time:
(i) Block components well explain their own block.
(ii) Block components are as correlated as possible for
connected blocks.
(iii) Block components are built from sparse 𝐚𝒋
𝐲1 = 𝐗1𝐚1 = 𝑎11𝐒𝐍𝐏1 +⋯+ 𝑎1,𝑗1𝐒𝐍𝐏𝑗1
𝐲2 = 𝐗2𝐚2 = 𝑎21𝐂𝐆𝐇1 +⋯+ 𝑎2,𝑗2𝐂𝐆𝐇𝑗2
𝐲3 = 𝐗3𝐚3 = 𝑎31𝐁𝐄𝐇𝐀𝐕𝟏 + 𝑎3,𝑗3𝐁𝐄𝐇𝐀𝐕𝑗2
Behavioral data (Clinic, psychometric)
Intermediate phenotype
Final phenotype
Genotype
Functional MRI
Gene Expression
Structured variable selection for RGCCA
(Structured) variable selection for RGCCA
argmax𝐚1 ,𝐚2 ,…,𝐚𝐽
𝑐𝑗𝑘 g cov 𝐗𝑗𝐚𝑗 ,𝐗𝑘𝐚𝑘
𝐽
𝑗≠𝑘
subject to 𝐚𝑗𝑡𝐌𝑗𝐚𝑗 = 1, 𝑗 = 1,… , 𝐽
Ω(𝐚𝑗 ) ≤ 𝑐𝑗 , 𝑗 = 1,… , 𝐽
• LASSO: Ω 𝐚𝐣 = 𝐚𝐣 1
Ω 𝐚𝐣 = 𝑎𝑗𝑘
𝑝𝑗
𝑘=1
+ 𝜆 𝑎𝑗𝑘 − 𝑎𝑗,𝑘−1
𝑝𝑗
𝑘=1
Ω 𝐚𝐣 = ag 2𝑔∈𝒢
• Group LASSO:
• Fused LASSO:
• Tenenhaus A., Philippe C., Guillemot V., Lê Cao K.-A., Grill J., Frouin V., Variable Selection for Generalized Canonical Correlation Analysis, Biostatistics,
doi : 10.1093/biostatistics/kxu001, 2014.
• Löfstedt T., Hadj-Salem F., Guillemot V., Philippe C., Duchesnay E., Frouin V., and Tenenhaus A., (2014). Structured variable selection for generalized
canonical correlation analysis. In: Proceedings of the 8th International Conference on Partial Least Squares and Related Methods (PLS14), Paris, France.
multigroup data analysis
• SETTINGS: The same set of
variables are measured on
individuals structured in
several groups.
• OBJECTIVE: investigate
the relationships between
variables within the various
groups.
X2 n
1
p
X2
n2
nI
• Tenenhaus, A. and Tenenhaus, M. (2014). Regularized Generalized Canonical Correlation Analysis for multiblock or multigroup data analysis.
European Journal of Operational Research, 238 :391–403.
argmax𝐚1 ,𝐚2 ,…,𝐚𝐼
𝑐𝑖𝑙g 𝐗𝑖𝑡𝐗𝑖𝐚𝑖 ,𝐗𝑙
𝑡𝐗𝑙𝐚𝑙
𝐼
𝑖 ,𝑙 ,𝑖≠𝑙
1 − 𝜏𝑖 𝐗𝐢𝐚𝑖 2 + 𝜏𝑖 𝐚𝑖
2 = 1, 𝑖 = 1,… , 𝐼 s.c.
X1
X6
SNP array Final Phenotype
n X2
X3
X4
X5
Anatomical MRI Diffusion MRI Functional MRI PET
p2 p2 p2 p2 p1 p3
From Multiblock data to …
X1
X3
SNP array Final Phenotype
n
p1 p3
… to Multiblock / Multiway data
p2
𝐗2
NeuroImaging
argmax𝐚1 ,𝐚2 ,…,𝐚𝐽
𝑐𝑗𝑘 g cov 𝐗𝑗𝐚𝑗 ,𝐗𝑘𝐚𝑘
𝐽
𝑗≠𝑘
𝐚𝑗𝑡𝐌𝑗𝐚𝑗 = 1 and 𝐚𝑗 = 𝐚𝑗
𝐾 ⊗𝐚𝑗𝐽 , 𝑗 = 1,… , 𝐽 subject to the constraints
• Tenenhaus A., Le Brusquet L. Regularized Generalized Canonical Correlation Analysis extended to three way data, International Conference of the ERCIM
WG on Computational and Methodological Statistics, 2014
• Tenenhaus A., Le Brusquet L. Three-way Regularized Generalized Canonical Correlation Analysis, ThRee-way methods In Chemistry And Psychology,
(TRICAP) ,2015