metabolomic data: combining wavelet representation with learning approaches

41
Metabolomic data: combining wavelet representation with learning approaches Nathalie Villa-Vialaneix http://www.nathalievilla.org In collaboration with Noslen Hernández (CENATAV, La Havane, Cuba) & Philippe Besse IUT de Carcassonne (UPVD) & Institut de Mathématiques de Toulouse Groupe de travail BioPuces, INRA de Castanet May 19th, 2010 1 / 23 Nathalie Villa-Vialaneix N

Upload: tuxette

Post on 11-May-2015

71 views

Category:

Science


1 download

DESCRIPTION

Groupe de travail Biopuces, INRA d'Auzeville May 19th, 2010

TRANSCRIPT

Page 1: Metabolomic data: combining wavelet representation with learning approaches

Metabolomic data: combining waveletrepresentation with learning approaches

Nathalie Villa-Vialaneixhttp://www.nathalievilla.org

In collaboration with Noslen Hernández (CENATAV, La

Havane, Cuba) & Philippe Besse

IUT de Carcassonne (UPVD)

& Institut de Mathématiques de Toulouse

Groupe de travail BioPuces, INRA de Castanet

May 19th, 2010

1 / 23Nathalie Villa-Vialaneix

N

Page 2: Metabolomic data: combining wavelet representation with learning approaches

Présentation générale

1 Presentation of the data

2 Wavelet preprocessing and normalization

3 Learning methods

4 Identification of relevant metabolites

2 / 23Nathalie Villa-Vialaneix

N

Page 3: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Presentation of the data

Data have been provided by Alain Paris (INRA): they aremetabolomic spectra (H NMR) from mice urine and consist of950 variables (from 0.50 ppm to 9.99 ppm).

Peaks have been aligned and baseline has been removed.

3 / 23Nathalie Villa-Vialaneix

N

Page 4: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Presentation of the data

Data have been provided by Alain Paris (INRA): they aremetabolomic spectra (H NMR) from mice urine and consist of950 variables (from 0.50 ppm to 9.99 ppm).

Peaks have been aligned and baseline has been removed.

3 / 23Nathalie Villa-Vialaneix

N

Page 5: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Presentation of the data

Data have been provided by Alain Paris (INRA): they aremetabolomic spectra (H NMR) from mice urine and consist of950 variables (from 0.50 ppm to 9.99 ppm).

Peaks have been aligned and baseline has been removed.3 / 23

Nathalie Villa-VialaneixN

Page 6: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Biologic question

Study the effets of Hypochoeris radicata (HR) ingestion on themetabolism: HR flowers are responsible for a mortal disease forhorses, the “Australian stringhalt” (nervous system attack,trembling...)

Experiences have been made with 72 mice.

4 / 23Nathalie Villa-Vialaneix

N

Page 7: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Biologic question

Study the effets of Hypochoeris radicata (HR) ingestion on themetabolism: HR flowers are responsible for a mortal disease forhorses, the “Australian stringhalt” (nervous system attack,trembling...)Experiences have been made with 72 mice.

4 / 23Nathalie Villa-Vialaneix

N

Page 8: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Description of the experiments

Mice are divided into several groups according to:

genders : 36 males ; 36 females

daily HR doses ingested : 0 (control) : 24 mice ; 3% : 24 mice ;9% : 24 mice

3 sacrifice dates : 8th day : 24 mice ; 15th : 24 mice ; 21st : 24mice

⇒ 18 groups (but groups coming from sacrifice dates are irrelevantfor the biological question).

5 / 23Nathalie Villa-Vialaneix

N

Page 9: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Description of the experiments

Mice are divided into several groups according to:

genders : 36 males ; 36 females

daily HR doses ingested : 0 (control) : 24 mice ; 3% : 24 mice ;9% : 24 mice

3 sacrifice dates : 8th day : 24 mice ; 15th : 24 mice ; 21st : 24mice

⇒ 18 groups (but groups coming from sacrifice dates are irrelevantfor the biological question).

5 / 23Nathalie Villa-Vialaneix

N

Page 10: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Description of the experiments

Mice are divided into several groups according to:

genders : 36 males ; 36 females

daily HR doses ingested : 0 (control) : 24 mice ; 3% : 24 mice ;9% : 24 mice

3 sacrifice dates : 8th day : 24 mice ; 15th : 24 mice ; 21st : 24mice

⇒ 18 groups (but groups coming from sacrifice dates are irrelevantfor the biological question).

5 / 23Nathalie Villa-Vialaneix

N

Page 11: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Description of the experiments

Mice are divided into several groups according to:

genders : 36 males ; 36 females

daily HR doses ingested : 0 (control) : 24 mice ; 3% : 24 mice ;9% : 24 mice

3 sacrifice dates : 8th day : 24 mice ; 15th : 24 mice ; 21st : 24mice

⇒ 18 groups (but groups coming from sacrifice dates are irrelevantfor the biological question).

5 / 23Nathalie Villa-Vialaneix

N

Page 12: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Day of measures

Urine was collected the following days:

Days 0 1 4 8 11 15 18 21Nb of obs. 68 68 68 66 46 44 19 18

For each mice, from 1 to 8 measures were done.Finally, 397 observations with 950 variables.

6 / 23Nathalie Villa-Vialaneix

N

Page 13: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Day of measures

Urine was collected the following days:

Days 0 1 4 8 11 15 18 21Nb of obs. 68 68 68 66 46 44 19 18

For each mice, from 1 to 8 measures were done.

Finally, 397 observations with 950 variables.

6 / 23Nathalie Villa-Vialaneix

N

Page 14: Metabolomic data: combining wavelet representation with learning approaches

Presentation of the data

Day of measures

Urine was collected the following days:

Days 0 1 4 8 11 15 18 21Nb of obs. 68 68 68 66 46 44 19 18

For each mice, from 1 to 8 measures were done.Finally, 397 observations with 950 variables.

6 / 23Nathalie Villa-Vialaneix

N

Page 15: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Basics about wavelets

For a given integer J, a spectrum f can be expressed at level J by:

f(x) =∑

k

αk 2−J/2Ψ(2−Jx − k) +J∑

j=1

∑k

βjk 2−j/2Φ(2−jx − k

)

f(x) =∑

k

αk 2−J/2Ψ(2−Jx − k)︸ ︷︷ ︸Trend based on father wavelet Ψ

+J∑

j=1

∑k

βjk 2−j/2Φ(2−jx − k

)︸ ︷︷ ︸Details of levels 1, . . . , J

based on mother wavelet Φ

7 / 23Nathalie Villa-Vialaneix

N

Page 16: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Basics about wavelets

For a given integer J, a spectrum f can be expressed at level J by:

f(x) =∑

k

αk 2−J/2Ψ(2−Jx − k)︸ ︷︷ ︸Trend based on father wavelet Ψ

+J∑

j=1

∑k

βjk 2−j/2Φ(2−jx − k

)

f(x) =∑

k

αk 2−J/2Ψ(2−Jx − k)︸ ︷︷ ︸Trend based on father wavelet Ψ

+J∑

j=1

∑k

βjk 2−j/2Φ(2−jx − k

)︸ ︷︷ ︸Details of levels 1, . . . , J

based on mother wavelet Φ

7 / 23Nathalie Villa-Vialaneix

N

Page 17: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Basics about wavelets

For a given integer J, a spectrum f can be expressed at level J by:

f(x) =∑

k

αk 2−J/2Ψ(2−Jx − k)︸ ︷︷ ︸Trend based on father wavelet Ψ

+J∑

j=1

∑k

βjk 2−j/2Φ(2−jx − k

)︸ ︷︷ ︸Details of levels 1, . . . , J

based on mother wavelet Φ

7 / 23Nathalie Villa-Vialaneix

N

Page 18: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Example of a hierarchical decomposi-tion for a metabolomic spectrum

↓ ↘

... Details 1 to 8↓ ↘

8 / 23Nathalie Villa-Vialaneix

N

Page 19: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Example of a hierarchical decomposi-tion for a metabolomic spectrum

↓ ↘

... Details 1 to 8↓ ↘

8 / 23Nathalie Villa-Vialaneix

N

Page 20: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Example of a hierarchical decomposi-tion for a metabolomic spectrum

↓ ↘

... Details 1 to 8↓ ↘

8 / 23Nathalie Villa-Vialaneix

N

Page 21: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Example of a hierarchical decomposi-tion for a metabolomic spectrum

... Details 1 to 8↓ ↘

8 / 23Nathalie Villa-Vialaneix

N

Page 22: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Several strategies

Several wavelet basisHaar wavelets (easily interpretable because they are close todiscrete derivatives);

D4 Daubechies wavelets (smoother representation but notdirectly interpretable).

Several preprocessingsUse all wavelet coefficients as input data;

Use thresholded wavelet coefficients as input data (i.e., deletethe smallest coefficient with an automatic method called “softthresholding”);

Use only the detailed coefficients (and the detailed coefficientsof the shifted spectra) as input data.

9 / 23Nathalie Villa-Vialaneix

N

Page 23: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Several strategies

Several wavelet basisHaar wavelets (easily interpretable because they are close todiscrete derivatives);

D4 Daubechies wavelets (smoother representation but notdirectly interpretable).

Several preprocessingsUse all wavelet coefficients as input data;

Use thresholded wavelet coefficients as input data (i.e., deletethe smallest coefficient with an automatic method called “softthresholding”);

Use only the detailed coefficients (and the detailed coefficientsof the shifted spectra) as input data.

9 / 23Nathalie Villa-Vialaneix

N

Page 24: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Scaling of wavelet coefficients (ex: Haardetailed coefficients)

D.1 D.57 D.125 D.297 D.370 D.443 D2.41 D2.120 D2.304 D2.389 D2.474

−40

−20

020

40

Before scaling

D.1 D.57 D.125 D.297 D.370 D.443 D2.41 D2.120 D2.304 D2.389 D2.474

−15

−10

−5

05

1015

After scaling

10 / 23Nathalie Villa-Vialaneix

N

Page 25: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Normalization issue

●●

● ●

●● ●

●●

●●

●●●

●●

● ●

●●

● ●●

● ● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●●● ●

●●●

● ●

●●

●●

●●

●●

●●

●●●

−10 −5 0 5 10 15

−10

−5

05

PC1 vs. PC2

PC1

PC

2

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

● ●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●● ●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

−10 −5 0 5 10 15

−20

−10

010

PC1 vs. PC3

PC1

PC

3

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

−10 −5 0 5 10 15

−15

−5

05

1015

20

PC1 vs. PC4

PC1

PC

4

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

●●

●● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●

● ●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

−10 −5 0 5

−20

−10

010

PC2 vs. PC3

PC2

PC

3

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●●

−10 −5 0 5

−15

−5

05

1015

20

PC2 vs. PC4

PC2

PC

4

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

●●

●●

●●

●●

●●

● ●●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

−20 −10 0 10

−15

−5

05

1015

20

PC3 vs. PC4

PC3

PC

4

Day 0

Day 1

Day 4

Day 8

Day 11

Day 15

Day 18

Day 21

PCA for the coef-ficients: the dayof measure for thecontrol group isemphasized onaxis 2 and 4

11 / 23Nathalie Villa-Vialaneix

N

Page 26: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Normalization

Find median and variance of the coefficients for each day ofmeasure based on the control group.

Use these values for the normalization of all the observations(according to the day of measure).

0 1 4 8 11 15 18 21

−0.

20.

00.

20.

40.

6

D2.444

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 15 18 21

−0.

20−

0.10

0.00

0.10

D.78

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 15 18 21

0.0

0.5

1.0

1.5

2.0

2.5

D.332

Day

Wav

elet

coe

ffici

ents

●●

●●

0 1 4 8 11 15 18 21

−1.

5−

1.0

−0.

5

D2.289

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 18

−2

−1

01

2

D2.444

Day

Wav

elet

coe

ffici

ents

●●

0 1 4 8 11 18

−3

−1

01

2

D.78

Day

Wav

elet

coe

ffici

ents

● ●

0 1 4 8 11 18

−3

−1

01

23

D.332

Day

Wav

elet

bco

effic

ient

s

●●

●●

0 1 4 8 11 18

−3

−1

01

23

D2.289

Day

Wav

elet

coe

ffici

ents

Before After

12 / 23Nathalie Villa-Vialaneix

N

Page 27: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

Normalization

Find median and variance of the coefficients for each day ofmeasure based on the control group.

Use these values for the normalization of all the observations(according to the day of measure).

0 1 4 8 11 15 18 21

−0.

20.

00.

20.

40.

6

D2.444

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 15 18 21

−0.

20−

0.10

0.00

0.10

D.78

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 15 18 21

0.0

0.5

1.0

1.5

2.0

2.5

D.332

Day

Wav

elet

coe

ffici

ents

●●

●●

0 1 4 8 11 15 18 21

−1.

5−

1.0

−0.

5

D2.289

Day

Wav

elet

coe

ffici

ents

0 1 4 8 11 18

−2

−1

01

2

D2.444

Day

Wav

elet

coe

ffici

ents

●●

0 1 4 8 11 18

−3

−1

01

2

D.78

Day

Wav

elet

coe

ffici

ents

● ●

0 1 4 8 11 18

−3

−1

01

23

D.332

Day

Wav

elet

bco

effic

ient

s

●●

●●

0 1 4 8 11 18

−3

−1

01

23

D2.289

Day

Wav

elet

coe

ffici

ents

Before After 12 / 23Nathalie Villa-Vialaneix

N

Page 28: Metabolomic data: combining wavelet representation with learning approaches

Wavelet preprocessing and normalization

PCA after normalization

●●

● ●

●●●

● ●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●●

●● ●

●●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

−10 −5 0 5 10 15

02

46

810

PC1 vs. PC2

PC1

PC

2

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●

●●

● ●●

●●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

−10 −5 0 5 10 15

−10

−5

05

1015

PC1 vs. PC3

PC1

PC

3

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

−10 −5 0 5 10 15

−5

05

PC1 vs. PC4

PC1

PC

4

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●●

●●●

●●

●●

●●

● ●●

● ●

●●

●●

0 2 4 6 8 10 12−

10−

50

510

15

PC2 vs. PC3

PC2

PC

3

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

● ●

●●

0 2 4 6 8 10 12

−5

05

PC2 vs. PC4

PC2

PC

4

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

−10 −5 0 5 10 15

−5

05

PC3 vs. PC4

PC3

PC

4

Day 0Day 1Day 4Day 8Day 11Day 15Day 18Day 21

13 / 23Nathalie Villa-Vialaneix

N

Page 29: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Motivations

Purpose: Validation of the impact of HR ingestion on metabolismby predicting from the spectra the total HR dose ingested. Ifthe prediction is accurate, the impact is not an artefact of the dataand the biological dependency is validated.

Compared methods :

random forest (R package randomForest)

ridge regression (R package glmnet)

LASSO (R package glmnet)

Elasticnet (R package glmnet)

Partial Least Squares (PLS) (R package mixOmics)

sparse PLS (R package mixOmics)

14 / 23Nathalie Villa-Vialaneix

N

Page 30: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Motivations

Purpose: Validation of the impact of HR ingestion on metabolismby predicting from the spectra the total HR dose ingested. Ifthe prediction is accurate, the impact is not an artefact of the dataand the biological dependency is validated.Compared methods :

random forest (R package randomForest)

ridge regression (R package glmnet)

LASSO (R package glmnet)

Elasticnet (R package glmnet)

Partial Least Squares (PLS) (R package mixOmics)

sparse PLS (R package mixOmics)

14 / 23Nathalie Villa-Vialaneix

N

Page 31: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Methodology

Split the data into train and test sets that are balanced according tothe groups;

Preprocess (or not), scale and normalize the data with wavelets;

Learn each of the 6 methods (for each of the 7 kinds ofpreprocessing) on the train set with a cross-validation strategy totune the parameters;

Calculate the mean squared error on the test set.

Repeat the previous scheme 250 times.

15 / 23Nathalie Villa-Vialaneix

N

Page 32: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Methodology

Split the data into train and test sets that are balanced according tothe groups;

Preprocess (or not), scale and normalize the data with wavelets;

Learn each of the 6 methods (for each of the 7 kinds ofpreprocessing) on the train set with a cross-validation strategy totune the parameters;

Calculate the mean squared error on the test set.

Repeat the previous scheme 250 times.

15 / 23Nathalie Villa-Vialaneix

N

Page 33: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Mean performances in test

Methods Original Daubechies Daubechies Daubechies Haar Haar Haar- Details - Full - Threshold - Details - Full - Threshold

ELN 0.5 16.29 (1.03) 15.38 (0.9) 14.33 (1.07) 42.94 (52.25) 15.39 (1.04) 14.49 (1.03) 30.98 (16.43)ELN 0.25 16.12 (1.03) 15.28 (0.9) 14.35 (0.94) 44.62 (61.3) 15.2 (1) 14.47 (0.98) 32.54 (17.31)ELN 0.1 15.81 (0.98) 15.14 (0.77) 14.38 (0.84) 42.58 (53.83) 15.15 (0.87) 14.58 (0.92) 35.41 (19.43)ELN 0.75 16.31 (1.1) 15.48 (0.9) 14.43 (1.1) 42.62 (51.59) 15.44 (1.06) 14.5 (1.01) 30.31 (15.92)Lasso 16.37 (1.27) 15.56 (1.01) 14.45 (1.14) 41.82 (50.86) 15.56 (1.1) 14.49 (1.01) 30.8 (17.01)Ridge 16.82 (0.83) 16.22 (0.67) 15.56 (0.74) 41.75 (25.09) 16.16 (0.7) 15.66 (0.8) 37.58 (16.07)PLS 16.83 (1.1) 16.25 (0.79) 15.61 (0.87) 81.56 (116.21) 16.09 (0.87) 15.87 (0.91) 42.6 (25.14)RF 16.69 (0.91) 16.33 (1.36) 16.2 (1.16) 18.91 (1.66) 16.24 (1.06) 16.11 (1.09) 18.8 (1.32)SPLS 5 19.71 (1.63) 19.25 (1.25) 16.55 (1.18) 36.54 (31.88) 19.1 (1.63) 17.24 (1.4) 34.25 (24.99)SPLS 10 19.25 (1.65) 19.22 (1.23) 16.74 (1.15) 79.35 (110.56) 18.66 (1.36) 17.14 (1.25) 42.46 (23.76)SPLS 20 18.41 (1.5) 18.81 (1.18) 17.55 (1.2) 76.05 (104.74) 18.55 (1.2) 17.11 (1.13) 42.38 (23.74)

16 / 23Nathalie Villa-Vialaneix

N

Page 34: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Boxplot for full Daubechies representa-tion

●●

●● ●

● ●● ●

●●

●●●●●●●

●●

Lass

o

Rid

ge

ELN

0.1

ELN

0.2

5

ELN

0.5

ELN

0.7

5

PLS

SP

LS 5

SP

LS 1

0

SP

LS 2

0

RF

1214

1618

20

Daubechies wavelets − Full

17 / 23Nathalie Villa-Vialaneix

N

Page 35: Metabolomic data: combining wavelet representation with learning approaches

Learning methods

Full Daubechies representation andELN: Accuracy (on test)

●●●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●●

●●●●

●●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●

●● ●

0 50 100 150

050

100

150

True values

Pred

icted

value

s

Mean R2 on test sets is equal to 89.0% (minimum is 83.1% andmaximum is 92.8%). 18 / 23

Nathalie Villa-VialaneixN

Page 36: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

Identification issue

The full learning process is the following:

Spectra→Wavelet preprocess→ Learning→ HR dose prediction

Hence, due to the preprocessing step, the coefficients selectedby ELN are not directly related to metabolites (or to localizationon the spectra).

19 / 23Nathalie Villa-VialaneixN

Page 37: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

Identification issue

The full learning process is the following:

Spectra→Wavelet preprocess→ Learning→ HR dose prediction

Hence, due to the preprocessing step, the coefficients selectedby ELN are not directly related to metabolites (or to localizationon the spectra).

19 / 23Nathalie Villa-VialaneixN

Page 38: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

Adaptation of the importance measure

for Each of the 950 variables, v, of the original data set doRandomize the observations of the variable vCompute the full Daubechies wavelet representationwith the randomized observations for vScale and normalize according to the true values mean,median or variancefor Each test set, i do

Calculate new predictions with false values of vand corresponding mse: msev ,i

Calculate decrease in accuracy for test set: DAi =1 − msei

msev ,iend forAverage over i, DAi , to obtain Importance of v

end for20 / 23

Nathalie Villa-VialaneixN

Page 39: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

Values of importance

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 200 400 600

0.0

0.2

0.4

0.6

0.8

Rank

Impo

rtan

ce

21 / 23Nathalie Villa-Vialaneix

N

Page 40: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

Identification of important metabolites

2 4 6 8 10

05

1015

20

ppm

Some havealready been identified: the most important is scyllo-inositol; oneof the orange is probably valine; one of the light yellow is probablytrimethylamine. The others are new.

22 / 23Nathalie Villa-Vialaneix

N

Page 41: Metabolomic data: combining wavelet representation with learning approaches

Identification of relevant metabolites

What next?

Identification of the metabolites, study of the correlation betweenthe ones found and the ones previously emphasized.Questions? Propositions?

23 / 23Nathalie Villa-Vialaneix

N