refined blood-borne mirnome of human diseases via pca-based feature extraction
TRANSCRIPT
Refined blood-borne miRNome of human diseases via PCA-based feature extraction
Y-h. TaguchiDepartment of Physics,
Chuo University
Yoshiki MurakamiCenter for Genomic Medicine
Kyoto University
Caution:
Main results obtained by the collaboration with Prof. Murakami are based upon his own experiments ( * ), but our results are related to planed patent proposal. Thus, here we decided to present our methods applied to alternative public data.
(*) to be submitted to Journal of hepatology
1. The concept of PCA based feature extraction
2. What is miRNA (will be skipped)?
3. Previous Work (Dry + Wet)
4. Proposed method + Results
5. Summary & Conclusion
1. The concept of PCA based feature extraction
Why feature extraction?
・ Avoiding overfitting ・Needs for experimental validation too many genes/proteins cannot be tested.
・Several methods require fewer state variables than observationsOne of problems: Feature extraction itself rarely passes cross validation test.
Samples
Group1 Group2 Group3
FeatureExtraction
ModelConstruction
FeatureExtraction
ModelConstruction
Validation
≠
Training Set
Conventional Test Set
Samples
Group1 Group2 Group3
ModelConstruction
FeatureExtraction
ModelConstruction
ValidationTraining Set
Proposed
Without knowledge
about classification/t
arget variable
Test Set
2. What is miRNA?
miRNA is a kind of non-coding RNA. miRNAs are believed to suppress target gene expression by degradation of mRNAs. Important features:
・ Typically, there are hundreds kinds of miRNAs found for each species (c.a., 1000 for human).≧
・ Each miRNA targets more than hundreds of genes. ・ miRNA mainly contributes to cell type change
(e.g., cancer, defferentiation, diseases) ・Infulence to target gene expression by miRNA is subtle (〜10%) and contexts dependent.・In spite of that, miRNA critically contributes to the related processesmiRNA critically contributes to the related processes (e.g., induction of cell cycle arrest)
3. Previous Work (Dry + Wet)
Toward the blood-borne miRNome of human diseases, A. Keller et al., Nature Method, (2011).
Discrimination between diseases using miRNA in blood
Feature (miRNA) selection : P-value (t test)
Discrimination: SVC with several types of kernels + grid based optimal parameter search
cf. Nature Method, 10 miRNAs
<0.7
4. Proposed method + Results
Data
⇓
PCA
⇓
Feature Selection(without classification information)
LDA
◯ Control△ lung cancer
PCA (samples: diseases/cancers)
diseasescancers
Feature extraction (miRNAs)
PCA (miRNAs)
10 outliner miRNAs
Why outliners?⇓
main contribution to PCA
embeddings of samples
Why 10?⇓
To compare with Nature Method paper results
miRNA
◯ Control △ lung cancer
PCA, again (samples after feature extraction)
diseasescancers
Control vs Lung CancerLDA with PCA (after feature extraction, up to the 5th PC)
control lung cancer
control 56 8
lung cancer 14 24
Accuracy 0.784Specificity 0.800Sensitivity 0.750Precision 0.632
Pred
iction
Actual
0.8130.8440.781
cf. Nature Method, 250 miRNAs
0.813 0.844 0.781 250 miRNAsRelatively Best
0.867 0.867 0.844 150 miRNAsRelatively Worst
(+)(-) : Comparison with 10 miRNA results in Nature Methods
>0.70
Selected miRNAs: diseases/cancers vs normal(+)/(-) : up/downregulated after the transformation by PCA+LDA (*) not selected independence of diseases/cancers
Advantages of proposed method
・ No need of classification information for feature selection
・ Independent of training/test set division for feature selection (Thus, stable)
5. Summary & Conclusion