margin based sample weighting for stable feature selection

15
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton

Upload: nibal

Post on 23-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Margin Based Sample Weighting for Stable Feature Selection. Yue Han, Lei Yu State University of New York at Binghamton. Outline. Introduction Related Work Hypothesis-Margin Feature Space Transformation Margin Based Sample Weighting Experimental Study Conclusion and Future Work. Terms. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Margin Based Sample Weighting for Stable Feature Selection

Margin Based Sample Weighting for Stable Feature Selection

Yue Han, Lei YuState University of New York at Binghamton

Page 2: Margin Based Sample Weighting for Stable Feature Selection

Outline• Introduction

• Related Work

• Hypothesis-Margin Feature Space Transformation

• Margin Based Sample Weighting

• Experimental Study

• Conclusion and Future Work

Page 3: Margin Based Sample Weighting for Stable Feature Selection

Introduction

Features(Genes or Proteins)

Sam

ples

p: # of features n: # of samplesHigh-dimensional data: p >> n

Feature Selection:Alleviating the effect of the curse of dimensionality.Enhancing generalization capability.Speeding up learning process.Improving model interpretability.

High dimensional Data

Dimension reduced Data

Feature Selection(Filter or Wrapper)

Learning Model

D1

D2

SportsT1 T2 ….…… TN

12 0 ….…… 6

DM

C

Travel

Jobs

… … …

Terms

Doc

umen

ts

3 10 ….…… 28

0 11 ….…… 16

Page 4: Margin Based Sample Weighting for Stable Feature Selection

Cont’s

D1

D2

Features

Sam

ples

Given Unlimited Sample Size of D:Feature selection results from D1 and D2 are the sameSize of D is limited(n<<p for high dimensional data)Feature selection results from D1 and D2 are differentIncreasing #of samples could be very costly or impractical

Stability of feature selection - the insensitivity of the resultof a feature selection algorithm to variations in the training set.

Identifying characteristic markers to explain the observed phenomena

Page 5: Margin Based Sample Weighting for Stable Feature Selection

Related Work• Bagging-based Ensemble Feature Selection • (Saeys et al. ECML07)Different bootstrapped samples of the same training set;Apply a conventional feature selection algorithm;Aggregates the feature selection results.

• Group-based Stable Feature Selection • (Yu et al. KDD08 , KDD09)Explore the intrinsic feature correlations;Identify groups of correlated features;Select relevant feature groups.

Page 6: Margin Based Sample Weighting for Stable Feature Selection

Hypothesis-Margin Feature Space Transformation

A framework of margin based instance weighting for stable feature selection

Introduce the concept of hypothesis-margin feature space;

Propose the framework of margin based instance weighting for stable feature selection;

Develop an efficient algorithm under the proposed framework.

Page 7: Margin Based Sample Weighting for Stable Feature Selection

Hypothesis-Margin Feature Space Transformation

X’ captures the local profile of feature importance for all features at X.Multiple nearest neighbors can be used to compute the HM of a sample

hitmiss

Page 8: Margin Based Sample Weighting for Stable Feature Selection

Cont’s

Hypothesis-margin based feature space transformation: (a) original feature space, and (b) hypothesis-margin (HM) feature space.

Page 9: Margin Based Sample Weighting for Stable Feature Selection

Margin Based Sample Weighting• Discrepancy among samples w.r.t. their local

profiles of feature importance(HM feature space)• Measure the average distance of X’ to all other

samples in the HM feature space and greater average distance indicates higher outlying degree.

• overall time complexity O(n2q) and n is the number of samples and q is the dimensionality of D.

Page 10: Margin Based Sample Weighting for Stable Feature Selection

Experimental Study

Feature Ranking

Feature Subset Selection

Feature Correlation

Stability of a feature selection algorithm is measured as the average of the pair-wise similarity of various feature selection results produced by the same algorithm from different training sets.

Stability Metrics

Page 11: Margin Based Sample Weighting for Stable Feature Selection

Cont’s•Experimental Setup

• SVM-RFE: 10 percent of remaining features eliminated at each iteration.

• En-RFE: 20 bootstrapped training sets to construct the ensemble. • IW-RFE: k = 10 for hypothesis margin transformation.• 10tims shuffling and 10 fold cross-validation to generate 100

datasets.

Page 12: Margin Based Sample Weighting for Stable Feature Selection

Consistent improvement in terms of stability of feature selection results from different stability measures

Page 13: Margin Based Sample Weighting for Stable Feature Selection

differentfeature selection algorithms can lead to similarly good classification results

Page 14: Margin Based Sample Weighting for Stable Feature Selection

Conclusion and Future Work• Introduced the concept of hypothesis-margin

feature space• Proposed the framework of margin based sample

weighting for stable feature selection• Developed an efficient algorithm under the

frameworkInvestigate alternative methods of sample

weighting based on HM feature spaceStrategies to combine margin based sample

weighting with group-based stable feature selection

Page 15: Margin Based Sample Weighting for Stable Feature Selection

Questions?

Thank you!