Transcript
  • Slide 1
  • Slide 2
  • [email protected] Winter 2014
  • Slide 3
  • Presentation Outline Feature Selection Categorize and Describe Various Algorithms for Feature Selection A Short View on the Dimension Reduction My Paper
  • Slide 4
  • Slide 5
  • Dimension (Feature or Variable)
  • Slide 6
  • Dimension (Feature or Variable) Two feature of person: weight hight
  • Slide 7
  • The curse of dimensionality Observe that the data become more and more sparse in higher dimensions (a) 12 samples that fall inside the unit- sized box (b) 7 samples in box(C) 2 samples in box Dimensionality reduction Effective solution to the problem of curse of dimensionality is: Dimensionality reduction
  • Slide 8
  • Dimension Reduction General objectives of dimensionality reduction: I.Improve the quality of data for efficient data-intensive processing tasks II.Reduce the computational cost and avoid data over-fitting
  • Slide 9
  • Dimension Reduction Dimensionality reduction approaches include : Feature Selection Feature Extraction
  • Slide 10
  • Dimension Reduction Feature Extraction: Create new feature based on transformations or combinations of the original feature set. N: Number of original features M: Number of extracted features M
  • Focus Feature selection Methods Compatibility with the least number of features Search tree --- > BFS
  • Slide 35
  • LVF Las Vegas Filter Feature selection Methods Searches for a minimal subset of features N: Number of feature (attribute) M: number of Samples (examples) Evaluation Criterion: inconsistency t max : predetermined number of iteration
  • Slide 36
  • SFS (Sequential Forward Selection) SBS (Sequential Backward Selection) Feature selection Methods Nesting Effect plus-l-take-away-r SFFS (Sequential forward Floating Search) SBFS (Sequential Backward Floating Search)
  • Slide 37
  • GA (Genetic Algorithm) Feature selection Methods Crossover Mutation SA (Simulated Annealing) RMHC-PF1 (Random Mutation Hill Climbing- Prototype and Feature selection) find sets of prototypes for nearest neighbor classification is a Monte Carlo method can be converted to a Las Vegas algorithm by running the many times.
  • Slide 38
  • Slide 39
  • Three methods commonly used in feature selection : Filter model --- > not consider interrelationship between the features Wrapper model --- > High Complexity Embedded methods Feature redundancy Failure to select the appropriate number of features Defining the problem as a game Defining the problem as a game
  • Slide 40
  • Problem as a One-Player Game Defining the problem as a Markov Decision Process Scan environment by Reinforcement Learning Methods Feature selection Method : to consider the interrelationship between the features Upper Confidence Graph Method
  • Slide 41
  • The main algorithms : The main algorithms : Dynamic programming Monte Carlo Method Temporal Difference Learning
  • Slide 42
  • The best policy possible in the situation f reward that have already achieved The whole set of features Subset of features each allowed action
  • Slide 43
  • Average score collected by this feature The number of times that this feature is selected
  • Slide 44
  • Benchmarks Information Gain CHI-squared statistic Feature Asseeement by Sliding Threshold(FAST) WEKA Software
  • Slide 45
  • Slide 46
  • Slide 47
  • Any Question? May 201346 Thanks for your attention

Top Related