feature grouping-based fuzzy-rough feature selection

21
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis

Upload: conan

Post on 22-Feb-2016

80 views

Category:

Documents


0 download

DESCRIPTION

Feature Grouping-Based Fuzzy-Rough Feature Selection. Richard Jensen Neil Mac Parthaláin Chris Cornelis. Outline. Motivation/Feature Selection (FS) Rough set theory Fuzzy-rough feature selection Feature grouping Experimentation. The problem: too much data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Feature Grouping-Based Fuzzy-Rough Feature Selection

Richard JensenNeil Mac Parthaláin

Chris Cornelis

Page 2: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Outline

• Motivation/Feature Selection (FS)

• Rough set theory

• Fuzzy-rough feature selection

• Feature grouping

• Experimentation

Page 3: Feature Grouping-Based  Fuzzy-Rough Feature Selection

The problem: too much data

• The amount of data is growing exponentially– Staggering 4300% annual growth in global data

• Therefore, there is a need for FS and other data reduction methods– Curse of dimensionality: a problem

for machine learning techniques

• The complexity of the problem is vast– (e.g. the powerset of features for FS)

Page 4: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Feature selection

• Remove features that are: – Noisy – Irrelevant – Misleading

• Task: find a subset that– Optimises a measure of subset goodness– Has small/minimal cardinality

• In rough set theory, this is a search for reducts– Much research in this area

Page 5: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Rough set theory (RST)

• For a subset of features P

Upper approximation

Set X

Lower approximation

Equivalence class [x]P

Page 6: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Rough set feature selection

• By considering more features, concepts become easier to define…

Page 7: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Rough set theory

• Problems:– Rough set methods (usually) require data

discretization beforehand– Extensions require thresholds, e.g. tolerance

rough sets– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)

approximation, or not at all

Page 8: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Fuzzy-rough sets

• Extends rough set theory– Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified– Collapses to traditional RST when data is crisp

• New definitions:

Fuzzy upper approximation:

Fuzzy lower approximation:

Page 9: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Fuzzy-rough feature selection• Search for reducts– Minimal subsets of features that preserve the fuzzy lower

approximations for all decision concepts

• Traditional approach– Greedy hill-climbing algorithm used– Other search techniques have been applied (e.g. PSO)

• Problems– Complexity is problematic for large data (e.g. over several

thousand features)– No explicit handling of redundancy

Page 10: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Feature grouping• Idea: don’t need to consider all features

– Those that are highly correlated with each other carry the same or similar information

– Therefore, we can group these, and work on a group by group basis

• This paper: based on greedy hill-climbing– Group-then-rank approach

• Relevancy and redundancy handled by– Correlation: similar features grouped together– Internal ranking (correlation with decision feature)

f7f4

f1f2 f9

f8

F1

Page 11: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Forming groups of features

Calculate correlations

F1 F2 F3 Fn. . .

#1 f3

#2 f12 #3 f1

…#m fn

#1 f#2 f #3 f…#m fn

#1 f#2 f #3 f…#m fn

#1 f#2 f #3 f…#m fn

Feature groups

Internally-rankedfeature groups

Correlation measure

Threshold:

Redundancy

Relevancy

Dataτ

Page 12: Feature Grouping-Based  Fuzzy-Rough Feature Selection

. . .

Selecting features

Feature subset search and selection

Search mechanism

Subset evaluation

Selected subset(s)

Page 13: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Fuzzy-rough feature grouping

Page 14: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Initial experimentation• Setup:

– 10 datasets (9-2557 features)– 3 classifiers– Stratified 5 x 10-fold cross-validation

• Performance evaluation in terms of– Subset size– Classification accuracy– Execution time

• FRFG compared with – Traditional greedy hill-climber (GHC)– GA & PSO (200 generations, population size: 40)

Page 15: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Results: average subset size

Page 16: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Results: classification accuracy

JRip

IBk (k=3)

Page 17: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Results: execution times (s)

Page 18: Feature Grouping-Based  Fuzzy-Rough Feature Selection

ConclusionFRFG– Motivation: reduce computational overhead; improve

consideration of redundancy– Group-then-rank approach– Parameter determines granularity of grouping– Weka implementation available: http://bit.ly/1oic2xM

Future work– Automatic determination of parameter τ – Experimentation using much larger data, other FS methods, etc– Clustering of features– Unsupervised selection?

Page 19: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Thank you!

Page 20: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Simple exampleDataset of six features

After initialisation, the following groups are formed

Within each group, rank determines relevance: e.g. f4 more relevant than f3

Ordering of groups

Greedy hill-climber

f1f4 f3

f2

F1

F2

f1f3F3

f5f4 f1F4

etc…

{F4, F1, F3, F5, F2, F6}F =

Page 21: Feature Grouping-Based  Fuzzy-Rough Feature Selection

Simple example...• First group to be considered: F4

– Feature f4 is preferable over others– So, add this to current (initially empty) subset R – Evaluate M(R + {f4}):

• If better score than the current best evaluation, store f4

• Current best evaluation = M(R + {f4})

– Set of features which appear in F4: ({f1 , f4 , f5}) • Add to the set Avoids

• Next feature group with elements that do not appear in Avoids: F1

And so on…

f5f4 f1

F4

f1f4

f3F1