modeling xcs in class imbalances: population sizing and parameter settings

34
Modeling XCS in Class Imbalances: Population Size and Parameter Settings Albert Orriols-Puig 1,2 David E. Goldberg 2 Kumara Sastry 2 Ester Bernadó-Mansilla 1 1 Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2 Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign

Upload: kknsastry

Post on 11-May-2015

793 views

Category:

Technology


1 download

DESCRIPTION

This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.

TRANSCRIPT

Page 1: Modeling XCS in class imbalances: Population sizing and parameter settings

Modeling XCS in Class Imbalances: Population Size

and Parameter Settings

Albert Orriols-Puig1,2 David E. Goldberg2

Kumara Sastry2 Ester Bernadó-Mansilla1

1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull University

2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering

University of Illinois at Urbana Champaign

Page 2: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2GECCO’07

Framework

Domain Learner Datamodel

Information basedon experience

Knowledgeextraction

Consisting of

Examples

Counter-examples

In real-world domains, typically:Higher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually imbalanced

Applications:Fraud detectionMedical diagnosis of rare illnessesDetection of oil spills in satellite images

New instance

Predicted Output

Page 3: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 3GECCO’07

Framework

Do learners suffer from class imbalances?

Learner Minimize theglobal error

TrainingSet

examplesnumbererrorsnumerrorsnum

error cc 21 .. +=Biased towards

the overwhelmed class

Maximization of the overwhelmed class accuracy,in detriment of the minority class.

And what about incremental learning?– Sampling instances of the minority class less frequently

Page 4: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 4GECCO’07

Aim

Facetwise analysis of XCS for class imbalances

How can XCS create rules of the minority class

When XCS will remove these rules

Population size bound with respect to the imbalance ratio

Until which imbalance ratio would XCS be able to learn from the minority class?

Page 5: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 5GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 6: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 6GECCO’07

Description of XCS

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Population [P]

Environment

Problem instance

Match set generation

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Action Set [A]

c1 c2 … cn

Prediction Array

Genetic Algorithm

Selection, Reproduction, Mutation

Deletion

Selectedaction

ClassifierParameters

Update

REWARD1000/0

In single-step tasks:

Random Action

Minorityclass instance

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Population [P] Match set generation

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]

Starved niches

Majorityclass instance

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Population [P] Match set generation

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]

Nourished niches

Problem niche: the schema defines the relevant attributes for a particular problem niche.Eg: 10**1*

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 7: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 7GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 8: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 8GECCO’07

Facetwise Analysis

Study XCS capabilities to provide representatives of starved niches:– Population covering– Generation of correct representatives of starved niches– Time of extinction of these correct classifiers

Derive a bound on the population size to guarantee that XCS will learn starved nichesDepart from theory developed for XCS

– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche

support– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled

Approach to LCS Analysis and Design.

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 9: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 9GECCO’07

Facetwise Analysis

Assumptions– Problems consisting of n classes

– One class sampled with a lower frequency: minority class

– Probability of sampling an instance of the minority class:

ir11 Ps(min)+

=

classminority theof instances num.classminority theother than classany of instances num.ir =

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 10: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 10GECCO’07

Facetwise Analysis

Facetwise Analysis– Population initialization

– Generation of correct representatives of starved niches

– Time of extinction of these correct classifiers

– Population size bound

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 11: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 11GECCO’07

Population Initialization

Covering procedure– Covering: Generalize over the input with probability P#

– P# needs to satisfy the covering challenge (Butz et al., 01)

Would I trigger covering on minority class instances?– Probability that one instance is covered, by, at least,

one rule is (Butz et. al, 01):Inputlength

Population size

Population specificity

Initially 1 – P#

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 12: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 12GECCO’07

Population Initialization1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Probability to apply covering on the first minority class instance

l = 20

Page 13: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 13GECCO’07

Facetwise Analysis

Facetwise Analysis– Population initialization

– Generation of correct representatives of starved niches

– Time of extinction of these correct classifiers

– Population size bound

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 14: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 14GECCO’07

Creation of Representatives of Starved Niches

Assumptions– Covering has not provided any representative of starved niches– Simplified model: only consider mutation in our model.

How can we generate representative of starved niches?– In the population there are:

• Representative of nourished niches• Overgeneral classifiers

– Specifying correctly all the bits of the schema that represents the starved niche

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 15: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 15GECCO’07

Creation of Representatives ofStarved Niches

Summing up, time to get the first representative of a starved niche

Time to extinction

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

n: number of classes

μ: Mutation probability

km: Order of the schema

Page 16: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 16GECCO’07

Facetwise Analysis

Facetwise Analysis– Population initialization

– Generation of correct representatives of starved niches

– Time of extinction of these correct classifiers

– Population size bound

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 17: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 17GECCO’07

Bounding the Population Size

Population size bound to guarantee that there will be representatives of starved niches

– Require that:

– Bound:

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

n: number of classes

μ: Mutation probability

km: Order of the schema

Page 18: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 18GECCO’07

Bounding the Population Size

Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity:– Consider θGA = 0

– We require that the best representative of a starved niche receive a genetic event before being removed

– Population size bound:

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

n: number of classes

ir: Imbalance ratio

Page 19: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 19GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 20: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 20GECCO’07

Design of Test Problems

One-bit problem

– Only two schemas of order one: 0***** and 1*****

Parity problem

– The k bits of parity form a single building block

Undersampling instances of the class labeled as 1

000110 :0 Value of the left-most bit

Condition length (l)

ir11 Ps(min)+

=

01001010

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Condition length (l)

:1 Number of 1 mod 2

Relevantbits ( k)

Page 21: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 21GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 22: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 22GECCO’07

XCS on the one-bit Problem

XCS configuration

Evaluation of the results:– Minimum population size to achieve:

TP rate * TN rate > 95%

– Results are averages over 25 seeds

α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 23: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 23GECCO’07

XCS on the one-bit Problem

N remains constant up to ir = 64

N increases linearly from ir=64 to ir=256

N increases exponentially fromir=256 to ir=1024

Higher ir could not be solved

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 24: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 24GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 25: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 25GECCO’07

Analysis of the Deviations

Inheritance Error of Classifiers’ Parameters– New promising representatives of starved niches are created from classifiers that

belong to nourished niches. – These new promising rules inherit parameters from these classifiers. This is

specially delicate for the action set size (as).

– Approach: initialize as=1.

Subsumption– An overgeneral classifier of the majority class may receive ir positive reward

before receiving the first negative reward– Approach: set θsub>ir

Stabilizing the population before testing– Overgeneral classifiers poorly evaluated– Approach: introduce some extra runs at the end of learning with the GA switched

off.

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 26: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 26GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 27: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 27GECCO’07

XCS+PCM in the one-bit Problem

N remains constant up to ir = 128

For higher ir, N slightly increases

We only have to guarantee that a representative of the starved niche will be created

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 28: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 28GECCO’07

XCS+PCM in the Parity Problem

Building blocks of size 3 need to be processed

Empirical results agree with thetheory

Population size bound to guaranteethat a representative of the nichewill receive a genetic event

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 29: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 29GECCO’07

Outline

1. Description of XCS

2. Facetwise Analysis

3. Design of test Problems

4. XCS on the one-bit Problem

5. Analysis of Deviations

6. Results

7. Conclusions

Page 30: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 30GECCO’07

Conclusions and Further Work

We derived models that analyzed the representatives of starved niches provided by covering and mutation

A population size bound was derived

We saw that the empirical observations met the theory if four aspects were considered:

– as initialization

– Subsumption

– Stabilization of the population

XCS really robust to class imbalances

Further analysis of the covering operator

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 31: Modeling XCS in class imbalances: Population sizing and parameter settings

Modeling XCS in Class Imbalances: Population Size

and Parameter Settings

Albert Orriols-Puig1,2 David E. Goldberg2

Kumara Sastry2 Ester Bernadó-Mansilla1

1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull University

2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering

University of Illinois at Urbana Champaign

Page 32: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 32GECCO’07

Motivation

And what about incremental learning?

Sampling instances of the minority class less frequently

This influences the mechanisms of XCS (Orriols & Bernadó, 2006)

Page 33: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 33GECCO’07

Analysis of the Deviations

Niched Mutation vs. Free Mutation– Classifiers can only be created if minority class instances are sampled

Inheritance Error of Classifiers’ Parameters– New promising representatives of starved niches are created from

classifiers that belong to nourished niches

– These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as).

– Approach: initialize as=1.

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions

Page 34: Modeling XCS in class imbalances: Population sizing and parameter settings

Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 34GECCO’07

Analysis of the Deviations

Subsumption– An overgeneral classifier of the majority class may receive ir positive

reward before receiving the first negative reward

– Approach: set θsub>ir

Stabilizing the population before testing– Overgeneral classifiers poorly evaluated

– Approach: introduce some extra runs at the end of learning with the GA switched off.

We gather all these little tweaks in XCS+PMC

1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions