cec'2005: class imbalance problem in ucs classifier system: fitness adaptation

18
Cl I b l P bl i UCS Class Imbalance Problem in UCS Classifier System: Classifier System: Fitness Adaptation Albert Orriols Puig Ester Bernadó Mansilla Enginyeria i Arquitectura La Salle Ramon Llull University Page 1 CEC-2005 Enginyeria i Arquitectura La Salle September 4th, 2005

Upload: albert-orriols-puig

Post on 31-Jul-2015

471 views

Category:

Education


1 download

TRANSCRIPT

Cl I b l P bl i UCS Class Imbalance Problem in UCS Classifier System: Classifier System: Fitness Adaptationp

Albert Orriols PuigEster Bernadó Mansilla

Enginyeria i Arquitectura La SalleRamon Llull University

Page 1CEC-2005 Enginyeria i Arquitectura La Salle

September 4th, 2005

OUTLINE

1. Introduction 1. Introduction

2. Description of UCS2. UCS Description

3. Dataset Design

3. Dataset design4 UCS U b l d D t t

3. Dataset Design

4. UCS on unbalanced d.

4. UCS on Unbalanced Datasets5 Dealing with imbalances

5. Dealing imbalances

6. Class-sensitive acc.5. Dealing with imbalances6. Class-Sensitive Accuracy 7. Weighted class-sens.

7. Weighted Class-Sensitive Accuracy8. Conclusions

8. Conclusions

Page 2CEC-2005 Enginyeria i Arquitectura La Salle

INTRODUCTION

1. Introduction

2. UCS Description

3. Dataset Design

Real world

Class imbalances inthe samples taken 3. Dataset Design

4. UCS on unbalanced d.

domainst e sa p es ta e

5. Dealing imbalances

Does it affects the learning performance of some well 6. Class-sensitive acc.Does it affects the learning performance of some well-known systems?

7. Weighted class-sens.

8. Conclusions

If it is, how we can deal with imbalances

Does class imbalances affect the performance of UCS

Page 3CEC-2005 Enginyeria i Arquitectura La Salle

Description of UCS1. Introduction

Description of UCS

2. UCS Description

3. Dataset DesignPopulation [P]1 C A acc F num cs ts exp3 C A acc F num cs ts exp5 C A F t

Match Set [M]

3. Dataset Design

4. UCS on unbalanced d.1 C A acc F num cs ts exp2 C A acc F num cs ts exp3 C A acc F num cs ts exp

Population [P] 5 C A acc F num cs ts exp6 C A acc F num cs ts exp

Match set generation

5. Dealing imbalances

6. Class-sensitive acc.

4 C A acc F num cs ts exp5 C A acc F num cs ts exp6 C A acc F num cs ts exp

ClassifierParameters

UpdateCorrect Set [C]

Action setgeneration

7. Weighted class-sens.Correctacc #

=Selection, Reproduction, mutation, Recombination

Deletion 3 C A acc F num cs ts exp6 C A acc F num cs ts exp

Correct Set [C]

8. Conclusions

Problem instance+

Experienceacc =

νaccFitness =Genetic Algorithm

mutation, Recombination p…

Environment

+output class

Page 4CEC-2005 Enginyeria i Arquitectura La Salle

Description of UCS: An example1. Introduction

Description of UCS: An example

2. UCS Description

3. Dataset DesignT i i Evolved 3. Dataset Design

4. UCS on unbalanced d.

TrainingDataset

UCSEvolvedModel

5. Dealing imbalances

6. Class-sensitive acc.

7. Weighted class-sens.IRIS Dataset if sepal_length <= 6.24 and petal_length <=

4.49) and petal_width <= 0.67 then Iris-setosa

if sepal_lenght >= 4.95 and 2.22 <= petal_length < 4 76 d 0 51 < t l idth < 2 36 th I

classpetalwidth

petallength

sepalwidth

sepallength class

petalwidth

petallength

sepalwidth

sepallength

8. ConclusionsUCS <= 4.76 and 0.51 <= petal_width <= 2.36 then I-versicolour

if petal_length >= 1.80 and petal_width >= 1.75 then Iris-virginicaVirgin.2.56.03.36.3

Versic-.1.44.73.27.0

Setosa0.21.43.55.1

Virgin.2.56.03.36.3

Versic-.1.44.73.27.0

Setosa0.21.43.55.1

………

Page 5CEC-2005 Enginyeria i Arquitectura La Salle

Chk Problem

- Two real attributes x,y E [0,1]Two classes

1. Introduction

- Two classes- Permits varying complexity along:

C t C l it ( )

2. UCS Description

3. Dataset Designa. Concept Complexity (c)b. Dataset size (s)

I b l l l (i)

3. Dataset Design

4. UCS on unbalanced d.

c. Imbalance level (i)5. Dealing imbalances

6. Class-sensitive acc.

7. Weighted class-sens.

8. Conclusions

s=4096, c=4, i=2

#inst. maj. class = s/c2 = 4096/16 = 256#inst. min. class = s/c2*2i = 4096/(16*4) = 64

Page 6CEC-2005 Enginyeria i Arquitectura La Salle

We ran UCS in chk with s=4096 c=4 and i=[0 7]

1. Introduction

We ran UCS in chk with s=4096, c=4 and i=[0..7]

2. UCS Description

3. Dataset Design3. Dataset Design

4. UCS on unbalanced d.

5. Dealing imbalances

6. Class-sensitive acc.

7. Weighted class-sens.

8. Conclusions

Training datasets for chk problem

Page 7CEC-2005 Enginyeria i Arquitectura La Salle

Obtaining the following results

1. Introduction

Obtaining the following results

2. UCS Description

3. Dataset Design3. Dataset Design

4. UCS on unbalanced d.

5. Dealing imbalances

6. Class-sensitive acc.

7. Weighted class-sens.

8. Conclusions

Boundaries evolved by UCS in the chk problem with imbalance levels from 0 to 7

Page 8CEC-2005 Enginyeria i Arquitectura La Salle

Analyzing the population evolved in higher imbalance levels

1. Introduction

Id diti Cl A F N

Analyzing the population evolved in higher imbalance levels

2. UCS Description

3. Dataset Design

Id condition Class Acc F Num

1 [0.509, 0.750] [0.259, 0.492] 1 1.00 1.00 39

2 [0.000, 0.231] [0.252, 0.492] 1 1.00 1.00 383. Dataset Design

4. UCS on unbalanced d.

3 [0.000, 0,248] [0.755, 1.000] 1 1.00 1.00 35

4 [0.761, 1.000] [0.000, 0.249] 1 1.00 1.00 34

5 [0.255, 0.498] [0.520, 0.730] 1 1.00 1.00 3318 rules5. Dealing imbalances6 [0.751, 1.000] [0.514, 0.737] 1 1.00 1.00 31

7 [0.259, 0.498] [0.000, 0.244] 1 1.00 1.00 27

8 [0.501, 0.743] [0.751, 1.000] 1 1.00 1.00 18

18 rules predicting the under-sized

class As imbalance level increases, the 6. Class-sensitive acc.[ , ] [ , ]

9 [0.500, 0.743] [0.751, 1.000] 1 1.00 1.00 9

10 [0.751, 1.000] [0.531, 0.737] 1 1.00 1.00 8

accuracy of the over-general classifiers increases too. Then, they become stronger in the population.

7. Weighted class-sens.

8. Conclusions…

18 [0.509, 0.750] [0.246, 0.492] 1 0.64 0.01 1

19 [0.000, 1.000] [0.000, 1.000] 0 0.94 0.54 2047 rules

g p p

20 [0.000, 1.000] [0.000, 0.990] 0 0.94 0.54 13

21 [0.012, 1.000] [0.000, 0.990] 0 0.94 0.54 10

47 rules predicting the

over-sized class

Page 9CEC-2005 Enginyeria i Arquitectura La Salle

64 [0.012, 1.000] [0.038, 0.973] 0 0.94 0.54 1Rules for imbalance level i=4

Methods to deal with imbalances1. Introduction

Methods to deal with imbalances

1. Methods that act at the Sampling Level2. UCS Description

3. Dataset Design3. Dataset Design

4. UCS on unbalanced d.Oversampling,

U d liSuppressing the bias towards the majority class

5. Dealing imbalancesUndersampling,

…Changing somehow the information available in the training dataset.

6. Class-sensitive acc.

2. Methods that act at the System Level 7. Weighted class-sens.

8. Conclusions

Cost-sensitive Imbalanced Datasets are a problem because rarelearning approach

Imbalanced Datasets are a problem because rare classes tends to be more costly to learn

Page 10CEC-2005 Enginyeria i Arquitectura La Salle

Class sensitive accuracy

W t f h l1. Introduction

Class-sensitive accuracy

• We compute accuracy for each class2. UCS Description

3. Dataset Designi

icacc = Ci = number of examples of class i correctly classified

b f l f l i d b th l

• The compound accuracy

3. Dataset Design

4. UCS on unbalanced d.i

iaccexp expi = number of examples of class i covered by the rule

The compound accuracy5. Dealing imbalances

C = Number of classes of the problemCe = Number of different classes that a rule∑=

C

iaccacc 16. Class-sensitive acc.Ce = Number of different classes that a rule

covers.∑

>=ii

e iC 0exp|1

7. Weighted class-sens.

Changing accuracy also changes fitness. So, fitness of individuals predicting instances of more than one

class decreases very fast

8. Conclusions

y

Page 11CEC-2005 Enginyeria i Arquitectura La Salle

1. Introduction

2. UCS Descript.

3. Dataset Design

4. UCS on unbal.

5. Dealing imb.

6. Chk Problem

7. Contrasting res.

8. Conclusions

Page 12CEC-2005 Enginyeria i Arquitectura La Salle

Class-sensitive accuracy

Analyzing the population evolved in higher imbalance levels

1. IntroductionId condition Class Acc F Num

1 [0.485, 0.756] [0.483, 0.753] 0 1 - 1.00 34

2 [0 000 0 253] [0 502 0 756] 0 1 1 00 34

2. UCS Description

3. Dataset Design

2 [0.000, 0.253] [0.502, 0.756] 0 1 - 1.00 34

3 [0.252, 0.505] [0.750, 1.000] 0 1 - 1.00 32

4 [0.753, 1.000] [0.749, 1.000] 0 1 - 1.00 31

5 [0.737, 1.000] [0.238, 0.515] 0 1 - 1.00 29

8 rules predicting the

over sized 3. Dataset Design

4. UCS on unbalanced d.

5 [0.737, 1.000] [0.238, 0.515] 0 1 1.00 29

6 [0.499, 0.772] [0.000, 0.277] 0 1 - 1.00 27

7 [0.000, 0.244] [0.000, 0.248] 0 1 - 1.00 27

8 [0.225, 0.544] [0.223, 0.529] 0 1 - 1.00 27

over-sized class

5. Dealing imbalances

[ , ] [ , ]

9 [0.252, 0.499] [0.000, 0.207] 1 - 1 1.00 21

10 [0.752, 1.000] [0.000, 0.242] 1 - 1 1.00 18

11 [0.751, 1.000] [0.502, 0.738] 1 - 1 1.00 15 6. Class-sensitive acc.12 [0.506, 0.734] [0.761, 1.000] 1 - 1 1.00 15

13 [0.510, 0.741] [0.252, 0.479] 1 - 1 1.00 13

14 [0.000, 0.233] [0.757, 1.000] 1 - 1 1.00 12

8 rules predicting the under-sized

class7. Weighted class-sens.

8. Conclusions15 [0.000, 0.240] [0.254, 0.485] 1 - 1 1.00 11

16 [0.252, 0.488] [0.516, 0.743] 1 - 1 1.00 6

17 [0.252, 0.498] [0.516, 0.692] 1 - 1 1.00 6

18 [0.000, 0.227] [0.757, 1.000] 1 - 1 1.00 4

19 [0.504, 0.772] [0.000, 0.277] 0 1 - 1.00 4

Page 13CEC-2005 Enginyeria i Arquitectura La Salle

…Rules for imbalance level i=4 using class-sensitive accuracy

Weighted Class sensitive accuracy

W t f h l1. Introduction

Weighted Class-sensitive accuracy

• We compute accuracy for each class2. UCS Description

3. Dataset Designi

icacc = Ci = number of examples of class i correctly classified

b f l f l i d b th l

• The compound accuracy

3. Dataset Design

4. UCS on unbalanced d.i

iaccexp expi = number of examples of class i covered by the rule

The compound accuracy5. Dealing imbalances

⎪⎨⎧ ∑

>=

C

iii

eacc

Cacc 0exp|1

1accii θ≥∀ exp: Ce = Number of different

classes that a rule 6. Class-sensitive acc.

⎪⎩⎨

∑=

>=

iiC

iiii

ewacc

C

acc 0e p|

0exp|1

1otherwise

classes that a rulecovers.

7. Weighted class-sens.

• Where 8. Conclusions

⎪⎧ exp f θ0 Cee = Number of

⎪⎩

⎪⎨

∑=

<<=−

acc

i

C

accii iacceCiw θ

θθ

exp

exp·exp0|1

acciif θ<< exp0..

acciif θ≥exp..

Cee = Number of experienced classes

Θacc = threshold below hi h l i

Page 14CEC-2005 Enginyeria i Arquitectura La Salle

⎪⎩ accee

acciC θ·

accif p which a class is inexperienced

1. Introduction

2. UCS Descript.

3. Dataset Design

4. UCS on unbal.

5. Dealing imb.

6. Chk Problem

7. Contrasting res.

8. Conclusions

Page 15CEC-2005 Enginyeria i Arquitectura La SalleWeighted Class-sensitive accuracy

Conclusions1. Introduction

Conclusions

• The class imbalance problem has appeared to be a real problem on UCS

2. UCS Description

3. Dataset Design

– For high unbalanced datasets, overgeneral rules interfere with specific rules covering the minority class regions

3. Dataset Design

4. UCS on unbalanced d.

• We proposed fitness adaptation based on class-sensitive accuracy to diminish the generalization

th t GA k ( id d b fit )

5. Dealing imbalances

6. Class-sensitive acc.pressure that GA makes (guided by fitness)– UCS can discover the right boundaries, but tends to

uncover some regions of the feature space7. Weighted class-sens.

uncover some regions of the feature space

• A weighted accuracy function was proposed to improve the coverage of the method

8. Conclusions

the coverage of the method

Page 16CEC-2005 Enginyeria i Arquitectura La Salle

Further Work1. Introduction

Further Work

• Enhance the study analyzing the contribution of each complexity factor

2. UCS Description

3. Dataset Designof each complexity factor

• Introducing new problems to the analysis

3. Dataset Design

4. UCS on unbalanced d.

• Use real-world datasets to validate the results

T t t t i th t t t th li

5. Dealing imbalances

6. Class-sensitive acc.• Test new strategies that act at the sampling level (to appear IWLCS’05) 7. Weighted class-sens.

• How can the new strategies deal with:– Noise

8. Conclusions

Noise

– Scarcity

Page 17CEC-2005 Enginyeria i Arquitectura La Salle

Th k f tt tiThanks for you attention

Page 18CEC-2005 Enginyeria i Arquitectura La Salle