cec'2005: class imbalance problem in ucs classifier system: fitness adaptation
TRANSCRIPT
Cl I b l P bl i UCS Class Imbalance Problem in UCS Classifier System: Classifier System: Fitness Adaptationp
Albert Orriols PuigEster Bernadó Mansilla
Enginyeria i Arquitectura La SalleRamon Llull University
Page 1CEC-2005 Enginyeria i Arquitectura La Salle
September 4th, 2005
OUTLINE
1. Introduction 1. Introduction
2. Description of UCS2. UCS Description
3. Dataset Design
3. Dataset design4 UCS U b l d D t t
3. Dataset Design
4. UCS on unbalanced d.
4. UCS on Unbalanced Datasets5 Dealing with imbalances
5. Dealing imbalances
6. Class-sensitive acc.5. Dealing with imbalances6. Class-Sensitive Accuracy 7. Weighted class-sens.
7. Weighted Class-Sensitive Accuracy8. Conclusions
8. Conclusions
Page 2CEC-2005 Enginyeria i Arquitectura La Salle
INTRODUCTION
1. Introduction
2. UCS Description
3. Dataset Design
Real world
Class imbalances inthe samples taken 3. Dataset Design
4. UCS on unbalanced d.
domainst e sa p es ta e
5. Dealing imbalances
Does it affects the learning performance of some well 6. Class-sensitive acc.Does it affects the learning performance of some well-known systems?
7. Weighted class-sens.
8. Conclusions
If it is, how we can deal with imbalances
Does class imbalances affect the performance of UCS
Page 3CEC-2005 Enginyeria i Arquitectura La Salle
Description of UCS1. Introduction
Description of UCS
2. UCS Description
3. Dataset DesignPopulation [P]1 C A acc F num cs ts exp3 C A acc F num cs ts exp5 C A F t
Match Set [M]
3. Dataset Design
4. UCS on unbalanced d.1 C A acc F num cs ts exp2 C A acc F num cs ts exp3 C A acc F num cs ts exp
Population [P] 5 C A acc F num cs ts exp6 C A acc F num cs ts exp
…
Match set generation
5. Dealing imbalances
6. Class-sensitive acc.
4 C A acc F num cs ts exp5 C A acc F num cs ts exp6 C A acc F num cs ts exp
…
ClassifierParameters
UpdateCorrect Set [C]
Action setgeneration
7. Weighted class-sens.Correctacc #
=Selection, Reproduction, mutation, Recombination
Deletion 3 C A acc F num cs ts exp6 C A acc F num cs ts exp
Correct Set [C]
8. Conclusions
Problem instance+
Experienceacc =
νaccFitness =Genetic Algorithm
mutation, Recombination p…
Environment
+output class
Page 4CEC-2005 Enginyeria i Arquitectura La Salle
Description of UCS: An example1. Introduction
Description of UCS: An example
2. UCS Description
3. Dataset DesignT i i Evolved 3. Dataset Design
4. UCS on unbalanced d.
TrainingDataset
UCSEvolvedModel
5. Dealing imbalances
6. Class-sensitive acc.
7. Weighted class-sens.IRIS Dataset if sepal_length <= 6.24 and petal_length <=
4.49) and petal_width <= 0.67 then Iris-setosa
if sepal_lenght >= 4.95 and 2.22 <= petal_length < 4 76 d 0 51 < t l idth < 2 36 th I
classpetalwidth
petallength
sepalwidth
sepallength class
petalwidth
petallength
sepalwidth
sepallength
8. ConclusionsUCS <= 4.76 and 0.51 <= petal_width <= 2.36 then I-versicolour
if petal_length >= 1.80 and petal_width >= 1.75 then Iris-virginicaVirgin.2.56.03.36.3
Versic-.1.44.73.27.0
Setosa0.21.43.55.1
Virgin.2.56.03.36.3
Versic-.1.44.73.27.0
Setosa0.21.43.55.1
………
Page 5CEC-2005 Enginyeria i Arquitectura La Salle
Chk Problem
- Two real attributes x,y E [0,1]Two classes
1. Introduction
- Two classes- Permits varying complexity along:
C t C l it ( )
2. UCS Description
3. Dataset Designa. Concept Complexity (c)b. Dataset size (s)
I b l l l (i)
3. Dataset Design
4. UCS on unbalanced d.
c. Imbalance level (i)5. Dealing imbalances
6. Class-sensitive acc.
7. Weighted class-sens.
8. Conclusions
s=4096, c=4, i=2
#inst. maj. class = s/c2 = 4096/16 = 256#inst. min. class = s/c2*2i = 4096/(16*4) = 64
Page 6CEC-2005 Enginyeria i Arquitectura La Salle
We ran UCS in chk with s=4096 c=4 and i=[0 7]
1. Introduction
We ran UCS in chk with s=4096, c=4 and i=[0..7]
2. UCS Description
3. Dataset Design3. Dataset Design
4. UCS on unbalanced d.
5. Dealing imbalances
6. Class-sensitive acc.
7. Weighted class-sens.
8. Conclusions
Training datasets for chk problem
Page 7CEC-2005 Enginyeria i Arquitectura La Salle
Obtaining the following results
1. Introduction
Obtaining the following results
2. UCS Description
3. Dataset Design3. Dataset Design
4. UCS on unbalanced d.
5. Dealing imbalances
6. Class-sensitive acc.
7. Weighted class-sens.
8. Conclusions
Boundaries evolved by UCS in the chk problem with imbalance levels from 0 to 7
Page 8CEC-2005 Enginyeria i Arquitectura La Salle
Analyzing the population evolved in higher imbalance levels
1. Introduction
Id diti Cl A F N
Analyzing the population evolved in higher imbalance levels
2. UCS Description
3. Dataset Design
Id condition Class Acc F Num
1 [0.509, 0.750] [0.259, 0.492] 1 1.00 1.00 39
2 [0.000, 0.231] [0.252, 0.492] 1 1.00 1.00 383. Dataset Design
4. UCS on unbalanced d.
3 [0.000, 0,248] [0.755, 1.000] 1 1.00 1.00 35
4 [0.761, 1.000] [0.000, 0.249] 1 1.00 1.00 34
5 [0.255, 0.498] [0.520, 0.730] 1 1.00 1.00 3318 rules5. Dealing imbalances6 [0.751, 1.000] [0.514, 0.737] 1 1.00 1.00 31
7 [0.259, 0.498] [0.000, 0.244] 1 1.00 1.00 27
8 [0.501, 0.743] [0.751, 1.000] 1 1.00 1.00 18
18 rules predicting the under-sized
class As imbalance level increases, the 6. Class-sensitive acc.[ , ] [ , ]
9 [0.500, 0.743] [0.751, 1.000] 1 1.00 1.00 9
10 [0.751, 1.000] [0.531, 0.737] 1 1.00 1.00 8
accuracy of the over-general classifiers increases too. Then, they become stronger in the population.
7. Weighted class-sens.
8. Conclusions…
18 [0.509, 0.750] [0.246, 0.492] 1 0.64 0.01 1
19 [0.000, 1.000] [0.000, 1.000] 0 0.94 0.54 2047 rules
g p p
20 [0.000, 1.000] [0.000, 0.990] 0 0.94 0.54 13
21 [0.012, 1.000] [0.000, 0.990] 0 0.94 0.54 10
…
47 rules predicting the
over-sized class
Page 9CEC-2005 Enginyeria i Arquitectura La Salle
64 [0.012, 1.000] [0.038, 0.973] 0 0.94 0.54 1Rules for imbalance level i=4
Methods to deal with imbalances1. Introduction
Methods to deal with imbalances
1. Methods that act at the Sampling Level2. UCS Description
3. Dataset Design3. Dataset Design
4. UCS on unbalanced d.Oversampling,
U d liSuppressing the bias towards the majority class
5. Dealing imbalancesUndersampling,
…Changing somehow the information available in the training dataset.
6. Class-sensitive acc.
2. Methods that act at the System Level 7. Weighted class-sens.
8. Conclusions
Cost-sensitive Imbalanced Datasets are a problem because rarelearning approach
Imbalanced Datasets are a problem because rare classes tends to be more costly to learn
Page 10CEC-2005 Enginyeria i Arquitectura La Salle
Class sensitive accuracy
W t f h l1. Introduction
Class-sensitive accuracy
• We compute accuracy for each class2. UCS Description
3. Dataset Designi
icacc = Ci = number of examples of class i correctly classified
b f l f l i d b th l
• The compound accuracy
3. Dataset Design
4. UCS on unbalanced d.i
iaccexp expi = number of examples of class i covered by the rule
The compound accuracy5. Dealing imbalances
C = Number of classes of the problemCe = Number of different classes that a rule∑=
C
iaccacc 16. Class-sensitive acc.Ce = Number of different classes that a rule
covers.∑
>=ii
e iC 0exp|1
7. Weighted class-sens.
Changing accuracy also changes fitness. So, fitness of individuals predicting instances of more than one
class decreases very fast
8. Conclusions
y
Page 11CEC-2005 Enginyeria i Arquitectura La Salle
1. Introduction
2. UCS Descript.
3. Dataset Design
4. UCS on unbal.
5. Dealing imb.
6. Chk Problem
7. Contrasting res.
8. Conclusions
Page 12CEC-2005 Enginyeria i Arquitectura La Salle
Class-sensitive accuracy
Analyzing the population evolved in higher imbalance levels
1. IntroductionId condition Class Acc F Num
1 [0.485, 0.756] [0.483, 0.753] 0 1 - 1.00 34
2 [0 000 0 253] [0 502 0 756] 0 1 1 00 34
2. UCS Description
3. Dataset Design
2 [0.000, 0.253] [0.502, 0.756] 0 1 - 1.00 34
3 [0.252, 0.505] [0.750, 1.000] 0 1 - 1.00 32
4 [0.753, 1.000] [0.749, 1.000] 0 1 - 1.00 31
5 [0.737, 1.000] [0.238, 0.515] 0 1 - 1.00 29
8 rules predicting the
over sized 3. Dataset Design
4. UCS on unbalanced d.
5 [0.737, 1.000] [0.238, 0.515] 0 1 1.00 29
6 [0.499, 0.772] [0.000, 0.277] 0 1 - 1.00 27
7 [0.000, 0.244] [0.000, 0.248] 0 1 - 1.00 27
8 [0.225, 0.544] [0.223, 0.529] 0 1 - 1.00 27
over-sized class
5. Dealing imbalances
[ , ] [ , ]
9 [0.252, 0.499] [0.000, 0.207] 1 - 1 1.00 21
10 [0.752, 1.000] [0.000, 0.242] 1 - 1 1.00 18
11 [0.751, 1.000] [0.502, 0.738] 1 - 1 1.00 15 6. Class-sensitive acc.12 [0.506, 0.734] [0.761, 1.000] 1 - 1 1.00 15
13 [0.510, 0.741] [0.252, 0.479] 1 - 1 1.00 13
14 [0.000, 0.233] [0.757, 1.000] 1 - 1 1.00 12
8 rules predicting the under-sized
class7. Weighted class-sens.
8. Conclusions15 [0.000, 0.240] [0.254, 0.485] 1 - 1 1.00 11
16 [0.252, 0.488] [0.516, 0.743] 1 - 1 1.00 6
17 [0.252, 0.498] [0.516, 0.692] 1 - 1 1.00 6
18 [0.000, 0.227] [0.757, 1.000] 1 - 1 1.00 4
19 [0.504, 0.772] [0.000, 0.277] 0 1 - 1.00 4
Page 13CEC-2005 Enginyeria i Arquitectura La Salle
…Rules for imbalance level i=4 using class-sensitive accuracy
Weighted Class sensitive accuracy
W t f h l1. Introduction
Weighted Class-sensitive accuracy
• We compute accuracy for each class2. UCS Description
3. Dataset Designi
icacc = Ci = number of examples of class i correctly classified
b f l f l i d b th l
• The compound accuracy
3. Dataset Design
4. UCS on unbalanced d.i
iaccexp expi = number of examples of class i covered by the rule
The compound accuracy5. Dealing imbalances
⎪⎨⎧ ∑
>=
C
iii
eacc
Cacc 0exp|1
1accii θ≥∀ exp: Ce = Number of different
classes that a rule 6. Class-sensitive acc.
⎪⎩⎨
∑=
>=
iiC
iiii
ewacc
C
acc 0e p|
0exp|1
1otherwise
classes that a rulecovers.
7. Weighted class-sens.
• Where 8. Conclusions
⎪⎧ exp f θ0 Cee = Number of
⎪⎩
⎪⎨
⎧
∑=
<<=−
acc
i
C
accii iacceCiw θ
θθ
exp
exp·exp0|1
acciif θ<< exp0..
acciif θ≥exp..
Cee = Number of experienced classes
Θacc = threshold below hi h l i
Page 14CEC-2005 Enginyeria i Arquitectura La Salle
⎪⎩ accee
acciC θ·
accif p which a class is inexperienced
1. Introduction
2. UCS Descript.
3. Dataset Design
4. UCS on unbal.
5. Dealing imb.
6. Chk Problem
7. Contrasting res.
8. Conclusions
Page 15CEC-2005 Enginyeria i Arquitectura La SalleWeighted Class-sensitive accuracy
Conclusions1. Introduction
Conclusions
• The class imbalance problem has appeared to be a real problem on UCS
2. UCS Description
3. Dataset Design
– For high unbalanced datasets, overgeneral rules interfere with specific rules covering the minority class regions
3. Dataset Design
4. UCS on unbalanced d.
• We proposed fitness adaptation based on class-sensitive accuracy to diminish the generalization
th t GA k ( id d b fit )
5. Dealing imbalances
6. Class-sensitive acc.pressure that GA makes (guided by fitness)– UCS can discover the right boundaries, but tends to
uncover some regions of the feature space7. Weighted class-sens.
uncover some regions of the feature space
• A weighted accuracy function was proposed to improve the coverage of the method
8. Conclusions
the coverage of the method
Page 16CEC-2005 Enginyeria i Arquitectura La Salle
Further Work1. Introduction
Further Work
• Enhance the study analyzing the contribution of each complexity factor
2. UCS Description
3. Dataset Designof each complexity factor
• Introducing new problems to the analysis
3. Dataset Design
4. UCS on unbalanced d.
• Use real-world datasets to validate the results
T t t t i th t t t th li
5. Dealing imbalances
6. Class-sensitive acc.• Test new strategies that act at the sampling level (to appear IWLCS’05) 7. Weighted class-sens.
• How can the new strategies deal with:– Noise
8. Conclusions
Noise
– Scarcity
Page 17CEC-2005 Enginyeria i Arquitectura La Salle