voting based learning classifier system for multi-label classification

Voting-Based Learning Classifier System

for multi-label classification

Kaveh Ahmadi-Abhari (Presenter)

Ali Hamzeh

Sattar Hashemi

IWLCS 2011 – Dublin, Ireland, 13th July 2011

Multi-label Classification

Single Label Classification

Exclusive classes: each example belongs to one class


Each instance can belong to more than one class

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group2


Single Label Classification

Exclusive classes: each example belongs to one class


Each instance can belong to more than one class


SkyPeople

Sand

Current Methods


• Transfer problem to a single-label classification problem

Problem Transformation

• Adapt single-label classifiers to Solve the problem

Algorithm Adaptation

[Tsoumakas & Katakis, 2007]

Problem Transformation Approaches


Ex. Label- set

1

2

3

4

{ }1 4,λ λ

{ }3 4,λ λ

{ }1λ

{ }2 3 4, ,λ λ λ

Ex. Label- set

1a

1b

2a

2b

3

4a

4b

4c

1λ

4λ

3λ

4λ

1λ

2λ

3λ

4λ

Copy Transformation

[Tsoumakas et al., 2009]

Algorithm Adaptation Approaches

Multi-label lazy algorithm ML-kNN [Zhang & Zhou, PRJ07]

Multi-label decision trees ADTBoost.MH [DeComité et al. MLDM03]

Multi-Label C4.5 [Clare & King, LNCS2168]

Multi-label kernel methods Rank-SVM [Elisseeff & Weston, NIPS02]

ML-SVM [M.R. Boutell, et al. PR04]

Multi-label text categorization algorithms BoosTexter [Schapire & Singer, MLJ00]

Maximal Margin Labeling [Kazawa et al., NIPS04]

Probabilistic generative models [McCallum, AAAI99] [Ueda & Saito, NIPS03]

BP-MLL [Zhang & Zhou, TKDE06]


Motivation


A lot has been done in terms of classifications using LCSs

Most of these studies have been conducted for single-label classification problems

Multi-label classification is in its inception [Vallimet al., IWLCS 08]

Voting Based Learning Classifier System


How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?




Using the prior knowledge gained from past experiences





Training instances vote their matched rules according to how correct the rule is





Training instances vote their matched rules according to how correct the rule is

Fitness measure

Voting Defining Rule Types


How can the given votes describe the quality of the rules accurately?

Define different types for the rules such that each of these types describes the quality status the rule might have.

Rule Types


Each rule might receive a “correct” or “wrong” vote from each matched training instance.

A rule receives a combination of “correct” and “wrong” votes from its matched training instances

Example:in a single-label classification problem, rule types might be correct or wrong.

Votes as Fitness Measure


• Given votes• Describe the quality of the rules• Use as a fitness measure for

guiding the discovery mechanism.

• For example, a rule with more “wrong” votes, should be discovered with a high probability to achieve a meaningful rule

Rules Definition

Antecedent part matches with the feature vector.

Consequent part are the classes predicted by the rule.

One bit for each class in the consequent part. Value 1 in the bit indicates existence of the respective class.


Antecedent / Consequent###1 / 110 0011 / 001

VLCS Vote Types for Multi-label Problem


Multi-label Vote Types for

VLCS

Correct

Subset

SupersetPartial

Wrong

Multi-Label Simple Dataset


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Expand from [Vallim et al., GECCO’ 08]

VLCS Voting Options for Multi-label Problem

Correct Rules (C)


00# /1001

• Is correct when it matches with: • 000 or• 001

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111


Wrong Rules (W)


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

0#0/0010

• Is wrong when it matches with: • 000 or• 010


Subset Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#01/1000

• Is subset when it matches with: • 001 or• 101


Subset Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#01/1000

• Is subset when it matches with: • 001 or• 101

Excepted Classes:

1, 4


Superset Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#00/1101

• Is superset when it matches with: • 001 or• 101


Superset Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#00/1101


Excepted Classes:

1, 4


Partial-set Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#1# / 0110



Partial-set Rules


1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#1# / 0110


Excepted Classes:

2, 4


Rules might receive different votes during the time


#0# / 1001

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111




#0# / 1001

Is correct for instance 000

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111




#0# / 1001

Is correct for instance 000

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Is partial-set for instance

101

Using Stored Prior Knowledge

Consider a rule that all received votes are superset

The rule is covering an appropriate area of the problem

The rule is predicting greater number of classes for the matched input instance

The number of the classes the rule predicts should be subtracted


} Information

} Inference

Discovery Operators

In the discovery mechanism an evolutionary algorithm with four mutation operators is defined:


Discovery Operators

Mutation operators on rule’s antecedent part


Generalize the rule by flipping the 0 or 1 bits to #MA-G

Specializes the rule by flipping # bits to 1 or 0MA-S

Discovery Operators

Mutation operators on rule’s consequent part


Subtract the number of predicted classes by flipping 1 bits to 0MC-S

Adds more classes to predicted classes by flipping 0 bits to 1MC-A

Which Discovery Operator?


The votes each rule has received guide which mutation operator should act.



The votes each rule has received guide which mutation operator should act.

Superset Rule

Wrongly assigned some non-expected

classes

Subtract the number of predicted

classes (MC-S)



Rule Received VotesActivated Mutation

Operator

Correct MA-G

Subset MC-A

Superset MC-S

Partial-Set MC-A, MC-S

Wrong MC-A, MC-S

Correct, Subset MA-S

Correct, Superset MA-G

Correct, Partial-Set MA-S

Correct, Wrong MA-S

Wrong, Subset MA-S, MC-A

Wrong, Partial MA-S

Correct, Subset, Wrong MA-S, MA-G

Mutation Rate


• Mutation operator performs bit flipping using a probability, which is the mutation rate.

• The strength of a rule is the amount of reward we predict the system to receive if the rule acts.

• The more the strength, the less the mutation rate.

Strength of a Rule

The mean of the rewards the rule gets over time.

Reward Function:

1 rule expected

rule expected

C CR

C C

∆= −

Kaveh Ahmadi-Abhari 37 Shiraz University, Soft Computing Group

Alteration of [Vallim et al., GECCO’ 08]

Strength of a Rule

The mean of the rewards the rule gets over time.

Reward Function:

1 rule expected

rule expected

C CR

C C

∆= −

( ) ( ){ }:A B x x A x B∆ = ∈ ⊕ ∈

Kaveh Ahmadi-Abhari 38 Shiraz University, Soft Computing Group

Alteration of [Vallim et al., GECCO’ 08]

Rules Rewards

Input Instance

Expected output

Selected Rule

Received Vote Reward

0001 1, 2 ###1 / 110 Correct 1

0101 1, 2, 3 ###1 / 110 Subset 0.66

0111 1 ###1 / 110 Superset 0.50

1111 1,3 ###1 / 110 Partial-set 0.33

0011 3 ###1 / 110 Wrong 0


Experimental Results

Data Sets: Two binary datasets in the bioinformatics domain [Chan and Freitas, GECCO’ 06 ]

Extracted from [Alves et al., 2009]



Quality Metrics:


• Proportion of predicted classes among all predicted or true classes

Accuracy

• Proportion of true classes among all predicted classes

Precision

• Proportion of predicted classes among all true classes

Recall

[Tsoumakas & Katakis, 2007]


For the VLCS, we use a 5-fold cross validation in which the training part is used to evaluate the rules using the voting mechanism described above.

Fixed size population initially are the most general possible rules.

In each generation, each rule is voted by its matched instances reward is assigned

Defined mutation operators to discover new rules

The combination of the best rules among the parents and the off-springs make the next generation.

We stop the training phase if the mean strength of the rules decreases in a number of consecutive generations.



[Chan and Freitas, GECCO’ 06 ] 135 instances

152 attributes

Two classes• Each instance could have one or both of the available class labels.


Method Accuracy Precision Recall

BR 0.89 0.89 0.87

ML-KNN 0.91 0.93 0.91

VLCS 0.89 0.89 0.89


Extracted from [Alves et al., 2009] 7877 proteins

40 attributes

Six classes• Each instance could have some of the available class labels.


Method Accuracy Precision Recall

BR 0.78 0.77 0.78

ML-KNN 0.80 0.81 0.80

VLCS 0.81 0.83 0.82

Conclusion


Guiding the discovery mechanism with a prior knowledge, such that is used in VLCS, can help us solve applicable problems

Future Work

A representation for dealing with numeric and nominal datasets.

Future studies on scalability and stability of the system is necessary.

Additional studies on system performance in dealing with imbalanced data and noise is also required.

Improving evolutionary operators, guiding mechanism and rule refinement.


Any Question?


The most exciting phrase to hear in science, the one that heralds new discoveries is not “Eureka”! (I found it!) but “That's funny...”

- Isaac Asimov

voting based learning classifier system for multi-label classification

Business

soft computing group

multilabel lazy algorithm

mldm03 multilabel c4

classkaveh ahmadiabhari

class sandkaveh ahmadiabhari

rule types example

rule typesmight

vlcs voting options