voting based learning classifier system for multi-label classification
DESCRIPTION
Kaveh Ahmadi-Abhari, Ali Hamzeh, Sattar Hashemi. "Voting Based Learning Classifier System for Multi-Label Classification". IWLCS, 2011TRANSCRIPT
Voting-Based Learning Classifier System
for multi-label classification
Kaveh Ahmadi-Abhari (Presenter)
Ali Hamzeh
Sattar Hashemi
IWLCS 2011 – Dublin, Ireland, 13th July 2011
Multi-label Classification
Single Label Classification
Exclusive classes: each example belongs to one class
Multi-label Classification
Each instance can belong to more than one class
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group2
Multi-label Classification
Single Label Classification
Exclusive classes: each example belongs to one class
Multi-label Classification
Each instance can belong to more than one class
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group3
SkyPeople
Sand
Current Methods
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group4
• Transfer problem to a single-label classification problem
Problem Transformation
• Adapt single-label classifiers to Solve the problem
Algorithm Adaptation
[Tsoumakas & Katakis, 2007]
Problem Transformation Approaches
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group5
Ex. Label- set
1
2
3
4
{ }1 4,λ λ
{ }3 4,λ λ
{ }1λ
{ }2 3 4, ,λ λ λ
Ex. Label- set
1a
1b
2a
2b
3
4a
4b
4c
1λ
4λ
3λ
4λ
1λ
2λ
3λ
4λ
Copy Transformation
[Tsoumakas et al., 2009]
Algorithm Adaptation Approaches
Multi-label lazy algorithm ML-kNN [Zhang & Zhou, PRJ07]
Multi-label decision trees ADTBoost.MH [DeComité et al. MLDM03]
Multi-Label C4.5 [Clare & King, LNCS2168]
Multi-label kernel methods Rank-SVM [Elisseeff & Weston, NIPS02]
ML-SVM [M.R. Boutell, et al. PR04]
Multi-label text categorization algorithms BoosTexter [Schapire & Singer, MLJ00]
Maximal Margin Labeling [Kazawa et al., NIPS04]
Probabilistic generative models [McCallum, AAAI99] [Ueda & Saito, NIPS03]
BP-MLL [Zhang & Zhou, TKDE06]
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group6
Motivation
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group7
A lot has been done in terms of classifications using LCSs
Most of these studies have been conducted for single-label classification problems
Multi-label classification is in its inception [Vallimet al., IWLCS 08]
Voting Based Learning Classifier System
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group8
How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?
Voting Based Learning Classifier System
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group9
How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?
Using the prior knowledge gained from past experiences
Voting Based Learning Classifier System
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group10
How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?
Using the prior knowledge gained from past experiences
Training instances vote their matched rules according to how correct the rule is
Voting Based Learning Classifier System
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group11
How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?
Using the prior knowledge gained from past experiences
Training instances vote their matched rules according to how correct the rule is
Fitness measure
Voting Defining Rule Types
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group12
How can the given votes describe the quality of the rules accurately?
Define different types for the rules such that each of these types describes the quality status the rule might have.
Rule Types
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group13
Each rule might receive a “correct” or “wrong” vote from each matched training instance.
A rule receives a combination of “correct” and “wrong” votes from its matched training instances
Example:in a single-label classification problem, rule types might be correct or wrong.
Votes as Fitness Measure
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group14
• Given votes• Describe the quality of the rules• Use as a fitness measure for
guiding the discovery mechanism.
• For example, a rule with more “wrong” votes, should be discovered with a high probability to achieve a meaningful rule
Rules Definition
Antecedent part matches with the feature vector.
Consequent part are the classes predicted by the rule.
One bit for each class in the consequent part. Value 1 in the bit indicates existence of the respective class.
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group15
Antecedent / Consequent###1 / 110 0011 / 001
VLCS Vote Types for Multi-label Problem
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group16
Multi-label Vote Types for
VLCS
Correct
Subset
SupersetPartial
Wrong
Multi-Label Simple Dataset
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group17
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
Expand from [Vallim et al., GECCO’ 08]
VLCS Voting Options for Multi-label Problem
Correct Rules (C)
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group18
00# /1001
• Is correct when it matches with: • 000 or• 001
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
VLCS Voting Options for Multi-label Problem
Wrong Rules (W)
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group19
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
0#0/0010
• Is wrong when it matches with: • 000 or• 010
VLCS Voting Options for Multi-label Problem
Subset Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group20
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#01/1000
• Is subset when it matches with: • 001 or• 101
VLCS Voting Options for Multi-label Problem
Subset Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group21
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#01/1000
• Is subset when it matches with: • 001 or• 101
Excepted Classes:
1, 4
VLCS Voting Options for Multi-label Problem
Superset Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group22
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#00/1101
• Is superset when it matches with: • 001 or• 101
VLCS Voting Options for Multi-label Problem
Superset Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group23
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#00/1101
• Is superset when it matches with: • 001 or• 101
Excepted Classes:
1, 4
VLCS Voting Options for Multi-label Problem
Partial-set Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group24
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#1# / 0110
• Is superset when it matches with: • 010 or• 111
VLCS Voting Options for Multi-label Problem
Partial-set Rules
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group25
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
#1# / 0110
• Is superset when it matches with: • 010 or• 111
Excepted Classes:
2, 4
VLCS Voting Options for Multi-label Problem
Rules might receive different votes during the time
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group26
#0# / 1001
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
VLCS Voting Options for Multi-label Problem
Rules might receive different votes during the time
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group27
#0# / 1001
Is correct for instance 000
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
VLCS Voting Options for Multi-label Problem
Rules might receive different votes during the time
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group28
#0# / 1001
Is correct for instance 000
1, 4
2, 41, 2
1, 3
000
001
010
011100
101
110
111
Is partial-set for instance
101
Using Stored Prior Knowledge
Consider a rule that all received votes are superset
The rule is covering an appropriate area of the problem
The rule is predicting greater number of classes for the matched input instance
The number of the classes the rule predicts should be subtracted
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group29
} Information
} Inference
Discovery Operators
In the discovery mechanism an evolutionary algorithm with four mutation operators is defined:
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group30
Discovery Operators
Mutation operators on rule’s antecedent part
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group31
Generalize the rule by flipping the 0 or 1 bits to #MA-G
Specializes the rule by flipping # bits to 1 or 0MA-S
Discovery Operators
Mutation operators on rule’s consequent part
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group32
Subtract the number of predicted classes by flipping 1 bits to 0MC-S
Adds more classes to predicted classes by flipping 0 bits to 1MC-A
Which Discovery Operator?
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group33
The votes each rule has received guide which mutation operator should act.
Which Discovery Operator?
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group34
The votes each rule has received guide which mutation operator should act.
Superset Rule
Wrongly assigned some non-expected
classes
Subtract the number of predicted
classes (MC-S)
Which Discovery Operator?
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group35
Rule Received VotesActivated Mutation
Operator
Correct MA-G
Subset MC-A
Superset MC-S
Partial-Set MC-A, MC-S
Wrong MC-A, MC-S
Correct, Subset MA-S
Correct, Superset MA-G
Correct, Partial-Set MA-S
Correct, Wrong MA-S
Wrong, Subset MA-S, MC-A
Wrong, Partial MA-S
Correct, Subset, Wrong MA-S, MA-G
Mutation Rate
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group36
• Mutation operator performs bit flipping using a probability, which is the mutation rate.
• The strength of a rule is the amount of reward we predict the system to receive if the rule acts.
• The more the strength, the less the mutation rate.
Strength of a Rule
The mean of the rewards the rule gets over time.
Reward Function:
1 rule expected
rule expected
C CR
C C
∆= −
Kaveh Ahmadi-Abhari 37 Shiraz University, Soft Computing Group
Alteration of [Vallim et al., GECCO’ 08]
Strength of a Rule
The mean of the rewards the rule gets over time.
Reward Function:
1 rule expected
rule expected
C CR
C C
∆= −
( ) ( ){ }:A B x x A x B∆ = ∈ ⊕ ∈
Kaveh Ahmadi-Abhari 38 Shiraz University, Soft Computing Group
Alteration of [Vallim et al., GECCO’ 08]
Rules Rewards
Input Instance
Expected output
Selected Rule
Received Vote Reward
0001 1, 2 ###1 / 110 Correct 1
0101 1, 2, 3 ###1 / 110 Subset 0.66
0111 1 ###1 / 110 Superset 0.50
1111 1,3 ###1 / 110 Partial-set 0.33
0011 3 ###1 / 110 Wrong 0
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group39
Experimental Results
Data Sets: Two binary datasets in the bioinformatics domain [Chan and Freitas, GECCO’ 06 ]
Extracted from [Alves et al., 2009]
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group40
Experimental Results
Quality Metrics:
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group41
• Proportion of predicted classes among all predicted or true classes
Accuracy
• Proportion of true classes among all predicted classes
Precision
• Proportion of predicted classes among all true classes
Recall
[Tsoumakas & Katakis, 2007]
Experimental Results
For the VLCS, we use a 5-fold cross validation in which the training part is used to evaluate the rules using the voting mechanism described above.
Fixed size population initially are the most general possible rules.
In each generation, each rule is voted by its matched instances reward is assigned
Defined mutation operators to discover new rules
The combination of the best rules among the parents and the off-springs make the next generation.
We stop the training phase if the mean strength of the rules decreases in a number of consecutive generations.
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group42
Experimental Results
[Chan and Freitas, GECCO’ 06 ] 135 instances
152 attributes
Two classes• Each instance could have one or both of the available class labels.
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group43
Method Accuracy Precision Recall
BR 0.89 0.89 0.87
ML-KNN 0.91 0.93 0.91
VLCS 0.89 0.89 0.89
Experimental Results
Extracted from [Alves et al., 2009] 7877 proteins
40 attributes
Six classes• Each instance could have some of the available class labels.
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group44
Method Accuracy Precision Recall
BR 0.78 0.77 0.78
ML-KNN 0.80 0.81 0.80
VLCS 0.81 0.83 0.82
Conclusion
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group45
Guiding the discovery mechanism with a prior knowledge, such that is used in VLCS, can help us solve applicable problems
Future Work
A representation for dealing with numeric and nominal datasets.
Future studies on scalability and stability of the system is necessary.
Additional studies on system performance in dealing with imbalanced data and noise is also required.
Improving evolutionary operators, guiding mechanism and rule refinement.
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group46
Any Question?
Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group47
The most exciting phrase to hear in science, the one that heralds new discoveries is not “Eureka”! (I found it!) but “That's funny...”
- Isaac Asimov