voting based learning classifier system for multi-label classification

47
Voting-Based Learning Classifier System for multi-label classification Kaveh Ahmadi-Abhari (Presenter) Ali Hamzeh Sattar Hashemi IWLCS 2011 – Dublin, Ireland, 13 th July 2011

Upload: daniele-loiacono

Post on 22-Jan-2015

673 views

Category:

Business


2 download

DESCRIPTION

Kaveh Ahmadi-Abhari, Ali Hamzeh, Sattar Hashemi. "Voting Based Learning Classifier System for Multi-Label Classification". IWLCS, 2011

TRANSCRIPT

Page 1: Voting Based Learning Classifier System for Multi-Label Classification

Voting-Based Learning Classifier System

for multi-label classification

Kaveh Ahmadi-Abhari (Presenter)

Ali Hamzeh

Sattar Hashemi

IWLCS 2011 – Dublin, Ireland, 13th July 2011

Page 2: Voting Based Learning Classifier System for Multi-Label Classification

Multi-label Classification

Single Label Classification

Exclusive classes: each example belongs to one class

Multi-label Classification

Each instance can belong to more than one class

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group2

Page 3: Voting Based Learning Classifier System for Multi-Label Classification

Multi-label Classification

Single Label Classification

Exclusive classes: each example belongs to one class

Multi-label Classification

Each instance can belong to more than one class

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group3

SkyPeople

Sand

Page 4: Voting Based Learning Classifier System for Multi-Label Classification

Current Methods

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group4

• Transfer problem to a single-label classification problem

Problem Transformation

• Adapt single-label classifiers to Solve the problem

Algorithm Adaptation

[Tsoumakas & Katakis, 2007]

Page 5: Voting Based Learning Classifier System for Multi-Label Classification

Problem Transformation Approaches

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group5

Ex. Label- set

1

2

3

4

{ }1 4,λ λ

{ }3 4,λ λ

{ }1λ

{ }2 3 4, ,λ λ λ

Ex. Label- set

1a

1b

2a

2b

3

4a

4b

4c

Copy Transformation

[Tsoumakas et al., 2009]

Page 6: Voting Based Learning Classifier System for Multi-Label Classification

Algorithm Adaptation Approaches

Multi-label lazy algorithm ML-kNN [Zhang & Zhou, PRJ07]

Multi-label decision trees ADTBoost.MH [DeComité et al. MLDM03]

Multi-Label C4.5 [Clare & King, LNCS2168]

Multi-label kernel methods Rank-SVM [Elisseeff & Weston, NIPS02]

ML-SVM [M.R. Boutell, et al. PR04]

Multi-label text categorization algorithms BoosTexter [Schapire & Singer, MLJ00]

Maximal Margin Labeling [Kazawa et al., NIPS04]

Probabilistic generative models [McCallum, AAAI99] [Ueda & Saito, NIPS03]

BP-MLL [Zhang & Zhou, TKDE06]

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group6

Page 7: Voting Based Learning Classifier System for Multi-Label Classification

Motivation

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group7

A lot has been done in terms of classifications using LCSs

Most of these studies have been conducted for single-label classification problems

Multi-label classification is in its inception [Vallimet al., IWLCS 08]

Page 8: Voting Based Learning Classifier System for Multi-Label Classification

Voting Based Learning Classifier System

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group8

How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?

Page 9: Voting Based Learning Classifier System for Multi-Label Classification

Voting Based Learning Classifier System

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group9

How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?

Using the prior knowledge gained from past experiences

Page 10: Voting Based Learning Classifier System for Multi-Label Classification

Voting Based Learning Classifier System

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group10

How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?

Using the prior knowledge gained from past experiences

Training instances vote their matched rules according to how correct the rule is

Page 11: Voting Based Learning Classifier System for Multi-Label Classification

Voting Based Learning Classifier System

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group11

How can we guide the discovery mechanism (e.g. evolutionary operators) in LCSs?

Using the prior knowledge gained from past experiences

Training instances vote their matched rules according to how correct the rule is

Fitness measure

Page 12: Voting Based Learning Classifier System for Multi-Label Classification

Voting Defining Rule Types

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group12

How can the given votes describe the quality of the rules accurately?

Define different types for the rules such that each of these types describes the quality status the rule might have.

Page 13: Voting Based Learning Classifier System for Multi-Label Classification

Rule Types

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group13

Each rule might receive a “correct” or “wrong” vote from each matched training instance.

A rule receives a combination of “correct” and “wrong” votes from its matched training instances

Example:in a single-label classification problem, rule types might be correct or wrong.

Page 14: Voting Based Learning Classifier System for Multi-Label Classification

Votes as Fitness Measure

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group14

• Given votes• Describe the quality of the rules• Use as a fitness measure for

guiding the discovery mechanism.

• For example, a rule with more “wrong” votes, should be discovered with a high probability to achieve a meaningful rule

Page 15: Voting Based Learning Classifier System for Multi-Label Classification

Rules Definition

Antecedent part matches with the feature vector.

Consequent part are the classes predicted by the rule.

One bit for each class in the consequent part. Value 1 in the bit indicates existence of the respective class.

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group15

Antecedent / Consequent###1 / 110 0011 / 001

Page 16: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Vote Types for Multi-label Problem

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group16

Multi-label Vote Types for

VLCS

Correct

Subset

SupersetPartial

Wrong

Page 17: Voting Based Learning Classifier System for Multi-Label Classification

Multi-Label Simple Dataset

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group17

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Expand from [Vallim et al., GECCO’ 08]

Page 18: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Correct Rules (C)

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group18

00# /1001

• Is correct when it matches with: • 000 or• 001

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Page 19: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Wrong Rules (W)

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group19

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

0#0/0010

• Is wrong when it matches with: • 000 or• 010

Page 20: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Subset Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group20

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#01/1000

• Is subset when it matches with: • 001 or• 101

Page 21: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Subset Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group21

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#01/1000

• Is subset when it matches with: • 001 or• 101

Excepted Classes:

1, 4

Page 22: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Superset Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group22

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#00/1101

• Is superset when it matches with: • 001 or• 101

Page 23: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Superset Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group23

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#00/1101

• Is superset when it matches with: • 001 or• 101

Excepted Classes:

1, 4

Page 24: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Partial-set Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group24

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#1# / 0110

• Is superset when it matches with: • 010 or• 111

Page 25: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Partial-set Rules

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group25

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

#1# / 0110

• Is superset when it matches with: • 010 or• 111

Excepted Classes:

2, 4

Page 26: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Rules might receive different votes during the time

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group26

#0# / 1001

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Page 27: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Rules might receive different votes during the time

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group27

#0# / 1001

Is correct for instance 000

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Page 28: Voting Based Learning Classifier System for Multi-Label Classification

VLCS Voting Options for Multi-label Problem

Rules might receive different votes during the time

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group28

#0# / 1001

Is correct for instance 000

1, 4

2, 41, 2

1, 3

000

001

010

011100

101

110

111

Is partial-set for instance

101

Page 29: Voting Based Learning Classifier System for Multi-Label Classification

Using Stored Prior Knowledge

Consider a rule that all received votes are superset

The rule is covering an appropriate area of the problem

The rule is predicting greater number of classes for the matched input instance

The number of the classes the rule predicts should be subtracted

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group29

} Information

} Inference

Page 30: Voting Based Learning Classifier System for Multi-Label Classification

Discovery Operators

In the discovery mechanism an evolutionary algorithm with four mutation operators is defined:

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group30

Page 31: Voting Based Learning Classifier System for Multi-Label Classification

Discovery Operators

Mutation operators on rule’s antecedent part

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group31

Generalize the rule by flipping the 0 or 1 bits to #MA-G

Specializes the rule by flipping # bits to 1 or 0MA-S

Page 32: Voting Based Learning Classifier System for Multi-Label Classification

Discovery Operators

Mutation operators on rule’s consequent part

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group32

Subtract the number of predicted classes by flipping 1 bits to 0MC-S

Adds more classes to predicted classes by flipping 0 bits to 1MC-A

Page 33: Voting Based Learning Classifier System for Multi-Label Classification

Which Discovery Operator?

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group33

The votes each rule has received guide which mutation operator should act.

Page 34: Voting Based Learning Classifier System for Multi-Label Classification

Which Discovery Operator?

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group34

The votes each rule has received guide which mutation operator should act.

Superset Rule

Wrongly assigned some non-expected

classes

Subtract the number of predicted

classes (MC-S)

Page 35: Voting Based Learning Classifier System for Multi-Label Classification

Which Discovery Operator?

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group35

Rule Received VotesActivated Mutation

Operator

Correct MA-G

Subset MC-A

Superset MC-S

Partial-Set MC-A, MC-S

Wrong MC-A, MC-S

Correct, Subset MA-S

Correct, Superset MA-G

Correct, Partial-Set MA-S

Correct, Wrong MA-S

Wrong, Subset MA-S, MC-A

Wrong, Partial MA-S

Correct, Subset, Wrong MA-S, MA-G

Page 36: Voting Based Learning Classifier System for Multi-Label Classification

Mutation Rate

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group36

• Mutation operator performs bit flipping using a probability, which is the mutation rate.

• The strength of a rule is the amount of reward we predict the system to receive if the rule acts.

• The more the strength, the less the mutation rate.

Page 37: Voting Based Learning Classifier System for Multi-Label Classification

Strength of a Rule

The mean of the rewards the rule gets over time.

Reward Function:

1 rule expected

rule expected

C CR

C C

∆= −

Kaveh Ahmadi-Abhari 37 Shiraz University, Soft Computing Group

Alteration of [Vallim et al., GECCO’ 08]

Page 38: Voting Based Learning Classifier System for Multi-Label Classification

Strength of a Rule

The mean of the rewards the rule gets over time.

Reward Function:

1 rule expected

rule expected

C CR

C C

∆= −

( ) ( ){ }:A B x x A x B∆ = ∈ ⊕ ∈

Kaveh Ahmadi-Abhari 38 Shiraz University, Soft Computing Group

Alteration of [Vallim et al., GECCO’ 08]

Page 39: Voting Based Learning Classifier System for Multi-Label Classification

Rules Rewards

Input Instance

Expected output

Selected Rule

Received Vote Reward

0001 1, 2 ###1 / 110 Correct 1

0101 1, 2, 3 ###1 / 110 Subset 0.66

0111 1 ###1 / 110 Superset 0.50

1111 1,3 ###1 / 110 Partial-set 0.33

0011 3 ###1 / 110 Wrong 0

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group39

Page 40: Voting Based Learning Classifier System for Multi-Label Classification

Experimental Results

Data Sets: Two binary datasets in the bioinformatics domain [Chan and Freitas, GECCO’ 06 ]

Extracted from [Alves et al., 2009]

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group40

Page 41: Voting Based Learning Classifier System for Multi-Label Classification

Experimental Results

Quality Metrics:

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group41

• Proportion of predicted classes among all predicted or true classes

Accuracy

• Proportion of true classes among all predicted classes

Precision

• Proportion of predicted classes among all true classes

Recall

[Tsoumakas & Katakis, 2007]

Page 42: Voting Based Learning Classifier System for Multi-Label Classification

Experimental Results

For the VLCS, we use a 5-fold cross validation in which the training part is used to evaluate the rules using the voting mechanism described above.

Fixed size population initially are the most general possible rules.

In each generation, each rule is voted by its matched instances reward is assigned

Defined mutation operators to discover new rules

The combination of the best rules among the parents and the off-springs make the next generation.

We stop the training phase if the mean strength of the rules decreases in a number of consecutive generations.

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group42

Page 43: Voting Based Learning Classifier System for Multi-Label Classification

Experimental Results

[Chan and Freitas, GECCO’ 06 ] 135 instances

152 attributes

Two classes• Each instance could have one or both of the available class labels.

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group43

Method Accuracy Precision Recall

BR 0.89 0.89 0.87

ML-KNN 0.91 0.93 0.91

VLCS 0.89 0.89 0.89

Page 44: Voting Based Learning Classifier System for Multi-Label Classification

Experimental Results

Extracted from [Alves et al., 2009] 7877 proteins

40 attributes

Six classes• Each instance could have some of the available class labels.

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group44

Method Accuracy Precision Recall

BR 0.78 0.77 0.78

ML-KNN 0.80 0.81 0.80

VLCS 0.81 0.83 0.82

Page 45: Voting Based Learning Classifier System for Multi-Label Classification

Conclusion

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group45

Guiding the discovery mechanism with a prior knowledge, such that is used in VLCS, can help us solve applicable problems

Page 46: Voting Based Learning Classifier System for Multi-Label Classification

Future Work

A representation for dealing with numeric and nominal datasets.

Future studies on scalability and stability of the system is necessary.

Additional studies on system performance in dealing with imbalanced data and noise is also required.

Improving evolutionary operators, guiding mechanism and rule refinement.

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group46

Page 47: Voting Based Learning Classifier System for Multi-Label Classification

Any Question?

Kaveh Ahmadi-Abhari Shiraz University, Soft Computing Group47

The most exciting phrase to hear in science, the one that heralds new discoveries is not “Eureka”! (I found it!) but “That's funny...”

- Isaac Asimov