classifier ensembles: does the combination rule matter? ludmila kuncheva school of computer science...
TRANSCRIPT
![Page 1: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/1.jpg)
Classifier ensembles: Does the combination
rule matter?Ludmila Kuncheva
School of Computer Science
Bangor University, UK
![Page 2: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/2.jpg)
classifier
feature values(object description)
classifier classifier
class label
combinerclassifier ensemble
![Page 3: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/3.jpg)
Congratulations!The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum.We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love.
classifier
feature values(object description)
classifier classifier
class label
combinerclassifier ensemble
![Page 4: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/4.jpg)
cited 7194 times
by 28 July 2013
(Google Scholar)
classifier
feature values(object description)
classifier classifier
class label
combinerclassifier ensemble
![Page 5: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/5.jpg)
Saso Dzeroski
David Hand
S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, 255-273.
David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21 (1), 1-14.
Classifier combination? Hmmmm…..
We are kidding ourselves; there is no real progress in spite of ensemble methods.
Chances are that the single best classifier will be better than the ensemble.
![Page 6: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/6.jpg)
Quo Vadis?
"combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning")
![Page 7: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/7.jpg)
time
visi
bilit
y
naiv
e eu
phor
ia
asymptote of reality
slope of enlightenment
trough of disillusionment
peak of inflated expectations
Gartner’s Hype Cycle: a typical evolution pattern of a new technology
Where are we?...
![Page 8: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/8.jpg)
1990 1995 2000 2005 20100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
per m
il of
pub
lishe
d pa
pers
on
clas
sifie
r ens
embl
es
time
IEEE
TSM
C
IEEE
TPA
MI
NN ML
IEEE
TPA
MI
IEEE
TPA
MI
ML
IEEE
TPA
MI
ML
JASA
ML
IJCV
PRIE
EE T
PAM
IIE
EE T
PAM
IJA
E PPL
PPL JT
BCC
(6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE TSMC = IEEE Transactions on Systems, Man and CyberneticsJASA = Journal of the American Statistical Association
IJCV = International Journal of Computer VisionJTB = Journal of Theoretical Biology
(2) PPL = Protein and Peptide LettersJAE = Journal of Animal Ecology
PR = Pattern Recognition (4) ML = Machine Learning
NN = Neural NetworksCC = Cerebral Cortex
top cited paper is from…
application paper
![Page 9: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/9.jpg)
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 20120
500
1000
1500
2000
2500
3000
3500
4000
4500
num
ber o
f cita
tions
time
[ML] Bagging predictors
[IEEE TPAMI] On combining classifiers
[ML] Random forests
[IJCV] Robust real-time face detection
![Page 10: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/10.jpg)
International Workshop on Multiple Classifier Systems2000 – 2013 - continuing
![Page 11: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/11.jpg)
Combiner
Features
Classifier 2
Classifier 1
Classifier L
…
Data set
A Combination level• selection or fusion?• voting or another combination method?• trainable or non-trainable combiner?
B Classifier level• same or different classifiers?• decision trees, neural networks or other?• how many?
C Feature level• all features or subsets of features?• random or selected subsets?D Data level
• independent/dependent bootstrap samples?
• selected data sets?
Levels of questions
![Page 12: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/12.jpg)
Number of classifiers L1
The perfect classifier• 3-8 classifiers• heterogeneous• trained combiner(stacked generalisation)
• 100+ classifiers• same model• non-trained
combiner(bagging, boosting, etc.)
Large ensemble of nearly identical classifiers - REDUNDANCY
Small ensembles of weak classifiers - INSUFFICIENCY?
?
Must engineer diversity…
Strength of classifiers
How about here?• 30-50 classifiers• same or different models?• trained or non-trained
combiner?• selection or fusion?• IS IT WORTH IT?
![Page 13: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/13.jpg)
Number of classifiers L1
The perfect classifier• 3-8 classifiers• heterogeneous• trained combiner(stacked generalisation)
• 100+ classifiers• same model• non-trained
combiner(bagging, boosting, etc.)
Large ensemble of nearly identical classifiers - REDUNDANCY
Small ensembles of weak classifiers - INSUFFICIENCY
Must engineer diversity…
Strength of classifiers
• 30-50 classifiers• same or different models?• trained or non-trained
combiner?• selection or fusion?• IS IT WORTH IT?
Diversity is absolutely CRUCIAL!
Diversity is pretty impossible…
![Page 14: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/14.jpg)
Label outputs Continuous-valued outputs
1 2 3
𝜔1𝜔2 𝜔1
x
1 2 3
x
𝜔1 𝜔2
Decision profile
𝑃3(𝜔¿¿2∨𝐱)¿
![Page 15: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/15.jpg)
Ensemble (label outputs, R,G,B)
204 R102 G
54 B
Red
Blue
RedRed
Green Red
Red
Majority vote
![Page 16: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/16.jpg)
Ensemble (label outputs, R,G,B)
200 R219 G190 B
Red
Blue
RedRed
Green Red
Red
Majority vote
Green
WeightedMajority vote
0.05 0.50 0.02 0.10 0.70 0.10
0.270.700.50
![Page 17: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/17.jpg)
Ensemble (label outputs, R,G,B)
Red
Blue
RedRed
Green Red
RBRRGR
Classifier
Green
![Page 18: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/18.jpg)
Ensemble (continuous outputs, [R,G,B])
[0.6 0.3 0.1]
[0.1 0.0 0.6]
[0.7 0.6 0.5]
[0.4 0.3 0.1]
[0 1 0] [0.9 0.7
0.8]
![Page 19: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/19.jpg)
Ensemble (continuous outputs, [R,G,B])
[0.6 0.3 0.1]
[0.1 0.0 0.6]
[0.7 0.6 0.5]
[0.4 0.3 0.1]
[0 1 0] [0.9 0.7
0.8]
Mean R = 0.45
![Page 20: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/20.jpg)
Ensemble (continuous outputs, [R,G,B])
[0.6 0.3 0.1]
[0.1 0.0 0.6]
[0.7 0.6 0.5]
[0.4 0.3 0.1]
[0 1 0] [0.9 0.7
0.8]
Mean R = 0.45
Mean G = 0.48
![Page 21: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/21.jpg)
Ensemble (continuous outputs, [R,G,B])
[0.6 0.3 0.1]
[0.1 0.0 0.6]
[0.7 0.6 0.5]
[0.4 0.3 0.1]
[0 1 0] [0.9 0.7
0.8]
Mean R = 0.45
Mean G = 0.48
Mean B = 0.35
Class GREEN
![Page 22: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/22.jpg)
Ensemble (continuous outputs, [R,G,B])
[0.6 0.3 0.1]
[0.1 0.0 0.6]
[0.7 0.6 0.5]
[0.4 0.3 0.1]
[0 1 0] [0.9 0.7
0.8]
Mean R = 0.45
Mean G = 0.48
Mean B = 0.35
Class GREEN
Decision profile
0.6 0.3 0.1 0.1 0.0
0.6 0.7 0.6
0.5 0.4 0.3
0.1 0 .0 1.0
0.0 0.9 0.7
0.8
![Page 23: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/23.jpg)
Decision profile
0.6 0.3 0.1 0.1 0.0 0.6 0.7 0.6 0.5
0.4 0.3 0.1
0 .0 1.0 0.0
0.9 0.7 0.8
classes
class
ifiers
Support that classifier #4 gives to the hypothesis that the object to classify comes from class #3.
Would be nice if these were probability distributions...
![Page 24: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/24.jpg)
Decision profile
classes
class
ifiers
𝜔1 𝜔2
𝐷1
𝐷2
𝐷3
𝑃3(𝜔¿¿2∨𝐱)¿
𝜔3
…We can take probability outputs from the classifiers
![Page 25: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/25.jpg)
Combination Rules
For label outputs For continuous-valued outputs
𝜔1𝜔2 𝜔1
𝜔1 𝜔2
𝐷1
𝐷2
𝐷3
𝑃3(𝜔¿¿2∨𝐱)¿
Majority (plurality) vote
Weighted majority vote
Naïve Bayes
BKS
A classifier
Simple rules: minimum, maximum, product,average (sum)
c Regressions
A classifier
![Page 26: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/26.jpg)
Combination Rules
For label outputs For continuous-valued outputs
𝐷𝑃=[𝑑 {𝑖 , 𝑗 } , 𝑖=1 ,…𝐿 , 𝑗=1 ,…,𝑐 ]
Majority (plurality) vote
Weighted majority vote
Naïve Bayes
BKS
A classifier
Simple rules: minimum, maximum, product,average (sum)
c Regressions
A classifier
𝑠1 ,𝑠2 ,…, 𝑠𝐿
Decision profile
![Page 27: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/27.jpg)
![Page 28: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/28.jpg)
![Page 29: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/29.jpg)
classifier
feature values(object description)
classifier classifier
class label
combinerclassifier ensemble
![Page 30: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/30.jpg)
classifier
feature values(object description)
classifier classifier
class label
classifierclassifier ensemble
![Page 31: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/31.jpg)
http://samcnitt.tumblr.com/
Bob Duin: The Combining Classifier: to Train or Not to Train?
![Page 32: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/32.jpg)
Tin Ho: “Multiple Classifier Combination: Lessons and Next Steps”, 2002
“Instead of looking for the best set of features and the best classifier, now we look for the best set of classifiers and then the best combination method. One can imagine that very soon we will be looking for the best set of combination methods and then the best way to use them all. If we do not take the chance to review the fundamental problems arising from this challenge, we are bound to be driven into such an infinite recurrence, dragging along more and more complicated combination schemes and theories and gradually losing sight of the original problem.”
![Page 33: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/33.jpg)
Classifier ensembles: Does the combination rule matter?
In a word, yes.
But its merit depends upon• the base classifier model, • the training of the individual classifiers, • the diversity, • the possibility to train the combiner, and more.
Conclusions - 1
![Page 34: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/34.jpg)
1. The choice of the combiner should not be side-lined.
2. The combiner should be chosen in relation to the rest of the ensemble and the available data.
Conclusions - 2
![Page 35: Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f0c5503460f94c1fca4/html5/thumbnails/35.jpg)
Questions to you:
1. What is the future of classifier ensembles? (Are they here to stay or are they a mere phase?)
2. In what direction(s) will they evolve/dissolve?
3. What will be the ‘classifier of the future’? Or the ‘classification paradigm of the future’?
4. And one last question: How can we get a handle of the ever growing scientific literature in each and every area? How can we find the gems among the pile of stones?