results of the nips 2006 model selection game isabelle guyon, amir saffari, gideon dror, gavin...

39
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php

Upload: josephine-ross

Post on 25-Dec-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

RESULTS OF THE NIPS 2006 MODEL SELECTION GAME

Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

http://www.agnostic.inf.ethz.ch/credits.php

Page 2: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Thanks

Page 3: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Part I

INTRODUCTION

Page 4: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Model selection

• Selecting models (neural net, decision tree, SVM, …)

• Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …)

• Selecting variables or features (space dimensionality reduction.)

• Selecting patterns (data cleaning, data reduction, e.g by clustering.)

Page 5: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Performance prediction challenge

How good are you at predicting

how good you are?

• Practically important in pilot studies.

• Good performance predictions render model selection trivial.

Page 6: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Model Selection Game

Find which model works best in a well controlled environment.

• A given “sandbox”: the CLOP Matlab® toolbox.• Focus only on devising model selection strategy.• Same datasets as the performance prediction

challenge, but “reshuffled”• Two $500 prizes offered.

Page 7: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Agnostic Learning vs. Prior Knowledge challenge

When everything else fails,

ask for additional domain knowledge…

• Two tracks:

– Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features.

– Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.

Page 8: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Game rules

• Date started: October 1st, 2006.• Date ended: December 1st, 2006 • Duration: 3 months.• Submit in Agnostic track only.• Optionally use CLOP or Spider.• Five last complete entries ranked:

– Total ALvsPK challenge entrants: 22.– Total ALvsPK developement entries: 546.– Number of game ranked participants: 10.– Number of game ranked submissions: 39.

Page 9: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Datasets

Dataset Domain Type Feat-ures

Training Examples

Validation Examples

Test Examples

ADA Marketing Dense 48 4147 415 41471

GINA Digits Dense 970 3153 315 31532

HIVADrug discovery

Dense 1617 3845 384 38449

NOVAText classif.

Sparse binary 16969 1754 175 17537

SYLVA Ecology Dense 216 13086 1308 130858

http://www.agnostic.inf.ethz.ch

Page 10: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Baseline BER distribution(Performance prediction challenge, 145 entrants)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

ADA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

GINA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

HIVA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

NOVA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

SYLVA

BERTest BER

Page 11: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Agnostic track on Dec. 1st 2006

ADA GINA HIVA NOVA SYLVA Ave. rank Ave. best Rev-submitRoman_Lutz 1 1 5 1 4 2.4 LogitBoost_with_trees 1Juha_Reunanen 5 2 1 2 6 3.2 cross-indexing-7 1H._Jair_Escalante 7 3 2 3 7 4.4 BRun2311062 5J._Wichard 3 5 4 8 2 4.4 mixed_tree_ensembles 3VladN 6 4 3 5 5 4.6 RS1 3Marc_Boulle 2 7 7 6 1 4.6 SNB(CMA)_+_100k_F(2D)_t 1The_Machine 4 6 6 9 3 5.6 TMK 5weseeare 8 8 8 4 8 7.2 YAT 1pipibjc 9 9 9 7 9 8.6 naiveBayes_Ensemble 1

Yellow: used a CLOP model

• CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER)

• Best ave. BER still held by Reference (Gavin Cawley) with the_bad.

Page 12: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Part II

PROTOCOL and

SCORING

Page 13: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Protocol

• Data split: training/validation/test.• Data proportions: 10/1/100.• Online feed-back on validation data.• Validation label release: not yet; one

month before end of challenge.• Final ranking on test data using the five

last complete submissions for each entrant.

Page 14: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Performance metrics

• Balanced Error Rate (BER): average of error rates of positive class and negative class.

• Area Under the ROC Curve (AUC).

• Guess error (for the performance prediction challenge only):

BER = abs(testBER – guessedBER)

Page 15: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

CLOP

• CLOP=Challenge Learning Object Package.

• Based on the Spider developed at the Max Planck Institute.

• Two basic abstractions:– Data object– Model objecthttp://www.agnostic.inf.ethz.ch/models.php

Page 16: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

CLOP tutorial

D=data(X,Y);hyper = {'degree=3', 'shrinkage=0.1'}; model = kridge(hyper); [resu, model] = train(model, D);tresu = test(model, testD);model = chain({standardize,kridge(hyper)});

At the Matlab prompt:

Page 17: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

CLOP models

Page 18: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Preprocessing and FS

Page 19: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Model grouping

for k=1:10

base_model{k}=chain({standardize, naive});

end

my_model=ensemble(base_model);

Page 20: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Part III

RESULT ANALYSIS

Page 21: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

What did we expect?

• Learn about new competitive machine learning techniques.

• Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice).

• Drive research in the direction of refining such methods (on-going benchmark).

Page 22: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Method comparison (PPC)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.3510

-4

10-3

10-2

10-1

100

BER

Del

ta B

ER

X

TREE

NN/BNNNB

LD/SVM/KLS/GP

SYLVA

GINA

NOVA

ADA

HIVA

BER

Test BER

Agnostic track no significant improvement

so far

Page 23: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

LS-SVM

Gavin Cawley, July 2006

Page 24: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Logitboost

Roman Lutz, July 2006

Page 25: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

CLOP models (best entrant) 

Dataset CLOP models selected

ADA 2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias}

GINA 6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)}

HIVA 3*{norm,svc(degree=1),bias}

NOVA 5*{norm,gentleboost(kridge),bias}

SYLVA 4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias}

 

Juha Reunanen, cross-indexing-7

sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown)

Page 26: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

CLOP models (2nd best entrant) 

Dataset CLOP models selected

ADA {sns, std, norm, neural(units=5), bias}

GINA {norm, svc(degree=5, shrinkage=0.01), bias}

HIVA {std, norm, gentleboost(kridge), bias}

NOVA {norm,gentleboost(neural), bias}

SYLVA {std, norm, neural(units=1), bias}

 

Hugo Jair Escalante Balderas, BRun2311062

sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown)

Note: entry Boosting_1_001_x900 gave better results, but was older.

Page 27: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Danger of overfitting (PPC)

0 20 40 60 80 100 120 140 1600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5B

ER

Time (days)

ADA

GINA

HIVA

NOVA

SYLVA

Full line: test BER

Dashed line: validation BER

Page 28: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Two best CLOP entrants (game)

Time

Ave. test BER

H._Jair_Escalante

Juha Reunanen

Statistically significant difference for 3/5 datasets.

Page 29: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Stats / CV / bounds ???

Page 30: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Top ranking methods

• Performance prediction:– CV with many splits 90% train / 10% validation– Nested CV loops

• Model selection – Performance prediction challenge

• Use of a single model family• Regularized risk / Bayesian priors• Ensemble methods• Nested CV loops, computationally efficient with with VLOO

– Model selection game• Cross-indexing• Particle swarm

Page 31: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Part IV

COMPETE NOW

in the

PRIOR KNOWLEDGE TRACK

Page 32: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

ADA

ADA is the marketing database

• Task: Discover high revenue people from census data. Two-class pb.

• Source: Census bureau, “Adult” database from the UCI machine-learning repository.

• Features: 14 original attributes including age, workclass,  education, education, marital status, occupation, native country. Continuous, binary and categorical features.

•  

Page 33: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

GINA

• Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes.

• Source: MNIST database formatted by LeCun and Cortes.

• Features: 28x28 pixel map.

•  

GINA is the digit database

Page 34: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

HIVA

HIVA is the HIV database

• Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive).

• Data source: National Cancer Inst.• Data representation: The compounds are

represented by their 3d molecular structure.•  

Page 35: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

NOVA

NOVA is the text classification database

• Task: Classify newsgroup emails into politics or religion vs. other topics.

• Source: The 20-Newsgroup dataset from in the UCI machine-learning repository.

• Data representation : The raw text with an estimated 17000 words of vocabulary.

Subject: Re: Goalie masksLines: 21

Tom Barrasso wore a great mask, one time, last season.  He unveiled it at a game in Boston. 

It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, alongwith a steel mill on one side and the Civic Arena on the other.   On the back of the helmet was the old Pens' logo the current (at the time) Penslogo, and a space for the "new" logo.

A great mask done in by a goalie's superstition.

Lori 

Page 36: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

SYLVA

SYLVA is the ecology database

• Task: Classify forest cover types into Ponderosa pine vs. everything else.

• Source: US Forest Service (USFS). • Data representation: Forest cover type for 30 x 30

meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)

•  

Page 37: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

How to enter?

• Enter results on any dataset in either track until March 1st 2007 at http://www.agnostic.inf.ethz.ch.

• Only “complete” entries (on 5 datasets) will be ranked. The 5 last will count.

• Seven prizes:– Best overall agnostic entry.

– Best overall prior knowledge entry.

– Best prior knowledge result in each dataset (5 prizes).

– Best paper.

Page 38: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Conclusions

• Less participation volume as in the previous challenges:– Entry level higher– Other on-going competitions

• Top methods in agnostic track as before– LS-SVMs and boosted logistic trees

• Top ranking entries closely followed by CLOP entries showing great advances in model selection.

• Todo: upgrade CLOP with LS-SVMs and logitboost.

Page 39: RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see

Open problems

Bridge the gap between theory and practice…• What are the best estimators of the variance of CV?• What should k be in k-fold?• Are other cross-validation methods better than k-

fold (e.g bootstrap, 5x2CV)?• Are there better “hybrid” methods?• What search strategies are best?• More than 2 levels of inference?