active learning challenge active learning challenge isabelle guyon (clopinet, california) gavin...

27
Active Learning Challenge http://clopinet.com/a Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia, UK) Olivier Chapelle (Yahhoo!, California) Gideon Dror (Academic College of Tel-Aviv-Yaffo, Israel) Vincent Lemaire (Orange, France) Amir Reza Saffari Azar (Graz University of Technology) Alexander Statnikov (New York University, USA)

Upload: irea-forbes

Post on 26-Mar-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Active Learning Challenge Isabelle Guyon (Clopinet, California)

Gavin Cawley (University of East Anglia, UK) Olivier Chapelle (Yahhoo!, California)

Gideon Dror (Academic College of Tel-Aviv-Yaffo, Israel) Vincent Lemaire (Orange, France)

Amir Reza Saffari Azar (Graz University of Technology) Alexander Statnikov (New York University, USA)

Page 2: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

What is the problem?

Page 3: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Labeling data is expensive

$$ $$$$$

Page 4: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Examples of domains

• Chemo-informatics

• Handwriting and speech recognition

• Image processing

• Text processing

• Marketing

• Ecology

• Embryology

Page 5: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

What is active learning?

Page 6: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

What is out there?

Page 7: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Scenarios

Burr Settles. Active Learning Literature Survey. CDTR 1648, Univ. Wisconsin–Madison. 2009.

Page 8: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

“De novo” queries

De novo queries implicitly assume interventions on the system under study: not for this challenge

Page 9: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Focus on “pool-based” AL

• Simplest scenario for a challenge.

Training data: labels can be queried

Test data: unknown labels

• Methods developed for pool-based AL should also be useful for stream-based AL.

Page 10: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Example

(a)Toy 2-class problem, 400 instances Gaussian distributed. (b)Linear logistic regression model trained w. 30 random instances.(c) Linear logistic regression model trained w. 30 actively queried

instances using “uncertainty sampling”.

Accuracy=0.7 Accuracy=0.9

Burr Settles, 2009

Page 11: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Learning curve

Burr Settles, 2009

Page 12: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Other methods

• Expected model change (greatest gradient if sample were used for training)

• Query by committee (query the sample subject to largest disagreement)

• Bayesian active learning (maximize change in revised posterior distribution)

• Expected error reduction (maximize generalization performance improvement)

• Information density (ask for examples both informative and representative)

Burr Settles, 2009

Page 13: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Datasets

Page 14: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Data donors

This project would not have been possible without generous donations of data:• Chemoinformatics -- Charles Bergeron, Kristin Bennett and Curt Breneman (Rensselaer

Polytechnic Institute, New York) contributed a dataset, which will be used for final testing.• Embryology -- Emmanuel Faure, Thierry Savy, Louise Duloquin, Miguel Luengo Oroz, Benoit

Lombardot, Camilo Melani, Paul Bourgine, and Nadine Peyriéras (Institut des systèmes complexes, France) contributed the ZEBRA dataset.

• Handwriting recognition -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the IBN_SINA dataset.

• Marketing -- Vincent Lemaire, Marc Boullé, Fabrice Clérot, Raphael Féraud, Aurélie Le Cam, and Pascal Gouzien (Orange, France) contributed the ORANGE dataset, previously used in the KDD cup 2009.

We also reused data made publicly available on the Internet:• Chemoinformatics -- The National Cancer Institute (USA) for the HIVA dataset. • Ecology -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson (US Forest Service, USA) for

the SYLVA dataset (Forest cover type). • Text processing -- Tom Mitchell (USA) and Ron Bekkerman (Israel) for the NOVA datset (derived

from the Twenty Newsgroups).

Page 15: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Development datasets

Page 16: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Difficulties

• Spase data

• Missing values

• Unbalanced classes

• Categorical variables

• Noisy data

• Large datasets

Page 17: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Final test datasets

• Will serve to do the final ranking

• Will be from the same domains

• May have different data representations and distributions

• No feed-back: the results will not be revealed until the end of the challenge

Page 18: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Protocol

Page 19: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Virtual Lab

Joint work with:

• Constantin Aliferis, New York University

• Gregory F. Cooper, Pittsburg University

• André Elisseeff, Nhumi, Zürich

• Jean-Philippe Pellet, IBM Zürich

• Alexander Statnikov, New York University

• Peter Spirtes, Carnegie Mellon

Virtual cash

Page 20: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Step by step instructions

1. Predict

2. Sample

3. Submit a query

4. Retrieve the labels

Download the data. You get 1 labeled example.

Page 21: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Two phases

• Development phase:– 6 datasets available– Can try as many times as you want– Matlab users can run queries on their computers– Others can use the labels (provided)

• Final test phase:– 6 new datasets available– A single try– No feed-back

Page 22: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Evaluation

Page 23: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

AUC score

For each set of samples queried, we assess the predictions of the learning machine with the Area under the ROC curve.

Page 24: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Area under the Learning Curve (ALC)

Linear interpolation. Horizontal extrapolation.

One query Five queries Thirteen queries

Lazy: ask for all labels at once

Page 25: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Prizes

• 1 dataset: $100• 2 datasets: $200• 3 datasets: $400• 4 datasets: $800• 5 datasets: $1600• 6 datasets: $3200!• Plus travel awards for top ranking students.

If you win on…

Page 26: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Schedule

Page 27: Active Learning Challenge  Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Active Learning Challenge http://clopinet.com/al

Conclusion

Try our new challenge, learn, and win!!!!– Workshops:

• AISTATS 2010, Sardinia, May, 2010

• WCCI 2010 Workshop, Barcelona, July, 2010

• Travel awards for top ranking students.

– Proceedings published by JMLR & IEEE.– Prizes: P(i)=$100 * 2(n-1)

– Your problem solved by dozens of research groups:

• Help us organize the next challenge!