learning to warm-start bayesian hyperparameter...

13
1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive Ensemble of Meta-Learners for Few-Shot Classification Jungtaek Kim ([email protected]) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77 Cheongam-ro, Nam-gu, Pohang 37673, Gyeongsangbuk-do, Republic of Korea September 11, 2018

Upload: others

Post on 02-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

1/13

Learning to Warm-StartBayesian Hyperparameter Optimization

and Task-Adaptive Ensemble of Meta-Learnersfor Few-Shot Classification

Jungtaek Kim ([email protected])

Machine Learning Group,Department of Computer Science and Engineering, POSTECH,

77 Cheongam-ro, Nam-gu, Pohang 37673,Gyeongsangbuk-do, Republic of Korea

September 11, 2018

Page 2: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

2/13

Table of Contents

Learning to Warm-Start Bayesian Hyperparameter OptimizationMotivationMain ArchitectureExperiments

Task-Adaptive Ensemble of Meta-Learners for Few-Shot ClassificationMotivationMain ArchitectureExperiments

Page 3: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

3/13

Learning to Warm-Start BayesianHyperparameter Optimization

Page 4: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

4/13

Motivation

I Bayesian hyperparameter optimization usually starts fromrandom initial points.

I Better initializations might help to speed up Bayesianhyperparameter optimization.

I Mappings from hyperparameters to validation error are able tobe trained.

I We attempt to transfer prior knowledge about initializationsto new task.

Page 5: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

5/13

Main Architecture

All weights are shared.

Meta-featureextractor

Meta-featureextractor

Dataset

Deep featureextractor

Dataset

Deep featureextractor

Meta-feature distance

fc layerfc layer

Page 6: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

6/13

Experiments (EI)

0 5 10 15 20Iteration

0.76

0.78

0.80

0.82

Min

imum

valid

atio

ner

ror

(a) AwA2

0 5 10 15 20Iteration

0.56

0.58

0.60

0.62

Min

imum

valid

atio

ner

ror

(b) Caltech-101

0 5 10 15 20Iteration

0.84

0.85

0.86

0.87

0.88

Min

imum

valid

atio

ner

ror

(c) Caltech-256

0 5 10 15 20Iteration

0.30

0.35

0.40

0.45

Min

imum

valid

atio

ner

ror

(d) CIFAR-10

0 5 10 15 20Iteration

0.700

0.725

0.750

0.775

0.800

0.825

Min

imum

valid

atio

ner

ror

(e) CIFAR-100

0 5 10 15 20Iteration

0.960

0.965

0.970

0.975

Min

imum

valid

atio

ner

ror

(f) CUB200-2011

0 5 10 15 20Iteration

0.012

0.014

0.016

0.018

0.020

Min

imum

valid

atio

ner

ror

(g) MNIST

0 5 10 15 20Iteration

0.70

0.71

0.72

0.73

0.74

0.75

0.76

Min

imum

valid

atio

ner

ror

(h) VOC2012

Random init. (Uniform)

Random init. (Latin)

Random init. (Halton)

Nearest best init. (ADF)

Nearest best init. (Bi-LSTM)

Page 7: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

7/13

Experiments (UCB)

0 5 10 15 20Iteration

0.76

0.78

0.80

0.82

Min

imum

valid

atio

ner

ror

(j) AwA2

0 5 10 15 20Iteration

0.56

0.58

0.60

0.62

Min

imum

valid

atio

ner

ror

(k) Caltech-101

0 5 10 15 20Iteration

0.84

0.85

0.86

0.87

0.88

Min

imum

valid

atio

ner

ror

(l) Caltech-256

0 5 10 15 20Iteration

0.30

0.35

0.40

0.45

Min

imum

valid

atio

ner

ror

(m) CIFAR-10

0 5 10 15 20Iteration

0.700

0.725

0.750

0.775

0.800

0.825

Min

imum

valid

atio

ner

ror

(n) CIFAR-100

0 5 10 15 20Iteration

0.960

0.965

0.970

0.975

Min

imum

valid

atio

ner

ror

(o) CUB200-2011

0 5 10 15 20Iteration

0.012

0.014

0.016

0.018

0.020

Min

imum

valid

atio

ner

ror

(p) MNIST

0 5 10 15 20Iteration

0.70

0.72

0.74

0.76

Min

imum

valid

atio

ner

ror

(q) VOC2012

Random init. (Uniform)

Random init. (Latin)

Random init. (Halton)

Nearest best init. (ADF)

Nearest best init. (Bi-LSTM)

Page 8: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

8/13

Task-Adaptive Ensemble ofMeta-Learners for Few-Shot

Classification

Page 9: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

9/13

Motivation

Page 10: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

10/13

Motivation

I Few-shot classification needs to generalize training episodesand outperform in test episodes.

I Domain distribution of meta-learner for few-shot classificationis assumed not to be changed.

I In practice, domain distribution is able to be varied.

I We try to make ensemble of several meta-learners, each ofwhich is trained by the episodes from single dataset.

Page 11: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

11/13

Main Architecture

Page 12: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

12/13

Experiments

Page 13: Learning to Warm-Start Bayesian Hyperparameter ...mlg.postech.ac.kr/~jtkim/shared/reading_group/slides...1/13 Learning to Warm-Start Bayesian Hyperparameter Optimization and Task-Adaptive

13/13

Experiments