state of the art - israel machine vision conference

21
Few-shot learning State of the Art Joseph Shtok IBM Research AI The presentation is available at http:// www.research.ibm.com/haifa/dept/imt/ist_dm.shtml

Upload: others

Post on 16-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: State of the Art - Israel Machine Vision Conference

Few-shot learningState of the Art

Joseph Shtok

IBM Research AI

The presentation is available at http://www.research.ibm.com/haifa/dept/imt/ist_dm.shtml

Page 2: State of the Art - Israel Machine Vision Conference

Problem statementUsing a large annotated offline dataset,

dog

elephant

monkey

Offline training

perform for novel categories,represented by just a few samples each.

knowledge transfer

lemur

rabbit

mongoose

model for novel categories

given task

Online training

…

Page 3: State of the Art - Israel Machine Vision Conference

Problem statementUsing a large annotated offline dataset,

knowledge transfer

classifierfor novel categories

classification

Online training

Offline training

perform for novel categories,represented by just a few samples each.

dog

elephant

monkey

…

lemur

rabbit

mongoose

Page 4: State of the Art - Israel Machine Vision Conference

Problem statementUsing a large annotated offline dataset,

knowledge transfer

detectorfor novel categories

detection

Online training

Offline training

perform for novel categories,represented by just a few samples each.

dog

elephant

monkey

…

lemur

rabbit

mongoose

Page 5: State of the Art - Israel Machine Vision Conference

Problem statementUsing a large annotated offline dataset,

knowledge transfer

regressorfor novel categories

regression

Online training

Offline training

perform for novel categories,represented by just a few samples each.

dog

elephant

monkey

…

lemur

rabbit

mongoose

Page 6: State of the Art - Israel Machine Vision Conference

Why work on few-shot learning?

1. It brings the DL closer to real-world business usecases.β€’ Companies hesitate to spend much time and money on

annotated data for a solution that they may profit.

β€’ Relevant objects are continuously replaced with new ones. DL has to be agile.

2. It involves a bunch of exciting cutting-edge technologies.

Meta-learning methods

Networks generating networks

Data synthesizers

Semantic metric spaces

Graph neural networks

Neural Turing Machines

GANs

Page 7: State of the Art - Israel Machine Vision Conference

Meta-learningLearn a learning strategy

to adjust well to a new few-shot learning task

Data augmentationSynthesize more data from the novel

classes to facilitate the regular learning

Metric learningLearn a `semantic` embedding space using a distance loss function

Few-shot learning

Each category is represented by just a few examples

Learn to perform classification, detection, regression

Page 8: State of the Art - Israel Machine Vision Conference

Training a meta learner to learn

on each task

Meta-Learning

Standard learning: datadatadata instances

training a learner on the data

model

Meta learning: datadatatasks

model

model

learning strategy

data knowledge

task-specific

learner

task-agnostic

specific classestraining datatarget data

task-specific

meta-learner

datadatatask data

meta-learner

New task

Page 9: State of the Art - Israel Machine Vision Conference

Recurrent meta-learnersMatching Networks in Vinyals et.al., NIPS 2016

Distance-based classification: based on similarity between the query and support samples in the embedding space (adaptive metric):

ΰ·œπ‘¦ =

𝑖

π‘Ž ොπ‘₯, π‘₯𝑖 𝑦𝑖 , π‘Ž ොπ‘₯, π‘₯𝑖 = π‘ π‘–π‘šπ‘–π‘™π‘Žπ‘Ÿπ‘–π‘‘π‘¦ 𝑓 ොπ‘₯, 𝑆 , 𝑔 π‘₯𝑖 , 𝑆

𝑓, 𝑔 - LSTM embeddings of π‘₯ dependent on the support set S

β€’ Embedding space is class-agnosticβ€’ LSTM attention mechanism adjusts the embedding to the task

reprinted from Vinyals et.al., 2016

to be elaborated later

Concept of episodes: test conditions in the training.β€’ N new categoriesβ€’ M training examples per categoryβ€’ one query example in {1..N} categories. β€’ Typically, N=5, M=1, 5.

Memory-augmented neural networks (MANN) in Santoro et. al., ICML 2016

β€’ Neural Turing Machine = differentiable MANNβ€’ Learn to predict the distribution 𝑝(𝑦𝑑|π‘₯𝑑, 𝑆1:π‘‘βˆ’1; πœƒ)β€’ Explicitly store the support samples in the external memory

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

Page 10: State of the Art - Israel Machine Vision Conference

OptimizersOptimize the learner to perform well after fine-tuning on the task data done by a single (or few) step(s) of Gradient Descent.

MAML (Model-Agnostic Meta-Learning) Finn et.al., ICML 2017

Standard objective (task-specific, for task T):

minΞΈ

β„’T ΞΈ , learned via update ΞΈβ€² = ΞΈ βˆ’ Ξ± βˆ™ 𝛻θℒT(ΞΈ)

Meta-objective (across tasks):

minΞΈ

ΟƒT~p(β„‘)β„’T ΞΈβ€² , learned via an update ΞΈ ← ΞΈ βˆ’ β𝛻θ ΟƒT~p(β„‘)β„’T ΞΈβ€²

reprinted from Li et.al., 2017

Meta-SGD Li et.al., 2017

β€œInterestingly, the learning process can continue forever, thus enabling life-long learning, and at any moment, the meta-learner can be applied to learn a learner for any new task.”

Meta-SGD Li et.al., 2017

Render Ξ± as a vector of size ΞΈ. Training objective: minimize the loss β„’T ΞΈβ€² with respect to the ΞΈ =weights initialization, Ξ± = update direction and scale, across the tasks.

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Meta-SGD 54.24 / 70.86

Page 11: State of the Art - Israel Machine Vision Conference

Optimizers

LEO Rusu et.al., NeurIPS 2018

Latent Embedding Optimization: take the optimization problem from high-dim. space of weights 𝜽 to a low-dim. space, for robust Meta-Learning.

Learn a generative distribution of model parameters 𝜽, by learning a stochastic latent space with an information bottleneck.

Relation network encodes distances between elements of support set for a new task

πœ‘βˆ— = π‘Žπ‘Ÿπ‘”minπœ‘

𝑇~𝑝 β„‘

ℒ𝑇 πœƒβ€² = π‘”πœ‘π‘‘π‘§β€² , 𝑧′ = 𝑧 βˆ’ 𝛼 βˆ™ 𝛻𝑧ℒ𝑇 πœƒπ‘§ ,

𝑧 = π‘”πœ‘π‘’π‘† , πœƒπ‘§= π‘”πœ‘π‘‘

𝑧

datadatadata instances

stochastic encoder

latent representation 𝒛

parameter generator

model parameters

𝜽

𝝋𝒅𝝋𝒆

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Meta-SGD 54.24 / 70.86

LEO 61.76 / 77.59

Page 12: State of the Art - Israel Machine Vision Conference

Metric Learning

Offline training

datadatadata instances

deep embedding

model

Training: achieve good distributions for offline categoriesInference: Nearest Neightbour in the embedding space

query

semantic embedding space

class 3

class 1

class 2

class C

class A

class B

New task data d(q,A)d(q,C)

d(q,B)

classification

Page 13: State of the Art - Israel Machine Vision Conference

Metric LearningRelation networks, Sung et.al., CVPR 2018

Use the Siamese Networks principle :

β€’ Concatenate embeddings of query and support samples

β€’ Relation module is trained to produces score 1 for correct class and 0 for others

β€’ Extends to zero-shot learning by replacing support embeddings with semantic features.

replicated from Sung et.al., Learning to Compare - Relation network for few-shot learning, CVPR 2018

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Relation networks 50.44 / 65.32

Meta-SGD 54.24 / 70.86

LEO 61.76 / 77.59

Page 14: State of the Art - Israel Machine Vision Conference

Metric Learning

Matching Networks, Vinyals et.al., NIPS 2016

Objective: maximize the cross-entropy for the non-parametric softmax

classifier Οƒ(π‘₯,𝑦) π‘™π‘œπ‘”π‘ƒπœƒ 𝑦 π‘₯, 𝑆 , with

π‘ƒπœƒ 𝑦 π‘₯, 𝑆 = π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯ π‘π‘œπ‘  𝑓 ොπ‘₯, 𝑆 , 𝑔 π‘₯𝑖 , 𝑆

Each category is represented by a single prototype 𝑐𝑖 .

Prototypical Networks, Snell et al, 2016:

Each category is represented by it mean sample (prototype).

Objective: maximize the cross-entropy with the prototypes-based

probability expression:

π‘ƒπœƒ 𝑦 π‘₯, 𝐢 =π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯ βˆ’πΏ2 𝑓 ොπ‘₯ , 𝑓 𝑐𝑖

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Relation networks 50.44 / 65.32

Prototypical Networks

49.42 / 68.20

Meta-SGD 54.24 / 70.86

LEO 61.76 / 77.59

Page 15: State of the Art - Israel Machine Vision Conference

Metric Learning

Large Margin Meta-Learning Wang el.al, 2018

Regularize the cross-entropy (CE) based learning with large-margin objective:

β„’ = ℒ𝐢𝐸 +

π‘‘π‘Ÿπ‘–π‘π‘™π‘’π‘‘π‘ 

|𝑑𝑝 βˆ’ 𝑑𝑛 + 𝛼|+

𝑑𝑛

𝑑𝑝

samplepositive

(hard)-negative

Distances for triplet loss

RepMet: Few-shot detection Karlinsky et.al., CVPR 2019

β€’ Equip a standard detector with a distance-based classifier.

β€’ Introduce class representatives 𝑅𝑖𝑗 as model weights.

β€’ Learn an embedding space using the objective

β„’ = ℒ𝐢𝐸 + min𝑗 𝑑 𝐸, 𝑅𝑖𝑗 βˆ’min𝑗,π‘˜β‰ π‘– 𝑑 𝐸, π‘…π‘˜π‘— + 𝛼+

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Relation networks 50.44 / 65.32

Prototypical Networks

49.42 / 68.20

Large-margin 51.08 / 67.57

Meta-SGD 54.24 / 70.86

LEO 61.76 / 77.59

Page 16: State of the Art - Israel Machine Vision Conference

Sample synthesisOffline stage

datadatadata instances

train a synthesizer sampling from

class distribution

synthesizer model

data knowledge

On new task data

datafew data instances

synthesizer model

novel classes

datadatamany data instances

train a model

taskmodel

datadataoffline data

Page 17: State of the Art - Israel Machine Vision Conference

Synthesizer optimized for classification

Low-Shot Learning from Imaginary DataWang et.al., 2018

β€’ The classifier β„Ž is differentiable w.r.t. the training data

β€’ In each episode, the backpropagated gradient updates the synthesizer

Method ImageNet top5 classification accuracy 1/5 shot

Novel Base+Novel

Wang et.al., 2018 43.3 / 68.4 55.8 / 71.1

β€’ The synthesizer is a part of classifier pipeline, trained end-to end

reprinted from Wang et.al., 2018

Page 18: State of the Art - Israel Machine Vision Conference

More augmentation approachesΞ”-encoder Schwartz et.al., NeurIPS 2018

β€’ Use a variant of autoencoder to capture the intra-class difference between two class samples in the latent space.

β€’ Transfer class distributions from training to novel classes.

𝑓𝑙 𝑔𝐷𝑒𝑐𝑔𝐸𝑛𝑐

Semantic Feature Augmentation in Few-shot Learning, Chen et.al., 2018

β€’ Synthesize samples by adding noise in the autoencoder’s bottleneck space.

β€’ Make it into a semantic space, using word embeddings or visual attributes for the objective’s fidelity term.

Encoder

Decoder

𝑍

Sampledtarget

Sampled reference

Sampled delta

New classreference

Synthesizednew classexample

Synthesis

MethodminiImageNet classification

accuracy 1/5 shot

Matching networks 43.56 / 55.31

MAML 48.70 / 63.11

Relation networks 50.44 / 65.32

Prototypical Networks

49.42 / 68.20

Large-margin 51.08 / 67.57

Meta-SGD 54.24 / 70.86

Semantic Feat. Aug. 58.12 / 76.92

Ξ”-encoder 59.9 / 66.7

LEO 61.76 / 77.59

Page 19: State of the Art - Israel Machine Vision Conference

Augmentation with GANsCovariance-Preserving Adversarial Augmentation Networks Gao et.al., NeurIPS 2018

β€’ Act in in feature space. Generate samples for novel categories from offline categories, selected by proximity of samples.

β€’ Discriminate by classication: real image β†’ correct classsynthetic image β†’ β€˜fake’

novel

Dbase

𝐺𝑛

β€˜cat’

catpuma…fake

β€’ Cycle-consistency constraint:

‒ Preserve the category distribution during transfer of base→ novel category. Objective: penalize the difference between the two covariance matrices viaKy Fan m-norm, i.e., the sum of singular values of m-truncated SVD:

𝐺𝑛 𝐺𝑏

Method ImageNet top5 classification accuracy 1/5 shot

Novel Base+Novel

Wang et.al., 2018 43.3 / 68.4 55.8 / 71.1

Gao et.al., 2018 48.4 / 70.2 58.5 / 73.5

Page 20: State of the Art - Israel Machine Vision Conference

My personal view on the evolution of Machine Learning

Developing ML: continuous life-long learning on various tasks

Classic ML: One dataset, one task, one heavy training

Few-shot ML: Heavy offline training, then easy learning on similar tasks

Page 21: State of the Art - Israel Machine Vision Conference

Thank you

The presentation is available at http://www.research.ibm.com/haifa/dept/imt/ist_dm.shtml