robust deep learning based on meta-learning

46
Deyu Meng Xi’an Jiaotong University [email protected] http://gr.xjtu.edu.cn/web/dymeng Robust Deep Learning Based on Meta-learning

Upload: others

Post on 25-Dec-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Deep Learning Based on Meta-learning

Deyu Meng

Xi’an Jiaotong [email protected]

http://gr.xjtu.edu.cn/web/dymeng

Robust Deep Learning Based on Meta-learning

Page 2: Robust Deep Learning Based on Meta-learning

• Deep Learning

• Robust

• Meta-learning

Page 3: Robust Deep Learning Based on Meta-learning

LFW

The Success of Deep Learning Relies on

well-annotated & big data sets

Page 4: Robust Deep Learning Based on Meta-learning

What we think

we have:But what we really

have is always:

Page 5: Robust Deep Learning Based on Meta-learning

Commonly Encountered Data Bias (low quality data)

Label noise Data noise Class imbalance

Page 6: Robust Deep Learning Based on Meta-learning

• Deep Learning

• Robust

• Meta-learning

Page 7: Robust Deep Learning Based on Meta-learning

Robust Machine Learning for Data Bias

Design specific optimization objective (especially, robust loss)

to make it robust to certain data bias:

Label noise Data noiseClass imbalance

Lin, et al., TPAMI, 2018 Yong, et al., TPAMI, 2018Meng, et al., Information Sciences, 2017

Page 8: Robust Deep Learning Based on Meta-learning

Two Critical Issues

Generalized Cross Entropy

Symmetric Cross Entropy

Bi-Tempered logistic Loss

Polynomial SoftWeighting loss

Focal loss

CT loss

Lin, et al., TPAMI, 2018

Xie, et al., TMI, 2018

Zhao, et al., AAAI, 2015

Amid, et al., NeurIPS, 2019

Wang, et al., ICCV, 2019

Zhang, et al., NeurIPS, 2018

Hyperparameter Tunning

Non-convexity

Page 9: Robust Deep Learning Based on Meta-learning

• Deep Learning

• Robust

• Meta-learning

Page 10: Robust Deep Learning Based on Meta-learning

Training Data VS Validation Data

Hyper-parameter tuning: by validation data

Training loss Validation loss

≈ argminΘ∈{Θ1,Θ2,⋯,Θ𝑠}

1

𝑀

𝑖=1

𝑀

𝐿𝑖𝑚(𝒘∗(Θ))

Page 11: Robust Deep Learning Based on Meta-learning

Training Data VS Validation Data

Hyper-parameter tuning: by validation data

Training loss Validation loss

✓ Low efficiency✓ Low accuracy✓ Search instead of optimization✓ Heuristic instead of intelligent

≈ argminΘ∈{Θ1,Θ2,⋯,Θ𝑠}

1

𝑀

𝑖=1

𝑀

𝐿𝑖𝑚(𝒘∗(Θ))

Page 12: Robust Deep Learning Based on Meta-learning

• The function of validation data is higher than training data➢Hyper-parameter tuning VS classifier parameter learning➢Make the model adaptable to data fit (general to specific)

• Validation data is different from training data!➢Teacher vs. student➢ Ideal vs. real➢High quality vs. low quality➢ Small scale vs. large scale➢ Fixed vs. dynamic (relatively)

• What we should do?➢ Lower the threshold of training data collection; higher the threshold of validation

data selection

Intrinsic Functions of Validation Data

Page 13: Robust Deep Learning Based on Meta-learning

✓ Optimization instead of search

✓ Intelligent instead of heuristic (partially)

From Validation Loss Searching to Meta Loss Training

Hyper-parameter tuning: by meta data

Training loss Meta loss

= argminΘ∈𝒢

1

𝑀

𝑖=1

𝑀

𝐿𝑖𝑚(𝒘∗(Θ))

Page 14: Robust Deep Learning Based on Meta-learning

Many Recent Attempts

◆ Loss function.

Wu L, Tian F, Xia Y, et al. Learning to teach with dynamic loss functions. In NeurIPS, 2018: 6466-6477.Huang C, Zhai S, Talbott W, et al. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In ICML, 2019: 2891-2900.Xu H, Zhang H, Hu Z, et al. AutoLoss: Learning Discrete Schedule for Alternate Optimization. In ICLR, 2019.Li C, Yuan X, Lin C, et al. AM-LFS: AutoML for Loss Function Search. In ICCV, 2019: 8410-8419.Grabocka J, Scholz R, Schmidt-Thieme L. Learning Surrogate Losses[J]. arXiv preprint arXiv:1905.10108, 2019.

◆ Regularization.

Feng J, Simon N. Gradient-based regularization parameter selection for problems with nonsmooth penalty functions[J]. Journal of Computational and Graphical Statistics, 2018, 27(2): 426-435.Frecon J, Salzo S, Pontil M. Bilevel learning of the group lasso structure. In NeurIPS 2018: 8301-8311.Streeter M. Learning Optimal Linear Regularizers. In ICML. 2019: 5996-6004.

◆ learner (NAS).

Zoph B, Le Q V. Neural architecture search with reinforcement learning. In ICLR, 2017.Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning. In ICLR, 2017.Pham H, Guan M, Zoph B, et al. Efficient Neural Architecture Search via Parameter Sharing. ICML. 2018: 4092-4101.Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition. In CVPR, 2018: 8697-8710.Liu H, Simonyan K, Yang Y. Darts: Differentiable architecture search. In ICLR, 2019.Xie S, Zheng H, Liu C, et al. SNAS: stochastic neural architecture search. In ICLR, 2019.Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In ECCV, 2018: 19-34.

Page 15: Robust Deep Learning Based on Meta-learning

Many Recent Attempts

◆ Hyper-parameters learning.

Maclaurin D, Duvenaud D, Adams R. Gradient-based hyperparameter optimization through reversible learning. In ICML, 2015: 2113-2122.Pedregosa F. Hyperparameter optimization with approximate gradient. In ICML, 2016: 737-746.Luketina J, Berglund M, Greff K, et al. Scalable gradient-based tuning of continuous regularization hyperparameters. In ICML. 2016: 2952-2960.Franceschi L, Donini M, Frasconi P, et al. Forward and reverse gradient-based hyperparameter optimization. In ICML, 2017: 1165-1173.Franceschi L, Frasconi P, Salzo S, et al. Bilevel Programming for Hyperparameter Optimization and Meta-Learning. In ICML, 2018: 1563-1572.

◆ Gradients and learning rate. Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent. In NeurIPS, 2016.Baydin A G, Cornish R, Rubio D M, et al. Online learning rate adaptation with hypergradient descent. In ICLR, 2018.Jacobsen A, Schlegel M, Linke C, et al. Meta-descent for Online, Continual Prediction. In AAAI. 2019.Metz L,, et al. Understanding and correcting pathologies in the training of learned optimizers. In ICML,2019:4556-4565.Xu Z, Dai A M, Kemp J, et al. Learning an Adaptive Learning Rate Schedule. arXiv preprint arXiv:1909.09712, 2019.

◆ Sample reweighing.

Jiang L, Zhou Z, Leung T, et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels. In ICML, 2018: 2309-2318.Ren M, Zeng W, Yang B, et al. Learning to Reweight Examples for Robust Deep Learning. In ICML, 2018: 4331-4340.Shu J, Xie Q, Yi L, et al. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. In NeurIPS, 2019.Zhao S, Fard M M, Narasimhan H, et al. Metric-Optimized Example Weights. In ICML 2019: 7533-7542.

Page 16: Robust Deep Learning Based on Meta-learning

• Deep Learning

• Robust

• Meta-learning

Page 17: Robust Deep Learning Based on Meta-learning

Generalized Cross Entropy

Symmetric Cross Entropy

Bi-Tempered logistic Loss

Polynomial SoftWeighting loss

Zhao, et al., AAAI, 2015

Amid, et al., NeurIPS, 2019

Wang, et al., ICCV, 2019

Zhang, et al., NeurIPS, 2018

Adaptively Learning the Robust Loss

Page 18: Robust Deep Learning Based on Meta-learning

Training loss Meta loss

Hyperparameter Learning by Meta Learning

Shu, et al., submitted, 2019

Page 19: Robust Deep Learning Based on Meta-learning

Experimental Results

Shu, et al., submitted, 2019

Page 20: Robust Deep Learning Based on Meta-learning

Experimental Results

✓ The hyper-parameter adaptively learned by meta-learning actually not the optimal one for the original loss, with fixed hyper-parameter throughout its iteration.

✓Meta learning adaptively finds a proper hyper-parameter and simultaneously explores a good initialization network parameter under its current hyper-parameter in a dynamical way.

✓ Such adaptive learning manner should be more suitable for simultaneously obtain optimal values for both of them rather than only updating one under the other fixed.

Shu, et al., submitted, 2019

Page 21: Robust Deep Learning Based on Meta-learning

When Model Contains Large Amount of Hyperparameters?

➢ Overfitting issue easily occurs (similar to conventional machine learning)➢ How to alleviate this issue?➢ Build parametric prior representation (neither too large nor too small) for

hyperparameters (similar to conventional machine learning)➢ Learner VS meta-learner➢ Need to deeply understand the data as well as the learning problem!

✓ Multi-view learning, multi-task learning (parameter - similar)

✓ Subspace learning (matrix – low rank)

Training loss Meta loss

Page 22: Robust Deep Learning Based on Meta-learning

When Model Contains Large Amount of Hyperparameters?

Page 23: Robust Deep Learning Based on Meta-learning

• Deep Learning

• Robust

• Meta-learning

Page 24: Robust Deep Learning Based on Meta-learning

Deep Learning with Training Data Bias

Problem: big data often come with noisy labels or class imbalance.

Page 25: Robust Deep Learning Based on Meta-learning

Deep Networks tend to overfit to Training Data!

Deep neural networks easily fit(memorizing) random labels.

Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. ICLR 2017. best paper

Zhang et al. (2017) found that:

Page 26: Robust Deep Learning Based on Meta-learning

How to robustly train deep networks on training data bias to improve the generalization performance?

Page 27: Robust Deep Learning Based on Meta-learning

Related work: Learning with Training Data Bias◆ Sample weighting methods

✓dataset resampling(Chawla et al., 2002) ✓instance re-weight (Zadrozny, 2004)✓AdaBoost method (Freund & Schapire, 1997)✓Hard example mining (Malisiewicz et al., 2011)✓focal loss (Lin et al., 2018)✓self-paced learning (Kumar et al., 2010)✓Iterative reweighting strategy (Fernando & Mkchael, 2003; Zhang & Sabuncu, 2018)✓prediction variance method (Chang et al., 2017)

◆Meta learning methods✓FWL (Dehghani et al.,2018)✓learning to teach (Fan et al., 2018; Wu et al., 2018)✓MentorNet (Jiang et al., 2018)✓L2RW (Ren et al., 2018)

◆Other methods✓GLC (Hendrycks et al., 2018)✓Reed (Reed et al., 2015)✓Co-teaching (Han et al., 2018)✓D2L (Ma et al.,2018)✓S-Model (Goldberger & Ben-Reuven, 2017)

Page 28: Robust Deep Learning Based on Meta-learning

Sample weighting methods

Existing studies define a curriculum as a function(hand-design) for specific tasks and extra hyper-parameter setting.

Strategy Regularzer 𝑮 Weight 𝒗∗

Self-paced [Kumar et al. NIPS 2010] − 𝒗 1 𝒗∗ = 𝕀(𝒍𝒊 ≤ 𝝀)

Linear weighting [Jiang et al. AAAI 2015]𝟏

𝟐

𝒊=𝟏

𝒏

(𝒗𝒊𝟐 − 𝟐𝒗𝒊) 𝒗∗ = 𝐦𝐚𝐱 (𝟎, 𝟏 −

𝟏

𝝀𝒍𝒊)

Focal Loss [Lin et al., ICCV 2017] − 𝒗∗ = 𝟏 − 𝒆𝒙𝒑 −𝒍𝒊𝜶

Hard example mining [Malisiewicz et al., ICCV 2011] − 𝒗∗ = 𝕀(𝒍𝒊 > 𝝀(𝟏 − 𝒚𝒊))

Prediction variance [Chang et al., NIPS 2017] − 𝒗∗ =𝟏

𝒁𝑽𝒂𝒓 𝒍𝒊 +

𝑽𝒂𝒓(𝒍𝒊)

|𝒍𝒊|

Page 29: Robust Deep Learning Based on Meta-learning

Strategy Regularzer 𝑮 Weight 𝒗∗

Self-paced [Kumar et al. NIPS 2010] − 𝒗 1 𝒗∗ = 𝕀(𝒍𝒊 ≤ 𝝀)

Linear weighting [Jiang et al. AAAI 2015]𝟏

𝟐

𝒊=𝟏

𝒏

(𝒗𝒊𝟐 − 𝟐𝒗𝒊) 𝒗∗ = 𝐦𝐚𝐱 (𝟎, 𝟏 −

𝟏

𝝀𝒍𝒊)

Focal Loss [Lin et al., ICCV 2017] − 𝒗∗ = 𝟏 − 𝒆𝒙𝒑 −𝒍𝒊𝜶

Hard example mining [Malisiewicz et al., ICCV 2011]

− 𝒗∗ = 𝕀(𝒍𝒊 > 𝝀(𝟏 − 𝒚𝒊))

Prediction variance [Chang et al., NIPS 2017] − 𝒗∗ =𝟏

𝒁𝑽𝒂𝒓 𝒍𝒊 +

𝑽𝒂𝒓(𝒍𝒊)

|𝒍𝒊|

⚫ Need to pre-specify the form of weighting function

⚫ Need to manually set hyper-parameters

Sample weighting methods

Page 30: Robust Deep Learning Based on Meta-learning

Meta Data and Meta Loss

Meta DataTraining Data

Page 31: Robust Deep Learning Based on Meta-learning

L2RW [Ren et al., ICML 2018]

Directly learning weights from training and meta data

Page 32: Robust Deep Learning Based on Meta-learning

Meta Data and Meta Loss

Meta DataTraining Data

Training Loss

Input Structure

Meta Loss

Page 33: Robust Deep Learning Based on Meta-learning

MentorNet [Jiang et al., ICML 2018]

The meta-learner is complex, hard to be reproduced.

Very Complex InputVery Complex Theta

Page 34: Robust Deep Learning Based on Meta-learning

Our work

Meta-Weight-Net

Input: LossTheta: MLP

Page 35: Robust Deep Learning Based on Meta-learning

Our work

Inner loop:

Outer loop:

Notation:

◆ Θ: Parameters of teacher◆ 𝑤: Parameters of student

Meta-Weight-NetShu, et al., NeurIPS, 2019

Page 36: Robust Deep Learning Based on Meta-learning

Our work

Step 5:

Step 6:

Step 7:

Shu, et al., NeurIPS, 2019

Page 37: Robust Deep Learning Based on Meta-learning

Our work

Shu, et al., NeurIPS, 2019

Page 38: Robust Deep Learning Based on Meta-learning

Experiments

Page 39: Robust Deep Learning Based on Meta-learning

Experimental Setup: Class Imbalance

Datasets: CIFAR-10 & CIFAR-100

Shu, et al., NeurIPS, 2019

Page 40: Robust Deep Learning Based on Meta-learning

Experimental Setup: Noisy Label

Datasets: CIFAR-10 & CIFAR-100

Shu, et al., NeurIPS, 2019

Page 41: Robust Deep Learning Based on Meta-learning

Stable analysis of Meta-Weight-Net

Shu, et al., NeurIPS, 2019

Page 42: Robust Deep Learning Based on Meta-learning

Real Data Experiment

Shu, et al., NeurIPS, 2019

Page 43: Robust Deep Learning Based on Meta-learning

Insight: Adaptively Learn the Weight Function

Shu, et al., NeurIPS, 2019

Page 44: Robust Deep Learning Based on Meta-learning

Future research

◆Extension to other semi/weakly-supervised learning problems

◆More amelioration to the Meta-Weight-Net

◆Multi-view learning, ensemble learning, domain adaptation

◆General hyper-parameter learning (meta-learner designing)

Page 45: Robust Deep Learning Based on Meta-learning

Jun Shu, Qian Zhao, Keyu Chen, Zongben Xu, Deyu Meng. Learning Adaptive Loss for Robust Learning with Noisy Labels. arXiv:2002.06482, 2020.

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, ZongbenXu, Deyu Meng. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. NeurIPS, 2019.

Page 46: Robust Deep Learning Based on Meta-learning