[email protected] mayank singh arxiv:2005.01499v1 [cs.cv ... · mayank singh adobe inc, noida,...

6
Published as a conference paper at ICLR 2020 O N THE B ENEFITS OF MODELS WITH P ERCEPTUALLY - A LIGNED G RADIENTS Gunjan Aggarwal * Adobe Inc, Noida, India [email protected] Abhishek Sinha *† Stanford University [email protected] Nupur Kumari * Adobe Inc, Noida, India [email protected] Mayank Singh * Adobe Inc, Noida, India [email protected] ABSTRACT Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [Tsipras et al. (2018)], such robust models inherit useful interpretable properties where the gradient aligns per- ceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show that ad- versarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks. 1 I NTRODUCTION Adversarial examples are crafted by adding visually imperceptible perturbation to clean inputs that leads to a significant change in the network’s prediction [Szegedy et al. (2014); Goodfellow et al. (2015)]. Adversarially trained classifiers are optimized using min-max objective to achieve high accuracy on adversarially-perturbed training examples [Madry et al. (2018b)]. Recently, [Tsipras et al. (2018); Santurkar et al. (2019)] demonstrated that adversarial trained robust networks have gradients that are perceptually-aligned such that updating an image in the direction of the gradient to maximize the score of target class changes the image to posses visual features of the target class. Also, [Kaur et al. (2019)] extended the perceptually meaningful property to randomized smoothing robust models and argued that the perceptually-aligned gradients might be a general property of robust models. In this paper, we show that even adversarially trained models on low max-perturbation bound have perceptually-aligned interpretable features. Although these models don’t show high adversarial ro- bustness but have comparable test accuracy to standard trained models unlike adversarially trained models on high max-perturbation bound. We explore the advantages of these interpretable features on downstream tasks and show improvements on zero-shot and weakly supervised localization tasks. 2 EXPERIMENTS AND RESULTS In this section, we start with a brief description of the datasets and the model architecture used, following up with details of the experiment. * Authors contributed equally work done while author was working at Adobe, India 1 arXiv:2005.01499v1 [cs.CV] 4 May 2020

Upload: others

Post on 18-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

ON THE BENEFITS OF MODELS WITH PERCEPTUALLY-ALIGNED GRADIENTS

Gunjan AggarwallowastAdobe Inc Noida Indiaguaggarwadobecom

Abhishek SinhalowastdaggerStanford Universitya7b23stanfordedu

Nupur KumarilowastAdobe Inc Noida Indianupkumaradobecom

Mayank SinghlowastAdobe Inc Noida Indiamsinghadobecom

ABSTRACT

Adversarial robust models have been shown to learn more robust and interpretablefeatures than standard trained models As shown in [Tsipras et al (2018)] suchrobust models inherit useful interpretable properties where the gradient aligns per-ceptually well with images and adding a large targeted adversarial perturbationleads to an image resembling the target class We perform experiments to showthat interpretable and perceptually aligned gradients are present even in modelsthat do not show high robustness to adversarial attacks Specifically we performadversarial training with attack for different max-perturbation bound Adversarialtraining with low max-perturbation bound results in models that have interpretablefeatures with only slight drop in performance over clean samples In this paper weleverage models with interpretable perceptually-aligned features and show that ad-versarial training with low max-perturbation bound can improve the performanceof models for zero-shot and weakly supervised localization tasks

1 INTRODUCTION

Adversarial examples are crafted by adding visually imperceptible perturbation to clean inputs thatleads to a significant change in the networkrsquos prediction [Szegedy et al (2014) Goodfellow et al(2015)] Adversarially trained classifiers are optimized using min-max objective to achieve highaccuracy on adversarially-perturbed training examples [Madry et al (2018b)] Recently [Tsipraset al (2018) Santurkar et al (2019)] demonstrated that adversarial trained robust networks havegradients that are perceptually-aligned such that updating an image in the direction of the gradientto maximize the score of target class changes the image to posses visual features of the target classAlso [Kaur et al (2019)] extended the perceptually meaningful property to randomized smoothingrobust models and argued that the perceptually-aligned gradients might be a general property ofrobust models

In this paper we show that even adversarially trained models on low max-perturbation bound haveperceptually-aligned interpretable features Although these models donrsquot show high adversarial ro-bustness but have comparable test accuracy to standard trained models unlike adversarially trainedmodels on high max-perturbation bound We explore the advantages of these interpretable featureson downstream tasks and show improvements on zero-shot and weakly supervised localization tasks

2 EXPERIMENTS AND RESULTS

In this section we start with a brief description of the datasets and the model architecture usedfollowing up with details of the experimentlowastAuthors contributed equallydaggerwork done while author was working at Adobe India

1

arX

iv2

005

0149

9v1

[cs

CV

] 4

May

202

0

Published as a conference paper at ICLR 2020

Model Adversarial Accuracy (10 step PGD attack)ε = 002550 ε = 402550 ε = 802550

Natural 9088 001 000AT-1 9095 3433 453AT-2 9088 5076 1441AT-4 8979 6128 2730AT-8 8725 7399 4737

Table 1 Adversarial Robustness of different models on CIFAR-10

Figure 1 Visualization of image level gradients for CIFAR-10 and SVHN dataset Column 1 denotes theoriginal image Columns 2-7 denote the gradient from Natural AT-1 AT-2 AT-4 AT-8 models respectively

Datasets We perform experiments on five standard datasets USPS [Hull (1994)] MNIST [Le-cun et al (1989)] CIFAR-10[Krizhevsky et al (2010)] SVHN[Netzer et al (2011)] and RestrictedImagenet[Tsipras et al (2018)] MNIST consists of grayscale handwritten images of digits of size28 lowast 28 We use the network architecture as described in Madry et al (2018a) USPS like MNISThas handwritten digits for 10 different classes However the image size is 16 times 16 MNIST andUSPS are subject to different data distributions and thus are often used to check zero-shot or transferlearning performance of models CIFAR-10 consists of RGB images of size 32times 32 for 10 differentclasses SVHN contains RGB images of streetview house numbers for 10 different classes Re-stricted ImageNet consists of a subset of ImageNet classes which have been grouped into 9 differentclasses that has RGB images of dimension 224 times 224 We perform experiments over WRN-28-10 model [Zagoruyko amp Komodakis (2016)] for SVHN CIFAR-10 and over ResNet50 [He et al(2016)] for Restricted ImageNet dataset

Methodology We perform adversarial training (AT) using either FSGM [Goodfellow et al (2015)Wong et al (2020)] or PGD [Madry et al (2018b)] We found similar results on the mentioneddownstream tasks for FGSM and PGD adversarially trained models

Qualitative evaluation We perform adversarial training (AT) for CIFAR-10 and SVHN overWRN-28-10 model under linfin norm constraint for a range of maximum perturbation bound(ε) Dif-ferent models were trained for ε isin [102550 202550 402550 802550] with a batch size of128 for 30 000 and 5 000 steps for CIFAR-10 and SVHN respectively Let us denote the adversar-ially trained model with linfin perturbation bound(ε) as AT-ε and standard trained models as Naturalthat trains using the clean samples only

The robustness of different models against a strong adversarial attack PGD attack for various valuesof ε over CIFAR-10 dataset have been mentioned in table 1 We find that models trained againstsmall ε adversarial perturbations do not show high robustness against strong adversarial attacks

The image level gradients for all the different models as done in Tsipras et al (2018) have beenvisualized in figure 1 As can be seen from the figure that while the gradients of the natural model

2

Published as a conference paper at ICLR 2020

Figure 2 High ε adversarial images Column 1 denotes the original image Columns 2-7 denote high εadversarial images from Natural AT-1 AT-2 AT-4 AT-8 models respectively

Figure 3 Example images generated by large ε untargeted l2 attack using ResNet-50 model trained by adver-sarial training with ε = 2 on Restricted ImageNet Top row denotes the original images while the bottom rowdenotes the adversarial images

looks like a noisy version of the original image the gradients of the model trained via adversarialtraining for different ε align well with the original image This is valid even for the adversarialtrained models with low value of ε

We also perform non-targeted adversarial attack at a high value of ε = 3202550 We use PGDattack with a learning rate of 162550 and step size of 50 to compute the attack Again we findthat while such an attack induces random noise to the Natural model for the other models suchadversarial images often resemble those belonging to different classes The results are shown infigure 2 We can see cat changing to dog frog changing to deer and truck changing to car forthe different models over the CIFAR-10 dataset We repeat this experiment over the RestrictedImageNet dataset too Figure 3 shows some of the images generated from a model trained viaadversarial training for ε = 2 using ResNet50 model

Interestingly both of the above phenomenons were shown to be property of adversarial robust modelsin Tsipras et al (2018) In contrast we show that not so robust models trained via adversarialtraining for small values of ε also exhibit such properties

Zero shot learning Models that seem to learn more interpretable features should also performbetter against shifts in data distributions We show the performance of adversarial trained modelsfor two zero-shot tasks SVHNrarrMNIST and MNISTrarrUSPS For both of these tasks the modelwas trained over the source dataset and evaluated over the validation set of both source and targetdatasets We define sourceAcc and targetAcc as the validation accuracies over the source and targetdatasets respectively

SVHNrarrMNIST We perform adversarial training for ε isin [102550 202550 402550802550] over the SVHN dataset Again AT-1 AT-2 AT-4 AT-8 refer to the corresponding ad-versarial trained models As a baseline we also perform training of models with gaussian noise ofvarying strength added to the inputs as done in [Kaur et al (2019)] The corresponding models are

3

Published as a conference paper at ICLR 2020

Model sourceAcc targetAccNatural 9472 5398

AT-1N-1 95409536 75997465AT-2N-2 94739493 75737214AT-4N-4 93319421 77267430AT-8N-8 90029068 69255820

Table 2 Zero-shot accuracy for SVHNrarrMNIST

Model sourceAcc targetAccNatural 9917 7514

AT-005N-005 99379871 81818032AT-01N-01 9923991 82418181AT-02N-02 9904992 77687992AT-03N-03 9849919 73998122

Table 3 Zero-shot accuracy for MNISTrarrUSPS

Figure 4 Examples of estimated bounding box and heatmap by adversarially ResNet50 model on randomlychosen images of CUB dataset Red bounding box is ground truth and green bounding box corresponds to theestimated box

referred to as N-1 N-2 N-4 N-8 The source and target validation accuracies have been mentionedin table 2

MNISTrarrUSPS We perform adversarial training for ε isin [005010203] over the MNIST datasetand then evaluate the performance of the trained model over the USPS dataset The results havebeen mentioned in table 3

For both zero-shot tasks we find that the adversarial trained models for low ε often achieve higherperformance over the target dataset than the naturally trained models Further the performance ofAT models for low ε often exceeds the AT models trained on high ε

Weakly Supervised Object Localization (WSOL) aims to localize the object in the given imagethrough weak supervisory signals like image labels instead of rich annotation of bounding boxesMost of the prior techniques (Choe amp Shim 2019 Zhang et al 2018ab) exploit CAM (Zhou et al2016) image interpretation technique for predicting the bounding box of the object using a trainedmodel We hypothesize that models having perceptually aligned gradients with respect to imageand high test accuracy should prove useful in this domain We show empirically that adversariallytrained model with low max-perturbation bound having perceptually aligned gradients performscomparable to the state-of-the-art in WSOL on CUB (Wah et al 2011) dataset in table 4 Wereport on evaluation metrics used in Choe amp Shim (2019) ie Top-1 classification accuracy (Top-1 Acc) Localization accuracy when ground truth is known (GT-Known Loc) ie when intersectionover union (IoU) of estimated box and ground truth bounding boxgt05 Top-1 localization accuracyie when prediction is correct and IoU of bounding boxgt05 (Top-1 Loc) We show some sampleimages and estimated bounding boxes in figure 4

Model Method WSOL Top-1 AccGT-Known Loc Top-1 Loc

ResNet50-SE ADL(Choe amp Shim 2019) - 6229 8034

ResNet50

Natural 6037 500 8112AT-05 7793 6103 7869AT-1 7455 5589 7588AT-2 6993 5010 7002

Table 4 Weakly Supervised Localization on CUB dataset Bold text refers to the best Top-1 Loc and underlinedenotes the second best

4

Published as a conference paper at ICLR 2020

3 CONCLUSION

We observe that adversarial training on low values of ε also induces features that are perceptually-aligned unlike standard trained model While these models do not show strong robustness againstadversarial attacks but performing adversarial attack of large magnitude often generates an imagebelonging to a different class We further show that such models outperform models trained naturallyand AT models with high value of ε on two different zero-shot tasks and a weakly supervisedlocalization task We hope that our findings will spur research aimed at harnessing the perceptually-aligned model features to further generalize to additional downstream tasks

REFERENCES

Junsuk Choe and Hyunjung Shim Attention-based dropout layer for weakly supervised objectlocalization In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionpp 2219ndash2228 2019

Ian J Goodfellow Jonathon Shlens and Christian Szegedy Explaining and harnessing adversarialexamples ICLR 2015

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun Deep residual learning for image recog-nition In Proceedings of the IEEE conference on computer vision and pattern recognition pp770ndash778 2016

J J Hull A database for handwritten text recognition research IEEE Transactions on PatternAnalysis and Machine Intelligence 16(5)550ndash554 May 1994 ISSN 1939-3539 doi 10110934291440

Simran Kaur Jeremy Cohen and Zachary C Lipton Are perceptually-aligned gradients a generalproperty of robust classifiers arXiv preprint arXiv191008640 2019

Alex Krizhevsky Vinod Nair and Geoffrey Hinton Cifar-10 URL httpwwwcstorontoedukrizcifarhtml 2010

Yan Lecun B Boser JS Denker D Henderson RE Howard W Hubbard and LD JackelBackpropagation applied to handwritten zip code recognition 1989

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018a

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018b

Yuval Netzer Tao Wang Adam Coates Alessandro Bissacco Bo Wu and Andrew Y Ng Readingdigits in natural images with unsupervised feature learning NIPS Workshop 2011

Shibani Santurkar Dimitris Tsipras Brandon Tran Andrew Ilyas Logan Engstrom and AleksanderMadry Image synthesis with a single (robust) classifier arXiv preprint arXiv190609453 2019

Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian J Goodfel-low and Rob Fergus Intriguing properties of neural networks ICLR 2014

Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner and Aleksander MadryRobustness may be at odds with accuracy arXiv preprint arXiv180512152 2018

C Wah S Branson P Welinder P Perona and S Belongie The Caltech-UCSD Birds-200-2011Dataset Technical Report CNS-TR-2011-001 California Institute of Technology 2011

Eric Wong Leslie Rice and J Zico Kolter Fast is better than free Revisiting adversarial trainingICLR 2020

Sergey Zagoruyko and Nikos Komodakis Wide residual networks CoRR abs160507146 2016URL httparxivorgabs160507146

5

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion
Page 2: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

Model Adversarial Accuracy (10 step PGD attack)ε = 002550 ε = 402550 ε = 802550

Natural 9088 001 000AT-1 9095 3433 453AT-2 9088 5076 1441AT-4 8979 6128 2730AT-8 8725 7399 4737

Table 1 Adversarial Robustness of different models on CIFAR-10

Figure 1 Visualization of image level gradients for CIFAR-10 and SVHN dataset Column 1 denotes theoriginal image Columns 2-7 denote the gradient from Natural AT-1 AT-2 AT-4 AT-8 models respectively

Datasets We perform experiments on five standard datasets USPS [Hull (1994)] MNIST [Le-cun et al (1989)] CIFAR-10[Krizhevsky et al (2010)] SVHN[Netzer et al (2011)] and RestrictedImagenet[Tsipras et al (2018)] MNIST consists of grayscale handwritten images of digits of size28 lowast 28 We use the network architecture as described in Madry et al (2018a) USPS like MNISThas handwritten digits for 10 different classes However the image size is 16 times 16 MNIST andUSPS are subject to different data distributions and thus are often used to check zero-shot or transferlearning performance of models CIFAR-10 consists of RGB images of size 32times 32 for 10 differentclasses SVHN contains RGB images of streetview house numbers for 10 different classes Re-stricted ImageNet consists of a subset of ImageNet classes which have been grouped into 9 differentclasses that has RGB images of dimension 224 times 224 We perform experiments over WRN-28-10 model [Zagoruyko amp Komodakis (2016)] for SVHN CIFAR-10 and over ResNet50 [He et al(2016)] for Restricted ImageNet dataset

Methodology We perform adversarial training (AT) using either FSGM [Goodfellow et al (2015)Wong et al (2020)] or PGD [Madry et al (2018b)] We found similar results on the mentioneddownstream tasks for FGSM and PGD adversarially trained models

Qualitative evaluation We perform adversarial training (AT) for CIFAR-10 and SVHN overWRN-28-10 model under linfin norm constraint for a range of maximum perturbation bound(ε) Dif-ferent models were trained for ε isin [102550 202550 402550 802550] with a batch size of128 for 30 000 and 5 000 steps for CIFAR-10 and SVHN respectively Let us denote the adversar-ially trained model with linfin perturbation bound(ε) as AT-ε and standard trained models as Naturalthat trains using the clean samples only

The robustness of different models against a strong adversarial attack PGD attack for various valuesof ε over CIFAR-10 dataset have been mentioned in table 1 We find that models trained againstsmall ε adversarial perturbations do not show high robustness against strong adversarial attacks

The image level gradients for all the different models as done in Tsipras et al (2018) have beenvisualized in figure 1 As can be seen from the figure that while the gradients of the natural model

2

Published as a conference paper at ICLR 2020

Figure 2 High ε adversarial images Column 1 denotes the original image Columns 2-7 denote high εadversarial images from Natural AT-1 AT-2 AT-4 AT-8 models respectively

Figure 3 Example images generated by large ε untargeted l2 attack using ResNet-50 model trained by adver-sarial training with ε = 2 on Restricted ImageNet Top row denotes the original images while the bottom rowdenotes the adversarial images

looks like a noisy version of the original image the gradients of the model trained via adversarialtraining for different ε align well with the original image This is valid even for the adversarialtrained models with low value of ε

We also perform non-targeted adversarial attack at a high value of ε = 3202550 We use PGDattack with a learning rate of 162550 and step size of 50 to compute the attack Again we findthat while such an attack induces random noise to the Natural model for the other models suchadversarial images often resemble those belonging to different classes The results are shown infigure 2 We can see cat changing to dog frog changing to deer and truck changing to car forthe different models over the CIFAR-10 dataset We repeat this experiment over the RestrictedImageNet dataset too Figure 3 shows some of the images generated from a model trained viaadversarial training for ε = 2 using ResNet50 model

Interestingly both of the above phenomenons were shown to be property of adversarial robust modelsin Tsipras et al (2018) In contrast we show that not so robust models trained via adversarialtraining for small values of ε also exhibit such properties

Zero shot learning Models that seem to learn more interpretable features should also performbetter against shifts in data distributions We show the performance of adversarial trained modelsfor two zero-shot tasks SVHNrarrMNIST and MNISTrarrUSPS For both of these tasks the modelwas trained over the source dataset and evaluated over the validation set of both source and targetdatasets We define sourceAcc and targetAcc as the validation accuracies over the source and targetdatasets respectively

SVHNrarrMNIST We perform adversarial training for ε isin [102550 202550 402550802550] over the SVHN dataset Again AT-1 AT-2 AT-4 AT-8 refer to the corresponding ad-versarial trained models As a baseline we also perform training of models with gaussian noise ofvarying strength added to the inputs as done in [Kaur et al (2019)] The corresponding models are

3

Published as a conference paper at ICLR 2020

Model sourceAcc targetAccNatural 9472 5398

AT-1N-1 95409536 75997465AT-2N-2 94739493 75737214AT-4N-4 93319421 77267430AT-8N-8 90029068 69255820

Table 2 Zero-shot accuracy for SVHNrarrMNIST

Model sourceAcc targetAccNatural 9917 7514

AT-005N-005 99379871 81818032AT-01N-01 9923991 82418181AT-02N-02 9904992 77687992AT-03N-03 9849919 73998122

Table 3 Zero-shot accuracy for MNISTrarrUSPS

Figure 4 Examples of estimated bounding box and heatmap by adversarially ResNet50 model on randomlychosen images of CUB dataset Red bounding box is ground truth and green bounding box corresponds to theestimated box

referred to as N-1 N-2 N-4 N-8 The source and target validation accuracies have been mentionedin table 2

MNISTrarrUSPS We perform adversarial training for ε isin [005010203] over the MNIST datasetand then evaluate the performance of the trained model over the USPS dataset The results havebeen mentioned in table 3

For both zero-shot tasks we find that the adversarial trained models for low ε often achieve higherperformance over the target dataset than the naturally trained models Further the performance ofAT models for low ε often exceeds the AT models trained on high ε

Weakly Supervised Object Localization (WSOL) aims to localize the object in the given imagethrough weak supervisory signals like image labels instead of rich annotation of bounding boxesMost of the prior techniques (Choe amp Shim 2019 Zhang et al 2018ab) exploit CAM (Zhou et al2016) image interpretation technique for predicting the bounding box of the object using a trainedmodel We hypothesize that models having perceptually aligned gradients with respect to imageand high test accuracy should prove useful in this domain We show empirically that adversariallytrained model with low max-perturbation bound having perceptually aligned gradients performscomparable to the state-of-the-art in WSOL on CUB (Wah et al 2011) dataset in table 4 Wereport on evaluation metrics used in Choe amp Shim (2019) ie Top-1 classification accuracy (Top-1 Acc) Localization accuracy when ground truth is known (GT-Known Loc) ie when intersectionover union (IoU) of estimated box and ground truth bounding boxgt05 Top-1 localization accuracyie when prediction is correct and IoU of bounding boxgt05 (Top-1 Loc) We show some sampleimages and estimated bounding boxes in figure 4

Model Method WSOL Top-1 AccGT-Known Loc Top-1 Loc

ResNet50-SE ADL(Choe amp Shim 2019) - 6229 8034

ResNet50

Natural 6037 500 8112AT-05 7793 6103 7869AT-1 7455 5589 7588AT-2 6993 5010 7002

Table 4 Weakly Supervised Localization on CUB dataset Bold text refers to the best Top-1 Loc and underlinedenotes the second best

4

Published as a conference paper at ICLR 2020

3 CONCLUSION

We observe that adversarial training on low values of ε also induces features that are perceptually-aligned unlike standard trained model While these models do not show strong robustness againstadversarial attacks but performing adversarial attack of large magnitude often generates an imagebelonging to a different class We further show that such models outperform models trained naturallyand AT models with high value of ε on two different zero-shot tasks and a weakly supervisedlocalization task We hope that our findings will spur research aimed at harnessing the perceptually-aligned model features to further generalize to additional downstream tasks

REFERENCES

Junsuk Choe and Hyunjung Shim Attention-based dropout layer for weakly supervised objectlocalization In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionpp 2219ndash2228 2019

Ian J Goodfellow Jonathon Shlens and Christian Szegedy Explaining and harnessing adversarialexamples ICLR 2015

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun Deep residual learning for image recog-nition In Proceedings of the IEEE conference on computer vision and pattern recognition pp770ndash778 2016

J J Hull A database for handwritten text recognition research IEEE Transactions on PatternAnalysis and Machine Intelligence 16(5)550ndash554 May 1994 ISSN 1939-3539 doi 10110934291440

Simran Kaur Jeremy Cohen and Zachary C Lipton Are perceptually-aligned gradients a generalproperty of robust classifiers arXiv preprint arXiv191008640 2019

Alex Krizhevsky Vinod Nair and Geoffrey Hinton Cifar-10 URL httpwwwcstorontoedukrizcifarhtml 2010

Yan Lecun B Boser JS Denker D Henderson RE Howard W Hubbard and LD JackelBackpropagation applied to handwritten zip code recognition 1989

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018a

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018b

Yuval Netzer Tao Wang Adam Coates Alessandro Bissacco Bo Wu and Andrew Y Ng Readingdigits in natural images with unsupervised feature learning NIPS Workshop 2011

Shibani Santurkar Dimitris Tsipras Brandon Tran Andrew Ilyas Logan Engstrom and AleksanderMadry Image synthesis with a single (robust) classifier arXiv preprint arXiv190609453 2019

Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian J Goodfel-low and Rob Fergus Intriguing properties of neural networks ICLR 2014

Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner and Aleksander MadryRobustness may be at odds with accuracy arXiv preprint arXiv180512152 2018

C Wah S Branson P Welinder P Perona and S Belongie The Caltech-UCSD Birds-200-2011Dataset Technical Report CNS-TR-2011-001 California Institute of Technology 2011

Eric Wong Leslie Rice and J Zico Kolter Fast is better than free Revisiting adversarial trainingICLR 2020

Sergey Zagoruyko and Nikos Komodakis Wide residual networks CoRR abs160507146 2016URL httparxivorgabs160507146

5

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion
Page 3: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

Figure 2 High ε adversarial images Column 1 denotes the original image Columns 2-7 denote high εadversarial images from Natural AT-1 AT-2 AT-4 AT-8 models respectively

Figure 3 Example images generated by large ε untargeted l2 attack using ResNet-50 model trained by adver-sarial training with ε = 2 on Restricted ImageNet Top row denotes the original images while the bottom rowdenotes the adversarial images

looks like a noisy version of the original image the gradients of the model trained via adversarialtraining for different ε align well with the original image This is valid even for the adversarialtrained models with low value of ε

We also perform non-targeted adversarial attack at a high value of ε = 3202550 We use PGDattack with a learning rate of 162550 and step size of 50 to compute the attack Again we findthat while such an attack induces random noise to the Natural model for the other models suchadversarial images often resemble those belonging to different classes The results are shown infigure 2 We can see cat changing to dog frog changing to deer and truck changing to car forthe different models over the CIFAR-10 dataset We repeat this experiment over the RestrictedImageNet dataset too Figure 3 shows some of the images generated from a model trained viaadversarial training for ε = 2 using ResNet50 model

Interestingly both of the above phenomenons were shown to be property of adversarial robust modelsin Tsipras et al (2018) In contrast we show that not so robust models trained via adversarialtraining for small values of ε also exhibit such properties

Zero shot learning Models that seem to learn more interpretable features should also performbetter against shifts in data distributions We show the performance of adversarial trained modelsfor two zero-shot tasks SVHNrarrMNIST and MNISTrarrUSPS For both of these tasks the modelwas trained over the source dataset and evaluated over the validation set of both source and targetdatasets We define sourceAcc and targetAcc as the validation accuracies over the source and targetdatasets respectively

SVHNrarrMNIST We perform adversarial training for ε isin [102550 202550 402550802550] over the SVHN dataset Again AT-1 AT-2 AT-4 AT-8 refer to the corresponding ad-versarial trained models As a baseline we also perform training of models with gaussian noise ofvarying strength added to the inputs as done in [Kaur et al (2019)] The corresponding models are

3

Published as a conference paper at ICLR 2020

Model sourceAcc targetAccNatural 9472 5398

AT-1N-1 95409536 75997465AT-2N-2 94739493 75737214AT-4N-4 93319421 77267430AT-8N-8 90029068 69255820

Table 2 Zero-shot accuracy for SVHNrarrMNIST

Model sourceAcc targetAccNatural 9917 7514

AT-005N-005 99379871 81818032AT-01N-01 9923991 82418181AT-02N-02 9904992 77687992AT-03N-03 9849919 73998122

Table 3 Zero-shot accuracy for MNISTrarrUSPS

Figure 4 Examples of estimated bounding box and heatmap by adversarially ResNet50 model on randomlychosen images of CUB dataset Red bounding box is ground truth and green bounding box corresponds to theestimated box

referred to as N-1 N-2 N-4 N-8 The source and target validation accuracies have been mentionedin table 2

MNISTrarrUSPS We perform adversarial training for ε isin [005010203] over the MNIST datasetand then evaluate the performance of the trained model over the USPS dataset The results havebeen mentioned in table 3

For both zero-shot tasks we find that the adversarial trained models for low ε often achieve higherperformance over the target dataset than the naturally trained models Further the performance ofAT models for low ε often exceeds the AT models trained on high ε

Weakly Supervised Object Localization (WSOL) aims to localize the object in the given imagethrough weak supervisory signals like image labels instead of rich annotation of bounding boxesMost of the prior techniques (Choe amp Shim 2019 Zhang et al 2018ab) exploit CAM (Zhou et al2016) image interpretation technique for predicting the bounding box of the object using a trainedmodel We hypothesize that models having perceptually aligned gradients with respect to imageand high test accuracy should prove useful in this domain We show empirically that adversariallytrained model with low max-perturbation bound having perceptually aligned gradients performscomparable to the state-of-the-art in WSOL on CUB (Wah et al 2011) dataset in table 4 Wereport on evaluation metrics used in Choe amp Shim (2019) ie Top-1 classification accuracy (Top-1 Acc) Localization accuracy when ground truth is known (GT-Known Loc) ie when intersectionover union (IoU) of estimated box and ground truth bounding boxgt05 Top-1 localization accuracyie when prediction is correct and IoU of bounding boxgt05 (Top-1 Loc) We show some sampleimages and estimated bounding boxes in figure 4

Model Method WSOL Top-1 AccGT-Known Loc Top-1 Loc

ResNet50-SE ADL(Choe amp Shim 2019) - 6229 8034

ResNet50

Natural 6037 500 8112AT-05 7793 6103 7869AT-1 7455 5589 7588AT-2 6993 5010 7002

Table 4 Weakly Supervised Localization on CUB dataset Bold text refers to the best Top-1 Loc and underlinedenotes the second best

4

Published as a conference paper at ICLR 2020

3 CONCLUSION

We observe that adversarial training on low values of ε also induces features that are perceptually-aligned unlike standard trained model While these models do not show strong robustness againstadversarial attacks but performing adversarial attack of large magnitude often generates an imagebelonging to a different class We further show that such models outperform models trained naturallyand AT models with high value of ε on two different zero-shot tasks and a weakly supervisedlocalization task We hope that our findings will spur research aimed at harnessing the perceptually-aligned model features to further generalize to additional downstream tasks

REFERENCES

Junsuk Choe and Hyunjung Shim Attention-based dropout layer for weakly supervised objectlocalization In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionpp 2219ndash2228 2019

Ian J Goodfellow Jonathon Shlens and Christian Szegedy Explaining and harnessing adversarialexamples ICLR 2015

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun Deep residual learning for image recog-nition In Proceedings of the IEEE conference on computer vision and pattern recognition pp770ndash778 2016

J J Hull A database for handwritten text recognition research IEEE Transactions on PatternAnalysis and Machine Intelligence 16(5)550ndash554 May 1994 ISSN 1939-3539 doi 10110934291440

Simran Kaur Jeremy Cohen and Zachary C Lipton Are perceptually-aligned gradients a generalproperty of robust classifiers arXiv preprint arXiv191008640 2019

Alex Krizhevsky Vinod Nair and Geoffrey Hinton Cifar-10 URL httpwwwcstorontoedukrizcifarhtml 2010

Yan Lecun B Boser JS Denker D Henderson RE Howard W Hubbard and LD JackelBackpropagation applied to handwritten zip code recognition 1989

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018a

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018b

Yuval Netzer Tao Wang Adam Coates Alessandro Bissacco Bo Wu and Andrew Y Ng Readingdigits in natural images with unsupervised feature learning NIPS Workshop 2011

Shibani Santurkar Dimitris Tsipras Brandon Tran Andrew Ilyas Logan Engstrom and AleksanderMadry Image synthesis with a single (robust) classifier arXiv preprint arXiv190609453 2019

Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian J Goodfel-low and Rob Fergus Intriguing properties of neural networks ICLR 2014

Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner and Aleksander MadryRobustness may be at odds with accuracy arXiv preprint arXiv180512152 2018

C Wah S Branson P Welinder P Perona and S Belongie The Caltech-UCSD Birds-200-2011Dataset Technical Report CNS-TR-2011-001 California Institute of Technology 2011

Eric Wong Leslie Rice and J Zico Kolter Fast is better than free Revisiting adversarial trainingICLR 2020

Sergey Zagoruyko and Nikos Komodakis Wide residual networks CoRR abs160507146 2016URL httparxivorgabs160507146

5

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion
Page 4: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

Model sourceAcc targetAccNatural 9472 5398

AT-1N-1 95409536 75997465AT-2N-2 94739493 75737214AT-4N-4 93319421 77267430AT-8N-8 90029068 69255820

Table 2 Zero-shot accuracy for SVHNrarrMNIST

Model sourceAcc targetAccNatural 9917 7514

AT-005N-005 99379871 81818032AT-01N-01 9923991 82418181AT-02N-02 9904992 77687992AT-03N-03 9849919 73998122

Table 3 Zero-shot accuracy for MNISTrarrUSPS

Figure 4 Examples of estimated bounding box and heatmap by adversarially ResNet50 model on randomlychosen images of CUB dataset Red bounding box is ground truth and green bounding box corresponds to theestimated box

referred to as N-1 N-2 N-4 N-8 The source and target validation accuracies have been mentionedin table 2

MNISTrarrUSPS We perform adversarial training for ε isin [005010203] over the MNIST datasetand then evaluate the performance of the trained model over the USPS dataset The results havebeen mentioned in table 3

For both zero-shot tasks we find that the adversarial trained models for low ε often achieve higherperformance over the target dataset than the naturally trained models Further the performance ofAT models for low ε often exceeds the AT models trained on high ε

Weakly Supervised Object Localization (WSOL) aims to localize the object in the given imagethrough weak supervisory signals like image labels instead of rich annotation of bounding boxesMost of the prior techniques (Choe amp Shim 2019 Zhang et al 2018ab) exploit CAM (Zhou et al2016) image interpretation technique for predicting the bounding box of the object using a trainedmodel We hypothesize that models having perceptually aligned gradients with respect to imageand high test accuracy should prove useful in this domain We show empirically that adversariallytrained model with low max-perturbation bound having perceptually aligned gradients performscomparable to the state-of-the-art in WSOL on CUB (Wah et al 2011) dataset in table 4 Wereport on evaluation metrics used in Choe amp Shim (2019) ie Top-1 classification accuracy (Top-1 Acc) Localization accuracy when ground truth is known (GT-Known Loc) ie when intersectionover union (IoU) of estimated box and ground truth bounding boxgt05 Top-1 localization accuracyie when prediction is correct and IoU of bounding boxgt05 (Top-1 Loc) We show some sampleimages and estimated bounding boxes in figure 4

Model Method WSOL Top-1 AccGT-Known Loc Top-1 Loc

ResNet50-SE ADL(Choe amp Shim 2019) - 6229 8034

ResNet50

Natural 6037 500 8112AT-05 7793 6103 7869AT-1 7455 5589 7588AT-2 6993 5010 7002

Table 4 Weakly Supervised Localization on CUB dataset Bold text refers to the best Top-1 Loc and underlinedenotes the second best

4

Published as a conference paper at ICLR 2020

3 CONCLUSION

We observe that adversarial training on low values of ε also induces features that are perceptually-aligned unlike standard trained model While these models do not show strong robustness againstadversarial attacks but performing adversarial attack of large magnitude often generates an imagebelonging to a different class We further show that such models outperform models trained naturallyand AT models with high value of ε on two different zero-shot tasks and a weakly supervisedlocalization task We hope that our findings will spur research aimed at harnessing the perceptually-aligned model features to further generalize to additional downstream tasks

REFERENCES

Junsuk Choe and Hyunjung Shim Attention-based dropout layer for weakly supervised objectlocalization In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionpp 2219ndash2228 2019

Ian J Goodfellow Jonathon Shlens and Christian Szegedy Explaining and harnessing adversarialexamples ICLR 2015

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun Deep residual learning for image recog-nition In Proceedings of the IEEE conference on computer vision and pattern recognition pp770ndash778 2016

J J Hull A database for handwritten text recognition research IEEE Transactions on PatternAnalysis and Machine Intelligence 16(5)550ndash554 May 1994 ISSN 1939-3539 doi 10110934291440

Simran Kaur Jeremy Cohen and Zachary C Lipton Are perceptually-aligned gradients a generalproperty of robust classifiers arXiv preprint arXiv191008640 2019

Alex Krizhevsky Vinod Nair and Geoffrey Hinton Cifar-10 URL httpwwwcstorontoedukrizcifarhtml 2010

Yan Lecun B Boser JS Denker D Henderson RE Howard W Hubbard and LD JackelBackpropagation applied to handwritten zip code recognition 1989

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018a

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018b

Yuval Netzer Tao Wang Adam Coates Alessandro Bissacco Bo Wu and Andrew Y Ng Readingdigits in natural images with unsupervised feature learning NIPS Workshop 2011

Shibani Santurkar Dimitris Tsipras Brandon Tran Andrew Ilyas Logan Engstrom and AleksanderMadry Image synthesis with a single (robust) classifier arXiv preprint arXiv190609453 2019

Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian J Goodfel-low and Rob Fergus Intriguing properties of neural networks ICLR 2014

Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner and Aleksander MadryRobustness may be at odds with accuracy arXiv preprint arXiv180512152 2018

C Wah S Branson P Welinder P Perona and S Belongie The Caltech-UCSD Birds-200-2011Dataset Technical Report CNS-TR-2011-001 California Institute of Technology 2011

Eric Wong Leslie Rice and J Zico Kolter Fast is better than free Revisiting adversarial trainingICLR 2020

Sergey Zagoruyko and Nikos Komodakis Wide residual networks CoRR abs160507146 2016URL httparxivorgabs160507146

5

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion
Page 5: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

3 CONCLUSION

We observe that adversarial training on low values of ε also induces features that are perceptually-aligned unlike standard trained model While these models do not show strong robustness againstadversarial attacks but performing adversarial attack of large magnitude often generates an imagebelonging to a different class We further show that such models outperform models trained naturallyand AT models with high value of ε on two different zero-shot tasks and a weakly supervisedlocalization task We hope that our findings will spur research aimed at harnessing the perceptually-aligned model features to further generalize to additional downstream tasks

REFERENCES

Junsuk Choe and Hyunjung Shim Attention-based dropout layer for weakly supervised objectlocalization In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionpp 2219ndash2228 2019

Ian J Goodfellow Jonathon Shlens and Christian Szegedy Explaining and harnessing adversarialexamples ICLR 2015

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun Deep residual learning for image recog-nition In Proceedings of the IEEE conference on computer vision and pattern recognition pp770ndash778 2016

J J Hull A database for handwritten text recognition research IEEE Transactions on PatternAnalysis and Machine Intelligence 16(5)550ndash554 May 1994 ISSN 1939-3539 doi 10110934291440

Simran Kaur Jeremy Cohen and Zachary C Lipton Are perceptually-aligned gradients a generalproperty of robust classifiers arXiv preprint arXiv191008640 2019

Alex Krizhevsky Vinod Nair and Geoffrey Hinton Cifar-10 URL httpwwwcstorontoedukrizcifarhtml 2010

Yan Lecun B Boser JS Denker D Henderson RE Howard W Hubbard and LD JackelBackpropagation applied to handwritten zip code recognition 1989

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018a

Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian VladuTowards deep learning models resistant to adversarial attacks ICLR 2018b

Yuval Netzer Tao Wang Adam Coates Alessandro Bissacco Bo Wu and Andrew Y Ng Readingdigits in natural images with unsupervised feature learning NIPS Workshop 2011

Shibani Santurkar Dimitris Tsipras Brandon Tran Andrew Ilyas Logan Engstrom and AleksanderMadry Image synthesis with a single (robust) classifier arXiv preprint arXiv190609453 2019

Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna Dumitru Erhan Ian J Goodfel-low and Rob Fergus Intriguing properties of neural networks ICLR 2014

Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner and Aleksander MadryRobustness may be at odds with accuracy arXiv preprint arXiv180512152 2018

C Wah S Branson P Welinder P Perona and S Belongie The Caltech-UCSD Birds-200-2011Dataset Technical Report CNS-TR-2011-001 California Institute of Technology 2011

Eric Wong Leslie Rice and J Zico Kolter Fast is better than free Revisiting adversarial trainingICLR 2020

Sergey Zagoruyko and Nikos Komodakis Wide residual networks CoRR abs160507146 2016URL httparxivorgabs160507146

5

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion
Page 6: nupkumar@adobe.com Mayank Singh arXiv:2005.01499v1 [cs.CV ... · Mayank Singh Adobe Inc, Noida, India msingh@adobe.com ABSTRACT Adversarial robust models have been shown to learn

Published as a conference paper at ICLR 2020

Xiaolin Zhang Yunchao Wei Jiashi Feng Yi Yang and Thomas Huang Adversarial complemen-tary learning for weakly supervised object localization In IEEE CVPR 2018a

Xiaolin Zhang Yunchao Wei Guoliang Kang Yi Yang and Thomas Huang Self-produced guid-ance for weakly-supervised object localization In European Conference on Computer VisionSpringer 2018b

Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Learning deepfeatures for discriminative localization In CVPR 2016

6

  • 1 Introduction
  • 2 Experiments And Results
  • 3 Conclusion