in machine learning gradient masking...gradient masking in machine learning nicolas papernot...

32
Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017 @NicolasPapernot

Upload: others

Post on 13-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Gradient Maskingin Machine LearningNicolas PapernotPennsylvania State University

ARO Workshop on Adversarial Machine Learning, StanfordSeptember 2017

@NicolasPapernot

Pieter Abbeel (Berkeley)Michael Backes (CISPA)Dan Boneh (Stanford)Z. Berkay Celik (Penn State)Yan Duan (OpenAI)Ian Goodfellow (Google)Matt Fredrikson (CMU)Kathrin Grosse (CISPA)

Thank you to our collaborators

2

Sandy Huang (Berkeley)Somesh Jha (U of Wisconsin)Alexey Kurakin (Google)Praveen Manoharan (CISPA)Patrick McDaniel (Penn State)Arunesh Sinha (U of Michigan) Ananthram Swami (US ARL)Florian Tramèr (Stanford)Michael Wellman (U of Michigan)

@NicolasPapernot

Gradient Masking

3

Training

4

Small when prediction is correct on legitimate input

Adversarial training

5

Small when prediction is correct on legitimate input

Small when prediction is correct on adversarial input

Gradient masking in adversarially trained models

6Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient

Direction of another model’s

gradient

Gradient masking in adversarially trained models

7Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient

Adversarial example

Non-adversarial example

Direction of another model’s

gradient

Gradient masking in adversarially trained models

8Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr

Non-adversarial example

Direction of the adversarially trained model’s gradient

Direction of another model’s

gradient

Adversarial exampleNon-adversarial example

Evading gradient masking (1)

9

Threat model: white-box adversary

Attack:

(1) Random step (of norm alpha)(2) FGSM step (of norm eps - alpha)

Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr

Threat model: black-box attack

Attack:

(1) Learn substitute for defended model(2) Find adversarial direction using substitute

Evading gradient masking (2)

10Papernot et al. Practical Black-Box Attacks against Machine LearningPapernot et al. Towards the Science of Security and Privacy in Machine Learning

11

Black-box ML sys

Local substitute

“no truck sign”“STOP sign”

(1) The adversary queries the remote ML system with synthetic inputs to learn a local substitute.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

12

Black-box ML sys

Local substitute

“yield sign”

(2) The adversary uses the local substitute to craft adversarial examples.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

Adversarial example transferability

13Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Large adversarial subspaces enable transferability

14Tramèr et al. The Space of Transferable Adversarial Examples

On average: 44 orthogonal directions -> 25 transfer

Adversarial training

15

Small when prediction is correct on legitimate input

Small when prediction is correct on adversarial inputgradient is not adversarial

Ensemble Adversarial Training

16

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

17

Model A Adversarial gradient

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

18

Model A Inference

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

19

Model A Adversarial gradient

Model B Model C Model D

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

20

Model A Adversarial gradient

Model B Model C Model D

Model C

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

21

Model A Adversarial gradient

Model B Model D

Model D

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

22

Model A Adversarial gradient

Model B Model C

Model D

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

23

Model A Adversarial gradient

Model B Model C

Model DModel B Model C

Ensemble adversarial training

Intuition: present adversarial gradients from multiple models during training

24

Model A

Inference

Experimental results on MNIST

25

(from holdout)

Experimental results on ImageNet

26

(from holdout)

Reproducible Adversarial ML research with CleverHans

27

CleverHans library guiding principles

28

1. Benchmark reproducibility

2. Can be used with any TensorFlow model

3. Always include state-of-the-art attacks and defenses

29

Growing community

1.1K+ stars290+ forks

35 contributors

30

Adversarial examples represent worst-case distribution drifts

[DDS04] Dalvi et al. Adversarial Classification (KDD)

31

Adversarial examples are a tangible instance of hypothetical AI safety problems

Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

This research was funded by:

?Thank you for listening! Get involved at:

[email protected]

32

www.cleverhans.io@NicolasPapernot

github.com/tensorflow/cleverhans