robust machine learning: progress, challenges, humansece739/lectures/18739-2020... ·...

50
Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd gradient-science.org

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robust Machine Learning: Progress, Challenges, Humans

Dimitris Tsipras

@tsiprasd gradient-science.org

Page 2: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

joint work with

Logan Engstrom

Andrew Ilyas

Aleksander Mądry

Brandon Tran

Shibani Santurkar

Alexander Turner

Kunal Talwar

Ludwig Schmidt

Adrian Vladu

Aleksandar Makelov

Page 3: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Deep Learning can be amazing

Image Classification

Strategy Games

Machine Translation

Robotic ManipulationRealistic Image Generation

Page 4: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

ImageNet: A success story

Page 5: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

ImageNet: A success story

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 Human 2015 2016 2017

ILSVRCtop-5ErroronImageNet

AlexNet

Have we achieved truly super-human performance?

Page 6: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Real-world deployment

Are ML systems ready for the real world?

Page 7: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Core issue: Brittleness

“pig” (91%)

=

“airliner” (99%)

+0.005x

adversarial noise

Long history in ”standard” ML: [Biggio et al. 2013] [Dalvi et al. 2004][Lowd Meek 2005] [Globerson Roweis 2006][Kolcz Teo 2009][Barreno et al. 2010] [Biggio et al.

2010][Biggio et al. 2014][Srndic Laskov 2013]

[Szegedy et al. 2013]

Page 8: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Real-world perturbations?

[Athalye Engstrom Ilyas Kwok 2017]

Page 9: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Training on rotations does not solve the problem

More natural examples?

[Fawzi Frossard 2015] [Engstrom Tran T Schmidt Madry 2017]

Page 10: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Black-box attacks?Microsoft Azure

Google Cloud Vision API

Input!

Output

Parameters θDoes black-box mean secure? No.

Transfer attacks: Just attack a similar model

Query attacks: Directly use input-output queries

[Szegedy et al. 2013, Papernot et al. 2016]

[Chen et al. 2017]

Page 11: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Beyond images?[Carlini Wagner. 2018]:  Can arbitrarily confuse a speech recognition system

[Grosse et al. 2017]:  Small changes can bypass malware detection systems

[Jia Liang 2017]:  Irrelevant sentences confuse reading comprehension models

Page 12: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Why should we care?

Page 13: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Security

[Sharif et al. 2016]

Already issues with spam and content filtering

[Evtimov et al. 2018]

Page 14: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

ReliabilityWhat we expect from AI

What we (sometimes) get

ML models are very brittle

Page 15: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Human Alignment

How are DL models making predictions?

“pig” (91%)

=

“airliner” (99%)

+0.005x

adversarial noise

Why is this important to the model?

Page 16: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

How do we train robust models?

“pig”

=

“airliner”

+

“pig”

Our focus:

Page 17: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

How do we find adv. examples?

Input!

Output

Parameters θ

differentiable

min$%&,(~* [,-.. /, 0, 1 ]

labelinputmodel

parameters

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

Gradient Descent to find θ

Allowed perturbations: pixel-wise, rotations, …

Standard training

Adversarial attacks

Page 18: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

How do we train robustly?

Key observation: Adversarial examples are not at odds with standard learning

Adversarially Robust Generalization:

min$%&,(~* [,-.. /, 0, 1 ]

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

Standard Generalization:

Explicit set of invariances

Page 19: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Towards robust models

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

finding a robust model finding a worst-case perturbation

Theorem (Danskin): Gradient at maximizer → Gradient of max

∇y maxx

f(x, y) = ∇y f(x⋆, y)

(Projected) Gradient Descent on δ(Stochastic) Gradient Descent on θ(How do we get gradients of the max?)

x⋆ = arg maxx

f(x, y)

Page 20: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Towards robust models

Improve robustness: Train on perturbed inputs

(aka “adversarial training” [Goodfellow et al. 2015])

Actually leads to robust models (with some care)

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

finding a robust model finding a worst-case perturbation

Page 21: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Key ingredient 1: Reliable attacks

We need to train on (almost) worst-case inputs

But: DNN loss is non-convex

PGD1

PGD2

+ε0

Page 22: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Key ingredient 1: Reliable attacks

We need to train on (almost) worst-case inputs

But: DNN loss is non-convex

PGD can still find worst-case inputs reliably

Consistent behavior from random starts

Page 23: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Key ingredient 2: Capacity

Robust models may need to be more expressive

Capacity scale

Robu

st

Acc

urac

y Weak models can fail to train

Higher capacity ⇒ more robust

Page 24: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robust models

Reliable attacks Sufficient capacity

Result: Adversarial loss decreases steadily

Page 25: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

MNIST

CIFAR-10

ImageNet

ℓ∞-norm ℓ2-norm Rotation+Translation

ε = 0.3

ε = 8/255

ε = 2.5

ε = 0.5

ε = ±3px, ±30°

ε = ±3px, ±30°

ε = ±3px, ±30°

89%

53%

66%

70%

50%

98%

82%

57%

ε = 1ε = 4/255

33%

Page 26: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Evaluating robustness can be hard

[Carlini Wagner 2016] [Carlini Wagner 2017] [Carlini Wagner 2017] [Athalye

et al. 2018] [Uesato et al. 2018]

Many defenses are broken by adaptive attacks

Try multiple adaptive attacks

(robust-ml.org)robust-ml.orgRelease code and models

Page 27: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Formal robustness verification

Prove robustness on specific examples

Verification Certification

MIP solvers

Accurate but intractable

Convex relaxation

Bounds might be too loose

Accurate and efficient verification largely open

[Tjeng et al. 2019] [Wong Kolter 2018]

Page 28: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Why is robust learning so hard?

Page 29: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robust generalization is hard

0%

20%

40%

60%

80%

100%

0 20000 40000 60000 80000

RobustAccuracy

Train

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

Page 30: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

0%

20%

40%

60%

80%

100%

0 20000 40000 60000 80000

RobustAccuracy

Test Train

Robust generalization is hard

min$%&,(~* [,-./∈1

2344 5, 6 + /, 8 ]

>50% overfitting

0% 20% 40% 60% 80%

100%

0 20000 40000 60000 80000

StandardAccuracy

Train Test

Doesn’t happen “normally”

min$%&,(~* [,-.. /, 0, 1 ]

Is robust learning fundamentally harder?

Page 31: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robust generalization is hard

Specifically: There exists a d-dimensional distribution where:

→ A single sample is enough to learn a good (standard) classifier

→ But: Need at least Ω(√d) samples for a robust classifierθ*

−θ*

Theorem: The sample complexity of robust generalization can be significantly  larger than that of “standard” generalization.

Page 32: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robust generalization is hard

Theorem: The sample complexity of robust generalization can be significantly  larger than that of “standard” generalization.

Empirically:

Page 33: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Does robustness improve accuracy?

Data augmentation: Train on random transformations of the input

Does adversarial training improve standard accuracy?

Adversarial training ⇔ Augment with the “most helpful” example

→ Significantly improves test accuracy.

Page 34: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Does robustness improve accuracy?

Small sample Large sample

Why are robust models less accurate?

Page 35: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Does robustness improve accuracy?

Theorem: There can exist an inherent trade-off between accuracy and robustness (no “free lunch”).

Strong correlation with label

Weak correlation with label

Standard Training: use all the features to maximize accuracy

Adversarial Training: use only strong features (lower accuracy)

Page 36: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

ML vs. “classical” security

Page 37: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Classical security exploits

Attackers use unintended vulnerabilities to manipulate system

Spectre: Side-effects of speculative execution

Heartbleed: Missing out-of-bounds read checks

“Correct” software should be unbreakable

Page 38: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

ML security exploits

Non-robust features Correlated with label on average,

but can be manipulated

Robust features Correlated with label even with adversary

Adversary manipulates input features used for classification

Page 39: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Predictive non-robust features

High-frequency patterns [Yin et al 2019]

Texture [Geirhos et al 2019]

Linear directions [Jetley et al 2018]

Other examples of unintuitive features

Accuracy CIFAR10 R. ImageNet

Standard 95% 97%

Non-robust features 44% 64%

Features small in L2-norm

Page 40: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Relying on non-robust features directly leads to adversarial vulnerability

We train classifiers to maximize accuracy: No wonder they utilize non-robust features

Non-robust features can be quite predictive

Back to adversarial examples

Thus: Adversarial examples are not bugs, they are features

Page 41: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Consequences

Transferability: Models learn similar non-robust features

Test accuracy of X trained on non-robust features from ResNet-50

Adversarial Transferability

(ResNet-50→X)

Page 42: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Consequences

Dataset robustification: Removing non-robust features can improve standard classifiers

frog

Training setRestrict to features

of robust model

“Robustified” frog

New training set

Standard training yields robust classifiers

Page 43: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

dog

Humans vs ML Models

Equally valid classification methods

We need to explicitly enforce robustness

Page 44: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Robustness beyond security: Robust models are more

human-aligned

Page 45: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Input Manipulation

Bird 1%

Dog 2%

Primate 96%

Truck 0%

Key Idea: Manipulate class scores for robust models

Class maximization introduces salient features

Page 46: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

house finch armadillo chow jigsaw Norwich terrier notebook

cliff anemone fish mashed potato coffee pot

Image Generation

Image Translation Superresolution Inpainting

Downstream applications

Page 47: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Interpolation

Seed (x0) Maximizing different coordinates (i)Seed Max(different coordinates)

Direct feature visualizationActivation 444

(long fish)Activation 939 (insect legs)

Maximized from noise

Most activated

Least activated Maximized from noise

Most activated

Least activated

Feature manipulation

Better representations

Page 48: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Conclusions

Page 49: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Takeaways

Brittleness can arise from non-robust features

ML models are really brittle

Robustness as a tool for human-aligned models

Robust optimization can lead to robust models

Page 50: Robust Machine Learning: Progress, Challenges, Humansece739/lectures/18739-2020... · 2020-04-09 · Robust Machine Learning: Progress, Challenges, Humans Dimitris Tsipras @tsiprasd

Future directions

gradsci.org

More robust models

Different perturbation sets

robustness

More comprehensive theoretical models

Further exploration of robust models