deep visual representations quantifying interpretability...
TRANSCRIPT
![Page 1: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/1.jpg)
Network Dissection: Quantifying Interpretability of Deep Visual RepresentationsBy David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
CS 381VThomas Crosley and Wonjoon Goo
![Page 2: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/2.jpg)
Detectors
![Page 4: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/4.jpg)
Unit Distributions
● Compute internal activations for entire dataset
● Gather distribution for each unit across dataset
![Page 5: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/5.jpg)
Top Quantile
● Compute Tk such that P(ak> Tk ) = 0.005 ● Tk is considered the top-quantile● Detected regions at test time are those with ak> Tk
![Page 6: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/6.jpg)
Detector Concept
● Score of each unit is its IoU with the label
● Detectors are selected with IoU above a threshold
● Threshold is Uk,c > 0.04.
![Page 7: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/7.jpg)
Test Data
● Compute activation map akfor all k neurons in the network
![Page 8: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/8.jpg)
Scaling Up
● Scale each unit’s activation up to the original image size
● Call this the mask-resolution SK
● Use bi-linear interpolation
![Page 9: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/9.jpg)
Thresholding
● Now make the binary segmentation mask Mk
● Mk = SK> TK
SK MK
![Page 10: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/10.jpg)
Experiment: Detector Robustness
● Interest in adversarial examples
● Invariance to noise
● Composition by parts or statistics
![Page 11: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/11.jpg)
Noisy Images
+ Unif[0, 1] + 5 * Unif[0, 1]
+ 100 * Unif[0, 1]+ 10 * Unif[0, 1]
![Page 12: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/12.jpg)
Conv3
Original
+ 5 * Unif[0, 1]
+ 10 * Unif[0, 1]
+ Unif[0, 1]
+ 100 * Unif[0, 1]
![Page 13: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/13.jpg)
Conv4
Original
+ 5 * Unif[0, 1]
+ 10 * Unif[0, 1]
+ Unif[0, 1]
+ 100 * Unif[0, 1]
![Page 14: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/14.jpg)
Conv5
Original
+ 5 * Unif[0, 1]
+ 10 * Unif[0, 1]
+ Unif[0, 1]
+ 100 * Unif[0, 1]
![Page 15: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/15.jpg)
Rotated Images
10 degreesOriginal
45 degrees 90 degrees
![Page 16: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/16.jpg)
conv3
Original
10 degrees
45 degrees
90 degrees
![Page 17: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/17.jpg)
conv4
10 degrees
45 degrees
90 degrees
Original
![Page 18: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/18.jpg)
conv5
10 degrees
45 degrees
90 degrees
Original
![Page 19: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/19.jpg)
Rearranged Images
![Page 20: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/20.jpg)
Rearranged Images
![Page 21: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/21.jpg)
Rearranged Images
![Page 22: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/22.jpg)
Conv3
Original
4x4 Patches
8x8 Patches
![Page 23: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/23.jpg)
Conv4
Original
4x4 Patches
8x8 Patches
![Page 24: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/24.jpg)
Conv5
Original
4x4 Patches
8x8 Patches
![Page 25: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/25.jpg)
Axis-Aligned Interpretability
![Page 26: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/26.jpg)
Axis-Aligned Interpretability
● Hypothesis 1:○ A linear combination of high level units serves just same
or better○ No specialized interpretation for each unit
● Hypothesis 2: (the authors’ argument)○ A linear combination will degrade the interpretability○ Each unit serves for unique concept
How similar is the way CNN learns to human?
![Page 27: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/27.jpg)
Axis-Aligned Interpretability Result from the Authors
Figure: from the paper
● It seems valid argument, but is it the best way to show?● Problems
○ It depends on a rotation matrix used for test○ A 90 degree rotation between two axis, does not affect the
number of unique detectors○ The test should be done multiple times and report the
means and stds.
![Page 28: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/28.jpg)
Experiment: Axis-Aligned Interpretability
![Page 29: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/29.jpg)
Is it really axis aligned?
● Principle Component Analysis (PCA)○ Find orthonormal vectors explaining samples the most ○ The projections to the vector u_1 have higher variance
Figure: From Andrew Ng’s lecture note on PCA
❖ Argument: a unit itself can explain a concept➢ Projections to unit vectors should have higher variance➢ Principal axis (Loading) from PCA should be similar to one
of the unit vectors
![Page 30: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/30.jpg)
Our method
1. Calculate the mean and std. of each unit activation2. Grab activations for a specific concept3. Subtract mean and std from activations4. Perform SVD5. Print Loading
Hypothesis 1 Hypothesis 2
The concept is interpreted with the combination of elementary basis
The concept can be interpreted with an elementary basis (eg. e_502 := (0,...,0,1,0,...,0) )
![Page 31: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/31.jpg)
(Supplementary) PCA and Singular Value Decomposition (SVD)
● Optimize target:
● With Lagrange multiplier:
● The eigenvector for the highest eigenvalue becomes principal axis (loading)
From Cheng Li, Bingyu Wang Notes
![Page 32: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/32.jpg)
PCA Results - Activations for Bird Concept
● Unit 502 stands high; concept bird is aligned to the unit● Does Unit 502 only serve for concept Bird?
○ Yes○ It does not stand for other concepts except bird
● Support Hypothesis 2
![Page 33: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/33.jpg)
PCA Results - Activations for Train Concept
● No units stands out for concept train○ Linear combination of them have better interpretability○ Support Hypothesis 1
![Page 34: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/34.jpg)
PCA Results - Activations for Train Concept
● No units stands out for concept train○ Linear combination of them have interpretability
Some objects with circle and trestle?
![Page 35: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/35.jpg)
PCA Results - Activations for Train Concept
● No units stands out for concept train○ Linear combination of them have interpretability
The sequence of square boxes?
![Page 36: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/36.jpg)
PCA Results - Activations for Train Concept
● No units stands out for concept train○ Linear combination of them have interpretability Dog face!
![Page 37: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/37.jpg)
Conclusion…?
● Actually, it seems mixed!● CNN learns some human concepts naturally, but not always
○ It might highly correlated with the label we give
![Page 38: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/38.jpg)
Other Thoughts
● What if we regularize the network to encourage its interpretability? ○ Taxonomy-Regularized Semantic Deep Convolutional Neural Networks,
Wonjoon Goo, Juyong Kim, Gunhee Kim, and Sung Ju Hwang, ECCV 2016
![Page 39: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya](https://reader034.vdocuments.mx/reader034/viewer/2022051917/6009734dffdf3f1529041b68/html5/thumbnails/39.jpg)
Thanks!Any questions?