biorxiv - simulating a primary visual cortex at the front of cnns … · 2020. 6. 16. · 8mit-ibm...

30
Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations Joel Dapello *,1,2,3 , Tiago Marques *,1,2,4 Martin Schrimpf 1,2,4 , Franziska Geiger 2,5,6,7 , David D. Cox 8,3 , James J. DiCarlo 1,2,4 * Joint first authors (equal contribution) 1 Department of Brain and Cognitive Sciences, MIT, Cambridge, MA02139 2 McGovern Institute for Brain Research, MIT, Cambridge, MA02139 3 School of Engineering and Applied Sciences, Harvard University, Cambridge, MA02139 4 Center for Brains, Minds and Machines, MIT, Cambridge, MA02139 5 University of Augsburg 6 Ludwig Maximilian University 7 Technical University of Munich 8 MIT-IBM Watson AI Lab [email protected] [email protected] Abstract Current state-of-the-art object recognition models are largely based on convo- lutional neural network (CNN) architectures, which are loosely inspired by the primate visual system. However, these CNNs can be fooled by imperceptibly small, explicitly crafted perturbations, and struggle to recognize objects in corrupted images that are easily recognized by humans. Here, by making comparisons with primate neural data, we first observed that CNN models with a neural hidden layer that better matches primate primary visual cortex (V1) are also more robust to adver- sarial attacks. Inspired by this observation, we developed VOneNets, a new class of hybrid CNN vision models. Each VOneNet contains a fixed weight neural network front-end that simulates primate V1, called the VOneBlock, followed by a neural network back-end adapted from current CNN vision models. The VOneBlock is based on a classical neuroscientific model of V1: the linear-nonlinear-Poisson model, consisting of a biologically-constrained Gabor filter bank, simple and com- plex cell nonlinearities, and a V1 neuronal stochasticity generator. After training, VOneNets retain high ImageNet performance, but each is substantially more ro- bust, outperforming the base CNNs and state-of-the-art methods by 18% and 3%, respectively, on a conglomerate benchmark of perturbations comprised of white box adversarial attacks and common image corruptions. Finally, we show that all components of the VOneBlock work in synergy to improve robustness. While current CNN architectures are arguably brain-inspired, the results presented here demonstrate that more precisely mimicking just one stage of the primate visual system leads to new gains in ImageNet-level computer vision applications. 1 Introduction For the past eight years, convolutional neural networks (CNNs) of various kinds have dominated object recognition [1, 2, 3, 4], even surpassing human performance in some benchmarks [5]. However, scratching beneath the surface reveals a different picture. These CNNs are easily fooled by imper- ceptibly small perturbations explicitly crafted to induce mistakes, usually referred to as adversarial Preprint. Under review. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 22, 2020. ; https://doi.org/10.1101/2020.06.16.154542 doi: bioRxiv preprint

Upload: others

Post on 30-Dec-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Simulating a Primary Visual Cortex at the Front ofCNNs Improves Robustness to Image Perturbations

Joel Dapellolowast123 Tiago Marqueslowast124Martin Schrimpf124 Franziska Geiger2567 David D Cox83 James J DiCarlo124

lowastJoint first authors (equal contribution)1Department of Brain and Cognitive Sciences MIT Cambridge MA02139

2McGovern Institute for Brain Research MIT Cambridge MA021393School of Engineering and Applied Sciences Harvard University Cambridge MA02139

4Center for Brains Minds and Machines MIT Cambridge MA021395University of Augsburg

6Ludwig Maximilian University7Technical University of Munich

8MIT-IBM Watson AI Lab

dapellomitedu tmarquesmitedu

Abstract

Current state-of-the-art object recognition models are largely based on convo-lutional neural network (CNN) architectures which are loosely inspired by theprimate visual system However these CNNs can be fooled by imperceptibly smallexplicitly crafted perturbations and struggle to recognize objects in corruptedimages that are easily recognized by humans Here by making comparisons withprimate neural data we first observed that CNN models with a neural hidden layerthat better matches primate primary visual cortex (V1) are also more robust to adver-sarial attacks Inspired by this observation we developed VOneNets a new class ofhybrid CNN vision models Each VOneNet contains a fixed weight neural networkfront-end that simulates primate V1 called the VOneBlock followed by a neuralnetwork back-end adapted from current CNN vision models The VOneBlockis based on a classical neuroscientific model of V1 the linear-nonlinear-Poissonmodel consisting of a biologically-constrained Gabor filter bank simple and com-plex cell nonlinearities and a V1 neuronal stochasticity generator After trainingVOneNets retain high ImageNet performance but each is substantially more ro-bust outperforming the base CNNs and state-of-the-art methods by 18 and 3respectively on a conglomerate benchmark of perturbations comprised of whitebox adversarial attacks and common image corruptions Finally we show thatall components of the VOneBlock work in synergy to improve robustness Whilecurrent CNN architectures are arguably brain-inspired the results presented heredemonstrate that more precisely mimicking just one stage of the primate visualsystem leads to new gains in ImageNet-level computer vision applications

1 Introduction

For the past eight years convolutional neural networks (CNNs) of various kinds have dominatedobject recognition [1 2 3 4] even surpassing human performance in some benchmarks [5] Howeverscratching beneath the surface reveals a different picture These CNNs are easily fooled by imper-ceptibly small perturbations explicitly crafted to induce mistakes usually referred to as adversarial

Preprint Under review

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

attacks [6 7 8 9 10] Further they exhibit a surprising failure to recognize objects in imagescorrupted with different noise patterns that humans have no trouble with [11 12 13] This remarkablefragility to image perturbations has received much attention in the machine learning communityoften from the perspective of safety in real-world deployment of computer vision systems [14 1516 17 18 19 20 21 22] As these perturbations generally have no perceptual alignment with theobject class [23] the failures suggest that current CNNs obtained through task-optimization end uprelying on visual features that are not all the same as those used by humans [24 25] Despite theselimitations some CNNs have achieved unparalleled success in partially explaining neural responsesat multiple stages of the primate ventral stream the set of cortical regions underlying primate visualobject recognition [26 27 28 29 30 31 32]

How can we develop CNNs that robustly generalize like human vision Incorporating biologicalconstraints into CNNs to make them behave more in line with primate vision is an active field ofresearch [33 34 35 36 37 38 39 40 41 42 43 32] Still no neurobiological prior has beenshown to considerably improve CNN robustness to both adversarial attacks and image corruptions inchallenging real-world tasks such as ImageNet [44] Here we build on this line of work starting withthe observation that the ability of each CNN to explain neural response patterns in primate primaryvisual cortex (V1) is strongly correlated with its robustness to imperceptibly small adversarial attacksThat is the more biological a CNNrsquos V1 is the more adversarially robust it is

Inspired by this we developed VOneNets a new class of hybrid CNNs containing a biologically-constrained neural network that simulates primate V1 as the front-end followed by an off-the-shelfCNN back-end trained using standard methods The V1 front-end VOneBlock is based on theclassical neuroscientific linear-nonlinear-Poisson (LNP) model consisting of a fixed-weight Gaborfilter bank (GFB) simple and complex cell nonlinearities and neuronal stochasticity The VOneBlockoutperforms all standard ImageNet trained CNNs we tested at explaining V1 responses to naturalistictextures and noise samples After training VOneNets retain high ImageNet performance but aresubstantially more robust than their corresponding base models and compete with state-of-the-artdefense methods on a conglomerate benchmark covering a variety of adversarial images and commonimage corruptions Importantly these benefits transfer across different architectures includingResNet50 [4] AlexNet [1] and CORnet-S [32] We dissect the VOneBlock showing that allproperties work in synergy to improve robustness and that specific aspects of VOneBlock circuitryoffer robustness to different perturbation types Notably we find that neuronal stochasticity playsa large role in the white box adversarial robustness but that stochasticity alone is insufficient toexplain our resultsmdashneuronal stochasticity interacts supralinearly with the VOneBlock features todrive adversarial robustness Finally as a large percentage of this robustness remains even when weremove stochasticity only during the adversarial attack we conclude that training with stochasticityat the VOneBlock level leads the downstream layers to learn more robust representations

Model weights and code are available at httpsgithubcomdicarlolabvonenet

11 Related Work

V1 modeling Since the functional characterization of simple and complex cell responses by Hubeland Wiesel [45] modeling V1 responses has been an area of intense research Early approachesconsisting of hand-designed Gabor filters were successful in predicting V1 simple cell [46] andcomplex cell [47] responses to relatively simple stimuli These models were later improved withthe addition of further nonlinear operations such as normalization and pooling to account forextra-classical functional properties [48 49] Early hierarchical models of object recognition whichincorporated these type of V1 models as their early layers were used with some success in modelingneuronal responses in the ventral stream and object recognition behavior [50 51] Generalized LNPmodels expanded the V1 model class by allowing a set of fitted excitatory and suppressive spatialfilters to be nonlinearly combined [52] and subunit models introduced two sequential linear-nonlinear(LN) stages for fitting V1 responses [53] Recently both task-optimized and data-fitted CNNs wereshown to narrowly beat a GFB model in predicting V1 responses further validating that multiple LNstages may be needed to capture the complexity of V1 computations [30]

Model robustness Much work has been devoted to increasing model robustness to adversarial attacks[54 17 15 55 56 40 57] and to a lesser extent common image corruptions [13 58 59] In thecase of adversarial perturbations the current state-of-the-art is adversarial training where a networkis explicitly trained to correctly classify adversarially perturbed images [17] Adversarial training

2

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

is computationally expensive [60 61] known to impact clean performance [62] and overfits tothe attack constraints it is trained on [63 64] Other defenses involve adding noise either duringtraining [59] inference [65 19] or both [15 66 67] In the case of stochasticity during inferenceAthalye et al demonstrated that fixing broken gradients or taking the expectation over randomnessoften dramatically reduces the effectiveness of the defense [68] In a promising demonstration thatbiological constraints can increase CNN robustness Li et al showed that biasing a neural networkrsquosrepresentations towards those of the mouse V1 increases the robustness of grey-scale CIFAR [69]trained neural networks to both noise and white box adversarial attacks [40]

2 Adversarial Robustness Correlates with V1 Explained Variance

The susceptibility of current CNNs to be fooled by imperceptibly small adversarial perturbationssuggests that these CNNs rely on some visual features not used by the primate visual system Aremodels that better explain neural responses in the macaque V1 more robust to adversarial attacksWe analyzed an array of publicly available neural networks with standard ImageNet training [70]including AlexNet [1] VGG [3] ResNet [4] ResNeXt [71] DenseNet [72] SqueezeNet [73]ShuffleNet [74] and MnasNet [75] as well as several ResNet50 models with specialized trainingroutines such as adversarial training with Linfin (δinfin = 4255 and δinfin = 8255) and L2

(δ2 = 3) constraints [76] and adversarial noise combined with Stylized ImageNet training [59]

For each model we evaluated how well it explained the responses of single V1 neurons evoked bygiven images using a standard neural predictivity methodology based on partial least square regression(PLS) [31 32] We used a neural dataset with 102 neurons and 450 different 4deg images consistingof naturalistic textures and noise samples [77] Explained variance was measured using a 10-foldcross-validation strategy Similarly to other studies we considered the field-of-view of all modelsto span 8deg [31 30] (see Supplementary Section A for more details) To evaluate the adversarialrobustness of each model in the pool we used untargeted projected gradient descent (PGD) [17]an iterative gradient-based white box adversarial attack with Linfin L2 and L1 norm constraints ofδinfin = 11020 δ2 = 015 and δ1 = 40 respectively For each norm the attack strength wascalibrated to drive variance in performance amongst non-adversarially trained models resulting inperturbations well below the level of perceptibility (see Supplementary Section B1 for more details)

We found that accuracy under these white box adversarial attacks has a strong positive correlationwith V1 explained variance (Fig 1 r=085 p=21E-9 n=30) Notably adversarially trained ResNet50models [76] which were hardly affected by these attacks explained more variance in V1 neuralactivations than any non-adversarially trained neural network The correlation was not driven bythe CNNsrsquo clean ImageNet performance since it was even more pronounced when the white boxaccuracy was normalized by the clean accuracy (r=094 p=74E-15 n=30) and was also presentwhen removing the adversarially trained models (r=073 p=178E-5 n=27) While increasingthe perturbation strength rapidly decreases the accuracy of non-adversarially trained models thedescribed correlation was present for a wide range of attack strengths when the perturbation wasmultiplied by a factor of 4 greatly reducing the variance of non-adversarially trained models whitebox accuracy was still significantly correlated with explained variance in V1 (r=082 p=117E-8n=30)

024 032 040V1 explained variance

02

04

06

Whi

te b

ox a

ccur

acy

Non-adv trainedAdv trainedVOneResNet50

Figure 1 CNNsrsquo robustness to white box attacks corre-lates with explained response variance in primate V1Comparison of top-1 accuracy under white box attacks oflow perturbation strengths (average of 3 PGD constraintsδinfin = 11020 δ2 = 015 and δ1 = 40) againstfraction of explained variance of V1 responses (usingPLS regression) for a pool of CNN models Perturba-tion strength was chosen to drive variance across modelperformance White box accuracy and V1 explained vari-ance are significantly correlated (r=085 p=21E-9 n=30CNNs linear fit shown in gray line) Gray circles non-adversarially trained trained CNNs (n=27) red circlesadversarially trained ResNet50 models (n=3) blue circleVOneResNet50 (not included in correlation)

3

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

3 VOneNet a Hybrid CNN with a V1 Neural Network Front-End

Inspired by the strong correlation between V1 explained variance and robustness to white boxattacks we developed the VOneNet architecture The major characteristic that sets the VOneNetarchitecture apart is its V1 front-end the VOneBlock While most of its layers have parameters learnedduring ImageNet training VOneNetrsquos first block is a fixed-weight mathematically parameterizedCNN model that approximates primate neural processing of images up to and including area V1Importantly the VOneBlock components are mapped to specific neuronal populations in V1 whichcan be independently manipulated or switched off completely to evaluate their functional role Finallythe VOneBlock can be easily adapted to different CNN base architectures as described below Herewe build VOneNets from three base architectures ResNet50 [4] CORnet-S [32] and AlexNet [1]

A VOneNet consists of the VOneBlock and a back-end network adapted from a base CNN (Fig2) When building a VOneNet from a particular CNN we replace its first block (typically onestack of convolutional normalization nonlinear and pooling layers Supplementary Table C3)by the VOneBlock and a trained transition layer The VOneBlock matches the replaced blockrsquosspatial map dimensions (56 times 56 for the base CNNs considered) but can have more channels (forthe standard model we chose CV 1 = 512 see Supplementary Fig C2 for an analysis of how thenumber of channels affects the results) It is followed by the transition layer a 1 times 1 convolutionthat acts as a bottleneck to compress the higher channel number to the original blockrsquos depth TheVOneBlock is inspired by the LNP model of V1 [52] consisting of three consecutive processingstagesmdashconvolution nonlinearity and stochasticity generatormdashwith two distinct neuronal typesmdashsimple and complex cellsmdasheach with a certain number of units per spatial location (standard modelwith SCV 1 = CCV 1 = 256) The following paragraphs describe the main components of theVOneBlock (see Supplementary Section C for a more detailed description)

Biologically-constrained Gabor filter bank The first layer of the VOneBlock is a mathematicallyparameterized GFB with parameters tuned to approximate empirical primate V1 neural responsedata It convolves the RGB input images with Gabor filters of multiple orientations sizes shapesand spatial frequencies in a 56times 56 spatial map To instantiate a VOneBlock we randomly sampleCV 1 values for the GFB parameters according to empirically observed distributions of preferredorientation peak spatial frequency and sizeshape of receptive fields [78 79 80] The VOneBlockkeeps color processing separate with each Gabor filter convolving a single color channel from theinput image The resulting set of spatial filters is considerably more heterogeneous than those foundin the first layer of standard CNNs and better approximates the diversity of primate V1 receptivefields (Supplementary Fig C1)

Simple and complex cells While simple cells were once thought to be an intermediate step forcomputing complex cell responses it is now known that they form the majority of downstreamprojections to V2 [81] For this reason the VOneBlock nonlinear layer has two different nonlinearitiesthat are applied to each channel depending on its cell type a rectified linear transformation for simplecells and the spectral power of a quadrature phase-pair for complex cells

V1 stochasticity A defining property of neuronal responses is their stochasticitymdashrepeated measure-ments of a neuron in response to nominally identical visual inputs results in different spike trains Inawake monkeys the mean spike count (averaged over many repetitions) depends on the presentedimage and the spike train for each trial is approximately Poisson the spike count variance is equal tothe mean [82] To approximate this property of neuronal responses we add independent Gaussiannoise to each unit of the VOneBlock with variance equal to its activation Before doing this weapply an affine transformation to the unitsrsquo activations so that both the mean stimulus response andthe mean baseline activity are the same as those of a population of primate V1 neurons measured ina 25ms time-window (mean stimulus response and spontaneous activity of 0324 and 0073 spikesrespectively see Supplementary Table C2 and Fig C2 for other time-windows) Like in the brainthe stochasticity of the VOneBlock is always on during both training and inference

V1 explained variance The VOneBlock was not developed to compete with state-of-the-art data-fitting models of V1 [53 30] Instead we used available empirical distributions to constrain a GFBmodel generating a neuronal space that approximates that of primate V1 Despite its simplicity theVOneBlock outperformed all tested standard ImageNet trained CNNs in explaining responses in theV1 dataset used and with an explained variance of 0375plusmn0006 came right in the middle of the rangeof the adversarially trained CNNs (Fig 1 blue circle) On the surface it seems that these results are

4

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

at odds with Cadena et al which showed that a task-optimized CNN marginally outperformed a GFBmodel in explaining V1 responses [30] However our GFB has parameters constrained by empiricaldata resulting in a better model of V1 when we use the parameters of the GFB in Cadena et al weobtained a much lower explained variance (0296plusmn0005) Our results suggest that a well-tuned GFBand adversarially trained CNNs are currently the best models in predicting primate V1 responses

4 Results

41 Simulating a V1 at the front of CNNs improves robustness to white box attacks

To evaluate the VOneNetsrsquo robustness to whitebox adversarial attacks we used untargeted PGDwith Linfin L2 and L1 norm constraints of δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] All VOneNets were attacked end-to-end with gradients propagated through the VOneBlockto the pixels Because the VOneBlock is stochastic we took extra care when attacking VOneNetsWe use the reparameterization trick when sampling VOneBlock stochasticity allowing us to keep thegradients intact [83] Further to combat the effects of noise in the gradients we adapted our attacksuch that at every PGD iteration we take 10 gradient samples and move in the average directionto increase the loss [68] (Supplementary Table B1) To verify the effectiveness of our attacks weused controls suggested by Carlini et al 2019 [84] in particular we show that increasing themagnitude of the norm of the attack monotonically decreases VOneNet accuracy eventually reaching0 accuracy (Supplementary Fig B1) Additionally increasing the number of PGD steps from 1 to

VOneBlock (512x56x56)

Standard CNN layers

Conv layer Non-linear layer Stochastic layerSimple

A B

Complex

x 256

x 256

00

03

06

Whi

te b

ox a

ccur

acy

(low

stre

ngth

)

ResN

et50

VOne

ResN

et50

CORn

etSVO

neCO

RnetS

Alex

Net

VOne

Alex

Net

00

03

06

Whi

te b

ox a

ccur

acy

(hig

h st

reng

th)

+

+

Figure 2 Simulating V1 at the front of CNNs improves robustness to white box attacks AA VOneNet consists of a model of primate V1 the VOneBlock followed by a standard CNNarchitecture The VOneBlock contains a convolutional layer (a GFB with fixed weights constrainedby empirical data) a nonlinear layer (simple or complex cell nonlinearities) and a stochastic layer(V1 stochasticity generator with variance equal to mean) B Top comparison of accuracy (top-1) onlow strength white box attacks (PGD with δinfin = 11020 δ2 = 015 δ1 = 40) between threebase models (ResNet50 CORnet-S and AlexNet) and their correspoding VOneNets Gray bars showthe performance of base models Blue bars show the improvements from the VOneBlock Dashedlines indicate the performance on clean images Bottom same but for white box attacks of higherstrength (PGD with δinfin = 1255 δ2 = 06 δ1 = 160) VOneNets show consistently higherwhite box accuracy even for perturbation strengths that reduce the performance of the base models tonearly chance Error-bars represent SD (n=3 seeds)

5

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 2: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

attacks [6 7 8 9 10] Further they exhibit a surprising failure to recognize objects in imagescorrupted with different noise patterns that humans have no trouble with [11 12 13] This remarkablefragility to image perturbations has received much attention in the machine learning communityoften from the perspective of safety in real-world deployment of computer vision systems [14 1516 17 18 19 20 21 22] As these perturbations generally have no perceptual alignment with theobject class [23] the failures suggest that current CNNs obtained through task-optimization end uprelying on visual features that are not all the same as those used by humans [24 25] Despite theselimitations some CNNs have achieved unparalleled success in partially explaining neural responsesat multiple stages of the primate ventral stream the set of cortical regions underlying primate visualobject recognition [26 27 28 29 30 31 32]

How can we develop CNNs that robustly generalize like human vision Incorporating biologicalconstraints into CNNs to make them behave more in line with primate vision is an active field ofresearch [33 34 35 36 37 38 39 40 41 42 43 32] Still no neurobiological prior has beenshown to considerably improve CNN robustness to both adversarial attacks and image corruptions inchallenging real-world tasks such as ImageNet [44] Here we build on this line of work starting withthe observation that the ability of each CNN to explain neural response patterns in primate primaryvisual cortex (V1) is strongly correlated with its robustness to imperceptibly small adversarial attacksThat is the more biological a CNNrsquos V1 is the more adversarially robust it is

Inspired by this we developed VOneNets a new class of hybrid CNNs containing a biologically-constrained neural network that simulates primate V1 as the front-end followed by an off-the-shelfCNN back-end trained using standard methods The V1 front-end VOneBlock is based on theclassical neuroscientific linear-nonlinear-Poisson (LNP) model consisting of a fixed-weight Gaborfilter bank (GFB) simple and complex cell nonlinearities and neuronal stochasticity The VOneBlockoutperforms all standard ImageNet trained CNNs we tested at explaining V1 responses to naturalistictextures and noise samples After training VOneNets retain high ImageNet performance but aresubstantially more robust than their corresponding base models and compete with state-of-the-artdefense methods on a conglomerate benchmark covering a variety of adversarial images and commonimage corruptions Importantly these benefits transfer across different architectures includingResNet50 [4] AlexNet [1] and CORnet-S [32] We dissect the VOneBlock showing that allproperties work in synergy to improve robustness and that specific aspects of VOneBlock circuitryoffer robustness to different perturbation types Notably we find that neuronal stochasticity playsa large role in the white box adversarial robustness but that stochasticity alone is insufficient toexplain our resultsmdashneuronal stochasticity interacts supralinearly with the VOneBlock features todrive adversarial robustness Finally as a large percentage of this robustness remains even when weremove stochasticity only during the adversarial attack we conclude that training with stochasticityat the VOneBlock level leads the downstream layers to learn more robust representations

Model weights and code are available at httpsgithubcomdicarlolabvonenet

11 Related Work

V1 modeling Since the functional characterization of simple and complex cell responses by Hubeland Wiesel [45] modeling V1 responses has been an area of intense research Early approachesconsisting of hand-designed Gabor filters were successful in predicting V1 simple cell [46] andcomplex cell [47] responses to relatively simple stimuli These models were later improved withthe addition of further nonlinear operations such as normalization and pooling to account forextra-classical functional properties [48 49] Early hierarchical models of object recognition whichincorporated these type of V1 models as their early layers were used with some success in modelingneuronal responses in the ventral stream and object recognition behavior [50 51] Generalized LNPmodels expanded the V1 model class by allowing a set of fitted excitatory and suppressive spatialfilters to be nonlinearly combined [52] and subunit models introduced two sequential linear-nonlinear(LN) stages for fitting V1 responses [53] Recently both task-optimized and data-fitted CNNs wereshown to narrowly beat a GFB model in predicting V1 responses further validating that multiple LNstages may be needed to capture the complexity of V1 computations [30]

Model robustness Much work has been devoted to increasing model robustness to adversarial attacks[54 17 15 55 56 40 57] and to a lesser extent common image corruptions [13 58 59] In thecase of adversarial perturbations the current state-of-the-art is adversarial training where a networkis explicitly trained to correctly classify adversarially perturbed images [17] Adversarial training

2

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

is computationally expensive [60 61] known to impact clean performance [62] and overfits tothe attack constraints it is trained on [63 64] Other defenses involve adding noise either duringtraining [59] inference [65 19] or both [15 66 67] In the case of stochasticity during inferenceAthalye et al demonstrated that fixing broken gradients or taking the expectation over randomnessoften dramatically reduces the effectiveness of the defense [68] In a promising demonstration thatbiological constraints can increase CNN robustness Li et al showed that biasing a neural networkrsquosrepresentations towards those of the mouse V1 increases the robustness of grey-scale CIFAR [69]trained neural networks to both noise and white box adversarial attacks [40]

2 Adversarial Robustness Correlates with V1 Explained Variance

The susceptibility of current CNNs to be fooled by imperceptibly small adversarial perturbationssuggests that these CNNs rely on some visual features not used by the primate visual system Aremodels that better explain neural responses in the macaque V1 more robust to adversarial attacksWe analyzed an array of publicly available neural networks with standard ImageNet training [70]including AlexNet [1] VGG [3] ResNet [4] ResNeXt [71] DenseNet [72] SqueezeNet [73]ShuffleNet [74] and MnasNet [75] as well as several ResNet50 models with specialized trainingroutines such as adversarial training with Linfin (δinfin = 4255 and δinfin = 8255) and L2

(δ2 = 3) constraints [76] and adversarial noise combined with Stylized ImageNet training [59]

For each model we evaluated how well it explained the responses of single V1 neurons evoked bygiven images using a standard neural predictivity methodology based on partial least square regression(PLS) [31 32] We used a neural dataset with 102 neurons and 450 different 4deg images consistingof naturalistic textures and noise samples [77] Explained variance was measured using a 10-foldcross-validation strategy Similarly to other studies we considered the field-of-view of all modelsto span 8deg [31 30] (see Supplementary Section A for more details) To evaluate the adversarialrobustness of each model in the pool we used untargeted projected gradient descent (PGD) [17]an iterative gradient-based white box adversarial attack with Linfin L2 and L1 norm constraints ofδinfin = 11020 δ2 = 015 and δ1 = 40 respectively For each norm the attack strength wascalibrated to drive variance in performance amongst non-adversarially trained models resulting inperturbations well below the level of perceptibility (see Supplementary Section B1 for more details)

We found that accuracy under these white box adversarial attacks has a strong positive correlationwith V1 explained variance (Fig 1 r=085 p=21E-9 n=30) Notably adversarially trained ResNet50models [76] which were hardly affected by these attacks explained more variance in V1 neuralactivations than any non-adversarially trained neural network The correlation was not driven bythe CNNsrsquo clean ImageNet performance since it was even more pronounced when the white boxaccuracy was normalized by the clean accuracy (r=094 p=74E-15 n=30) and was also presentwhen removing the adversarially trained models (r=073 p=178E-5 n=27) While increasingthe perturbation strength rapidly decreases the accuracy of non-adversarially trained models thedescribed correlation was present for a wide range of attack strengths when the perturbation wasmultiplied by a factor of 4 greatly reducing the variance of non-adversarially trained models whitebox accuracy was still significantly correlated with explained variance in V1 (r=082 p=117E-8n=30)

024 032 040V1 explained variance

02

04

06

Whi

te b

ox a

ccur

acy

Non-adv trainedAdv trainedVOneResNet50

Figure 1 CNNsrsquo robustness to white box attacks corre-lates with explained response variance in primate V1Comparison of top-1 accuracy under white box attacks oflow perturbation strengths (average of 3 PGD constraintsδinfin = 11020 δ2 = 015 and δ1 = 40) againstfraction of explained variance of V1 responses (usingPLS regression) for a pool of CNN models Perturba-tion strength was chosen to drive variance across modelperformance White box accuracy and V1 explained vari-ance are significantly correlated (r=085 p=21E-9 n=30CNNs linear fit shown in gray line) Gray circles non-adversarially trained trained CNNs (n=27) red circlesadversarially trained ResNet50 models (n=3) blue circleVOneResNet50 (not included in correlation)

3

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

3 VOneNet a Hybrid CNN with a V1 Neural Network Front-End

Inspired by the strong correlation between V1 explained variance and robustness to white boxattacks we developed the VOneNet architecture The major characteristic that sets the VOneNetarchitecture apart is its V1 front-end the VOneBlock While most of its layers have parameters learnedduring ImageNet training VOneNetrsquos first block is a fixed-weight mathematically parameterizedCNN model that approximates primate neural processing of images up to and including area V1Importantly the VOneBlock components are mapped to specific neuronal populations in V1 whichcan be independently manipulated or switched off completely to evaluate their functional role Finallythe VOneBlock can be easily adapted to different CNN base architectures as described below Herewe build VOneNets from three base architectures ResNet50 [4] CORnet-S [32] and AlexNet [1]

A VOneNet consists of the VOneBlock and a back-end network adapted from a base CNN (Fig2) When building a VOneNet from a particular CNN we replace its first block (typically onestack of convolutional normalization nonlinear and pooling layers Supplementary Table C3)by the VOneBlock and a trained transition layer The VOneBlock matches the replaced blockrsquosspatial map dimensions (56 times 56 for the base CNNs considered) but can have more channels (forthe standard model we chose CV 1 = 512 see Supplementary Fig C2 for an analysis of how thenumber of channels affects the results) It is followed by the transition layer a 1 times 1 convolutionthat acts as a bottleneck to compress the higher channel number to the original blockrsquos depth TheVOneBlock is inspired by the LNP model of V1 [52] consisting of three consecutive processingstagesmdashconvolution nonlinearity and stochasticity generatormdashwith two distinct neuronal typesmdashsimple and complex cellsmdasheach with a certain number of units per spatial location (standard modelwith SCV 1 = CCV 1 = 256) The following paragraphs describe the main components of theVOneBlock (see Supplementary Section C for a more detailed description)

Biologically-constrained Gabor filter bank The first layer of the VOneBlock is a mathematicallyparameterized GFB with parameters tuned to approximate empirical primate V1 neural responsedata It convolves the RGB input images with Gabor filters of multiple orientations sizes shapesand spatial frequencies in a 56times 56 spatial map To instantiate a VOneBlock we randomly sampleCV 1 values for the GFB parameters according to empirically observed distributions of preferredorientation peak spatial frequency and sizeshape of receptive fields [78 79 80] The VOneBlockkeeps color processing separate with each Gabor filter convolving a single color channel from theinput image The resulting set of spatial filters is considerably more heterogeneous than those foundin the first layer of standard CNNs and better approximates the diversity of primate V1 receptivefields (Supplementary Fig C1)

Simple and complex cells While simple cells were once thought to be an intermediate step forcomputing complex cell responses it is now known that they form the majority of downstreamprojections to V2 [81] For this reason the VOneBlock nonlinear layer has two different nonlinearitiesthat are applied to each channel depending on its cell type a rectified linear transformation for simplecells and the spectral power of a quadrature phase-pair for complex cells

V1 stochasticity A defining property of neuronal responses is their stochasticitymdashrepeated measure-ments of a neuron in response to nominally identical visual inputs results in different spike trains Inawake monkeys the mean spike count (averaged over many repetitions) depends on the presentedimage and the spike train for each trial is approximately Poisson the spike count variance is equal tothe mean [82] To approximate this property of neuronal responses we add independent Gaussiannoise to each unit of the VOneBlock with variance equal to its activation Before doing this weapply an affine transformation to the unitsrsquo activations so that both the mean stimulus response andthe mean baseline activity are the same as those of a population of primate V1 neurons measured ina 25ms time-window (mean stimulus response and spontaneous activity of 0324 and 0073 spikesrespectively see Supplementary Table C2 and Fig C2 for other time-windows) Like in the brainthe stochasticity of the VOneBlock is always on during both training and inference

V1 explained variance The VOneBlock was not developed to compete with state-of-the-art data-fitting models of V1 [53 30] Instead we used available empirical distributions to constrain a GFBmodel generating a neuronal space that approximates that of primate V1 Despite its simplicity theVOneBlock outperformed all tested standard ImageNet trained CNNs in explaining responses in theV1 dataset used and with an explained variance of 0375plusmn0006 came right in the middle of the rangeof the adversarially trained CNNs (Fig 1 blue circle) On the surface it seems that these results are

4

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

at odds with Cadena et al which showed that a task-optimized CNN marginally outperformed a GFBmodel in explaining V1 responses [30] However our GFB has parameters constrained by empiricaldata resulting in a better model of V1 when we use the parameters of the GFB in Cadena et al weobtained a much lower explained variance (0296plusmn0005) Our results suggest that a well-tuned GFBand adversarially trained CNNs are currently the best models in predicting primate V1 responses

4 Results

41 Simulating a V1 at the front of CNNs improves robustness to white box attacks

To evaluate the VOneNetsrsquo robustness to whitebox adversarial attacks we used untargeted PGDwith Linfin L2 and L1 norm constraints of δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] All VOneNets were attacked end-to-end with gradients propagated through the VOneBlockto the pixels Because the VOneBlock is stochastic we took extra care when attacking VOneNetsWe use the reparameterization trick when sampling VOneBlock stochasticity allowing us to keep thegradients intact [83] Further to combat the effects of noise in the gradients we adapted our attacksuch that at every PGD iteration we take 10 gradient samples and move in the average directionto increase the loss [68] (Supplementary Table B1) To verify the effectiveness of our attacks weused controls suggested by Carlini et al 2019 [84] in particular we show that increasing themagnitude of the norm of the attack monotonically decreases VOneNet accuracy eventually reaching0 accuracy (Supplementary Fig B1) Additionally increasing the number of PGD steps from 1 to

VOneBlock (512x56x56)

Standard CNN layers

Conv layer Non-linear layer Stochastic layerSimple

A B

Complex

x 256

x 256

00

03

06

Whi

te b

ox a

ccur

acy

(low

stre

ngth

)

ResN

et50

VOne

ResN

et50

CORn

etSVO

neCO

RnetS

Alex

Net

VOne

Alex

Net

00

03

06

Whi

te b

ox a

ccur

acy

(hig

h st

reng

th)

+

+

Figure 2 Simulating V1 at the front of CNNs improves robustness to white box attacks AA VOneNet consists of a model of primate V1 the VOneBlock followed by a standard CNNarchitecture The VOneBlock contains a convolutional layer (a GFB with fixed weights constrainedby empirical data) a nonlinear layer (simple or complex cell nonlinearities) and a stochastic layer(V1 stochasticity generator with variance equal to mean) B Top comparison of accuracy (top-1) onlow strength white box attacks (PGD with δinfin = 11020 δ2 = 015 δ1 = 40) between threebase models (ResNet50 CORnet-S and AlexNet) and their correspoding VOneNets Gray bars showthe performance of base models Blue bars show the improvements from the VOneBlock Dashedlines indicate the performance on clean images Bottom same but for white box attacks of higherstrength (PGD with δinfin = 1255 δ2 = 06 δ1 = 160) VOneNets show consistently higherwhite box accuracy even for perturbation strengths that reduce the performance of the base models tonearly chance Error-bars represent SD (n=3 seeds)

5

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 3: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

is computationally expensive [60 61] known to impact clean performance [62] and overfits tothe attack constraints it is trained on [63 64] Other defenses involve adding noise either duringtraining [59] inference [65 19] or both [15 66 67] In the case of stochasticity during inferenceAthalye et al demonstrated that fixing broken gradients or taking the expectation over randomnessoften dramatically reduces the effectiveness of the defense [68] In a promising demonstration thatbiological constraints can increase CNN robustness Li et al showed that biasing a neural networkrsquosrepresentations towards those of the mouse V1 increases the robustness of grey-scale CIFAR [69]trained neural networks to both noise and white box adversarial attacks [40]

2 Adversarial Robustness Correlates with V1 Explained Variance

The susceptibility of current CNNs to be fooled by imperceptibly small adversarial perturbationssuggests that these CNNs rely on some visual features not used by the primate visual system Aremodels that better explain neural responses in the macaque V1 more robust to adversarial attacksWe analyzed an array of publicly available neural networks with standard ImageNet training [70]including AlexNet [1] VGG [3] ResNet [4] ResNeXt [71] DenseNet [72] SqueezeNet [73]ShuffleNet [74] and MnasNet [75] as well as several ResNet50 models with specialized trainingroutines such as adversarial training with Linfin (δinfin = 4255 and δinfin = 8255) and L2

(δ2 = 3) constraints [76] and adversarial noise combined with Stylized ImageNet training [59]

For each model we evaluated how well it explained the responses of single V1 neurons evoked bygiven images using a standard neural predictivity methodology based on partial least square regression(PLS) [31 32] We used a neural dataset with 102 neurons and 450 different 4deg images consistingof naturalistic textures and noise samples [77] Explained variance was measured using a 10-foldcross-validation strategy Similarly to other studies we considered the field-of-view of all modelsto span 8deg [31 30] (see Supplementary Section A for more details) To evaluate the adversarialrobustness of each model in the pool we used untargeted projected gradient descent (PGD) [17]an iterative gradient-based white box adversarial attack with Linfin L2 and L1 norm constraints ofδinfin = 11020 δ2 = 015 and δ1 = 40 respectively For each norm the attack strength wascalibrated to drive variance in performance amongst non-adversarially trained models resulting inperturbations well below the level of perceptibility (see Supplementary Section B1 for more details)

We found that accuracy under these white box adversarial attacks has a strong positive correlationwith V1 explained variance (Fig 1 r=085 p=21E-9 n=30) Notably adversarially trained ResNet50models [76] which were hardly affected by these attacks explained more variance in V1 neuralactivations than any non-adversarially trained neural network The correlation was not driven bythe CNNsrsquo clean ImageNet performance since it was even more pronounced when the white boxaccuracy was normalized by the clean accuracy (r=094 p=74E-15 n=30) and was also presentwhen removing the adversarially trained models (r=073 p=178E-5 n=27) While increasingthe perturbation strength rapidly decreases the accuracy of non-adversarially trained models thedescribed correlation was present for a wide range of attack strengths when the perturbation wasmultiplied by a factor of 4 greatly reducing the variance of non-adversarially trained models whitebox accuracy was still significantly correlated with explained variance in V1 (r=082 p=117E-8n=30)

024 032 040V1 explained variance

02

04

06

Whi

te b

ox a

ccur

acy

Non-adv trainedAdv trainedVOneResNet50

Figure 1 CNNsrsquo robustness to white box attacks corre-lates with explained response variance in primate V1Comparison of top-1 accuracy under white box attacks oflow perturbation strengths (average of 3 PGD constraintsδinfin = 11020 δ2 = 015 and δ1 = 40) againstfraction of explained variance of V1 responses (usingPLS regression) for a pool of CNN models Perturba-tion strength was chosen to drive variance across modelperformance White box accuracy and V1 explained vari-ance are significantly correlated (r=085 p=21E-9 n=30CNNs linear fit shown in gray line) Gray circles non-adversarially trained trained CNNs (n=27) red circlesadversarially trained ResNet50 models (n=3) blue circleVOneResNet50 (not included in correlation)

3

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

3 VOneNet a Hybrid CNN with a V1 Neural Network Front-End

Inspired by the strong correlation between V1 explained variance and robustness to white boxattacks we developed the VOneNet architecture The major characteristic that sets the VOneNetarchitecture apart is its V1 front-end the VOneBlock While most of its layers have parameters learnedduring ImageNet training VOneNetrsquos first block is a fixed-weight mathematically parameterizedCNN model that approximates primate neural processing of images up to and including area V1Importantly the VOneBlock components are mapped to specific neuronal populations in V1 whichcan be independently manipulated or switched off completely to evaluate their functional role Finallythe VOneBlock can be easily adapted to different CNN base architectures as described below Herewe build VOneNets from three base architectures ResNet50 [4] CORnet-S [32] and AlexNet [1]

A VOneNet consists of the VOneBlock and a back-end network adapted from a base CNN (Fig2) When building a VOneNet from a particular CNN we replace its first block (typically onestack of convolutional normalization nonlinear and pooling layers Supplementary Table C3)by the VOneBlock and a trained transition layer The VOneBlock matches the replaced blockrsquosspatial map dimensions (56 times 56 for the base CNNs considered) but can have more channels (forthe standard model we chose CV 1 = 512 see Supplementary Fig C2 for an analysis of how thenumber of channels affects the results) It is followed by the transition layer a 1 times 1 convolutionthat acts as a bottleneck to compress the higher channel number to the original blockrsquos depth TheVOneBlock is inspired by the LNP model of V1 [52] consisting of three consecutive processingstagesmdashconvolution nonlinearity and stochasticity generatormdashwith two distinct neuronal typesmdashsimple and complex cellsmdasheach with a certain number of units per spatial location (standard modelwith SCV 1 = CCV 1 = 256) The following paragraphs describe the main components of theVOneBlock (see Supplementary Section C for a more detailed description)

Biologically-constrained Gabor filter bank The first layer of the VOneBlock is a mathematicallyparameterized GFB with parameters tuned to approximate empirical primate V1 neural responsedata It convolves the RGB input images with Gabor filters of multiple orientations sizes shapesand spatial frequencies in a 56times 56 spatial map To instantiate a VOneBlock we randomly sampleCV 1 values for the GFB parameters according to empirically observed distributions of preferredorientation peak spatial frequency and sizeshape of receptive fields [78 79 80] The VOneBlockkeeps color processing separate with each Gabor filter convolving a single color channel from theinput image The resulting set of spatial filters is considerably more heterogeneous than those foundin the first layer of standard CNNs and better approximates the diversity of primate V1 receptivefields (Supplementary Fig C1)

Simple and complex cells While simple cells were once thought to be an intermediate step forcomputing complex cell responses it is now known that they form the majority of downstreamprojections to V2 [81] For this reason the VOneBlock nonlinear layer has two different nonlinearitiesthat are applied to each channel depending on its cell type a rectified linear transformation for simplecells and the spectral power of a quadrature phase-pair for complex cells

V1 stochasticity A defining property of neuronal responses is their stochasticitymdashrepeated measure-ments of a neuron in response to nominally identical visual inputs results in different spike trains Inawake monkeys the mean spike count (averaged over many repetitions) depends on the presentedimage and the spike train for each trial is approximately Poisson the spike count variance is equal tothe mean [82] To approximate this property of neuronal responses we add independent Gaussiannoise to each unit of the VOneBlock with variance equal to its activation Before doing this weapply an affine transformation to the unitsrsquo activations so that both the mean stimulus response andthe mean baseline activity are the same as those of a population of primate V1 neurons measured ina 25ms time-window (mean stimulus response and spontaneous activity of 0324 and 0073 spikesrespectively see Supplementary Table C2 and Fig C2 for other time-windows) Like in the brainthe stochasticity of the VOneBlock is always on during both training and inference

V1 explained variance The VOneBlock was not developed to compete with state-of-the-art data-fitting models of V1 [53 30] Instead we used available empirical distributions to constrain a GFBmodel generating a neuronal space that approximates that of primate V1 Despite its simplicity theVOneBlock outperformed all tested standard ImageNet trained CNNs in explaining responses in theV1 dataset used and with an explained variance of 0375plusmn0006 came right in the middle of the rangeof the adversarially trained CNNs (Fig 1 blue circle) On the surface it seems that these results are

4

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

at odds with Cadena et al which showed that a task-optimized CNN marginally outperformed a GFBmodel in explaining V1 responses [30] However our GFB has parameters constrained by empiricaldata resulting in a better model of V1 when we use the parameters of the GFB in Cadena et al weobtained a much lower explained variance (0296plusmn0005) Our results suggest that a well-tuned GFBand adversarially trained CNNs are currently the best models in predicting primate V1 responses

4 Results

41 Simulating a V1 at the front of CNNs improves robustness to white box attacks

To evaluate the VOneNetsrsquo robustness to whitebox adversarial attacks we used untargeted PGDwith Linfin L2 and L1 norm constraints of δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] All VOneNets were attacked end-to-end with gradients propagated through the VOneBlockto the pixels Because the VOneBlock is stochastic we took extra care when attacking VOneNetsWe use the reparameterization trick when sampling VOneBlock stochasticity allowing us to keep thegradients intact [83] Further to combat the effects of noise in the gradients we adapted our attacksuch that at every PGD iteration we take 10 gradient samples and move in the average directionto increase the loss [68] (Supplementary Table B1) To verify the effectiveness of our attacks weused controls suggested by Carlini et al 2019 [84] in particular we show that increasing themagnitude of the norm of the attack monotonically decreases VOneNet accuracy eventually reaching0 accuracy (Supplementary Fig B1) Additionally increasing the number of PGD steps from 1 to

VOneBlock (512x56x56)

Standard CNN layers

Conv layer Non-linear layer Stochastic layerSimple

A B

Complex

x 256

x 256

00

03

06

Whi

te b

ox a

ccur

acy

(low

stre

ngth

)

ResN

et50

VOne

ResN

et50

CORn

etSVO

neCO

RnetS

Alex

Net

VOne

Alex

Net

00

03

06

Whi

te b

ox a

ccur

acy

(hig

h st

reng

th)

+

+

Figure 2 Simulating V1 at the front of CNNs improves robustness to white box attacks AA VOneNet consists of a model of primate V1 the VOneBlock followed by a standard CNNarchitecture The VOneBlock contains a convolutional layer (a GFB with fixed weights constrainedby empirical data) a nonlinear layer (simple or complex cell nonlinearities) and a stochastic layer(V1 stochasticity generator with variance equal to mean) B Top comparison of accuracy (top-1) onlow strength white box attacks (PGD with δinfin = 11020 δ2 = 015 δ1 = 40) between threebase models (ResNet50 CORnet-S and AlexNet) and their correspoding VOneNets Gray bars showthe performance of base models Blue bars show the improvements from the VOneBlock Dashedlines indicate the performance on clean images Bottom same but for white box attacks of higherstrength (PGD with δinfin = 1255 δ2 = 06 δ1 = 160) VOneNets show consistently higherwhite box accuracy even for perturbation strengths that reduce the performance of the base models tonearly chance Error-bars represent SD (n=3 seeds)

5

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 4: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

3 VOneNet a Hybrid CNN with a V1 Neural Network Front-End

Inspired by the strong correlation between V1 explained variance and robustness to white boxattacks we developed the VOneNet architecture The major characteristic that sets the VOneNetarchitecture apart is its V1 front-end the VOneBlock While most of its layers have parameters learnedduring ImageNet training VOneNetrsquos first block is a fixed-weight mathematically parameterizedCNN model that approximates primate neural processing of images up to and including area V1Importantly the VOneBlock components are mapped to specific neuronal populations in V1 whichcan be independently manipulated or switched off completely to evaluate their functional role Finallythe VOneBlock can be easily adapted to different CNN base architectures as described below Herewe build VOneNets from three base architectures ResNet50 [4] CORnet-S [32] and AlexNet [1]

A VOneNet consists of the VOneBlock and a back-end network adapted from a base CNN (Fig2) When building a VOneNet from a particular CNN we replace its first block (typically onestack of convolutional normalization nonlinear and pooling layers Supplementary Table C3)by the VOneBlock and a trained transition layer The VOneBlock matches the replaced blockrsquosspatial map dimensions (56 times 56 for the base CNNs considered) but can have more channels (forthe standard model we chose CV 1 = 512 see Supplementary Fig C2 for an analysis of how thenumber of channels affects the results) It is followed by the transition layer a 1 times 1 convolutionthat acts as a bottleneck to compress the higher channel number to the original blockrsquos depth TheVOneBlock is inspired by the LNP model of V1 [52] consisting of three consecutive processingstagesmdashconvolution nonlinearity and stochasticity generatormdashwith two distinct neuronal typesmdashsimple and complex cellsmdasheach with a certain number of units per spatial location (standard modelwith SCV 1 = CCV 1 = 256) The following paragraphs describe the main components of theVOneBlock (see Supplementary Section C for a more detailed description)

Biologically-constrained Gabor filter bank The first layer of the VOneBlock is a mathematicallyparameterized GFB with parameters tuned to approximate empirical primate V1 neural responsedata It convolves the RGB input images with Gabor filters of multiple orientations sizes shapesand spatial frequencies in a 56times 56 spatial map To instantiate a VOneBlock we randomly sampleCV 1 values for the GFB parameters according to empirically observed distributions of preferredorientation peak spatial frequency and sizeshape of receptive fields [78 79 80] The VOneBlockkeeps color processing separate with each Gabor filter convolving a single color channel from theinput image The resulting set of spatial filters is considerably more heterogeneous than those foundin the first layer of standard CNNs and better approximates the diversity of primate V1 receptivefields (Supplementary Fig C1)

Simple and complex cells While simple cells were once thought to be an intermediate step forcomputing complex cell responses it is now known that they form the majority of downstreamprojections to V2 [81] For this reason the VOneBlock nonlinear layer has two different nonlinearitiesthat are applied to each channel depending on its cell type a rectified linear transformation for simplecells and the spectral power of a quadrature phase-pair for complex cells

V1 stochasticity A defining property of neuronal responses is their stochasticitymdashrepeated measure-ments of a neuron in response to nominally identical visual inputs results in different spike trains Inawake monkeys the mean spike count (averaged over many repetitions) depends on the presentedimage and the spike train for each trial is approximately Poisson the spike count variance is equal tothe mean [82] To approximate this property of neuronal responses we add independent Gaussiannoise to each unit of the VOneBlock with variance equal to its activation Before doing this weapply an affine transformation to the unitsrsquo activations so that both the mean stimulus response andthe mean baseline activity are the same as those of a population of primate V1 neurons measured ina 25ms time-window (mean stimulus response and spontaneous activity of 0324 and 0073 spikesrespectively see Supplementary Table C2 and Fig C2 for other time-windows) Like in the brainthe stochasticity of the VOneBlock is always on during both training and inference

V1 explained variance The VOneBlock was not developed to compete with state-of-the-art data-fitting models of V1 [53 30] Instead we used available empirical distributions to constrain a GFBmodel generating a neuronal space that approximates that of primate V1 Despite its simplicity theVOneBlock outperformed all tested standard ImageNet trained CNNs in explaining responses in theV1 dataset used and with an explained variance of 0375plusmn0006 came right in the middle of the rangeof the adversarially trained CNNs (Fig 1 blue circle) On the surface it seems that these results are

4

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

at odds with Cadena et al which showed that a task-optimized CNN marginally outperformed a GFBmodel in explaining V1 responses [30] However our GFB has parameters constrained by empiricaldata resulting in a better model of V1 when we use the parameters of the GFB in Cadena et al weobtained a much lower explained variance (0296plusmn0005) Our results suggest that a well-tuned GFBand adversarially trained CNNs are currently the best models in predicting primate V1 responses

4 Results

41 Simulating a V1 at the front of CNNs improves robustness to white box attacks

To evaluate the VOneNetsrsquo robustness to whitebox adversarial attacks we used untargeted PGDwith Linfin L2 and L1 norm constraints of δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] All VOneNets were attacked end-to-end with gradients propagated through the VOneBlockto the pixels Because the VOneBlock is stochastic we took extra care when attacking VOneNetsWe use the reparameterization trick when sampling VOneBlock stochasticity allowing us to keep thegradients intact [83] Further to combat the effects of noise in the gradients we adapted our attacksuch that at every PGD iteration we take 10 gradient samples and move in the average directionto increase the loss [68] (Supplementary Table B1) To verify the effectiveness of our attacks weused controls suggested by Carlini et al 2019 [84] in particular we show that increasing themagnitude of the norm of the attack monotonically decreases VOneNet accuracy eventually reaching0 accuracy (Supplementary Fig B1) Additionally increasing the number of PGD steps from 1 to

VOneBlock (512x56x56)

Standard CNN layers

Conv layer Non-linear layer Stochastic layerSimple

A B

Complex

x 256

x 256

00

03

06

Whi

te b

ox a

ccur

acy

(low

stre

ngth

)

ResN

et50

VOne

ResN

et50

CORn

etSVO

neCO

RnetS

Alex

Net

VOne

Alex

Net

00

03

06

Whi

te b

ox a

ccur

acy

(hig

h st

reng

th)

+

+

Figure 2 Simulating V1 at the front of CNNs improves robustness to white box attacks AA VOneNet consists of a model of primate V1 the VOneBlock followed by a standard CNNarchitecture The VOneBlock contains a convolutional layer (a GFB with fixed weights constrainedby empirical data) a nonlinear layer (simple or complex cell nonlinearities) and a stochastic layer(V1 stochasticity generator with variance equal to mean) B Top comparison of accuracy (top-1) onlow strength white box attacks (PGD with δinfin = 11020 δ2 = 015 δ1 = 40) between threebase models (ResNet50 CORnet-S and AlexNet) and their correspoding VOneNets Gray bars showthe performance of base models Blue bars show the improvements from the VOneBlock Dashedlines indicate the performance on clean images Bottom same but for white box attacks of higherstrength (PGD with δinfin = 1255 δ2 = 06 δ1 = 160) VOneNets show consistently higherwhite box accuracy even for perturbation strengths that reduce the performance of the base models tonearly chance Error-bars represent SD (n=3 seeds)

5

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 5: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

at odds with Cadena et al which showed that a task-optimized CNN marginally outperformed a GFBmodel in explaining V1 responses [30] However our GFB has parameters constrained by empiricaldata resulting in a better model of V1 when we use the parameters of the GFB in Cadena et al weobtained a much lower explained variance (0296plusmn0005) Our results suggest that a well-tuned GFBand adversarially trained CNNs are currently the best models in predicting primate V1 responses

4 Results

41 Simulating a V1 at the front of CNNs improves robustness to white box attacks

To evaluate the VOneNetsrsquo robustness to whitebox adversarial attacks we used untargeted PGDwith Linfin L2 and L1 norm constraints of δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] All VOneNets were attacked end-to-end with gradients propagated through the VOneBlockto the pixels Because the VOneBlock is stochastic we took extra care when attacking VOneNetsWe use the reparameterization trick when sampling VOneBlock stochasticity allowing us to keep thegradients intact [83] Further to combat the effects of noise in the gradients we adapted our attacksuch that at every PGD iteration we take 10 gradient samples and move in the average directionto increase the loss [68] (Supplementary Table B1) To verify the effectiveness of our attacks weused controls suggested by Carlini et al 2019 [84] in particular we show that increasing themagnitude of the norm of the attack monotonically decreases VOneNet accuracy eventually reaching0 accuracy (Supplementary Fig B1) Additionally increasing the number of PGD steps from 1 to

VOneBlock (512x56x56)

Standard CNN layers

Conv layer Non-linear layer Stochastic layerSimple

A B

Complex

x 256

x 256

00

03

06

Whi

te b

ox a

ccur

acy

(low

stre

ngth

)

ResN

et50

VOne

ResN

et50

CORn

etSVO

neCO

RnetS

Alex

Net

VOne

Alex

Net

00

03

06

Whi

te b

ox a

ccur

acy

(hig

h st

reng

th)

+

+

Figure 2 Simulating V1 at the front of CNNs improves robustness to white box attacks AA VOneNet consists of a model of primate V1 the VOneBlock followed by a standard CNNarchitecture The VOneBlock contains a convolutional layer (a GFB with fixed weights constrainedby empirical data) a nonlinear layer (simple or complex cell nonlinearities) and a stochastic layer(V1 stochasticity generator with variance equal to mean) B Top comparison of accuracy (top-1) onlow strength white box attacks (PGD with δinfin = 11020 δ2 = 015 δ1 = 40) between threebase models (ResNet50 CORnet-S and AlexNet) and their correspoding VOneNets Gray bars showthe performance of base models Blue bars show the improvements from the VOneBlock Dashedlines indicate the performance on clean images Bottom same but for white box attacks of higherstrength (PGD with δinfin = 1255 δ2 = 06 δ1 = 160) VOneNets show consistently higherwhite box accuracy even for perturbation strengths that reduce the performance of the base models tonearly chance Error-bars represent SD (n=3 seeds)

5

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 6: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

many strictly increases the effect of the attack under a given norm size (Supplementary Fig B2)Both of these indicate that VOneNet gradients contain useful information for the construction ofadversarial examples For more details and controls see Supplementary Sections B1 and B2

We found that simulating a V1 front-end substantially increased the robustness to white box attacksfor all three base architectures that we tested (ResNet50 CORnet-S and AlexNet) while leavingclean ImageNet performance largely intact (Fig 2 B) This was particularly evident for the strongerperturbation attacks which reduced the accuracy of the base models to nearly chance (Figs 2 Bbottom) This suggests that the VOneBlock works as a generic front-end which can be transferred toa variety of different neural networks as an architectural defense to increase robustness to adversarialattacks

42 VOneNets outperform state-of-the-art methods on a composite set of perturbations

We then focused on the ResNet50 architecture and compared VOneResNet50 with two state-of-the-arttraining-based defense methods adversarial training with a δinfin = 4255 constraint (ATLinfin ) [76]and adversarial noise with Stylized ImageNet training (ANT3times3+SIN) [59] Because white boxadversarial attacks are only part of the issue of model robustness we considered a larger panel ofimage perturbations containing a variety of common corruptions For evaluating model performanceon corruptions we used the ImageNet-C dataset [13] which consists of 15 different corruptiontypes each at 5 levels of severity divided into 4 categories noise blur weather and digital (seeSupplementary Fig B5 for example images and Supplementary Section B3 for more details)

As expected each of the defense methods had the highest accuracy under the perturbation type that itwas designed for but did not considerably improve over the base model on the other (Table 1) Whilethe ATLinfin model suffered substantially on corruptions and clean performance the ANT3times3+SINmodel the current state-of-the-art for common corruptions had virtually no benefit on white boxattacks On the other hand VOneResNet50 improved on both perturbation types outperforming allthe models on perturbation mean (average of white box and common corruptions) and overall mean(average of clean white box and common corruptions) with a difference to the second best modelof 33 and 53 respectively Specifically VOneResNet50 showed substantial improvements forall the white box attack constraints (Supplementary Fig B1) and more moderate improvementsfor common image corruptions of the categories noise blur and digital (Supplementary Table B2and Fig B6) These results are particularly remarkable since VOneResNet50 was not optimized forrobustness and does not benefit from any computationally expensive training procedure like the otherdefense methods When compared to the base ResNet50 which has an identical training procedureVOneResNet50 improves 18 on perturbation mean and 107 on overall mean (Table 1)

Table 1 VOneResNet50 outperforms other defenses on perturbation mean and overall meanAccuracy (top-1) on overall mean perturbation mean white box attacks common corruptions andclean images for standard ResNet50 and three defense methods adversarial training adversarialnoise combined with Stylized ImageNet training and VOneResNet50 Perturbation mean is theaverage accuracy over white box attacks and common image corruptions The overall mean is theaverage accuracy over the two perturbation types and clean ImageNet Values are reported as meanand SD (n=3 seeds)

Overall Perturbation

Mean Mean White box Corruption CleanModel [] [] [] [] []

Base ResNet50 [4] 436plusmn00 276plusmn01 164plusmn01 388plusmn03 756plusmn01ATLinfin [76] 490 423 523 323 624ANT3times3+SIN [59] 480 349 173 526 741VOneResNet50 543plusmn01 456plusmn02 511plusmn04 400plusmn03 717plusmn01

43 All components of the VOneBlock contribute to improved model robustness

Since the VOneBlock was not explicitly designed to increase model robustness but rather to ap-proximate primate V1 we investigated which of its components are responsible for the increased

6

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 7: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

robustness we observe We performed a series of experiments wherein six new VOneNet variantswere created by removing or modifying a part of the VOneBlock (referred to as VOneBlock variants)After ImageNet training model robustness was evaluated as before on our panel of white box attacksand image corruptions Three variants targeted the GFB one sampling the Gabor parameters fromuniform distributions instead of those found in primate V1 another without high spatial frequency(SF) filters (f lt 2cpd) and another without low SF filters (f gt 2cpd) Two additional variantstargeted the nonlinearities one without simple cells and another without complex cells Finallythe sixth variant had the V1 neuronal stochasticity generator removed Even though all VOneBlockvariants are poorer approximations of primate V1 all resulting VOneResNet50 variants still hadimproved perturbation accuracy when compared to the base ResNet50 model (Fig 3 A) On the otherhand all variants except that with uniformly sampled Gabor parameters showed significant deficits inrobustness compared to the unmodified VOneResNet50 (Fig 3 A drops in perturbation accuracybetween 1 and 15)

Interestingly we observed that some of these changes had a highly specific effect on the type ofperturbation robustness affected (Table 2 and Supplementary Section D) Removing high SF Gaborsnegatively affected both white box and clean accuracy while actually improving robustness to commonimage corruptions particularly those of the noise and blur categories (Supplementary Table D1)Removing complex cells only impaired white box robustness as opposed to removing simple cellswhich was particularly detrimental to performance on image corruptions Finally removing V1stochasticity considerably decreased white box accuracy while improving accuracy for both cleanand corrupted images

The VOneBlock variant without V1 stochasticity suffered the most dramatic loss in robustness Thisis not altogether surprising as several approaches to adversarial robustness have focused on noise as adefense [15 19 66] To investigate whether V1 stochasticity alone accounted for the majority of therobustness gains we evaluated the perturbation accuracy of a ResNet50 model with V1 stochasticityadded at the output of its first block Neuronal stochasticity was implemented exactly the sameway as in the VOneBlock by first applying an affine transformation to scale the activations so thatthey match primate V1 neuronal activity Like the VOneResNet50 this model had stochasticityduring training and inference and showed a considerable improvement in robustness compared

02

03

04

05

Pertu

rbat

ion

accu

racy

Low SFHigh SFV1 dist

ComplexSimpleV1 stoch

V1 stoch

V1 features

ResNet50V1 features

V1 stoch

InteractionContributionA B

Figure 3 All components of the VOneBlock work in synergy to improve robustness to imageperturbations A Perturbation mean accuracy (top-1) for VOneResNet50 and several variations witha component of the VOneBlock removed or altered From left to right unmodified VOneResNet50model with Gabor parameters sampled uniformly (within biological ranges) model without high SFGabors model without low SF Gabors model without complex cells model without simple cellsmodel without V1 stochasticity Gray bars show the performance of ResNet50 Blue bars show theimprovements due to the VOneBlock Dashed line indicates the accuracy of the unmodified modelError-bars represent SD (n=3 seeds) B Same as in A but comparing ResNet50 VOneResNet50without V1 stochasticity ResNet50 with V1 stochasticity added after the first block and VOneRes-Net50 Adding V1 stochasticity to ResNet50 accounts for less than half of the total improvement ofVOneResNet50 demonstrating a supralinear interaction between the V1 features and V1 stochasticity

7

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 8: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Table 2 Removal of some VOneBlock components causes impairments with high specificityDifference in accuracy (top-1) relative to the unmodified VOneResNet50 for each of the variants withremoved components on overall mean white box attacks common corruptions and clean imagesValues are reported as mean and SD (n=3 seeds)

Overall Perturbation

Component Mean White box Corruption Cleanremoved [∆] [∆] [∆] [∆]

V1 dist -04plusmn03 04plusmn08 -11plusmn03 -05plusmn03High SF -22plusmn02 -38plusmn03 19plusmn04 -47plusmn02Low SF -07plusmn05 -16plusmn11 -07plusmn01 01plusmn03Complex -10plusmn06 -45plusmn11 06plusmn04 08plusmn03Simple -30plusmn05 -20plusmn11 -48plusmn05 -21plusmn03V1 stoch -88plusmn03 -306plusmn04 15plusmn05 26plusmn01

to the standard ResNet50 However this improvement accounted for only a fraction of the totalgains of the VOneResNet50 model (Fig 3 B) demonstrating that there is a substantial supralinearinteraction between the V1 features and the neuronal stochasticity Merely adding V1 stochasticityto the first block of a standard CNN model does not increase robustness to the same degree as thefull VOneBlockmdashthe presence of V1 features more than doubles the contribution to perturbationrobustness brought by the addition of neuronal stochasticity

Finally we sought to determine whether stochasticity during inference is key to defending againstattacks Thus we analyzed the white box adversarial performance of VOneResNet50 while quenchingstochasticity during the adversarial attack (Fig 4) Remarkably the majority of improvements inadversarial robustness originate from the neuronal stochasticity during training indicating that V1stochasticity induces the downstream layers to learn representations that are more robust to adversarialattacks This is particularly interesting when contrasted with the ANT3times3+SIN defense which hasnoise added to input images during training but does not learn representations with notably higherrobustness to white box adversarial attacks (Table 1 and Supplementary Fig B1)

015

030

045

Whi

te b

ox a

ccur

acy

V1 featureswo stoch

V1 stochtraining

V1 stochattack

Contribution

V1

stoc

h

Attack

Training

Figure 4 V1 stochasticity during training leads tomore robust learned representations White boxaccuracy (top-1) for VOneResNet50 variant withoutV1 stochasticity VOneResNet50 with stochasticity re-moved during the white box attack and inference andVOneResNet50 VOneResNet50 without stochasticityduring attackinference maintained over half of the ro-bustness gains Therefore V1 stochasticity improvesmodel robustness in two ways rendering the attack itselfless effective and inducing more robust representationsdownstream of the VOneBlock the latter accountingfor the majority of the gains on our benchmark Graybars show the performance of the standard ResNet50Blue bars show the improvements due to the VOneBlockError-bars represent SD (n=3 seeds)

5 Discussion

In this work we demonstrate that adversarial robustness in CNNs is correlated with their abilityto explain primate V1 neuronal responses and that simulating the image processing of primateV1 at the front of standard neural network architectures significantly improves their robustness toimage perturbations Notably this approach outperformed state-of-the-art defense methods on alarge benchmark consisting of adversarial attacks and common image corruptions Despite not beingconstructed to this end any component of the model front-end we removed or modified to be lesslike primate V1 resulted in less overall robustness and revealed that different components improverobustness to different perturbation types Remarkably we find that simulated V1 stochasticity

8

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 9: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

interacts synergistically with the V1 model features and drives the downstream layers to learnrepresentations more robust to adversarial perturbations

Our approach bears some similarity to the pioneering study by Li et al [40] in which a modelrsquosrepresentations were regularized to approximate mouse V1 increasing its robustness in the grey-scale CIFAR dataset however here we go further in several key ways First while the mouse isgaining traction as a model for studying vision a vast literature has established macaque vision as aquantitatively accurate model of human vision in general and human object recognition in particular[85 86] Visual acuity in mice is much lower than in macaques and humans [87 79] suggestingthat vision in the mouse may serve different behavioral functions than in primates Further theregularization approach employed by Li et al does not allow a clear disambiguation of whichaspects of mouse V1 contribute to the improved robustness Since the components of the VOneBlockproposed here are mappable to the brain we can dissect the contributions of different neuronalpopulations in primate V1 to robustness against specific image perturbations Finally extendingthe robustness gains of biologically-constrained CNNs from gray-scale CIFAR to the full ImageNetdataset is a critical step towards real-world human-level applications

The gains achieved by VOneNets are substantial particularly against white box attacks and havetangible benefits over other defense methods Though adversarial training still provides the strongestdefense for the attack statistics it is trained on it has significant downsides Beyond its considerableadditional computational cost during training adversarially trained networks have significantly lowerperformance on clean images corrupted images and images perturbed with attack statistics not seenduring training implying that adversarial training in its current form may not be viable as a generaldefense method In contrast by deploying an architectural change VOneNets improve robustnessto all adversarial attacks tested and many common image corruptions and they accomplishthis with no additional training overhead This also suggests that the architectural gains of theVOneNet could be stacked with other training based defense methods to achieve even greater overallrobustness gains

Relative to current methods the success of this approach derives from engineering in a betterapproximation of the architecture of the most well studied primate visual area combined with taskoptimization of the remaining free parameters of the downstream architecture [26] This points to twopotentially synergistic avenues for further gains an even more neurobiologically precise model ofV1 (ie a better VOneBlock) and an even more neurobiologically precise model of the downstreamarchitecture For example one could extend the biological fidelity of the VOneBlock in the hope thatit confers even greater robustness by including properties such as divisive normalization [48 49] andcontextual modulation [88] to name a few In addition to V1 the retina and the Lateral GeniculateNucleus (LGN) also play important roles in pre-processing visual information only partially capturedby the current V1 model suggesting that extending the work done here to a retinaLGN front-end haspotential to better align CNNs with human visual object recognition [89 90] In addition though ourinitial experiments show that multiple components of our V1 model work together to produce greaterrobustness the nature of the relationship between adversarial robustness and explained variance in V1responses is far from resolved In particular as the VOneNet with and without stochasticity achievevery similar V1 explained variances but have markedly different levels of adversarial robustness itis clear that much theoretical work remains in better understanding both when and why matchingbiology leads to more robust computer vision models

While neuroscience has recently seen a huge influx of new neurally-mechanistic models and toolsdrawn from machine learning [91 92] the most recent advances in machine learning and computervision have been driven mostly by the widespread availability of computational resources and verylarge labeled datasets [93] rather than by an understanding of the relevant brain mechanisms Underthe belief that biological intelligence still has a lot of untapped potential to contribute a number ofresearchers have been pushing for more neuroscience-inspired machine learning algorithms [37 4232] The work presented here shows that this aspiration can become realitymdashthe models presentedhere drawn directly from primate neurobiology indeed require less training to achieve more human-like behavior This is one turn of a new virtuous circle wherein neuroscience and artificial intelligenceeach feed into and reinforce the understanding and ability of the other

9

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 10: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Broader Impact

From a technological perspective the ethical implications of our work are largely aligned with thoseof computer vision in general While there is undoubtedly potential for malicious and abusive usesof computer vision particularly in the form of discrimination or invasion of privacy we believethat our work will aid in the production of more robust and intuitive behavior of computer visionalgorithms As CNNs are deployed in real-world situations it is critical that they behave with thesame level of stability as their human counterparts In particular they should at the very least notbe confused by changes in input statistics that do not confuse humans We believe that this workwill help to bridge that gap Furthermore while algorithms are often thought to be impartial orunbiased much research has shown that data driven models like current CNNs are often even morebiased than humans implicitly keying in on and amplifying stereotypes For this reason makingnew CNNs that behave more like humans may actually reduce or at least make more intuitive theirimplicit biases Unfortunately we note that even with our work these issues are not resolved yetWhile we developed a more neurobiologically-constrained algorithm it comes nowhere close tohuman-like behaviour in the wide range of circumstances experienced in the real world Finally fromthe perspective of neuroscience we think that this work introduces a more accurate model of theprimate visual system Ultimately better models contribute to a better mechanistic understandingof how the brain works and how to intervene in the case of illness or disease states We think thatour model contributes a stronger foundation for understanding the brain and building novel medicaltechnology such as neural implants for restoring vision in people with impairments

Acknowledgments and Disclosure of Funding

We thank J Anthony Movshon and Corey M Ziemba for access to the V1 neural dataset MIT-IBMSatori for providing the compute necessary for this project John Cohn for technical support SijiaLiu and Wieland Brendel for advice on adversarial attacks and Adam Marblestone for insightfuldiscussion This work was supported by the PhRMA Foundation Postdoctoral Fellowship in Infor-matics (TM) the Semiconductor Research Corporation (SRC) and DARPA (JD MS JJD) theMassachusetts Institute of Technology Shoemaker Fellowship (MS) Office of Naval Research grantMURI-114407 (JJD) the Simons Foundation grant SCGB-542965 (JJD) the MIT-IBM WatsonAI Lab grant W1771646 (JJD)

References[1] Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton ldquoImageNet Classification with Deep

Convolutional Neural Networksrdquo In Advances in Neural Information Processing Systems 25Ed by F Pereira et al Curran Associates Inc 2012 pp 1097ndash1105

[2] Christian Szegedy et al ldquoGoing Deeper with Convolutionsrdquo In (Sept 2014) arXiv 14094842 [csCV]

[3] Karen Simonyan and Andrew Zisserman ldquoVery Deep Convolutional Networks for Large-Scale Image Recognitionrdquo In (Sept 2014) arXiv 14091556 [csCV]

[4] Kaiming He et al ldquoDeep Residual Learning for Image Recognitionrdquo In (Dec 2015) arXiv151203385 [csCV]

[5] Kaiming He et al ldquoDelving deep into rectifiers Surpassing human-level performance onimagenet classificationrdquo In Proceedings of the IEEE International Conference on ComputerVision 2015 Inter (2015) pp 1026ndash1034 ISSN 15505499 DOI 101109ICCV2015123arXiv 150201852

[6] Christian Szegedy et al ldquoIntriguing properties of neural networksrdquo In (Dec 2013) arXiv13126199 [csCV]

[7] Nicholas Carlini and David Wagner ldquoTowards Evaluating the Robustness of Neural Net-worksrdquo In (Aug 2016) arXiv 160804644 [csCR]

[8] Pin-Yu Chen et al ldquoEAD Elastic-Net Attacks to Deep Neural Networks via AdversarialExamplesrdquo In (Sept 2017) arXiv 170904114 [statML]

[9] Jeacuterocircme Rony et al ldquoDecoupling Direction and Norm for Efficient Gradient-Based L2 Adver-sarial Attacks and Defensesrdquo In (Nov 2018) arXiv 181109600 [csCV]

10

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 11: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

[10] Wieland Brendel et al ldquoAccurate reliable and fast robustness evaluationrdquo In (July 2019)arXiv 190701003 [statML]

[11] Samuel Dodge and Lina Karam ldquoA Study and Comparison of Human and Deep LearningRecognition Performance Under Visual Distortionsrdquo In (May 2017) arXiv 170502498[csCV]

[12] Robert Geirhos et al ldquoGeneralisation in humans and deep neural networksrdquo In (Aug 2018)arXiv 180808750 [csCV]

[13] Dan Hendrycks and Thomas Dietterich ldquoBenchmarking Neural Network Robustness toCommon Corruptions and Perturbationsrdquo In (Mar 2019) arXiv 190312261 [csLG]

[14] Nilaksh Das et al ldquoKeeping the Bad Guys Out Protecting and Vaccinating Deep Learningwith JPEG Compressionrdquo In (May 2017) arXiv 170502900 [csCV]

[15] Xuanqing Liu et al ldquoTowards Robust Neural Networks via Random Self-ensemblerdquo In (Dec2017) arXiv 171200673 [csLG]

[16] Weilin Xu David Evans and Yanjun Qi ldquoFeature Squeezing Detecting Adversarial Examplesin Deep Neural Networksrdquo In arXiv [csCV] (Apr 2017)

[17] Aleksander Madry et al ldquoTowards Deep Learning Models Resistant to Adversarial AttacksrdquoIn (June 2017) arXiv 170606083 [statML]

[18] Yang Song et al ldquoPixelDefend Leveraging Generative Models to Understand and Defendagainst Adversarial Examplesrdquo In (Oct 2017) arXiv 171010766 [csLG]

[19] Guneet S Dhillon et al ldquoStochastic Activation Pruning for Robust Adversarial Defenserdquo In(Mar 2018) arXiv 180301442 [csLG]

[20] Jacob Buckman et al ldquoThermometer Encoding One Hot Way To Resist Adversarial Exam-plesrdquo In (Feb 2018) URL httpsopenreviewnetforumid=S18Su--CW

[21] Chuan Guo et al ldquoCountering Adversarial Images using Input Transformationsrdquo In (Feb2018) URL httpsopenreviewnetforumid=SyJ7ClWCb

[22] Claudio Michaelis et al ldquoBenchmarking Robustness in Object Detection AutonomousDriving when Winter is Comingrdquo In (2019) pp 1ndash23 arXiv 190707484 URL httparxivorgabs190707484

[23] Andrew Ilyas et al ldquoAdversarial Examples Are Not Bugs They Are Featuresrdquo In (May2019) arXiv 190502175 [statML]

[24] D Linsley et al ldquoWhat are the Visual Features Underlying Human Versus Machine VisionrdquoIn ICCV 2017 ISBN 9781538610343 DOI 101109ICCVW2017331 arXiv 170102704

[25] Robert Geirhos et al ldquoImageNet-Trained CNNs Are Biased Towards Texturerdquo In Inter-national Conference on Learning Representations c (2019) pp 1ndash20 DOI arXiv181112231v1 arXiv 181112231

[26] Daniel L K Yamins et al ldquoPerformance-optimized hierarchical models predict neuralresponses in higher visual cortexrdquo In Proceedings of the National Academy of Sciences11123 (2014) pp 8619ndash8624 ISSN 0027-8424 DOI 101073pnas1403112111 arXiv07061062v1 URL httpwwwpnasorgcgidoi101073pnas1403112111

[27] Charles F Cadieu et al ldquoDeep neural networks rival the representation of primate IT cortexfor core visual object recognitionrdquo In PLoS computational biology 1012 (2014) e1003963

[28] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte ldquoDeep supervised but not unsu-pervised models may explain IT cortical representationrdquo en In PLoS Comput Biol 1011(Nov 2014) e1003915

[29] Umut Guumlccedilluuml and Marcel A J van Gerven ldquoDeep Neural Networks Reveal a Gradientin the Complexity of Neural Representations across the Brainrsquos Ventral Visual PathwayrdquoIn The Journal of neuroscience 3527 (2015) pp 10005ndash10014 ISSN 0270-6474 DOI101523JNEUROSCI5023-142015 arXiv 14116422 URL httparxivorgabs141164227B5C7D0Ahttpdxdoiorg101523JNEUROSCI5023-142015

[30] Santiago A Cadena et al ldquoDeep convolutional models improve predictions of macaque V1responses to natural imagesrdquo en In PLoS Comput Biol 154 (Apr 2019) e1006897

[31] Martin Schrimpf et al ldquoBrain-Score Which Artificial Neural Network for Object Recognitionis most Brain-Likerdquo en Sept 2018 URL httpsdoiorg101101407007

11

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 12: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

[32] Jonas Kubilius et al ldquoBrain-Like Object Recognition with High-Performing Shallow Recur-rent ANNsrdquo In Advances in Neural Information Processing Systems 32 Ed by H Wallachet al Curran Associates Inc 2019 pp 12805ndash12816 URL httppapersnipsccpaper9441- brain- like- object- recognition- with- high- performing-shallow-recurrent-annspdf

[33] Adam H Marblestone Greg Wayne and Konrad P Kording ldquoToward an Integration of DeepLearning and Neurosciencerdquo en In Front Comput Neurosci 10 (Sept 2016) p 94

[34] William Lotter Gabriel Kreiman and David Cox ldquoDeep Predictive Coding Networksfor Video Prediction and Unsupervised Learningrdquo In (May 2016) arXiv 160508104[csLG]

[35] Aran Nayebi and Surya Ganguli ldquoBiologically inspired protection of deep networks fromadversarial attacksrdquo In (Mar 2017) arXiv 170309202 [statML]

[36] Jordan Guerguiev Timothy P Lillicrap and Blake A Richards ldquoTowards deep learning withsegregated dendritesrdquo en In Elife 6 (Dec 2017)

[37] Demis Hassabis et al ldquoNeuroscience-Inspired Artificial Intelligencerdquo In Neuron 952 (2017)pp 245ndash258 ISSN 10974199 DOI 101016jneuron201706011 arXiv 14047282

[38] Grace W Lindsay and Kenneth D Miller ldquoHow biological attention mechanisms improvetask performance in a large-scale visual system modelrdquo en In Elife 7 (Oct 2018)

[39] Hanlin Tang et al ldquoRecurrent computations for visual pattern completionrdquo In Proceedingsof the National Academy of Sciences 11535 (2018) pp 8835ndash8840 ISSN 0027-8424 DOI101073pnas1719397115

[40] Zhe Li et al ldquoLearning From Brains How to Regularize Machinesrdquo In (Nov 2019) arXiv191105072 [csLG]

[41] Hosein Hasani Mahdieh Soleymani and Hamid Aghajan ldquoSurround Modulation A Bio-inspired Connectivity Structure for Convolutional Neural Networksrdquo In NeurIPS NeurIPS(2019) pp 15877ndash15888 URL httppapersnipsccpaper9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks

[42] Fabian H Sinz et al ldquoEngineering a Less Artificial Intelligencerdquo en In Neuron 1036 (Sept2019) pp 967ndash979

[43] Anthony M Zador ldquoA critique of pure learning and what artificial neural networks can learnfrom animal brainsrdquo en In Nat Commun 101 (Aug 2019) p 3770

[44] J Deng et al ldquoImageNet A large-scale hierarchical image databaserdquo In 2009 IEEE Confer-ence on Computer Vision and Pattern Recognition June 2009 pp 248ndash255

[45] D H Hubel and T N Wiesel ldquoReceptive Fields Binocular Interaction and FunctionalArchitecture in the Catrsquos Visual Cortexrdquo In Journal of Physiology (1962) pp 106ndash154

[46] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model ofsimple receptive fields in cat striate cortexrdquo In Journal of Neurophysiology 586 (1987)pp 1233ndash1258 ISSN 0022-3077 DOI 101152jn19875861233 URL httpwwwphysiologyorgdoi101152jn19875861233

[47] EH Adelson and JR Bergen ldquoSpatiotemporal energy models for the perception of motionrdquoIn Journal of the Optical Society of America A Optics and image science 22 (1985)pp 284ndash299 ISSN 1084-7529 DOI 101364JOSAA2000284 URL httpwwwopticsinfobaseorgabstractcfmURI=josaa-2-2-284

[48] David J Heeger E P Simoncelli and J Anthony Movshon ldquoComputational models ofcortical visual processingrdquo In Proceedings of the National Academy of Sciences of theUnited States of America 932 (1996) pp 623ndash627 ISSN 00278424 DOI 101073pnas932623

[49] M Carandini D J Heeger and J A Movshon ldquoLinearity and normalization in simple cellsof the macaque primary visual cortexrdquo In The Journal of neuroscience the official journalof the Society for Neuroscience 1721 (Nov 1997) pp 8621ndash44 ISSN 0270-6474 URLhttpwwwncbinlmnihgovpubmed9334433

[50] Maximilian Riesenhuber and Tomaso Poggio ldquoHierarchical models of object recognitionin cortexrdquo In Nature Neuroscience 211 (1999) pp 1019ndash1025 ISSN 10976256 DOI10103814819

12

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 13: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

[51] Thomas Serre Aude Oliva and Tomaso Poggio ldquoA feedforward architecture accounts forrapid categorizationrdquo en In Proc Natl Acad Sci U S A 10415 (Apr 2007) pp 6424ndash6429

[52] Nicole C Rust et al ldquoSpatiotemporal Elements of Macaque V1 Receptive Fieldsrdquo In Neuron46 (2005) pp 945ndash956 DOI 101016jneuron200505021

[53] Brett Vintch J A Movshon and E P Simoncelli ldquoA Convolutional Subunit Model forNeuronal Responses in Macaque V1rdquo In Journal of Neuroscience 3544 (2015) pp 14829ndash14841 DOI 101523JNEUROSCI2815-132015

[54] Nicolas Papernot et al ldquoDistillation as a Defense to Adversarial Perturbations against DeepNeural Networksrdquo In (Nov 2015) arXiv 151104508 [csCR]

[55] Mayank Singh et al ldquoHarnessing the Vulnerability of Latent Layers in Adversarially TrainedModelsrdquo In (May 2019) arXiv 190505186 [csLG]

[56] Cihang Xie et al ldquoFeature denoising for improving adversarial robustnessrdquo In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition 2019 pp 501ndash509

[57] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Jan 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[58] Dan Hendrycks et al ldquoAugMix A Simple Data Processing Method to Improve Robustnessand Uncertaintyrdquo In (Dec 2019) arXiv 191202781 [statML]

[59] Evgenia Rusak et al ldquoIncreasing the robustness of DNNs against image corruptions byplaying the Game of Noiserdquo In (Jan 2020) arXiv 200106057 [csCV]

[60] Ali Shafahi et al ldquoAdversarial training for freerdquo In Advances in Neural Information Pro-cessing Systems 32 Ed by H Wallach et al Curran Associates Inc 2019 pp 3358ndash3369

[61] Eric Wong Leslie Rice and J Zico Kolter ldquoFast is better than free Revisiting adversarialtrainingrdquo In (Sept 2020) URL httpsopenreviewnetforumid=BJx040EFvH

[62] Dimitris Tsipras et al ldquoRobustness May Be at Odds with Accuracyrdquo Sept 2019 URLhttpsopenreviewnetforumid=SyxAb30cY7

[63] Yash Sharma and Pin-Yu Chen ldquoAttacking the Madry Defense Model with L1-based Adver-sarial Examplesrdquo In (Oct 2017) arXiv 171010733 [statML]

[64] Lukas Schott et al ldquoTowards the first adversarially robust neural network model on MNISTrdquoIn (May 2018) arXiv 180509190 [csCV]

[65] Cihang Xie et al ldquoMitigating Adversarial Effects Through Randomizationrdquo Feb 2018 URLhttpsopenreviewnetforumid=Sk9yuql0Z

[66] Adnan Siraj Rakin Zhezhi He and Deliang Fan ldquoParametric Noise Injection TrainableRandomness to Improve Deep Neural Network Robustness against Adversarial Attackrdquo In(Nov 2018) arXiv 181109310 [csLG]

[67] Joonho Lee et al ldquoProbAct A Probabilistic Activation Function for Deep Neural NetworksrdquoIn (May 2019) arXiv 190510761 [csLG]

[68] Anish Athalye Nicholas Carlini and David Wagner ldquoObfuscated Gradients Give a FalseSense of Security Circumventing Defenses to Adversarial Examplesrdquo In (Feb 2018) arXiv180200420 [csLG]

[69] Alex Krizhevsky ldquoLearning Multiple Layers of Features from Tiny Imagesrdquo In (Apr 2009)URL httpciteseerxistpsueduviewdocdownloaddoi=10112229220amprep=rep1amptype=pdf

[70] Adam Paszke et al ldquoAutomatic differentiation in PyTorchrdquo In (Oct 2017) URL httpsopenreviewnetforumid=BJJsrmfCZ

[71] Saining Xie et al ldquoAggregated Residual Transformations for Deep Neural Networksrdquo In(Nov 2016) arXiv 161105431 [csCV]

[72] Gao Huang et al ldquoDensely Connected Convolutional Networksrdquo In (Aug 2016) arXiv160806993 [csCV]

[73] Forrest N Iandola et al ldquoSqueezeNet AlexNet-level accuracy with 50x fewer parameters andlt05MB model sizerdquo In (Feb 2016) arXiv 160207360 [csCV]

[74] Ningning Ma et al ldquoShuffleNet V2 Practical Guidelines for Efficient CNN ArchitectureDesignrdquo In (July 2018) arXiv 180711164 [csCV]

[75] Mingxing Tan et al ldquoMnasNet Platform-Aware Neural Architecture Search for Mobilerdquo In(July 2018) arXiv 180711626 [csCV]

13

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 14: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

[76] Logan Engstrom et al Robustness (Python Library) 2019 URL httpsgithubcomMadryLabrobustness

[77] Jeremy Freeman et al ldquoA functional and perceptual signature of the second visual area inprimatesrdquo en In Nat Neurosci 167 (July 2013) pp 974ndash981

[78] Russell L De Valois E W Yund and Norva Hepler ldquoThe orientation and direction selectivityof cells in macaque visual cortexrdquo In Vision Research 22 (1982) pp 531ndash544

[79] Russell L De Valois Duane G Albrecht and Lisa G Thorell ldquoSpatial Frequency Selectivityof Cells in Macaque Visual Cortexrdquo In Vision Research 22 (1982) pp 545ndash559

[80] Dario L Ringach ldquoSpatial Structure and Symmetry of Simple-Cell Receptive Fields inMacaque Primary Visual Cortexrdquo In Journal of Neurophysiology 88 (2002) pp 455ndash463

[81] Yasmine El-Shamayleh et al ldquoVisual response properties of V1 neurons projecting to V2 inmacaquerdquo In Journal of Neuroscience 3342 (2013) pp 16594ndash16605 ISSN 02706474DOI 101523JNEUROSCI2753-132013

[82] W R Softky and C Koch ldquoThe highly irregular firing of cortical cells is inconsistent withtemporal integration of random EPSPsrdquo In Journal of Neuroscience 131 (1993) pp 334ndash350 ISSN 02706474 DOI 101523jneurosci13-01-003341993

[83] Diederik P Kingma and Max Welling ldquoAuto-Encoding Variational Bayesrdquo In (Dec 2013)arXiv 13126114v10 [statML]

[84] Nicholas Carlini et al ldquoOn Evaluating Adversarial Robustnessrdquo In (Feb 2019) arXiv190206705v2 [csLG]

[85] Nikolaus Kriegeskorte Marieke Mur and Peter Bandettini ldquoRepresentational similarityanalysis - connecting the branches of systems neurosciencerdquo en In Front Syst Neurosci 2(Nov 2008) p 4

[86] R Rajalingham K Schmidt and James J DiCarlo ldquoComparison of Object RecognitionBehavior in Human and Monkeyrdquo In Journal of Neuroscience 3535 (2015) pp 12127ndash12136 ISSN 0270-6474 DOI 101523JNEUROSCI0573-152015 URL httpwwwjneurosciorgcgidoi101523JNEUROSCI0573-152015

[87] Cristopher M Niell and Michael P Stryker ldquoHighly selective receptive fields in mouse visualcortexrdquo In The Journal of neuroscience 2830 (July 2008) pp 7520ndash36 ISSN 1529-2401DOI 101523JNEUROSCI0623-082008 URL httpwwwncbinlmnihgovpubmed18650330

[88] Pieter R Roelfsema ldquoCortical algorithms for perceptual groupingrdquo In Annual review ofneuroscience 29March (Jan 2006) pp 203ndash27 ISSN 0147-006X DOI 101146annurevneuro 29 051605 112939 URL http www ncbi nlm nih gov pubmed 16776584

[89] Alexander Berardino et al ldquoEigen-Distortions of Hierarchical Representationsrdquo In (Oct2017) arXiv 171002266 [csCV]

[90] Jack Lindsey et al ldquoA Unified Theory of Early Visual Representations from Retina toCortex through Anatomically Constrained Deep CNNsrdquo In ICLR (2019) pp 1ndash17 DOI101101511535 arXiv 190100945 URL httparxivorgabs190100945

[91] Daniel L K Yamins and James J DiCarlo ldquoUsing goal-driven deep learning models tounderstand sensory cortexrdquo en In Nat Neurosci 193 (Mar 2016) pp 356ndash365

[92] Alexander Mathis et al ldquoDeepLabCut markerless pose estimation of user-defined body partswith deep learningrdquo In Nature Neuroscience (2018) ISSN 1097-6256 DOI 101038s41593-018-0209-y arXiv 180403142 URL httpwwwnaturecomarticless41593-018-0209-y

[93] Yann LeCun Yoshua Bengio and Geoffrey E Hinton ldquoDeep learningrdquo In Nature 5217553(2015) pp 436ndash444 ISSN 14764687 DOI 101038nature14539 arXiv 180707987

[94] T Marques M Schrimpf and J J DiCarlo ldquoHierarchical neural network models thatmore closely match primary visual cortex tend to better explain higher level visual corticalresponsesrdquo In Cosyne 2020

[95] Alexey Kurakin Ian Goodfellow and Samy Bengio ldquoAdversarial examples in the physicalworldrdquo In (July 2016) arXiv 160702533 [csCV]

[96] Maria-Irina Nicolae et al ldquoAdversarial Robustness Toolbox v120rdquo In CoRR 180701069(2018) URL httpsarxivorgpdf180701069

14

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 15: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

[97] Florian Tramer et al ldquoOn Adaptive Attacks to Adversarial Example Defensesrdquo In (Feb2020) arXiv 200208347 [csLG]

[98] Jonas Rauber et al ldquoFoolbox Native Fast adversarial attacks to benchmark the robustness ofmachine learning models in PyTorch TensorFlow and JAXrdquo In Journal of Open SourceSoftware 553 (2020) p 2607 DOI 1021105joss02607 URL httpsdoiorg1021105joss02607

[99] Matthew D Zeiler and Rob Fergus ldquoVisualizing and Understanding Convolutional Net-worksrdquo In arXiv (Nov 2013) ISSN 16113349 DOI 101007978- 3- 319- 10590-1_53 arXiv 13112901 URL httplinkspringercom101007978- 3-319 - 10590 - 1 7B 5C _ 7D53 7B 5C 7D5Cnhttp arxiv org abs 1311 29017B5C7D5Cnpapers3publicationuuid44feb4b1-873a-4443-8baa-1730ecd1629120httparxivorgabs13112901

[100] Chris Olah et al ldquoAn Overview of Early Vision in InceptionV1rdquo In Distill (2020)httpsdistillpub2020circuitsearly-vision DOI 1023915distill00024002

[101] Ian Nauhaus et al ldquoOrthogonal micro-organization of orientation and spatial frequency inprimate primary visual cortexrdquo In Nature Neuroscience 1512 (2012) pp 1683ndash1690 ISSN10976256 DOI 101038nn3255

15

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 16: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Supplementary MaterialA Neural Explained Variance

We evaluated how well responses to given images in candidate CNNs explain responses of single V1neurons to the same images using a standard neural predictivity methodology based on partial leastsquare (PLS) regression [31 32] We used a neural dataset obtained by extracellular recordings from102 single-units while presenting 450 different images for 100ms 20 times each spanning 4deg ofthe visual space Results from this dataset were originally published in a study comparing V1 and V2responses to naturalistic textures [77] To avoid over-fitting for each CNN we first chose its bestV1 layer on a subset of the dataset (n=135 images) and reported the final neural explained variancecalculated for the chosen layer on the remaining images of the dataset (n=315 images) Model unitswere mapped to each V1 neuron linearly using a PLS regression model with 25 components Themapping was performed for each neuron using 90 of the image responses in the respective datasetand tested on the remaining held-out 10 in a 10-fold cross-validation strategy We reported neuralexplained variance values normalized by the neuronal internal consistency We calculated V1 neuralexplained variance for a large pool of CNNs (n=30 models)

When using CNNs in a computer vision task such as object recognition in ImageNet their field-of-view (in degrees) is not defined since the only relevant input property is the image resolution (inpixels) However when using a CNN to model the ventral stream the visual spatial extent of themodelrsquos input is of key importance to ensure that it is correctly mapped to the data it is trying toexplain We assigned a spatial extent of 8deg to all models since this value has been previously usedin studies benchmarking CNNs as models of the primate ventral stream [31 94] and is consistentwith the results in Cadena et al [30]

16

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 17: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

B Image Perturbations

B1 White Box Adversarial Attacks

For performing white box adversarial attacks we used untargeted projected gradient descent (PGD)[17] (also referred to as the Basic Iterative Method [95]) with Linfin L2 and L1 norm constraintsGiven an image x This method uses the gradient of the loss to iteratively construct an adversarialimage xadv which maximizes the model loss within an Lp bound around x Formally in the case ofan Linfin constraint PGD iteratively computes xadv as

xt+1 = Projx+S(xt + αsgn(nablaxtL(θ xt y)))

where x is the original image and the Proj operator ensures the final computed adversarial imagexadv is constrained to the space x+S here the Linfin ball around x In the case of L1 or L2 constraintsat each iteration nablaxtL(θ xt y) is scaled to have an L1 or L2 norm of α and the Proj operatorensures the final xadv is within an L1 or L2 norm ball around x

For our white box benchmarks we used δinfin isin [11020 1255] δ2 isin [015 06] and δ1 isin[40 160] constraints where δ = xminusxadv for a total of six different attack variants with two strengthsunder each norm constraint We arrived at these strengths by calibrating the lower strength attackfor each norm to give a reasonable degree of variance in normally trained CNNs Because normallytrained CNNs are extremely sensitive to adversarial perturbations this resulted in perturbations wellbelow the level of perceptibility and in most cases particularly with the Linfin constraint perturbationsthat would not even be rendered differently by a monitor for display We set the high attack strength at4 times the low value which brought standard ImageNet trained CNNs to nearly chance performanceStill even this higher perturbation strength remained practically imperceptible (Fig B3 left) Eachperturbation was computed with 64 PGD iterations and a step size α = ε32 where ε = δp for thesame 5000 images from the ImageNet validation set on a per model basis and final top-1 accuracywas reported The Adversarial Robustness Toolkit [96] was used for computing the attacks

While adversarial formulations using Lagrangian relaxation like the CampW attack [7] and the EADattack [8] are known to be more effective than PGD for finding minimal perturbations under L2 andL1 constraints these formulations involve computationally expensive line searches often requiringthousands of iterations to converge making them computationally difficult to scale to 5000 ImageNetsamples for a large number of models Furthermore because some of the models we tested operatewith stochasticity during inference we found that other state-of-the-art attacks formulated to effi-ciently find minimal perturbation distances [9 10] were generally difficult to tune as their successbecomes stochastic as they near the decision boundary Thus we proceeded with the PGD formulationas we found it to be the most reliable computationally tractable and conceptually simple to followWe share our model and weights and encourage anyone interested to attack our model to verify ourresults

Although the stochastic models were implemented with the reparameterization trick for randomsampling which leaves the gradients intact [83] we performed several sanity checks to make surethat the white box attacks were effectively implemented Following the advice of Athalye et al [68]and Carlini et al [84] we observed that in all of the white box attacks increasing the attack strengthdecreased model performance eventually bringing accuracy to zero (Fig B1) and additionally thatfor any given constraint and attack strength increasing PGD iterations from one to many increasedattack effectiveness (Fig B2) Furthermore to be extra cautious we followed the method Athalye etal [68] used to break Stochastic Activation Pruning [19] and replacednablaxf(x) with

sumki=1nablaxf(x)

effectively performing Monte Carlo (MC) sampling of the loss gradient at every PGD iteration Whilewith high perturbation strength even k = 1 was sufficient to bring our modelrsquos accuracy to zerowe find that k = 10 generally achieved higher attack success at intermediate and large perturbationstrengths (Table B1) and thus we used this approach for all attacks on stochastic models

17

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 18: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

11020 1255 4255Attack strength

000

025

050

075

Whi

te b

ox a

ccur

acy

015 060 240Attack strength

PGD-L2

40 160 640Attack strength

PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B1 VOneResNet50 improves robustness to white box attacks in a wide range of pertur-bation strengths White box accuracy perturbation strength curves for PGD attacks with constraintsLinfin L2 and L1 for ResNet50 VOneResNet50 ATLinfin and ANT3times3+SIN Adding the VOneBlock toResNet50 consistently improves robustness under all constraints and attack strengthmdashVOneResNet50can withstand perturbations roughly 4 times higher than the standard ResNet50 for the same perfor-mance level Remarkably VOneResNet50 outperformed the adversarially trained model (ATLinfin)for a considerable range of perturbation strengths particularly with the L1 constraint Adversarialnoise training combined with Stylized ImageNet (ANT3times3+SIN) has virtually no improvement inrobustness to white box attacks In all white box attacks and for all models increasing the attackstrength decreased performance eventually bringing accuracy to zero

1 4 16 64 PGD iterations

000

025

050

075

Whi

te b

ox a

ccur

acy

1 4 16 64 PGD iterations

1 4 16 64 PGD iterations

PGD-L2 PGD-L1PGD-Linfin ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B2 White box attack effectiveness converges for large number of PGD iteration stepsWhite box accuracy iteration curves for PGD attacks with δinfin = 1255 δ2 = 06 δ1 = 160constraints Increasing the number of PGD iteration steps makes the attack more effective generallyconverging after 32 iterations

Table B1 Improving the effectiveness of white box attacks with Monte Carlo sampling Whitebox attack accuracy (top-1) under different norm constraints and white box overall mean usingstandard PGD and PGD with Monte Carlo sampling of the loss gradients (PGD-MC k = 10) onVOneResNet50 for 64 PGD iteration steps For all stochastic models we used PGD with Monte Carlosampling in the white box attacks throughout the paper Values are mean and SD (n=3 seeds)

White box PGD-L1 PGD-L2 PGD-LinfinMean 40 160 015 060 11020 1255

Attack type [] [] [] [] [] [] []

Standard PGD 528plusmn02 644plusmn04 422plusmn01 652plusmn03 453plusmn04 634plusmn04 364plusmn04PGD-MC (k=10) 511plusmn04 642plusmn05 404plusmn05 653plusmn04 443plusmn03 619plusmn04 303plusmn05

18

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 19: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Although VOneResNet50 was not designed to explicitly improve robustness to white box attacksit offers substantial gains over the standard ResNet50 model (Table 1) A closer inspection of thewhite box accuracy perturbation strength curves (Fig B1) reveals that VOneResNet50 can withstandperturbations roughly 4 times higher than the standard ResNet50 for the same performance level (bluecurve is shifted to the right of the black curve by a factor of 4) To help visualize how this differencetranslates into actual images we show in Fig B3 examples of white box adversarial images for bothResNet50 and VOneResNe50 for the perturbation strengths that bring the respective models to nearlychance performance

Original Egyptian Cat

ResNet50 VOneResNet50

||δ||infin = 1255 ||δ||infin = 4255

||δ||2 = 060

||δ||1 = 160

+ +Sleeping Bag Barrel

Sleeping Bag ||δ||2 = 240 Norwegian Elkhound

Sleeping Bag ||δ||1 = 640 Schipperke

= =

=

=

=

=

Figure B3 Comparison of adversarial images for ResNet50 and VOneResNet50 Visualizationof white box attacks under the 3 different norm constraints (Linfin L2 and L1) for the perturbationstrength that brings the model to nearly chance performance (Fig B1) Left ResNet50 rightVOneResNet50 For each attack the perturbation is shown on the left scaled for visualization andthe perturbed image appears next to it Perturbation strengths and classes of the adversarial imagesare shown over the images

19

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 20: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

B2 Additional attack controls

The field of adversarial robustness is littered with examples of proposed defense methods that werequickly defeated with some minor attack optimizations or adaptations [68 84 97] For this reasonwe took great care to verify our claims of improved robustness of the VOneNets In addition to theimplementation details and controls described in the previous section we validated our results byfurther optimizing our attacks on both the standard stochastic VOneResNet50 and the VOneResNet50with stochasticity removed during the attack and inference

First on the standard stochastic VOneResNet50 we ensured that our attack hyper-parametersincluding step-size (α) number of gradient samples (k) and PGD iterations are sufficient forevaluating the networkrsquos robustness For PGD with δinfin = 1255 with 64 iterations we performeda large grid search over α and k observing only a marginal increase in the attack effectiveness (FigB4) For the most effective step size (α = ε8) increasing the number of gradient samples beyondk = 16 no longer decreases accuracy under the attack At k = 128 VOneResNet50 accuracy isonly reduced from 2915 to 260 remaining a large margin above ResNet50 accuracy of 08Additionally increasing the number of PGD iterations all the way up to 1000 (with α = ε8 andk = 16) only further reduced performance by 05 Finally using 5-10 random restarts did notimprove the attack beyond the most effective k and α settings

Reported VOneResNet50

Reported ResNet50

Whi

te b

ox a

ccur

acy

gradient samples

n = 3 seeds

||δ||infin = 1255

ε 2ε 4ε 8ε 16ε 32

Step size

001 8 32 64 128

01

02

03

04

Figure B4 Further attack optimization does not overturn results White box accuracy for PGDwith δinfin = 1255 64 iterations on 500 images Large grid search over step size and number ofgradient samples only marginally improves attack effectiveness Further attack optimization onlymarginally reduces model performance Lines represent mean and shaded areas SD (n=3 seeds)

Second while boundary attacks are not ideal against the standard stochastic VOneResNet50 due to theshifting boundary line we investigated VOneResNet50 without stochasticity during attackinferenceusing the Brendel and Bethge (BampB) attack [10] a powerful white box adversary that aims to findminimal adversarial perturbations along the decision boundary of the network We reasoned that ifthe BampB attack offers similar results to those observed using PGD this is a good sanity check thatour observed robustness is reliable and not a result of masked gradients around the input or a quirk ofPGD Using the FoolBox [98] implementation of the Linfin BampB attack initialized with 500 successfuladversarial images generated using PGD δinfin = 8255 we see nearly identical scores as comparedto standard PGD with optimal hyper-parameters δinfin = 11020 and PGD δinfin = 1255

We conclude that these additional controls do not qualitatively change our reported results and providefurther evidence that the observed improved robustness of the VOneNets is real and not an artifact

20

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 21: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

B3 Common Image Corruptions

For evaluating model robustness to common image corruptions we used ImageNet-C a datasetpublicly available at httpsgithubcomhendrycksrobustness [13] It consists of 15 dif-ferent types of corruptions each at five levels of severity for a total of 75 different perturbations (FigB5) The corruption types fall into four categories noise blur weather and digital effects Noiseincludes Gaussian noise shot noise and impulse noise blur includes motion blur defocus blur zoomblur and glass blur weather includes snow frost fog and brightness digital effects includes JPEGcompression elastic transformations contrast and pixelation For every corruption type and at everyseverity we tested on all 50000 images from the ImageNet validation set We note that these imageshave been JPEG compressed by the original authors for sharing which is known to have minor butnoticeable effect on final network performance

Original Gaussian noise Shot noise Impulse noise

Defocus blur Glass blur Motion blur Zoom blur

Snow Frost Fog Brightness

Contrast Elastic transform Pixelate JPEG compression

Figure B5 Common image corruptions with intermediate severity Visualization of all 15common image corruption types evaluated at severity = 3 First row original image followed by thenoise corruptions second row blur corruptions third row weather corruptions fourth row digitalcorruptions

21

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 22: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

1 2 3 4 5Severity

000

025

050

075C

orru

ptui

ons a

ccur

acy

Noise

1 2 3 4 5Severity

Blur

1 2 3 4 5Severity

Weather

1 2 3 4 5Severity

Digital

ResNet50VOneResNet50

ANT3times3+SINATLinfin

Figure B6 VOneResNet50 improves robustness to noise blur and digital corruptions Corrup-tion accuracy severity curves for the four categories of common image corruptions noise blurweather and digital Adding the VOneBlock to ResNet50 improves robustness to all corruptioncategories except weather The adversarially trained model (ATLinfin ) is considerably worse at commonimage corruptions Adversarial noise combined with Stylized ImageNet training (ANT3times3+SIN) isconsistently more robust to all corruption categories For ResNet50 and VOneResNet50 error-barsrepresent SD (n=3 seeds)

Table B2 Detailed common image corruption results Adding the VOneBlock to ResNet50improves robustness to several of the corruption types However for others particularly fog andcontrast robustness decreases Interestingly these are the corruption types that the adversariallytrained model (ATLinfin) also does the worst Adversarial noise combined with Stylized ImageNettraining (ANT3times3+SIN) is the most robust model for most of the corruption types For ResNet50 andVOneResNet50 values are mean and SD (n=3 seeds)

Noise Blur

Gaussian Shot Impulse Defocus Glass Motion ZoomModel [] [] [] [] [] [] []

ResNet50 [4] 296plusmn11 279plusmn09 245plusmn12 388plusmn07 261plusmn06 379plusmn04 345plusmn07

ATLinfin [76] 236 223 148 244 343 332 349ANT3times3+SIN [59] 616 607 599 415 386 460 373VOneResNet50 346plusmn14 334plusmn12 319plusmn15 378plusmn01 357plusmn01 374plusmn02 340plusmn02

Weather Digital

Snow Frost Fog Bright Contrast Elastic Pixelate JPEGModel [] [] [] [] [] [] [] []

ResNet50 301plusmn05 367plusmn04 436plusmn08 668plusmn02 388plusmn05 448plusmn03 470plusmn27 551plusmn10

ATLinfin 295 312 84 541 102 484 563 587ANT3times3+SIN 477 522 549 682 493 527 589 593VOneResNet50 252plusmn05 368plusmn04 301plusmn05 624plusmn03 285plusmn05 487plusmn01 633plusmn02 610plusmn03

22

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 23: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

B4 Other defense methods

We compared the performance of VOneResNet50 in the described computer vision benchmarkswith two training-based defense methods using the ResNet50 architecture adversarial training witha δinfin = 4255 constraint (ATLinfin) [76] and adversarial noise with Stylized ImageNet training(ANT3times3+SIN) [59]

The ATLinfin model was downloaded from the publicly available adversarially trained models at httpsgithubcomMadryLabrobustness [76] This model was trained following the adversarialtraining paradigm of Madry et al [17] to solve the min-max problem

minθρ(θ) where ρ(θ) = E(xy)simD

[maxδisinS

L(θ x+ δ y)

](1)

Here the goal is to learn parameters θ minimizing the loss L for training images x and labels y drawnfrom D while perturbing x with δ isin S to maximally increase the loss In this case D is ImageNetand PGD with δinfin = 4255 is used to find the perturbation maximizing the loss for a given (x y)pair The δinfin = 4255 constraint model was selected to compare against because it had the bestperformance on our conglomerate benchmark of adversarial attacks and common image corruptions

The ANT3times3+SIN model was obtained from httpsgithubcombethgelabgame-of-noiseRecently Rusak et al showed that training a ResNet50 model with several types of additive inputnoise improved the robustness to common image corruptions [59] Inspired by this observation theauthors decided to train a model while optimizing a noise distribution that maximally confuses itSimilarly to standard adversarial training this results in solving a min-max problem

minθ

maxφ

E(xy)simDEδsimpφ(δ) [L(θ x+ δ y)] (2)

where pφ(δ) is the maximally confusing noise distribution The main difference to regular adversarialtraining in Equation 1 is that while in the former δ is optimized directly here δ is found by optimizinga constrained distribution with local spatial correlations Complementing the adversarial noise trainingwith Stylized ImageNet training [25] produced the model with the current best robustness in theImageNet-C benchmark as a standalone method

23

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 24: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

C VOneNet implementation details

C1 Convolutional layer

The convolutional layer of the VOneBlock is a mathematically parameterized Gabor Filter Bank(GFB) We set the stride of the GFB to be four originating a 56times56 spatial map of activations Sincethe number of channels in most CNNsrsquo first convolution is relatively small (64 in the architecturesadapted) we used a larger number in the VOneBlock so that the Gabors would cover the largeparameter space and better approximate primate V1 We set the main VOneNet models to contain512 channels equally split between simple and complex cells (see Fig C2 A for an analysis of howthe number of channels affects performance) Each channel in the GFB convolves a single colorchannel from the input image

The Gabor function consists of a two-dimensional grating with a Gaussian envelope and is describedby the following equation

Gθfφnxny (x y) =1

2πσxσyexp

[minus05(x2rotσ

2x + y2rotσ

2y)]

cos (2πf + φ) (3)

where

xrot = x cos(θ) + y sin(θ)

yrot = minusx sin(θ) + y cos(θ)(4)

σx =nxf

σy =nyf

(5)

xrot and yrot are the orthogonal and parallel orientations relative to the grating θ is the angle of thegrating orientation f is the spatial frequency of the grating φ is the phase of the grating relative to theGaussian envelope and σx and σy are the standard deviations of the Gaussian envelope orthogonaland parallel to the grating which can be defined as multiples (nx and ny) of the grating cycle (inverseof the frequency)

Although the Gabor filter greatly reduces the number of parameters defining the linear spatialcomponent of V1 neurons it still has five parameters per channel Fortunately there is a vastliterature in neurophysiology with detailed characterizations of primate V1 response properties whichcan be used to constrain these parameters To instantiate a VOneBlock with CV 1 channels wesampled CV 1 values for each of the parameters according to an empirically constrained distribution(Table C1) Due to the resolution of the input images we limited the ranges of spatial frequencies(f lt 56cpd) and number of cycles (n gt 01)

Table C1 Empirical distributions used to sample the GFB parameters in the VOneBlock

Parameter Reference

θ De Valois et at [78] Fig 5 Orientation tuning (foveal)f De Valois et at [79] Fig 6 Peak spatial freq (X cells foveal)φ - Uniform distribution isin [0 360] degnx ny Ringach 2002 [80] Fig 4 (nx ny) joint distribution

Critical to the correct implementation of the biologically-constrained parameters of the GFB is thechoice of the modelrsquos field of view in degrees As previously mentioned in Section A we used 8degas the input spatial extent for all CNN models Itrsquos important to note that the set of spatial filters inthe VOneBlockrsquos GFB differs considerably from those in the first convolution in most CNNs (FigC1) While standard CNNs learn filters that resemble Gabors in their input layer [1 99 100] due thelimited span of their kernels they do not vary significantly in size and spatial frequency V1 neuronson the other hand are known to exhibit a wide range of receptive field properties This phenomena iscaptured in the VOneBlock with the spatial frequencies and receptive field sizes of Gabors rangingmore than one order of magnitude Due to this high variability we set the convolution kernel to be25times 25px which is considerably larger than those in standard CNNs (Fig C1)

24

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 25: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

VOneBlock ResNet50 CORnet-S AlexNet

7 px7 px 7 px7 px

Figure C1 Comparison between filters in the first convolution of standard CNNs and theVOneBlock Example filters from the first convolution in the VOneBlock ResNet50 CORnet-S andAlexNet (from left to right) VOneBlock filters are all parameterized as Gabors varying considerablyin size and spatial frequency Standard CNNs have some filters with shapes other than Gaborsparticularly center-surround and are limited in size by their small kernel Kernel sizes are 25px 7px7px and 11px for VOneBlock ResNet50 CORnet-S and AlexNet respectively

25

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 26: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

C2 Nonlinear layer

VOneBlockrsquos nonlinear layer has two different nonlinearities that are applied to each channel depend-ing on its cell type a rectified linear transformation for simple cells (6) and the spectral power of aquadrature phase-pair (7) for complex cells

Snlθfφnxny =

Slθfφnxny if Slθfφnxny gt 0

0 otherwise(6)

Cnlθfφnxny =1radic2

radic(Clθfφnxny )2 + (Clθfφ+π2nxny )2 (7)

where Sl and Snl are the linear and nonlinear responses of a simple neuron and Cl and Cnl are thesame for a complex neuron

C3 Neuronal stochasticity generator

In awake monkeys spike trains of V1 neurons are approximately Poisson ie the variance and meanof spike counts in a given time-window over a set of repetitions are roughly the same [82] Weincorporated stochasticity into the VOneBlock to emulate this property of neuronal responses Sincethe Poisson distribution is not continuous it breaks the gradients in a white box attack giving a falsesense of robustness [68] In order to avoid this situation and facilitate the evaluation of the modelrsquosreal robustness our neuronal stochasticity generator as implemented uses a continuous second-orderapproximation of Poisson noise by adding Gaussian noise with variance equal to the activation

Rs sim N (micro = Rns σ2 = Rns) (8)

where Rns and Rs are the non-stochastic and stochastic responses of a neuron

To approximate the same levels of neuronal stochasticity of primate V1 neurons it is critical that theVOneBlock activations are on the same range as the V1 neuronal responses (number of spike countsin a given time-window) Thus we applied an affine transformation to the activations so that both themean stimulus response and the mean baseline activity are the same as those of a population of V1neurons (Table C2 shows the mean responses and spontaneous activity of V1 neurons measured indifferent time-windows)

Table C2 Mean stimulus responses and spontaneous activity of a population of V1 neuronsfor different time-windows

Time-window Stimulus response Spontaneous activity[ms] [spikes] [spikes]

25 0324 007350 0655 0145

100 105 029200 138 058

Since the exact time-window that responses in V1 are integrated during visual perception is still anopen question we considered a time-window of 25ms (Fig C2 B) In order to keep the outputs ofthe VOneBlock on a range that does not deviate considerably from the typical range of activations inCNNs so that the model can be efficiently trained using standard training parameters we applied theinverse transformation to scale the outputs back to their original range after the stochastic layer

C4 Hyper-parameters and other design choices

We developed VOneNets from three different standard CNN architectures ResNet50 CORnet-S andAlexNet As previously mentioned we replaced the first block of each architecture by the VOneBlockand the transition layer to create the respective VOneNet Table C3 contains the layers that wereremoved for each base architecture Except CORnet-S the layers removed only contained a single

26

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 27: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

convolution nonlinearity and maxpool Since CORnet-S already had a pre-committed set of layersto V1 we replaced them by the VOneBlock The torchvision implementation of AlexNet containsa combined stride of eight in the removed layers (four in the convolution and two in the maxpool)followed by a stride of one in the second convolution (outside of the removed layers) In order to moreeasily adapt it to a VOneBlock with a stride of four we slightly adapted AlexNetrsquos architecture so thatit had strides of two in these three layers (first convolution first maxpool and second convolution)The results shown in Fig 2 were obtained using this modified architecture

Table C3 Layers removed from each base CNN architecture for creating the VOneNets

Architecture Layers removed

AlexNet ConvReLUMaxPoolResNet50 ConvBatch-NormReLUMaxPoolCORnet-S ConvBatch-NormReLUMaxPoolConvBatch-NormReLU

The VOneNet architecture was designed to have as few hyper-parameters as possible When possiblethese were either constrained by the base CNN architecture or by neurobiological data mostparameters of the GFB were instantiated by sampling from neuronal distributions of primate V1neurons the kernel size was set to 25times25px to capture the high variability in Gabor sizes like allother CNN models the spatial extent of the field of view was set to 8deg (Section A) Neverthelessthere were two hyper-parameters where the choice was rather arbitrary the number of channels inthe VOneBlock and the time-window for integrating V1 responses to scale the activations prior tothe stochasticity generator

Since primate V1 has neurons tuned to a wide range of spatial frequencies for each orientation andat any given location of the visual space [101] we expected that a large number of channels would berequired to cover all of the combinations of Gabor parameters For the VOneResNet50 architecturewe varied the number of channels in the VOneBlock between 64 and 1024 equally split betweensimple and complex cells and measured the performance of the several models after training (FigC2 A) Clean ImageNet and common image corruption performance improved with channel numberRemarkably the opposite happened for white box accuracy with variants of VOneResNet50 with 64and 128 channels achieving the highest robustness to white box attacks This result is particularlyinteresting since it had been shown that increasing the number of channels in all layers of a CNNimproves robustness to white box attacks in both standard and adversarially trained models [17]Therefore this shows that the improvement in white box robustness of VOneNets when compared totheir base models cannot be attributed to the higher number of channels For the main model we setthe channel number to be 512 as it offered a good compromise between the different benchmarksRegarding the time-window for integrating V1 responses we observed small effects on the modelsrsquoperformance when varying it between 25 and 200ms We chose 25ms for the main model due to asmall trend for higher robustness to white box attacks with shorter time-windows

102451225612864 channels

04

05

06

07

Acc

urac

y

25 50 100 200Activity time-window (ms)

04

05

06

07

Acc

urac

y

CleanPerturbationWhite boxCorruption

A Bwith 25 ms time-window with 512 channels

Figure C2 Exploration of VOneBlockrsquos hyper-parameters in the VOneResNet50 architec-ture A Performance of different VOneResNet50 models with varying number of channels in theVOneBlock B Same as in A but varying time-window for integrating neuronal activity Error-barsshow SD (n=3 seeds) Red arrows represent the values used in the main model

27

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 28: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

C5 Training details

We used PyTorch 041 and trained the model using ImageNet 2012 [34] Images were prepro-cessed (1) for trainingmdashrandom crop to 224 times 224 pixels and random flipping left and right (2)for validationmdashcentral crop to 224 times 224 pixels Preprocessing was followed by normalizationmdashsubtraction and division by [05 05 05] We used a batch size of 256 images and trained either on 2GPUs (NVIDIA Titan X GeForce 1080Ti) or 1 GPU (QuadroRTX6000 or V100) for 70 epochsWe use step learning rate scheduling 01 starting learning rate divided by 10 every 20 epochs Foroptimization we use Stochastic Gradient Descent with a weight decay 00001 momentum 09 and across-entropy loss between image labels and model predictions (logits)

Model weights and code are available at httpsgithubcomdicarlolabvonenet

28

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 29: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

D VOneBlock variant details

To investigate which components of the VOneBlock are responsible for the increased robustnesswe created six variants of the VOneBlock by removing or modifying one of its parts In all caseswe trained from scratch a corresponding variant of VOneResNet50 with the modified VOneBlockAll other properties of the model and components of the VOneBlock were left unmodified Herewe describe in more detail each of these variants (the name of the variant refers to the componentremoved)

bull V1 distributions the GFB parameters were sampled from uniform distributions with thesame domain as the empirical distributions used in the default VOneBlock (Table C1)bull Low SF when sampling the spatial frequencies for the GFB neurons with low peak spatial

frequencies (f lt 2cpd) were removed from the empirical distribution [79]bull High SF similar to the previous variant but removing neurons with high peak spatial

frequencies (f gt 2cpd)bull Complex NL complex cell channels were removed from the VOneBlock and replaced by

simple cell channelsbull Simple NL simple cell channels were removed from the VOneBlock and replaced by

complex cell channelsbull V1 stochasticity V1 stochasticity generator was removed from the VOneBlock

While all of the variants resulted in worse overall mean performance than the unmodified VOneRes-Net50 (Table 2) some improved specific benchmarks the variant without high SF had higher accuracyunder noise and blur corruptions (Table D1) the variant without V1 stochasticity had higher accuracyto clean images and images with noise and weather corruptions (Tables 2 and D1) and the variantwithout V1 distributions had higher accuracy to white box attacks with high Linfin perturbations (TableD2)

Table D1 Difference between the common image corruption accuracies of the severalVOneResNet50 variants and the unmodified VOneResNet50 Removal of simple cells reducedaccuracy to all common image corruption types Removal of some components of the VOneBlockimproved accuracy in specific common image corruption removing high SF consistently improvedperformance for noise and blur corruptions and removing V1 stochasticity improved performancefor noise and weather corruptions

Noise Blur

Component Gaussian Shot Impulse Defocus Glass Motion Zoomremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -28plusmn09 -24plusmn07 -29plusmn11 -16plusmn01 04plusmn02 -08plusmn03 -02plusmn05

Low SF -14plusmn05 -10plusmn02 -15plusmn05 -06plusmn06 -11plusmn02 -12plusmn04 -05plusmn07

High SF 68plusmn13 73plusmn11 70plusmn13 56plusmn03 115plusmn03 37plusmn00 61plusmn01

Complex 10plusmn16 12plusmn13 07plusmn18 12plusmn02 -15plusmn01 02plusmn08 04plusmn11

Simple -61plusmn16 -54plusmn15 -64plusmn19 -41plusmn04 -42plusmn04 -49plusmn05 -54plusmn03

V1 stoch 22plusmn25 22plusmn25 21plusmn29 33plusmn01 -55plusmn01 09plusmn07 09plusmn01

Weather Digital

Component Snow Frost Fog Bright Contrast Elastic Pixelate JPEGremoved [∆] [∆] [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -02plusmn06 -09plusmn06 -19plusmn11 -08plusmn05 -14plusmn09 03plusmn04 -17plusmn03 08plusmn05

Low SF -04plusmn03 -06plusmn04 09plusmn03 -02plusmn02 -04plusmn05 -09plusmn05 -13plusmn02 -05plusmn04

High SF 14plusmn10 -07plusmn07 -95plusmn04 -52plusmn04 -44plusmn05 24plusmn02 19plusmn04 -46plusmn04

Complex 07plusmn04 04plusmn03 43plusmn02 13plusmn02 29plusmn02 -05plusmn02 -12plusmn00 -19plusmn05

Simple -50plusmn05 -90plusmn05 -44plusmn05 -39plusmn02 -48plusmn04 -39plusmn05 -35plusmn04 -18plusmn02

V1 stoch 25plusmn04 03plusmn03 120plusmn05 26plusmn03 100plusmn08 -28plusmn03 -40plusmn07 -40plusmn04

29

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details
Page 30: bioRxiv - Simulating a Primary Visual Cortex at the Front of CNNs … · 2020. 6. 16. · 8MIT-IBM Watson AI Lab dapello@mit.edu tmarques@mit.edu Abstract Current state-of-the-art

Table D2 Difference between the white box attack accuracies of the several VOneResNet50variants and the unmodified VOneResNet50 Removal of V1 stochasticity considerably reducedaccuracy to all white box attacks Removal of V1 distributions improved accuracy for the Linfin whitebox attack with high perturbation strength

PGD-L1 PGD-L2 PGD-LinfinComponent 40 160 015 060 11020 1255removed [∆] [∆] [∆] [∆] [∆] [∆]

V1 dist -05plusmn08 01plusmn11 -08plusmn02 06plusmn10 03plusmn03 30plusmn18Low SF -08plusmn12 -31plusmn11 -06plusmn10 -23plusmn08 -08plusmn13 -18plusmn10High SF -22plusmn09 -47plusmn26 -26plusmn15 -51plusmn12 -26plusmn14 -55plusmn11Complex -13plusmn09 -74plusmn16 -11plusmn09 -72plusmn12 -19plusmn10 -79plusmn12Simple -25plusmn08 -14plusmn14 -29plusmn11 -17plusmn12 -27plusmn12 -06plusmn08Noise -237plusmn09 -365plusmn08 -216plusmn05 -401plusmn04 -322plusmn04 -292plusmn04

30

CC-BY 40 International licenseavailable under a(which was not certified by peer review) is the authorfunder who has granted bioRxiv a license to display the preprint in perpetuity It is made

The copyright holder for this preprintthis version posted October 22 2020 httpsdoiorg10110120200616154542doi bioRxiv preprint

  • Introduction
    • Related Work
      • Adversarial Robustness Correlates with V1 Explained Variance
      • VOneNet a Hybrid CNN with a V1 Neural Network Front-End
      • Results
        • Simulating a V1 at the front of CNNs improves robustness to white box attacks
        • VOneNets outperform state-of-the-art methods on a composite set of perturbations
        • All components of the VOneBlock contribute to improved model robustness
          • Discussion
          • Neural Explained Variance
          • Image Perturbations
            • White Box Adversarial Attacks
            • Additional attack controls
            • Common Image Corruptions
            • Other defense methods
              • VOneNet implementation details
                • Convolutional layer
                • Nonlinear layer
                • Neuronal stochasticity generator
                • Hyper-parameters and other design choices
                • Training details
                  • VOneBlock variant details